docs(eval): add window B telemetry slice and maintain hold decision

This commit is contained in:
William Valentin
2026-02-23 22:31:06 -08:00
parent 9156adb2a8
commit 2d31f85c75
4 changed files with 164 additions and 12 deletions
+12 -9
View File
@@ -81,17 +81,19 @@ pnpm audit:backend-canary \
### Window B
- Dates: _TBD_
- Route volume: _TBD_
- Summary artifact: _TBD_
- Dates: February 24, 2026 (since 06:14:00Z; post-initial-fallback slice)
- Route volume: 6 total routes (`pi_embedded`: 6, `native`: 0)
- Summary artifacts:
- `docs/plans/artifacts/pi_embedded_eval_window_b_2026-02-24_post_fallbacks.md`
- `docs/plans/artifacts/pi_embedded_eval_window_b_2026-02-24_post_fallbacks.json`
| Check | Result | Notes |
| --- | --- | --- |
| Completion rate delta | _TBD_ | |
| P50 latency delta | _TBD_ | |
| P95 latency delta | _TBD_ | |
| Fallback rate | _TBD_ | |
| Guardrail escapes | _TBD_ | |
| Completion rate delta | n/a (insufficient baseline) | no native-routed turns in this slice |
| P50 latency delta | n/a (insufficient baseline) | no native-routed turns in this slice |
| P95 latency delta | n/a (insufficient baseline) | no native-routed turns in this slice |
| Fallback rate | 0.00% (pass) | 0 fallbacks / 6 attempts |
| Guardrail escapes | none observed (provisional pass) | no `forced_native_guard` events in this window |
## Tool Compatibility Findings
@@ -110,7 +112,8 @@ Track all tool-adjacent/risky prompts that were force-routed to native (`no_tool
- Rationale: Window A fails 3/4 numeric gates (p50 delta, p95 delta, fallback rate) with only 10 total routed turns, including two concrete fallback failure modes:
- module session factory mismatch
- no assistant text returned from Pi runtime
- Next cohort/config delta: none until Window B confirms gate pass and fallback causes are remediated.
Window B shows fallback recovery (0%) in a post-fallback slice but cannot evaluate delta gates because it contains no baseline native routes.
- Next cohort/config delta: none until an additional baseline-balanced window confirms delta gates and guardrail coverage probes are completed.
## Diagram/Protocol Impact Review