From 55f1a3dd7be71c3a4d27bcc1ab6d2b470d06af8c Mon Sep 17 00:00:00 2001 From: William Valentin Date: Fri, 27 Feb 2026 09:36:22 -0800 Subject: [PATCH] feat(audit): refresh all phase0 live windows in cadence run --- README.md | 2 +- docs/api/PROTOCOL.md | 2 +- docs/architecture/AGENT_DIAGRAM.md | 2 +- .../GATEWAY_SESSIONS_AND_QUEUE.md | 2 +- ...phase0-instrumentation-ticket-checklist.md | 2 +- .../phase0_baseline_live_2026-02-27.json | 58 +++++++++---------- .../phase0_baseline_live_2026-02-27.jsonl | 5 ++ .../phase0_baseline_live_2026-02-27.md | 20 +++---- ...aseline_live_backend_drift_2026-02-27.json | 28 ++++----- ..._baseline_live_backend_drift_2026-02-27.md | 18 +++--- ...seline_live_backend_native_2026-02-27.json | 46 ++++++++++----- ...eline_live_backend_native_2026-02-27.jsonl | 2 + ...baseline_live_backend_native_2026-02-27.md | 15 ++--- ...e_live_backend_pi_embedded_2026-02-27.json | 58 +++++++++---------- ..._live_backend_pi_embedded_2026-02-27.jsonl | 3 + ...ine_live_backend_pi_embedded_2026-02-27.md | 18 +++--- ...ase0_baseline_live_gateway_2026-02-27.json | 2 +- docs/plans/state.json | 32 +++++++++- package.json | 2 +- 19 files changed, 189 insertions(+), 128 deletions(-) diff --git a/README.md b/README.md index 4418ddd..6abab34 100644 --- a/README.md +++ b/README.md @@ -1635,7 +1635,7 @@ pnpm audit:phase0-baseline:live:pi pnpm audit:phase0-baseline:live:native ``` -One-shot refresh for both channel + gateway live windows: +One-shot refresh for all live baseline windows (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`): ```bash pnpm audit:phase0-baseline:live:refresh ``` diff --git a/docs/api/PROTOCOL.md b/docs/api/PROTOCOL.md index 2764e87..ba3e8d7 100644 --- a/docs/api/PROTOCOL.md +++ b/docs/api/PROTOCOL.md @@ -23,7 +23,7 @@ The gateway provides: - **HTTP Server**: Serves static dashboard and handles webhook endpoints - **Node Capability Negotiation**: Optional companion-node role/capability registration -Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap `), release-bundle export (`--export-release-bundle ` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle ` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template `), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of both windows, and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided. +Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap `), release-bundle export (`--export-release-bundle ` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle ` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template `), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of all live windows (channel + gateway + backend-scoped), and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided. ### Execution Model (Sessions + Per-Session Queue) diff --git a/docs/architecture/AGENT_DIAGRAM.md b/docs/architecture/AGENT_DIAGRAM.md index c2599f0..fad21f2 100644 --- a/docs/architecture/AGENT_DIAGRAM.md +++ b/docs/architecture/AGENT_DIAGRAM.md @@ -169,7 +169,7 @@ Gateway streaming UX signals: - `pnpm audit:phase0-baseline:live` captures anonymized channel-origin live run/reaction baseline artifacts from real audit logs. - `pnpm audit:phase0-baseline:live:pi` and `pnpm audit:phase0-baseline:live:native` capture backend-scoped channel windows using `backend.route` timelines. - `pnpm audit:phase0-baseline:live:gateway` captures gateway-origin baseline windows by auto-selecting the latest cancel/cancelled session window (or use `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit windows). -- `pnpm audit:phase0-baseline:live:refresh` runs both channel + gateway capture commands in one step for cadence refreshes. +- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture commands in one cadence step. - `pnpm audit:phase0-baseline:live:drift` evaluates backend-scoped artifact freshness/drift gates and writes `docs/plans/artifacts/phase0_baseline_live_backend_drift_.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` runs capture + drift checks in one cadence step. - `audit:phase0-baseline:live*` scripts are cadence-safe by default (UTC-date tags auto-generated unless explicitly overridden). - Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts. diff --git a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md index 2c8b3e2..cb28841 100644 --- a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md +++ b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md @@ -34,7 +34,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`. - Audit phase-0 live telemetry snapshots can be regenerated with `pnpm audit:phase0-baseline:live` (channel-origin anonymized sample JSONL + summary JSON/markdown artifacts). - Backend-scoped channel snapshots can be regenerated with `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native` (`--backend` filtering via `backend.route` timelines). - Gateway-origin phase-0 windows (including cancel-path samples) can be captured with `pnpm audit:phase0-baseline:live:gateway` (auto-detect latest cancel window) or `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit bounds. -- `pnpm audit:phase0-baseline:live:refresh` runs both capture paths to refresh channel + gateway artifacts in one command. +- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture paths in one command. - `pnpm audit:phase0-baseline:live:drift` checks backend-scoped artifact freshness/drift gates and writes `phase0_baseline_live_backend_drift_.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` chains refresh + drift checks for scheduled cadence runs. - `audit:phase0-baseline:live*` package scripts now omit fixed tags so scheduled runs automatically roll to current UTC-date artifact tags. - Companion CLI supports one-shot shell bootstrap metadata for live sessions (`--app-version`/`--status-text`, `--latitude`/`--longitude`, `--push-token`) so desktop/mobile wrappers can initialize node status/location/push in a single launch flow. diff --git a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md index edc8c61..312ae5e 100644 --- a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md +++ b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md @@ -203,7 +203,7 @@ Phase 0 is complete when: 2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`. 3. No user-visible response behavior changed compared to pre-phase baseline. -Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), both windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (scheduling example included in README), backend-scoped channel windows are now available via `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`, and backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_.{md,json}`. +Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), all live windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (channel + gateway + backend-scoped `pi`/`native`; scheduling example included in README), backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_.{md,json}`. ## Subagent Model Assignment Plan diff --git a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json index 83c2722..bb267f4 100644 --- a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json @@ -1,8 +1,8 @@ { - "generated_at": "2026-02-27T16:46:42.576Z", + "generated_at": "2026-02-27T17:36:01.625Z", "source_audit_path": "~/.local/share/flynn/audit.log", - "source_event_count": 110, - "sampled_event_count": 104, + "source_event_count": 115, + "sampled_event_count": 109, "filters": { "sources": [ "channel" @@ -22,19 +22,19 @@ }, "summary": { "event_counts": { - "run_state": 65, + "run_state": 68, "run_cancel": 0, "reaction_match": 0, - "reaction_skip": 39 + "reaction_skip": 41 }, "run_outcomes": { "overall": { - "total_outcomes": 27, - "complete": 27, + "total_outcomes": 28, + "complete": 28, "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 38, + "start": 40, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -43,12 +43,12 @@ { "key": "gmail", "stats": { - "total_outcomes": 25, - "complete": 25, + "total_outcomes": 26, + "complete": 26, "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 25, + "start": 26, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -62,7 +62,7 @@ "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 13, + "start": 14, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -112,6 +112,20 @@ "error_rate_pct": 0 } }, + { + "key": "session_f6304f25e43b", + "stats": { + "total_outcomes": 2, + "complete": 2, + "cancelled": 0, + "error": 0, + "cancel_requested": 0, + "start": 2, + "completion_rate_pct": 100, + "cancel_rate_pct": 0, + "error_rate_pct": 0 + } + }, { "key": "session_33469de5a1ee", "stats": { @@ -322,20 +336,6 @@ "error_rate_pct": 0 } }, - { - "key": "session_f6304f25e43b", - "stats": { - "total_outcomes": 1, - "complete": 1, - "cancelled": 0, - "error": 0, - "cancel_requested": 0, - "start": 1, - "completion_rate_pct": 100, - "cancel_rate_pct": 0, - "error_rate_pct": 0 - } - }, { "key": "session_fd6536fa5ff4", "stats": { @@ -355,14 +355,14 @@ "cancel_latency_ms": null, "reactions": { "matched": 0, - "skipped": 39, - "total": 39, + "skipped": 41, + "total": 41, "match_rate_pct": 0, "skip_rate_pct": 100, "skip_reasons": [ { "reason": "no_rules", - "count": 39, + "count": 41, "pct": 100 } ] diff --git a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl index 39cacf1..8efee97 100644 --- a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl +++ b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl @@ -102,3 +102,8 @@ {"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772208000012} {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013} {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252} +{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454} +{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454} +{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324} +{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036} +{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036} diff --git a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.md b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.md index c160436..12c5be4 100644 --- a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.md +++ b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.md @@ -1,27 +1,27 @@ # Phase 0 Baseline Telemetry Summary -- Run state events: 65 +- Run state events: 68 - Run cancel events: 0 - Reaction matches: 0 -- Reaction skips: 39 +- Reaction skips: 41 - Sources: channel ## Run Outcomes (Overall) -- Total outcomes: 27 -- Complete: 27 (100.00%) +- Total outcomes: 28 +- Complete: 28 (100.00%) - Cancelled: 0 (0.00%) - Errors: 0 (0.00%) - Cancel requested: 0 -- Starts: 38 +- Starts: 40 ## Run Outcomes by Channel | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | -| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 25 | -| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 13 | +| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 26 | +| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 14 | ## Run Outcomes by Session @@ -30,6 +30,7 @@ | session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 | | session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 | | session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 | +| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 | | session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | @@ -45,7 +46,6 @@ | session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | -| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | ## Cancel Latency @@ -55,11 +55,11 @@ ## Reaction Decisions - Matched: 0 (0.00%) -- Skipped: 39 (100.00%) +- Skipped: 41 (100.00%) ### Skip Reasons | Reason | Count | Percent | | --- | ---: | ---: | -| no_rules | 39 | 100.00% | +| no_rules | 41 | 100.00% | diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json index 66177aa..d2af247 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json @@ -1,5 +1,5 @@ { - "generated_at": "2026-02-27T17:04:49.009Z", + "generated_at": "2026-02-27T17:36:02.803Z", "artifacts_dir": "/home/will/lab/flynn/docs/plans/artifacts", "backends": [ "pi_embedded", @@ -29,15 +29,15 @@ "candidate": { "tag": "2026-02-27", "path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json", - "generated_at": "2026-02-27T16:45:18.488Z" + "generated_at": "2026-02-27T17:36:02.214Z" }, "baseline": null, "comparison": { "baseline": null, "candidate": { - "source_event_count": 110, - "sampled_event_count": 56, - "run_total_outcomes": 25, + "source_event_count": 115, + "sampled_event_count": 59, + "run_total_outcomes": 26, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0, @@ -59,7 +59,7 @@ "freshness": { "enabled": true, "pass": true, - "actual_age_hours": 0.33, + "actual_age_hours": 0, "threshold_hours": 36 }, "drift_gate": { @@ -68,7 +68,7 @@ { "criterion": "candidate_sampled_events", "pass": true, - "actual": "56", + "actual": "59", "threshold": ">= 10" }, { @@ -116,21 +116,21 @@ "candidate": { "tag": "2026-02-27", "path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json", - "generated_at": "2026-02-27T16:45:18.490Z" + "generated_at": "2026-02-27T17:36:02.514Z" }, "baseline": null, "comparison": { "baseline": null, "candidate": { - "source_event_count": 110, - "sampled_event_count": 13, + "source_event_count": 115, + "sampled_event_count": 15, "run_total_outcomes": 2, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0, "cancel_latency_p95_ms": null, - "reaction_match_rate_pct": null, - "reaction_skip_rate_pct": null + "reaction_match_rate_pct": 0, + "reaction_skip_rate_pct": 100 }, "deltas": { "sampled_event_count_pct": null, @@ -146,7 +146,7 @@ "freshness": { "enabled": true, "pass": true, - "actual_age_hours": 0.33, + "actual_age_hours": 0, "threshold_hours": 36 }, "drift_gate": { @@ -155,7 +155,7 @@ { "criterion": "candidate_sampled_events", "pass": true, - "actual": "13", + "actual": "15", "threshold": ">= 10" }, { diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md b/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md index 93cdc14..4a77993 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md +++ b/docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md @@ -1,6 +1,6 @@ # Phase-0 Backend Drift Check -Generated at: 2026-02-27T17:04:49.009Z +Generated at: 2026-02-27T17:36:02.803Z Artifacts: /home/will/lab/flynn/docs/plans/artifacts Backends: pi_embedded, native Freshness max age (hours): 36 @@ -19,9 +19,9 @@ Overall gate: PASS ## pi_embedded - status: PASS - candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json -- candidate generated_at: 2026-02-27T16:45:18.488Z +- candidate generated_at: 2026-02-27T17:36:02.214Z - baseline: none -- candidate snapshot: sampled=56 outcomes=25 completion=100% cancel=0% error=0% cancel_p95_ms=n/a +- candidate snapshot: sampled=59 outcomes=26 completion=100% cancel=0% error=0% cancel_p95_ms=n/a - deltas: sampled_event_count_pct=n/a run_total_outcomes_pct=n/a @@ -31,9 +31,9 @@ Overall gate: PASS cancel_latency_p95_ms=n/a reaction_match_rate_pp=n/a reaction_skip_rate_pp=n/a -- freshness gate: PASS (age_hours=0.33 threshold=36) +- freshness gate: PASS (age_hours=0 threshold=36) - drift gate: PASS - PASS candidate_sampled_events actual=56 threshold=>= 10 + PASS candidate_sampled_events actual=59 threshold=>= 10 PASS sampled_events_drop_pct actual=n/a threshold=<= 80 PASS run_outcomes_drop_pct actual=n/a threshold=<= 80 PASS completion_rate_drop_pp actual=n/a threshold=<= 35 @@ -44,9 +44,9 @@ Overall gate: PASS ## native - status: PASS - candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json -- candidate generated_at: 2026-02-27T16:45:18.490Z +- candidate generated_at: 2026-02-27T17:36:02.514Z - baseline: none -- candidate snapshot: sampled=13 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a +- candidate snapshot: sampled=15 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a - deltas: sampled_event_count_pct=n/a run_total_outcomes_pct=n/a @@ -56,9 +56,9 @@ Overall gate: PASS cancel_latency_p95_ms=n/a reaction_match_rate_pp=n/a reaction_skip_rate_pp=n/a -- freshness gate: PASS (age_hours=0.33 threshold=36) +- freshness gate: PASS (age_hours=0 threshold=36) - drift gate: PASS - PASS candidate_sampled_events actual=13 threshold=>= 10 + PASS candidate_sampled_events actual=15 threshold=>= 10 PASS sampled_events_drop_pct actual=n/a threshold=<= 80 PASS run_outcomes_drop_pct actual=n/a threshold=<= 80 PASS completion_rate_drop_pp actual=n/a threshold=<= 35 diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json index 27dd98d..76e5159 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json @@ -1,8 +1,8 @@ { - "generated_at": "2026-02-27T16:45:18.490Z", + "generated_at": "2026-02-27T17:36:02.514Z", "source_audit_path": "~/.local/share/flynn/audit.log", - "source_event_count": 110, - "sampled_event_count": 13, + "source_event_count": 115, + "sampled_event_count": 15, "filters": { "sources": [ "channel" @@ -14,7 +14,7 @@ "probe" ], "anonymized_identifiers": true, - "backend_route_event_count": 127 + "backend_route_event_count": 129 }, "options": { "sources": [ @@ -26,10 +26,10 @@ }, "summary": { "event_counts": { - "run_state": 13, + "run_state": 14, "run_cancel": 0, "reaction_match": 0, - "reaction_skip": 0 + "reaction_skip": 1 }, "run_outcomes": { "overall": { @@ -38,7 +38,7 @@ "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 11, + "start": 12, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -52,7 +52,7 @@ "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 11, + "start": 12, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -172,6 +172,20 @@ "error_rate_pct": null } }, + { + "key": "session_534570702ea5", + "stats": { + "total_outcomes": 0, + "complete": 0, + "cancelled": 0, + "error": 0, + "cancel_requested": 0, + "start": 1, + "completion_rate_pct": null, + "cancel_rate_pct": null, + "error_rate_pct": null + } + }, { "key": "session_683372f346c3", "stats": { @@ -219,11 +233,17 @@ "cancel_latency_ms": null, "reactions": { "matched": 0, - "skipped": 0, - "total": 0, - "match_rate_pct": null, - "skip_rate_pct": null, - "skip_reasons": [] + "skipped": 1, + "total": 1, + "match_rate_pct": 0, + "skip_rate_pct": 100, + "skip_reasons": [ + { + "reason": "no_rules", + "count": 1, + "pct": 100 + } + ] } } } diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl index 6ad563e..92f3074 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl +++ b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl @@ -11,3 +11,5 @@ {"level":"info","event_type":"run.state","event":{"session_id":"session_a3f64a8e3c1e","channel":"cron","sender":"sender_a31bd6d4a95a","source":"channel","state":"start","request_id":"request_fc572d83d4c6"},"timestamp":1772182800034} {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013} {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252} +{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036} +{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036} diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md index ab51d57..e28304f 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md +++ b/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md @@ -1,9 +1,9 @@ # Phase 0 Baseline Telemetry Summary -- Run state events: 13 +- Run state events: 14 - Run cancel events: 0 - Reaction matches: 0 -- Reaction skips: 0 +- Reaction skips: 1 - Sources: channel @@ -14,13 +14,13 @@ - Cancelled: 0 (0.00%) - Errors: 0 (0.00%) - Cancel requested: 0 -- Starts: 11 +- Starts: 12 ## Run Outcomes by Channel | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | -| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 11 | +| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 12 | ## Run Outcomes by Session @@ -34,6 +34,7 @@ | session_494cb3b392af | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | | session_49b700741e03 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | | session_4cd8ba5e6df5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | +| session_534570702ea5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | | session_683372f346c3 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | | session_a3f64a8e3c1e | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | | session_ffcee254d546 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 | @@ -44,12 +45,12 @@ ## Reaction Decisions -- Matched: 0 (n/a) -- Skipped: 0 (n/a) +- Matched: 0 (0.00%) +- Skipped: 1 (100.00%) ### Skip Reasons | Reason | Count | Percent | | --- | ---: | ---: | -| _none_ | 0 | 0.00% | +| no_rules | 1 | 100.00% | diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json index 2e01ba4..4815b67 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json @@ -1,8 +1,8 @@ { - "generated_at": "2026-02-27T16:45:18.488Z", + "generated_at": "2026-02-27T17:36:02.214Z", "source_audit_path": "~/.local/share/flynn/audit.log", - "source_event_count": 110, - "sampled_event_count": 56, + "source_event_count": 115, + "sampled_event_count": 59, "filters": { "sources": [ "channel" @@ -14,7 +14,7 @@ "probe" ], "anonymized_identifiers": true, - "backend_route_event_count": 127 + "backend_route_event_count": 129 }, "options": { "sources": [ @@ -26,19 +26,19 @@ }, "summary": { "event_counts": { - "run_state": 42, + "run_state": 44, "run_cancel": 0, "reaction_match": 0, - "reaction_skip": 14 + "reaction_skip": 15 }, "run_outcomes": { "overall": { - "total_outcomes": 25, - "complete": 25, + "total_outcomes": 26, + "complete": 26, "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 17, + "start": 18, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -47,12 +47,12 @@ { "key": "gmail", "stats": { - "total_outcomes": 25, - "complete": 25, + "total_outcomes": 26, + "complete": 26, "cancelled": 0, "error": 0, "cancel_requested": 0, - "start": 17, + "start": 18, "completion_rate_pct": 100, "cancel_rate_pct": 0, "error_rate_pct": 0 @@ -102,6 +102,20 @@ "error_rate_pct": 0 } }, + { + "key": "session_f6304f25e43b", + "stats": { + "total_outcomes": 2, + "complete": 2, + "cancelled": 0, + "error": 0, + "cancel_requested": 0, + "start": 2, + "completion_rate_pct": 100, + "cancel_rate_pct": 0, + "error_rate_pct": 0 + } + }, { "key": "session_33469de5a1ee", "stats": { @@ -284,20 +298,6 @@ "error_rate_pct": 0 } }, - { - "key": "session_f6304f25e43b", - "stats": { - "total_outcomes": 1, - "complete": 1, - "cancelled": 0, - "error": 0, - "cancel_requested": 0, - "start": 1, - "completion_rate_pct": 100, - "cancel_rate_pct": 0, - "error_rate_pct": 0 - } - }, { "key": "session_fd6536fa5ff4", "stats": { @@ -317,14 +317,14 @@ "cancel_latency_ms": null, "reactions": { "matched": 0, - "skipped": 14, - "total": 14, + "skipped": 15, + "total": 15, "match_rate_pct": 0, "skip_rate_pct": 100, "skip_reasons": [ { "reason": "no_rules", - "count": 14, + "count": 15, "pct": 100 } ] diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl index 717fe6b..8dc6320 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl +++ b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl @@ -54,3 +54,6 @@ {"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772206157229} {"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"start","request_id":"request_ab73d670c119"},"timestamp":1772206157229} {"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"complete","request_id":"request_ab73d670c119","duration_ms":3850},"timestamp":1772206161079} +{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454} +{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454} +{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324} diff --git a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md index 154047c..55fea44 100644 --- a/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md +++ b/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md @@ -1,26 +1,26 @@ # Phase 0 Baseline Telemetry Summary -- Run state events: 42 +- Run state events: 44 - Run cancel events: 0 - Reaction matches: 0 -- Reaction skips: 14 +- Reaction skips: 15 - Sources: channel ## Run Outcomes (Overall) -- Total outcomes: 25 -- Complete: 25 (100.00%) +- Total outcomes: 26 +- Complete: 26 (100.00%) - Cancelled: 0 (0.00%) - Errors: 0 (0.00%) - Cancel requested: 0 -- Starts: 17 +- Starts: 18 ## Run Outcomes by Channel | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | -| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 17 | +| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 18 | ## Run Outcomes by Session @@ -29,6 +29,7 @@ | session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 | | session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 | | session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | +| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 | | session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 | | session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 | @@ -42,7 +43,6 @@ | session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | -| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | | session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 | ## Cancel Latency @@ -52,11 +52,11 @@ ## Reaction Decisions - Matched: 0 (0.00%) -- Skipped: 14 (100.00%) +- Skipped: 15 (100.00%) ### Skip Reasons | Reason | Count | Percent | | --- | ---: | ---: | -| no_rules | 14 | 100.00% | +| no_rules | 15 | 100.00% | diff --git a/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json index 5711b12..80842a1 100644 --- a/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json @@ -1,5 +1,5 @@ { - "generated_at": "2026-02-27T16:46:42.880Z", + "generated_at": "2026-02-27T17:36:01.922Z", "source_audit_path": "~/.local/share/flynn/audit.log", "source_event_count": 6, "sampled_event_count": 6, diff --git a/docs/plans/state.json b/docs/plans/state.json index ac4cb46..f9a2f6a 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -234,6 +234,36 @@ ], "test_status": "pnpm audit:phase0-baseline:live:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing" }, + "phase0-live-baseline-refresh-full-window": { + "status": "completed", + "date": "2026-02-27", + "updated": "2026-02-27", + "summary": "Expanded `pnpm audit:phase0-baseline:live:refresh` to regenerate all live windows in one command (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`) so scheduled `refresh:drift` runs keep backend artifacts fresh for baseline-vs-prior comparisons.", + "files_modified": [ + "package.json", + "README.md", + "docs/api/PROTOCOL.md", + "docs/architecture/AGENT_DIAGRAM.md", + "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md", + "docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md", + "docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json", + "docs/plans/state.json" + ], + "test_status": "pnpm audit:phase0-baseline:live:refresh:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing" + }, "phase0-instrumentation-ticket-checklist": { "status": "completed", "date": "2026-02-25", @@ -7420,7 +7450,7 @@ "deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters", "deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output", "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, generated anonymized live baseline artifacts for channel/gateway/backend-scoped (pi/native) windows, and added backend artifact freshness/drift gates with persisted drift reports (`phase0_baseline_live_backend_drift_.{md,json}`)", - "next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.", + "next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment (now refreshing channel + gateway + backend-scoped windows together) and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.", "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default", "pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)", "pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing", diff --git a/package.json b/package.json index b00b5e7..53dfde8 100644 --- a/package.json +++ b/package.json @@ -26,7 +26,7 @@ "audit:phase0-baseline:live:pi": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend pi_embedded --exclude-session-substring probe", "audit:phase0-baseline:live:native": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend native --exclude-session-substring probe", "audit:phase0-baseline:live:gateway": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --auto-gateway-cancel-window", - "audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway", + "audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway && pnpm audit:phase0-baseline:live:pi && pnpm audit:phase0-baseline:live:native", "audit:phase0-baseline:live:drift": "node --import tsx/esm scripts/check-phase0-baseline-backend-drift.ts --artifacts-dir docs/plans/artifacts --backend pi_embedded,native --max-age-hours 36 --min-candidate-sampled-events 10 --max-sampled-events-drop-pct 80 --max-run-outcomes-drop-pct 80 --max-completion-rate-drop-pp 35 --max-cancel-rate-increase-pp 25 --max-error-rate-increase-pp 25 --max-cancel-latency-p95-increase-ms 6000 --write-default-artifacts", "audit:phase0-baseline:live:refresh:drift": "pnpm audit:phase0-baseline:live:refresh && pnpm audit:phase0-baseline:live:drift", "audit:backend-canary:probes": "node --import tsx/esm scripts/run-pi-canary-guard-probes.ts",