feat(audit): refresh all phase0 live windows in cadence run

2026-02-27 09:36:22 -08:00
parent e905fe1d56
commit 55f1a3dd7b
19 changed files with 189 additions and 128 deletions
@@ -1635,7 +1635,7 @@ pnpm audit:phase0-baseline:live:pi
 pnpm audit:phase0-baseline:live:native
 ```

-One-shot refresh for both channel + gateway live windows:
+One-shot refresh for all live baseline windows (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`):
 ```bash
 pnpm audit:phase0-baseline:live:refresh
 ```
@@ -23,7 +23,7 @@ The gateway provides:
 - **HTTP Server**: Serves static dashboard and handles webhook endpoints
 - **Node Capability Negotiation**: Optional companion-node role/capability registration

-Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap <path|->`), release-bundle export (`--export-release-bundle <dir>` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle <dir>` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template <dir>`), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of both windows, and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_<UTC-date>.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided.
+Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap <path|->`), release-bundle export (`--export-release-bundle <dir>` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle <dir>` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template <dir>`), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of all live windows (channel + gateway + backend-scoped), and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_<UTC-date>.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided.

 ### Execution Model (Sessions + Per-Session Queue)

@@ -169,7 +169,7 @@ Gateway streaming UX signals:
 - `pnpm audit:phase0-baseline:live` captures anonymized channel-origin live run/reaction baseline artifacts from real audit logs.
 - `pnpm audit:phase0-baseline:live:pi` and `pnpm audit:phase0-baseline:live:native` capture backend-scoped channel windows using `backend.route` timelines.
 - `pnpm audit:phase0-baseline:live:gateway` captures gateway-origin baseline windows by auto-selecting the latest cancel/cancelled session window (or use `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit windows).
- `pnpm audit:phase0-baseline:live:refresh` runs both channel + gateway capture commands in one step for cadence refreshes.
+- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture commands in one cadence step.
 - `pnpm audit:phase0-baseline:live:drift` evaluates backend-scoped artifact freshness/drift gates and writes `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` runs capture + drift checks in one cadence step.
 - `audit:phase0-baseline:live*` scripts are cadence-safe by default (UTC-date tags auto-generated unless explicitly overridden).
 - Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts.
@@ -34,7 +34,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
 - Audit phase-0 live telemetry snapshots can be regenerated with `pnpm audit:phase0-baseline:live` (channel-origin anonymized sample JSONL + summary JSON/markdown artifacts).
 - Backend-scoped channel snapshots can be regenerated with `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native` (`--backend` filtering via `backend.route` timelines).
 - Gateway-origin phase-0 windows (including cancel-path samples) can be captured with `pnpm audit:phase0-baseline:live:gateway` (auto-detect latest cancel window) or `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit bounds.
- `pnpm audit:phase0-baseline:live:refresh` runs both capture paths to refresh channel + gateway artifacts in one command.
+- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture paths in one command.
 - `pnpm audit:phase0-baseline:live:drift` checks backend-scoped artifact freshness/drift gates and writes `phase0_baseline_live_backend_drift_<UTC-date>.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` chains refresh + drift checks for scheduled cadence runs.
 - `audit:phase0-baseline:live*` package scripts now omit fixed tags so scheduled runs automatically roll to current UTC-date artifact tags.
 - Companion CLI supports one-shot shell bootstrap metadata for live sessions (`--app-version`/`--status-text`, `--latitude`/`--longitude`, `--push-token`) so desktop/mobile wrappers can initialize node status/location/push in a single launch flow.
@@ -203,7 +203,7 @@ Phase 0 is complete when:
 2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`.
 3. No user-visible response behavior changed compared to pre-phase baseline.

-Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), both windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (scheduling example included in README), backend-scoped channel windows are now available via `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`, and backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`.
+Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), all live windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (channel + gateway + backend-scoped `pi`/`native`; scheduling example included in README), backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`.

 ## Subagent Model Assignment Plan

@@ -1,8 +1,8 @@
 {
-  "generated_at": "2026-02-27T16:46:42.576Z",
+  "generated_at": "2026-02-27T17:36:01.625Z",
  "source_audit_path": "~/.local/share/flynn/audit.log",
-  "source_event_count": 110,
-  "sampled_event_count": 104,
+  "source_event_count": 115,
+  "sampled_event_count": 109,
  "filters": {
    "sources": [
      "channel"
@@ -22,19 +22,19 @@
  },
  "summary": {
    "event_counts": {
-      "run_state": 65,
+      "run_state": 68,
      "run_cancel": 0,
      "reaction_match": 0,
-      "reaction_skip": 39
+      "reaction_skip": 41
    },
    "run_outcomes": {
      "overall": {
-        "total_outcomes": 27,
-        "complete": 27,
+        "total_outcomes": 28,
+        "complete": 28,
        "cancelled": 0,
        "error": 0,
        "cancel_requested": 0,
-        "start": 38,
+        "start": 40,
        "completion_rate_pct": 100,
        "cancel_rate_pct": 0,
        "error_rate_pct": 0
@@ -43,12 +43,12 @@
        {
          "key": "gmail",
          "stats": {
-            "total_outcomes": 25,
-            "complete": 25,
+            "total_outcomes": 26,
+            "complete": 26,
            "cancelled": 0,
            "error": 0,
            "cancel_requested": 0,
-            "start": 25,
+            "start": 26,
            "completion_rate_pct": 100,
            "cancel_rate_pct": 0,
            "error_rate_pct": 0
@@ -62,7 +62,7 @@
            "cancelled": 0,
            "error": 0,
            "cancel_requested": 0,
-            "start": 13,
+            "start": 14,
            "completion_rate_pct": 100,
            "cancel_rate_pct": 0,
            "error_rate_pct": 0
@@ -112,6 +112,20 @@
            "error_rate_pct": 0
          }
        },
+        {
+          "key": "session_f6304f25e43b",
+          "stats": {
+            "total_outcomes": 2,
+            "complete": 2,
+            "cancelled": 0,
+            "error": 0,
+            "cancel_requested": 0,
+            "start": 2,
+            "completion_rate_pct": 100,
+            "cancel_rate_pct": 0,
+            "error_rate_pct": 0
+          }
+        },
        {
          "key": "session_33469de5a1ee",
          "stats": {
@@ -322,20 +336,6 @@
            "error_rate_pct": 0
          }
        },
-        {
-          "key": "session_f6304f25e43b",
-          "stats": {
-            "total_outcomes": 1,
-            "complete": 1,
-            "cancelled": 0,
-            "error": 0,
-            "cancel_requested": 0,
-            "start": 1,
-            "completion_rate_pct": 100,
-            "cancel_rate_pct": 0,
-            "error_rate_pct": 0
-          }
-        },
        {
          "key": "session_fd6536fa5ff4",
          "stats": {
@@ -355,14 +355,14 @@
    "cancel_latency_ms": null,
    "reactions": {
      "matched": 0,
-      "skipped": 39,
-      "total": 39,
+      "skipped": 41,
+      "total": 41,
      "match_rate_pct": 0,
      "skip_rate_pct": 100,
      "skip_reasons": [
        {
          "reason": "no_rules",
-          "count": 39,
+          "count": 41,
          "pct": 100
        }
      ]
@@ -102,3 +102,8 @@
 {"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772208000012}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252}
+{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324}
+{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036}
@@ -1,27 +1,27 @@
 # Phase 0 Baseline Telemetry Summary

- Run state events: 65
+- Run state events: 68
 - Run cancel events: 0
 - Reaction matches: 0
- Reaction skips: 39
+- Reaction skips: 41

 - Sources: channel

 ## Run Outcomes (Overall)

- Total outcomes: 27
- Complete: 27 (100.00%)
+- Total outcomes: 28
+- Complete: 28 (100.00%)
 - Cancelled: 0 (0.00%)
 - Errors: 0 (0.00%)
 - Cancel requested: 0
- Starts: 38
+- Starts: 40

 ## Run Outcomes by Channel

 | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
-| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 25 |
-| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 13 |
+| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 26 |
+| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 14 |

 ## Run Outcomes by Session

@@ -30,6 +30,7 @@
 | session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 |
 | session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 |
 | session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
+| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
 | session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
@@ -45,7 +46,6 @@
 | session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
-| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |

 ## Cancel Latency
@@ -55,11 +55,11 @@
 ## Reaction Decisions

 - Matched: 0 (0.00%)
- Skipped: 39 (100.00%)
+- Skipped: 41 (100.00%)

 ### Skip Reasons

 | Reason | Count | Percent |
 | --- | ---: | ---: |
-| no_rules | 39 | 100.00% |
+| no_rules | 41 | 100.00% |

@@ -1,5 +1,5 @@
 {
-  "generated_at": "2026-02-27T17:04:49.009Z",
+  "generated_at": "2026-02-27T17:36:02.803Z",
  "artifacts_dir": "/home/will/lab/flynn/docs/plans/artifacts",
  "backends": [
    "pi_embedded",
@@ -29,15 +29,15 @@
      "candidate": {
        "tag": "2026-02-27",
        "path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json",
-        "generated_at": "2026-02-27T16:45:18.488Z"
+        "generated_at": "2026-02-27T17:36:02.214Z"
      },
      "baseline": null,
      "comparison": {
        "baseline": null,
        "candidate": {
-          "source_event_count": 110,
-          "sampled_event_count": 56,
-          "run_total_outcomes": 25,
+          "source_event_count": 115,
+          "sampled_event_count": 59,
+          "run_total_outcomes": 26,
          "completion_rate_pct": 100,
          "cancel_rate_pct": 0,
          "error_rate_pct": 0,
@@ -59,7 +59,7 @@
      "freshness": {
        "enabled": true,
        "pass": true,
-        "actual_age_hours": 0.33,
+        "actual_age_hours": 0,
        "threshold_hours": 36
      },
      "drift_gate": {
@@ -68,7 +68,7 @@
          {
            "criterion": "candidate_sampled_events",
            "pass": true,
-            "actual": "56",
+            "actual": "59",
            "threshold": ">= 10"
          },
          {
@@ -116,21 +116,21 @@
      "candidate": {
        "tag": "2026-02-27",
        "path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json",
-        "generated_at": "2026-02-27T16:45:18.490Z"
+        "generated_at": "2026-02-27T17:36:02.514Z"
      },
      "baseline": null,
      "comparison": {
        "baseline": null,
        "candidate": {
-          "source_event_count": 110,
-          "sampled_event_count": 13,
+          "source_event_count": 115,
+          "sampled_event_count": 15,
          "run_total_outcomes": 2,
          "completion_rate_pct": 100,
          "cancel_rate_pct": 0,
          "error_rate_pct": 0,
          "cancel_latency_p95_ms": null,
-          "reaction_match_rate_pct": null,
-          "reaction_skip_rate_pct": null
+          "reaction_match_rate_pct": 0,
+          "reaction_skip_rate_pct": 100
        },
        "deltas": {
          "sampled_event_count_pct": null,
@@ -146,7 +146,7 @@
      "freshness": {
        "enabled": true,
        "pass": true,
-        "actual_age_hours": 0.33,
+        "actual_age_hours": 0,
        "threshold_hours": 36
      },
      "drift_gate": {
@@ -155,7 +155,7 @@
          {
            "criterion": "candidate_sampled_events",
            "pass": true,
-            "actual": "13",
+            "actual": "15",
            "threshold": ">= 10"
          },
          {
@@ -1,6 +1,6 @@
 # Phase-0 Backend Drift Check

-Generated at: 2026-02-27T17:04:49.009Z
+Generated at: 2026-02-27T17:36:02.803Z
 Artifacts: /home/will/lab/flynn/docs/plans/artifacts
 Backends: pi_embedded, native
 Freshness max age (hours): 36
@@ -19,9 +19,9 @@ Overall gate: PASS
 ## pi_embedded
 - status: PASS
 - candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json
- candidate generated_at: 2026-02-27T16:45:18.488Z
+- candidate generated_at: 2026-02-27T17:36:02.214Z
 - baseline: none
- candidate snapshot: sampled=56 outcomes=25 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
+- candidate snapshot: sampled=59 outcomes=26 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
 - deltas:
  sampled_event_count_pct=n/a
  run_total_outcomes_pct=n/a
@@ -31,9 +31,9 @@ Overall gate: PASS
  cancel_latency_p95_ms=n/a
  reaction_match_rate_pp=n/a
  reaction_skip_rate_pp=n/a
- freshness gate: PASS (age_hours=0.33 threshold=36)
+- freshness gate: PASS (age_hours=0 threshold=36)
 - drift gate: PASS
-  PASS candidate_sampled_events actual=56 threshold=>= 10
+  PASS candidate_sampled_events actual=59 threshold=>= 10
  PASS sampled_events_drop_pct actual=n/a threshold=<= 80
  PASS run_outcomes_drop_pct actual=n/a threshold=<= 80
  PASS completion_rate_drop_pp actual=n/a threshold=<= 35
@@ -44,9 +44,9 @@ Overall gate: PASS
 ## native
 - status: PASS
 - candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json
- candidate generated_at: 2026-02-27T16:45:18.490Z
+- candidate generated_at: 2026-02-27T17:36:02.514Z
 - baseline: none
- candidate snapshot: sampled=13 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
+- candidate snapshot: sampled=15 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
 - deltas:
  sampled_event_count_pct=n/a
  run_total_outcomes_pct=n/a
@@ -56,9 +56,9 @@ Overall gate: PASS
  cancel_latency_p95_ms=n/a
  reaction_match_rate_pp=n/a
  reaction_skip_rate_pp=n/a
- freshness gate: PASS (age_hours=0.33 threshold=36)
+- freshness gate: PASS (age_hours=0 threshold=36)
 - drift gate: PASS
-  PASS candidate_sampled_events actual=13 threshold=>= 10
+  PASS candidate_sampled_events actual=15 threshold=>= 10
  PASS sampled_events_drop_pct actual=n/a threshold=<= 80
  PASS run_outcomes_drop_pct actual=n/a threshold=<= 80
  PASS completion_rate_drop_pp actual=n/a threshold=<= 35
@@ -1,8 +1,8 @@
 {
-  "generated_at": "2026-02-27T16:45:18.490Z",
+  "generated_at": "2026-02-27T17:36:02.514Z",
  "source_audit_path": "~/.local/share/flynn/audit.log",
-  "source_event_count": 110,
-  "sampled_event_count": 13,
+  "source_event_count": 115,
+  "sampled_event_count": 15,
  "filters": {
    "sources": [
      "channel"
@@ -14,7 +14,7 @@
      "probe"
    ],
    "anonymized_identifiers": true,
-    "backend_route_event_count": 127
+    "backend_route_event_count": 129
  },
  "options": {
    "sources": [
@@ -26,10 +26,10 @@
  },
  "summary": {
    "event_counts": {
-      "run_state": 13,
+      "run_state": 14,
      "run_cancel": 0,
      "reaction_match": 0,
-      "reaction_skip": 0
+      "reaction_skip": 1
    },
    "run_outcomes": {
      "overall": {
@@ -38,7 +38,7 @@
        "cancelled": 0,
        "error": 0,
        "cancel_requested": 0,
-        "start": 11,
+        "start": 12,
        "completion_rate_pct": 100,
        "cancel_rate_pct": 0,
        "error_rate_pct": 0
@@ -52,7 +52,7 @@
            "cancelled": 0,
            "error": 0,
            "cancel_requested": 0,
-            "start": 11,
+            "start": 12,
            "completion_rate_pct": 100,
            "cancel_rate_pct": 0,
            "error_rate_pct": 0
@@ -172,6 +172,20 @@
            "error_rate_pct": null
          }
        },
+        {
+          "key": "session_534570702ea5",
+          "stats": {
+            "total_outcomes": 0,
+            "complete": 0,
+            "cancelled": 0,
+            "error": 0,
+            "cancel_requested": 0,
+            "start": 1,
+            "completion_rate_pct": null,
+            "cancel_rate_pct": null,
+            "error_rate_pct": null
+          }
+        },
        {
          "key": "session_683372f346c3",
          "stats": {
@@ -219,11 +233,17 @@
    "cancel_latency_ms": null,
    "reactions": {
      "matched": 0,
-      "skipped": 0,
-      "total": 0,
-      "match_rate_pct": null,
-      "skip_rate_pct": null,
-      "skip_reasons": []
+      "skipped": 1,
+      "total": 1,
+      "match_rate_pct": 0,
+      "skip_rate_pct": 100,
+      "skip_reasons": [
+        {
+          "reason": "no_rules",
+          "count": 1,
+          "pct": 100
+        }
+      ]
    }
  }
 }
@@ -11,3 +11,5 @@
 {"level":"info","event_type":"run.state","event":{"session_id":"session_a3f64a8e3c1e","channel":"cron","sender":"sender_a31bd6d4a95a","source":"channel","state":"start","request_id":"request_fc572d83d4c6"},"timestamp":1772182800034}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252}
+{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036}
@@ -1,9 +1,9 @@
 # Phase 0 Baseline Telemetry Summary

- Run state events: 13
+- Run state events: 14
 - Run cancel events: 0
 - Reaction matches: 0
- Reaction skips: 0
+- Reaction skips: 1

 - Sources: channel

@@ -14,13 +14,13 @@
 - Cancelled: 0 (0.00%)
 - Errors: 0 (0.00%)
 - Cancel requested: 0
- Starts: 11
+- Starts: 12

 ## Run Outcomes by Channel

 | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
-| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 11 |
+| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 12 |

 ## Run Outcomes by Session

@@ -34,6 +34,7 @@
 | session_494cb3b392af | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
 | session_49b700741e03 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
 | session_4cd8ba5e6df5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
+| session_534570702ea5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
 | session_683372f346c3 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
 | session_a3f64a8e3c1e | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
 | session_ffcee254d546 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
@@ -44,12 +45,12 @@

 ## Reaction Decisions

- Matched: 0 (n/a)
- Skipped: 0 (n/a)
+- Matched: 0 (0.00%)
+- Skipped: 1 (100.00%)

 ### Skip Reasons

 | Reason | Count | Percent |
 | --- | ---: | ---: |
-| _none_ | 0 | 0.00% |
+| no_rules | 1 | 100.00% |

@@ -1,8 +1,8 @@
 {
-  "generated_at": "2026-02-27T16:45:18.488Z",
+  "generated_at": "2026-02-27T17:36:02.214Z",
  "source_audit_path": "~/.local/share/flynn/audit.log",
-  "source_event_count": 110,
-  "sampled_event_count": 56,
+  "source_event_count": 115,
+  "sampled_event_count": 59,
  "filters": {
    "sources": [
      "channel"
@@ -14,7 +14,7 @@
      "probe"
    ],
    "anonymized_identifiers": true,
-    "backend_route_event_count": 127
+    "backend_route_event_count": 129
  },
  "options": {
    "sources": [
@@ -26,19 +26,19 @@
  },
  "summary": {
    "event_counts": {
-      "run_state": 42,
+      "run_state": 44,
      "run_cancel": 0,
      "reaction_match": 0,
-      "reaction_skip": 14
+      "reaction_skip": 15
    },
    "run_outcomes": {
      "overall": {
-        "total_outcomes": 25,
-        "complete": 25,
+        "total_outcomes": 26,
+        "complete": 26,
        "cancelled": 0,
        "error": 0,
        "cancel_requested": 0,
-        "start": 17,
+        "start": 18,
        "completion_rate_pct": 100,
        "cancel_rate_pct": 0,
        "error_rate_pct": 0
@@ -47,12 +47,12 @@
        {
          "key": "gmail",
          "stats": {
-            "total_outcomes": 25,
-            "complete": 25,
+            "total_outcomes": 26,
+            "complete": 26,
            "cancelled": 0,
            "error": 0,
            "cancel_requested": 0,
-            "start": 17,
+            "start": 18,
            "completion_rate_pct": 100,
            "cancel_rate_pct": 0,
            "error_rate_pct": 0
@@ -102,6 +102,20 @@
            "error_rate_pct": 0
          }
        },
+        {
+          "key": "session_f6304f25e43b",
+          "stats": {
+            "total_outcomes": 2,
+            "complete": 2,
+            "cancelled": 0,
+            "error": 0,
+            "cancel_requested": 0,
+            "start": 2,
+            "completion_rate_pct": 100,
+            "cancel_rate_pct": 0,
+            "error_rate_pct": 0
+          }
+        },
        {
          "key": "session_33469de5a1ee",
          "stats": {
@@ -284,20 +298,6 @@
            "error_rate_pct": 0
          }
        },
-        {
-          "key": "session_f6304f25e43b",
-          "stats": {
-            "total_outcomes": 1,
-            "complete": 1,
-            "cancelled": 0,
-            "error": 0,
-            "cancel_requested": 0,
-            "start": 1,
-            "completion_rate_pct": 100,
-            "cancel_rate_pct": 0,
-            "error_rate_pct": 0
-          }
-        },
        {
          "key": "session_fd6536fa5ff4",
          "stats": {
@@ -317,14 +317,14 @@
    "cancel_latency_ms": null,
    "reactions": {
      "matched": 0,
-      "skipped": 14,
-      "total": 14,
+      "skipped": 15,
+      "total": 15,
      "match_rate_pct": 0,
      "skip_rate_pct": 100,
      "skip_reasons": [
        {
          "reason": "no_rules",
-          "count": 14,
+          "count": 15,
          "pct": 100
        }
      ]
@@ -54,3 +54,6 @@
 {"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772206157229}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"start","request_id":"request_ab73d670c119"},"timestamp":1772206157229}
 {"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"complete","request_id":"request_ab73d670c119","duration_ms":3850},"timestamp":1772206161079}
+{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454}
+{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324}
@@ -1,26 +1,26 @@
 # Phase 0 Baseline Telemetry Summary

- Run state events: 42
+- Run state events: 44
 - Run cancel events: 0
 - Reaction matches: 0
- Reaction skips: 14
+- Reaction skips: 15

 - Sources: channel

 ## Run Outcomes (Overall)

- Total outcomes: 25
- Complete: 25 (100.00%)
+- Total outcomes: 26
+- Complete: 26 (100.00%)
 - Cancelled: 0 (0.00%)
 - Errors: 0 (0.00%)
 - Cancel requested: 0
- Starts: 17
+- Starts: 18

 ## Run Outcomes by Channel

 | Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
 | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
-| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 17 |
+| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 18 |

 ## Run Outcomes by Session

@@ -29,6 +29,7 @@
 | session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 |
 | session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 |
 | session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
+| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
 | session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 |
 | session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 |
@@ -42,7 +43,6 @@
 | session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
-| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
 | session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |

 ## Cancel Latency
@@ -52,11 +52,11 @@
 ## Reaction Decisions

 - Matched: 0 (0.00%)
- Skipped: 14 (100.00%)
+- Skipped: 15 (100.00%)

 ### Skip Reasons

 | Reason | Count | Percent |
 | --- | ---: | ---: |
-| no_rules | 14 | 100.00% |
+| no_rules | 15 | 100.00% |

@@ -1,5 +1,5 @@
 {
-  "generated_at": "2026-02-27T16:46:42.880Z",
+  "generated_at": "2026-02-27T17:36:01.922Z",
  "source_audit_path": "~/.local/share/flynn/audit.log",
  "source_event_count": 6,
  "sampled_event_count": 6,
@@ -234,6 +234,36 @@
      ],
      "test_status": "pnpm audit:phase0-baseline:live:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing"
    },
+    "phase0-live-baseline-refresh-full-window": {
+      "status": "completed",
+      "date": "2026-02-27",
+      "updated": "2026-02-27",
+      "summary": "Expanded `pnpm audit:phase0-baseline:live:refresh` to regenerate all live windows in one command (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`) so scheduled `refresh:drift` runs keep backend artifacts fresh for baseline-vs-prior comparisons.",
+      "files_modified": [
+        "package.json",
+        "README.md",
+        "docs/api/PROTOCOL.md",
+        "docs/architecture/AGENT_DIAGRAM.md",
+        "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
+        "docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md",
+        "docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl",
+        "docs/plans/artifacts/phase0_baseline_live_2026-02-27.md",
+        "docs/plans/artifacts/phase0_baseline_live_2026-02-27.json",
+        "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl",
+        "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md",
+        "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json",
+        "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl",
+        "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md",
+        "docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json",
+        "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl",
+        "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md",
+        "docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json",
+        "docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md",
+        "docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json",
+        "docs/plans/state.json"
+      ],
+      "test_status": "pnpm audit:phase0-baseline:live:refresh:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing"
+    },
    "phase0-instrumentation-ticket-checklist": {
      "status": "completed",
      "date": "2026-02-25",
@@ -7420,7 +7450,7 @@
    "deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters",
    "deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output",
    "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, generated anonymized live baseline artifacts for channel/gateway/backend-scoped (pi/native) windows, and added backend artifact freshness/drift gates with persisted drift reports (`phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`)",
-    "next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.",
+    "next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment (now refreshing channel + gateway + backend-scoped windows together) and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.",
    "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default",
    "pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)",
    "pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing",
@@ -26,7 +26,7 @@
    "audit:phase0-baseline:live:pi": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend pi_embedded --exclude-session-substring probe",
    "audit:phase0-baseline:live:native": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend native --exclude-session-substring probe",
    "audit:phase0-baseline:live:gateway": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --auto-gateway-cancel-window",
-    "audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway",
+    "audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway && pnpm audit:phase0-baseline:live:pi && pnpm audit:phase0-baseline:live:native",
    "audit:phase0-baseline:live:drift": "node --import tsx/esm scripts/check-phase0-baseline-backend-drift.ts --artifacts-dir docs/plans/artifacts --backend pi_embedded,native --max-age-hours 36 --min-candidate-sampled-events 10 --max-sampled-events-drop-pct 80 --max-run-outcomes-drop-pct 80 --max-completion-rate-drop-pp 35 --max-cancel-rate-increase-pp 25 --max-error-rate-increase-pp 25 --max-cancel-latency-p95-increase-ms 6000 --write-default-artifacts",
    "audit:phase0-baseline:live:refresh:drift": "pnpm audit:phase0-baseline:live:refresh && pnpm audit:phase0-baseline:live:drift",
    "audit:backend-canary:probes": "node --import tsx/esm scripts/run-pi-canary-guard-probes.ts",