feat(audit): refresh all phase0 live windows in cadence run

This commit is contained in:
William Valentin
2026-02-27 09:36:22 -08:00
parent e905fe1d56
commit 55f1a3dd7b
19 changed files with 189 additions and 128 deletions
+1 -1
View File
@@ -1635,7 +1635,7 @@ pnpm audit:phase0-baseline:live:pi
pnpm audit:phase0-baseline:live:native
```
One-shot refresh for both channel + gateway live windows:
One-shot refresh for all live baseline windows (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`):
```bash
pnpm audit:phase0-baseline:live:refresh
```
+1 -1
View File
@@ -23,7 +23,7 @@ The gateway provides:
- **HTTP Server**: Serves static dashboard and handles webhook endpoints
- **Node Capability Negotiation**: Optional companion-node role/capability registration
Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap <path|->`), release-bundle export (`--export-release-bundle <dir>` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle <dir>` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template <dir>`), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of both windows, and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_<UTC-date>.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided.
Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap <path|->`), release-bundle export (`--export-release-bundle <dir>` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle <dir>` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template <dir>`), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows, backend-scoped variants (`pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`) via `--backend`, `pnpm audit:phase0-baseline:live:gateway` (auto-detected cancel window) for gateway-origin windows, `pnpm audit:phase0-baseline:live:refresh` for one-shot refresh of all live windows (channel + gateway + backend-scoped), and `pnpm audit:phase0-baseline:live:drift` for backend artifact freshness/drift gates (writing `phase0_baseline_live_backend_drift_<UTC-date>.md/.json` reports). These scripts default to current UTC-date tags unless `--tag` is explicitly provided.
### Execution Model (Sessions + Per-Session Queue)
+1 -1
View File
@@ -169,7 +169,7 @@ Gateway streaming UX signals:
- `pnpm audit:phase0-baseline:live` captures anonymized channel-origin live run/reaction baseline artifacts from real audit logs.
- `pnpm audit:phase0-baseline:live:pi` and `pnpm audit:phase0-baseline:live:native` capture backend-scoped channel windows using `backend.route` timelines.
- `pnpm audit:phase0-baseline:live:gateway` captures gateway-origin baseline windows by auto-selecting the latest cancel/cancelled session window (or use `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit windows).
- `pnpm audit:phase0-baseline:live:refresh` runs both channel + gateway capture commands in one step for cadence refreshes.
- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture commands in one cadence step.
- `pnpm audit:phase0-baseline:live:drift` evaluates backend-scoped artifact freshness/drift gates and writes `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` runs capture + drift checks in one cadence step.
- `audit:phase0-baseline:live*` scripts are cadence-safe by default (UTC-date tags auto-generated unless explicitly overridden).
- Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts.
@@ -34,7 +34,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
- Audit phase-0 live telemetry snapshots can be regenerated with `pnpm audit:phase0-baseline:live` (channel-origin anonymized sample JSONL + summary JSON/markdown artifacts).
- Backend-scoped channel snapshots can be regenerated with `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native` (`--backend` filtering via `backend.route` timelines).
- Gateway-origin phase-0 windows (including cancel-path samples) can be captured with `pnpm audit:phase0-baseline:live:gateway` (auto-detect latest cancel window) or `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for explicit bounds.
- `pnpm audit:phase0-baseline:live:refresh` runs both capture paths to refresh channel + gateway artifacts in one command.
- `pnpm audit:phase0-baseline:live:refresh` runs channel + gateway + backend-scoped (`pi_embedded` and `native`) capture paths in one command.
- `pnpm audit:phase0-baseline:live:drift` checks backend-scoped artifact freshness/drift gates and writes `phase0_baseline_live_backend_drift_<UTC-date>.md/.json`; `pnpm audit:phase0-baseline:live:refresh:drift` chains refresh + drift checks for scheduled cadence runs.
- `audit:phase0-baseline:live*` package scripts now omit fixed tags so scheduled runs automatically roll to current UTC-date artifact tags.
- Companion CLI supports one-shot shell bootstrap metadata for live sessions (`--app-version`/`--status-text`, `--latitude`/`--longitude`, `--push-token`) so desktop/mobile wrappers can initialize node status/location/push in a single launch flow.
@@ -203,7 +203,7 @@ Phase 0 is complete when:
2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`.
3. No user-visible response behavior changed compared to pre-phase baseline.
Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), both windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (scheduling example included in README), backend-scoped channel windows are now available via `pnpm audit:phase0-baseline:live:pi` / `pnpm audit:phase0-baseline:live:native`, and backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`.
Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), all live windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (channel + gateway + backend-scoped `pi`/`native`; scheduling example included in README), backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`.
## Subagent Model Assignment Plan
@@ -1,8 +1,8 @@
{
"generated_at": "2026-02-27T16:46:42.576Z",
"generated_at": "2026-02-27T17:36:01.625Z",
"source_audit_path": "~/.local/share/flynn/audit.log",
"source_event_count": 110,
"sampled_event_count": 104,
"source_event_count": 115,
"sampled_event_count": 109,
"filters": {
"sources": [
"channel"
@@ -22,19 +22,19 @@
},
"summary": {
"event_counts": {
"run_state": 65,
"run_state": 68,
"run_cancel": 0,
"reaction_match": 0,
"reaction_skip": 39
"reaction_skip": 41
},
"run_outcomes": {
"overall": {
"total_outcomes": 27,
"complete": 27,
"total_outcomes": 28,
"complete": 28,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 38,
"start": 40,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -43,12 +43,12 @@
{
"key": "gmail",
"stats": {
"total_outcomes": 25,
"complete": 25,
"total_outcomes": 26,
"complete": 26,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 25,
"start": 26,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -62,7 +62,7 @@
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 13,
"start": 14,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -112,6 +112,20 @@
"error_rate_pct": 0
}
},
{
"key": "session_f6304f25e43b",
"stats": {
"total_outcomes": 2,
"complete": 2,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 2,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
}
},
{
"key": "session_33469de5a1ee",
"stats": {
@@ -322,20 +336,6 @@
"error_rate_pct": 0
}
},
{
"key": "session_f6304f25e43b",
"stats": {
"total_outcomes": 1,
"complete": 1,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 1,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
}
},
{
"key": "session_fd6536fa5ff4",
"stats": {
@@ -355,14 +355,14 @@
"cancel_latency_ms": null,
"reactions": {
"matched": 0,
"skipped": 39,
"total": 39,
"skipped": 41,
"total": 41,
"match_rate_pct": 0,
"skip_rate_pct": 100,
"skip_reasons": [
{
"reason": "no_rules",
"count": 39,
"count": 41,
"pct": 100
}
]
@@ -102,3 +102,8 @@
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772208000012}
{"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013}
{"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252}
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454}
{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454}
{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324}
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036}
{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036}
@@ -1,27 +1,27 @@
# Phase 0 Baseline Telemetry Summary
- Run state events: 65
- Run state events: 68
- Run cancel events: 0
- Reaction matches: 0
- Reaction skips: 39
- Reaction skips: 41
- Sources: channel
## Run Outcomes (Overall)
- Total outcomes: 27
- Complete: 27 (100.00%)
- Total outcomes: 28
- Complete: 28 (100.00%)
- Cancelled: 0 (0.00%)
- Errors: 0 (0.00%)
- Cancel requested: 0
- Starts: 38
- Starts: 40
## Run Outcomes by Channel
| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 25 |
| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 13 |
| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 26 |
| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 14 |
## Run Outcomes by Session
@@ -30,6 +30,7 @@
| session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 |
| session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 |
| session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
| session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
@@ -45,7 +46,6 @@
| session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
## Cancel Latency
@@ -55,11 +55,11 @@
## Reaction Decisions
- Matched: 0 (0.00%)
- Skipped: 39 (100.00%)
- Skipped: 41 (100.00%)
### Skip Reasons
| Reason | Count | Percent |
| --- | ---: | ---: |
| no_rules | 39 | 100.00% |
| no_rules | 41 | 100.00% |
@@ -1,5 +1,5 @@
{
"generated_at": "2026-02-27T17:04:49.009Z",
"generated_at": "2026-02-27T17:36:02.803Z",
"artifacts_dir": "/home/will/lab/flynn/docs/plans/artifacts",
"backends": [
"pi_embedded",
@@ -29,15 +29,15 @@
"candidate": {
"tag": "2026-02-27",
"path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json",
"generated_at": "2026-02-27T16:45:18.488Z"
"generated_at": "2026-02-27T17:36:02.214Z"
},
"baseline": null,
"comparison": {
"baseline": null,
"candidate": {
"source_event_count": 110,
"sampled_event_count": 56,
"run_total_outcomes": 25,
"source_event_count": 115,
"sampled_event_count": 59,
"run_total_outcomes": 26,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0,
@@ -59,7 +59,7 @@
"freshness": {
"enabled": true,
"pass": true,
"actual_age_hours": 0.33,
"actual_age_hours": 0,
"threshold_hours": 36
},
"drift_gate": {
@@ -68,7 +68,7 @@
{
"criterion": "candidate_sampled_events",
"pass": true,
"actual": "56",
"actual": "59",
"threshold": ">= 10"
},
{
@@ -116,21 +116,21 @@
"candidate": {
"tag": "2026-02-27",
"path": "/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json",
"generated_at": "2026-02-27T16:45:18.490Z"
"generated_at": "2026-02-27T17:36:02.514Z"
},
"baseline": null,
"comparison": {
"baseline": null,
"candidate": {
"source_event_count": 110,
"sampled_event_count": 13,
"source_event_count": 115,
"sampled_event_count": 15,
"run_total_outcomes": 2,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0,
"cancel_latency_p95_ms": null,
"reaction_match_rate_pct": null,
"reaction_skip_rate_pct": null
"reaction_match_rate_pct": 0,
"reaction_skip_rate_pct": 100
},
"deltas": {
"sampled_event_count_pct": null,
@@ -146,7 +146,7 @@
"freshness": {
"enabled": true,
"pass": true,
"actual_age_hours": 0.33,
"actual_age_hours": 0,
"threshold_hours": 36
},
"drift_gate": {
@@ -155,7 +155,7 @@
{
"criterion": "candidate_sampled_events",
"pass": true,
"actual": "13",
"actual": "15",
"threshold": ">= 10"
},
{
@@ -1,6 +1,6 @@
# Phase-0 Backend Drift Check
Generated at: 2026-02-27T17:04:49.009Z
Generated at: 2026-02-27T17:36:02.803Z
Artifacts: /home/will/lab/flynn/docs/plans/artifacts
Backends: pi_embedded, native
Freshness max age (hours): 36
@@ -19,9 +19,9 @@ Overall gate: PASS
## pi_embedded
- status: PASS
- candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json
- candidate generated_at: 2026-02-27T16:45:18.488Z
- candidate generated_at: 2026-02-27T17:36:02.214Z
- baseline: none
- candidate snapshot: sampled=56 outcomes=25 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
- candidate snapshot: sampled=59 outcomes=26 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
- deltas:
sampled_event_count_pct=n/a
run_total_outcomes_pct=n/a
@@ -31,9 +31,9 @@ Overall gate: PASS
cancel_latency_p95_ms=n/a
reaction_match_rate_pp=n/a
reaction_skip_rate_pp=n/a
- freshness gate: PASS (age_hours=0.33 threshold=36)
- freshness gate: PASS (age_hours=0 threshold=36)
- drift gate: PASS
PASS candidate_sampled_events actual=56 threshold=>= 10
PASS candidate_sampled_events actual=59 threshold=>= 10
PASS sampled_events_drop_pct actual=n/a threshold=<= 80
PASS run_outcomes_drop_pct actual=n/a threshold=<= 80
PASS completion_rate_drop_pp actual=n/a threshold=<= 35
@@ -44,9 +44,9 @@ Overall gate: PASS
## native
- status: PASS
- candidate: tag=2026-02-27 file=/home/will/lab/flynn/docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json
- candidate generated_at: 2026-02-27T16:45:18.490Z
- candidate generated_at: 2026-02-27T17:36:02.514Z
- baseline: none
- candidate snapshot: sampled=13 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
- candidate snapshot: sampled=15 outcomes=2 completion=100% cancel=0% error=0% cancel_p95_ms=n/a
- deltas:
sampled_event_count_pct=n/a
run_total_outcomes_pct=n/a
@@ -56,9 +56,9 @@ Overall gate: PASS
cancel_latency_p95_ms=n/a
reaction_match_rate_pp=n/a
reaction_skip_rate_pp=n/a
- freshness gate: PASS (age_hours=0.33 threshold=36)
- freshness gate: PASS (age_hours=0 threshold=36)
- drift gate: PASS
PASS candidate_sampled_events actual=13 threshold=>= 10
PASS candidate_sampled_events actual=15 threshold=>= 10
PASS sampled_events_drop_pct actual=n/a threshold=<= 80
PASS run_outcomes_drop_pct actual=n/a threshold=<= 80
PASS completion_rate_drop_pp actual=n/a threshold=<= 35
@@ -1,8 +1,8 @@
{
"generated_at": "2026-02-27T16:45:18.490Z",
"generated_at": "2026-02-27T17:36:02.514Z",
"source_audit_path": "~/.local/share/flynn/audit.log",
"source_event_count": 110,
"sampled_event_count": 13,
"source_event_count": 115,
"sampled_event_count": 15,
"filters": {
"sources": [
"channel"
@@ -14,7 +14,7 @@
"probe"
],
"anonymized_identifiers": true,
"backend_route_event_count": 127
"backend_route_event_count": 129
},
"options": {
"sources": [
@@ -26,10 +26,10 @@
},
"summary": {
"event_counts": {
"run_state": 13,
"run_state": 14,
"run_cancel": 0,
"reaction_match": 0,
"reaction_skip": 0
"reaction_skip": 1
},
"run_outcomes": {
"overall": {
@@ -38,7 +38,7 @@
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 11,
"start": 12,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -52,7 +52,7 @@
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 11,
"start": 12,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -172,6 +172,20 @@
"error_rate_pct": null
}
},
{
"key": "session_534570702ea5",
"stats": {
"total_outcomes": 0,
"complete": 0,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 1,
"completion_rate_pct": null,
"cancel_rate_pct": null,
"error_rate_pct": null
}
},
{
"key": "session_683372f346c3",
"stats": {
@@ -219,11 +233,17 @@
"cancel_latency_ms": null,
"reactions": {
"matched": 0,
"skipped": 0,
"total": 0,
"match_rate_pct": null,
"skip_rate_pct": null,
"skip_reasons": []
"skipped": 1,
"total": 1,
"match_rate_pct": 0,
"skip_rate_pct": 100,
"skip_reasons": [
{
"reason": "no_rules",
"count": 1,
"pct": 100
}
]
}
}
}
@@ -11,3 +11,5 @@
{"level":"info","event_type":"run.state","event":{"session_id":"session_a3f64a8e3c1e","channel":"cron","sender":"sender_a31bd6d4a95a","source":"channel","state":"start","request_id":"request_fc572d83d4c6"},"timestamp":1772182800034}
{"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"start","request_id":"request_a3bafbb93755"},"timestamp":1772208000013}
{"level":"info","event_type":"run.state","event":{"session_id":"session_5ae4ad331184","channel":"cron","sender":"sender_a912a223d950","source":"channel","state":"complete","request_id":"request_a3bafbb93755","duration_ms":35239},"timestamp":1772208035252}
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211600036}
{"level":"info","event_type":"run.state","event":{"session_id":"session_534570702ea5","channel":"cron","sender":"sender_552aeb8f1b32","source":"channel","state":"start","request_id":"request_c0a9fc76c188"},"timestamp":1772211600036}
@@ -1,9 +1,9 @@
# Phase 0 Baseline Telemetry Summary
- Run state events: 13
- Run state events: 14
- Run cancel events: 0
- Reaction matches: 0
- Reaction skips: 0
- Reaction skips: 1
- Sources: channel
@@ -14,13 +14,13 @@
- Cancelled: 0 (0.00%)
- Errors: 0 (0.00%)
- Cancel requested: 0
- Starts: 11
- Starts: 12
## Run Outcomes by Channel
| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 11 |
| cron | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 12 |
## Run Outcomes by Session
@@ -34,6 +34,7 @@
| session_494cb3b392af | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_49b700741e03 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_4cd8ba5e6df5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_534570702ea5 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_683372f346c3 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_a3f64a8e3c1e | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
| session_ffcee254d546 | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 1 |
@@ -44,12 +45,12 @@
## Reaction Decisions
- Matched: 0 (n/a)
- Skipped: 0 (n/a)
- Matched: 0 (0.00%)
- Skipped: 1 (100.00%)
### Skip Reasons
| Reason | Count | Percent |
| --- | ---: | ---: |
| _none_ | 0 | 0.00% |
| no_rules | 1 | 100.00% |
@@ -1,8 +1,8 @@
{
"generated_at": "2026-02-27T16:45:18.488Z",
"generated_at": "2026-02-27T17:36:02.214Z",
"source_audit_path": "~/.local/share/flynn/audit.log",
"source_event_count": 110,
"sampled_event_count": 56,
"source_event_count": 115,
"sampled_event_count": 59,
"filters": {
"sources": [
"channel"
@@ -14,7 +14,7 @@
"probe"
],
"anonymized_identifiers": true,
"backend_route_event_count": 127
"backend_route_event_count": 129
},
"options": {
"sources": [
@@ -26,19 +26,19 @@
},
"summary": {
"event_counts": {
"run_state": 42,
"run_state": 44,
"run_cancel": 0,
"reaction_match": 0,
"reaction_skip": 14
"reaction_skip": 15
},
"run_outcomes": {
"overall": {
"total_outcomes": 25,
"complete": 25,
"total_outcomes": 26,
"complete": 26,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 17,
"start": 18,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -47,12 +47,12 @@
{
"key": "gmail",
"stats": {
"total_outcomes": 25,
"complete": 25,
"total_outcomes": 26,
"complete": 26,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 17,
"start": 18,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
@@ -102,6 +102,20 @@
"error_rate_pct": 0
}
},
{
"key": "session_f6304f25e43b",
"stats": {
"total_outcomes": 2,
"complete": 2,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 2,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
}
},
{
"key": "session_33469de5a1ee",
"stats": {
@@ -284,20 +298,6 @@
"error_rate_pct": 0
}
},
{
"key": "session_f6304f25e43b",
"stats": {
"total_outcomes": 1,
"complete": 1,
"cancelled": 0,
"error": 0,
"cancel_requested": 0,
"start": 1,
"completion_rate_pct": 100,
"cancel_rate_pct": 0,
"error_rate_pct": 0
}
},
{
"key": "session_fd6536fa5ff4",
"stats": {
@@ -317,14 +317,14 @@
"cancel_latency_ms": null,
"reactions": {
"matched": 0,
"skipped": 14,
"total": 14,
"skipped": 15,
"total": 15,
"match_rate_pct": 0,
"skip_rate_pct": 100,
"skip_reasons": [
{
"reason": "no_rules",
"count": 14,
"count": 15,
"pct": 100
}
]
@@ -54,3 +54,6 @@
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772206157229}
{"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"start","request_id":"request_ab73d670c119"},"timestamp":1772206157229}
{"level":"info","event_type":"run.state","event":{"session_id":"session_2f2f1e414e81","channel":"gmail","sender":"sender_323cedc3233a","source":"channel","state":"complete","request_id":"request_ab73d670c119","duration_ms":3850},"timestamp":1772206161079}
{"level":"debug","event_type":"reaction.skip","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","reason":"no_rules","candidate_count":0},"timestamp":1772211257454}
{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"start","request_id":"request_607c64c2760f"},"timestamp":1772211257454}
{"level":"info","event_type":"run.state","event":{"session_id":"session_f6304f25e43b","channel":"gmail","sender":"sender_311c7608cc58","source":"channel","state":"complete","request_id":"request_607c64c2760f","duration_ms":3870},"timestamp":1772211261324}
@@ -1,26 +1,26 @@
# Phase 0 Baseline Telemetry Summary
- Run state events: 42
- Run state events: 44
- Run cancel events: 0
- Reaction matches: 0
- Reaction skips: 14
- Reaction skips: 15
- Sources: channel
## Run Outcomes (Overall)
- Total outcomes: 25
- Complete: 25 (100.00%)
- Total outcomes: 26
- Complete: 26 (100.00%)
- Cancelled: 0 (0.00%)
- Errors: 0 (0.00%)
- Cancel requested: 0
- Starts: 17
- Starts: 18
## Run Outcomes by Channel
| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| gmail | 25 | 25 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 17 |
| gmail | 26 | 26 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 18 |
## Run Outcomes by Session
@@ -29,6 +29,7 @@
| session_2f2f1e414e81 | 5 | 5 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 5 |
| session_f4d8ddc04194 | 3 | 3 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 3 |
| session_eabc3c2a91b9 | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_f6304f25e43b | 2 | 2 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 2 |
| session_33469de5a1ee | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_3ffb2e631ab1 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 |
| session_4d9e843358a3 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 0 |
@@ -42,7 +43,6 @@
| session_cb9a69d8a362 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_e0a2a17b7329 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_ea839415979e | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_f6304f25e43b | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
| session_fd6536fa5ff4 | 1 | 1 | 0 | 0 | 100.00% | 0.00% | 0.00% | 0 | 1 |
## Cancel Latency
@@ -52,11 +52,11 @@
## Reaction Decisions
- Matched: 0 (0.00%)
- Skipped: 14 (100.00%)
- Skipped: 15 (100.00%)
### Skip Reasons
| Reason | Count | Percent |
| --- | ---: | ---: |
| no_rules | 14 | 100.00% |
| no_rules | 15 | 100.00% |
@@ -1,5 +1,5 @@
{
"generated_at": "2026-02-27T16:46:42.880Z",
"generated_at": "2026-02-27T17:36:01.922Z",
"source_audit_path": "~/.local/share/flynn/audit.log",
"source_event_count": 6,
"sampled_event_count": 6,
+31 -1
View File
@@ -234,6 +234,36 @@
],
"test_status": "pnpm audit:phase0-baseline:live:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing"
},
"phase0-live-baseline-refresh-full-window": {
"status": "completed",
"date": "2026-02-27",
"updated": "2026-02-27",
"summary": "Expanded `pnpm audit:phase0-baseline:live:refresh` to regenerate all live windows in one command (channel, gateway, backend-scoped `pi_embedded`, backend-scoped `native`) so scheduled `refresh:drift` runs keep backend artifacts fresh for baseline-vs-prior comparisons.",
"files_modified": [
"package.json",
"README.md",
"docs/api/PROTOCOL.md",
"docs/architecture/AGENT_DIAGRAM.md",
"docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
"docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md",
"docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl",
"docs/plans/artifacts/phase0_baseline_live_2026-02-27.md",
"docs/plans/artifacts/phase0_baseline_live_2026-02-27.json",
"docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl",
"docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md",
"docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json",
"docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.jsonl",
"docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.md",
"docs/plans/artifacts/phase0_baseline_live_backend_pi_embedded_2026-02-27.json",
"docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.jsonl",
"docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.md",
"docs/plans/artifacts/phase0_baseline_live_backend_native_2026-02-27.json",
"docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.md",
"docs/plans/artifacts/phase0_baseline_live_backend_drift_2026-02-27.json",
"docs/plans/state.json"
],
"test_status": "pnpm audit:phase0-baseline:live:refresh:drift + pnpm test:run src/audit/phase0BaselineDrift.test.ts + pnpm typecheck passing"
},
"phase0-instrumentation-ticket-checklist": {
"status": "completed",
"date": "2026-02-25",
@@ -7420,7 +7450,7 @@
"deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters",
"deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output",
"deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, generated anonymized live baseline artifacts for channel/gateway/backend-scoped (pi/native) windows, and added backend artifact freshness/drift gates with persisted drift reports (`phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`)",
"next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.",
"next_up": "Run scheduled `pnpm audit:phase0-baseline:live:refresh:drift` in each active environment (now refreshing channel + gateway + backend-scoped windows together) and collect at least one additional UTC-date drift artifact so baseline-vs-prior comparisons become active before tightening thresholds or changing additional run-control/reaction semantics.",
"pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default",
"pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)",
"pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing",
+1 -1
View File
@@ -26,7 +26,7 @@
"audit:phase0-baseline:live:pi": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend pi_embedded --exclude-session-substring probe",
"audit:phase0-baseline:live:native": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source channel --backend native --exclude-session-substring probe",
"audit:phase0-baseline:live:gateway": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --auto-gateway-cancel-window",
"audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway",
"audit:phase0-baseline:live:refresh": "pnpm audit:phase0-baseline:live && pnpm audit:phase0-baseline:live:gateway && pnpm audit:phase0-baseline:live:pi && pnpm audit:phase0-baseline:live:native",
"audit:phase0-baseline:live:drift": "node --import tsx/esm scripts/check-phase0-baseline-backend-drift.ts --artifacts-dir docs/plans/artifacts --backend pi_embedded,native --max-age-hours 36 --min-candidate-sampled-events 10 --max-sampled-events-drop-pct 80 --max-run-outcomes-drop-pct 80 --max-completion-rate-drop-pp 35 --max-cancel-rate-increase-pp 25 --max-error-rate-increase-pp 25 --max-cancel-latency-p95-increase-ms 6000 --write-default-artifacts",
"audit:phase0-baseline:live:refresh:drift": "pnpm audit:phase0-baseline:live:refresh && pnpm audit:phase0-baseline:live:drift",
"audit:backend-canary:probes": "node --import tsx/esm scripts/run-pi-canary-guard-probes.ts",