diff --git a/README.md b/README.md index 266cc4a..04d7a76 100644 --- a/README.md +++ b/README.md @@ -1629,6 +1629,18 @@ Live baseline artifacts (sample JSONL + JSON/Markdown summaries) can be captured pnpm audit:phase0-baseline:live ``` +Gateway-origin windows can be captured separately (for example when validating cancel paths) by restricting source + time window: +```bash +node --import tsx/esm scripts/capture-phase0-live-baseline.ts \ + --audit ~/.local/share/flynn/audit.log \ + --source gateway \ + --since \ + --until \ + --sample-out docs/plans/artifacts/phase0_baseline_live_gateway_.jsonl \ + --summary-json-out docs/plans/artifacts/phase0_baseline_live_gateway_.json \ + --summary-md-out docs/plans/artifacts/phase0_baseline_live_gateway_.md +``` + ## Gateway Lock Single-client mode for the WebSocket gateway. When enabled, only one WebSocket connection is allowed at a time. Additional connections are rejected with close code `4003`. diff --git a/docs/api/PROTOCOL.md b/docs/api/PROTOCOL.md index f80316c..adc5b45 100644 --- a/docs/api/PROTOCOL.md +++ b/docs/api/PROTOCOL.md @@ -23,7 +23,7 @@ The gateway provides: - **HTTP Server**: Serves static dashboard and handles webhook endpoints - **Node Capability Negotiation**: Optional companion-node role/capability registration -Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap `), release-bundle export (`--export-release-bundle ` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle ` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template `), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes a live phase-0 baseline capture flow (`pnpm audit:phase0-baseline:live`) that emits anonymized run/reaction sample artifacts. +Operational note: onboarding (`flynn setup` / `flynn onboard`) now runs post-save live readiness checks (model/channel/memory/automation) and prints a guided first-success task flow. Companion CLI now also supports bootstrap-manifest export (`flynn companion --export-bootstrap `), release-bundle export (`--export-release-bundle ` with optional `--signing-key`/`--signing-key-id` signature output), release-bundle verification (`--verify-release-bundle ` with optional `--verify-signing-key`/`--verify-signing-key-id`/`--require-signature`), platform shell-template export (`--export-shell-template `), plus richer shell bootstrap flags for status/location/push (`--app-version`, `--latitude/--longitude`, `--push-token`, etc.) for desktop/mobile app packaging without changing JSON-RPC method/event shapes. Audit observability now includes live phase-0 baseline capture flows: `pnpm audit:phase0-baseline:live` for channel-origin windows and `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` for gateway-origin windows. ### Execution Model (Sessions + Per-Session Queue) diff --git a/docs/architecture/AGENT_DIAGRAM.md b/docs/architecture/AGENT_DIAGRAM.md index 2d45474..cad3734 100644 --- a/docs/architecture/AGENT_DIAGRAM.md +++ b/docs/architecture/AGENT_DIAGRAM.md @@ -166,7 +166,8 @@ Gateway streaming UX signals: - `.github/workflows/companion-release-bundle.yml` provides CI artifact generation for companion release bundles using the same build-and-verify pipeline. - `.github/workflows/companion-reference-apps-check.yml` enforces reference-app generator sync in CI. - `flynn companion` can bootstrap status/location/push metadata on connect (`node.status.set` + optional `node.location.set`/`node.push_token.set`) so thin companion shells can register operational context in one launch. -- `pnpm audit:phase0-baseline:live` captures anonymized live run/reaction baseline artifacts from real audit logs to replace probe-only telemetry samples. +- `pnpm audit:phase0-baseline:live` captures anonymized channel-origin live run/reaction baseline artifacts from real audit logs. +- `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...` captures gateway-origin baseline windows (including cancel-path telemetry) as separate artifacts. - Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts. - TTS synthesis uses an ordered provider chain with health cooldown tracking; if all providers fail, replies degrade to text-only without dropping the response. - Talk mode accepts spoken/text `stop`/`cancel` while active and maps it onto the same `/stop` run-control cancellation path used for text sessions. diff --git a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md index 6956805..6c995fc 100644 --- a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md +++ b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md @@ -31,7 +31,8 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`. - Companion reference-app sync can be enforced with `pnpm companion:reference-apps:check` (regenerate + diff fail on drift). - CI workflow `.github/workflows/companion-release-bundle.yml` mirrors this pipeline for manual artifact generation/upload. - CI workflow `.github/workflows/companion-reference-apps-check.yml` enforces reference-app generator sync on pull requests. -- Audit phase-0 live telemetry snapshots can be regenerated with `pnpm audit:phase0-baseline:live` (anonymized sample JSONL + summary JSON/markdown artifacts). +- Audit phase-0 live telemetry snapshots can be regenerated with `pnpm audit:phase0-baseline:live` (channel-origin anonymized sample JSONL + summary JSON/markdown artifacts). +- Gateway-origin phase-0 windows (including cancel-path samples) can be captured with `scripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...`. - Companion CLI supports one-shot shell bootstrap metadata for live sessions (`--app-version`/`--status-text`, `--latitude`/`--longitude`, `--push-token`) so desktop/mobile wrappers can initialize node status/location/push in a single launch flow. - Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts. - TTS output is best-effort with ordered provider fallback + per-provider cooldown tracking; synthesis failures still fall back to text-only responses. diff --git a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md index 1a95bf9..ef341db 100644 --- a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md +++ b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md @@ -203,7 +203,7 @@ Phase 0 is complete when: 2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`. 3. No user-visible response behavior changed compared to pre-phase baseline. -Follow-up status (2026-02-27): live channel-session artifacts now exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs); gateway-origin live samples remain a future slice. +Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. ## Subagent Model Assignment Plan diff --git a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json index 10c7871..b60ee2b 100644 --- a/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json +++ b/docs/plans/artifacts/phase0_baseline_live_2026-02-27.json @@ -1,7 +1,7 @@ { - "generated_at": "2026-02-27T07:39:16.384Z", + "generated_at": "2026-02-27T07:49:58.821Z", "source_audit_path": "~/.local/share/flynn/audit.log", - "source_event_count": 88, + "source_event_count": 94, "sampled_event_count": 88, "filters": { "sources": [ diff --git a/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json new file mode 100644 index 0000000..4aea7ce --- /dev/null +++ b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json @@ -0,0 +1,94 @@ +{ + "generated_at": "2026-02-27T07:47:41.346Z", + "source_audit_path": "~/.local/share/flynn/audit.log", + "source_event_count": 6, + "sampled_event_count": 6, + "filters": { + "since_ms": 1772178440693, + "until_ms": 1772178442694, + "sources": [ + "gateway" + ], + "exclude_session_substrings": [ + "probe" + ], + "anonymized_identifiers": true + }, + "options": { + "sources": [ + "gateway" + ], + "maxSessions": 20, + "maxChannels": 20, + "maxSkipReasons": 10 + }, + "summary": { + "event_counts": { + "run_state": 5, + "run_cancel": 1, + "reaction_match": 0, + "reaction_skip": 0 + }, + "run_outcomes": { + "overall": { + "total_outcomes": 2, + "complete": 1, + "cancelled": 1, + "error": 0, + "cancel_requested": 1, + "start": 2, + "completion_rate_pct": 50, + "cancel_rate_pct": 50, + "error_rate_pct": 0 + }, + "by_channel": [ + { + "key": "ws", + "stats": { + "total_outcomes": 2, + "complete": 1, + "cancelled": 1, + "error": 0, + "cancel_requested": 1, + "start": 2, + "completion_rate_pct": 50, + "cancel_rate_pct": 50, + "error_rate_pct": 0 + } + } + ], + "by_session": [ + { + "key": "session_67024a716ed2", + "stats": { + "total_outcomes": 2, + "complete": 1, + "cancelled": 1, + "error": 0, + "cancel_requested": 1, + "start": 2, + "completion_rate_pct": 50, + "cancel_rate_pct": 50, + "error_rate_pct": 0 + } + } + ] + }, + "cancel_latency_ms": { + "count": 1, + "avg_ms": 0, + "p50_ms": 0, + "p95_ms": 0, + "min_ms": 0, + "max_ms": 0 + }, + "reactions": { + "matched": 0, + "skipped": 0, + "total": 0, + "match_rate_pct": null, + "skip_rate_pct": null, + "skip_reasons": [] + } + } +} diff --git a/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl new file mode 100644 index 0000000..1368af7 --- /dev/null +++ b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl @@ -0,0 +1,6 @@ +{"level":"info","event_type":"run.state","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","state":"start","request_id":"request_e7102f267cb2"},"timestamp":1772178441693} +{"level":"info","event_type":"run.cancel","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","requested":true,"acknowledged":true,"request_id":"request_61e1e51ee26b","latency_ms":0},"timestamp":1772178441693} +{"level":"info","event_type":"run.state","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","state":"cancel_requested","request_id":"request_61e1e51ee26b","duration_ms":0},"timestamp":1772178441694} +{"level":"info","event_type":"run.state","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","state":"cancelled","request_id":"request_e7102f267cb2","duration_ms":1},"timestamp":1772178441694} +{"level":"info","event_type":"run.state","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","state":"start","request_id":"request_897903dab9c4"},"timestamp":1772178441694} +{"level":"info","event_type":"run.state","event":{"session_id":"session_67024a716ed2","channel":"ws","sender":"sender_776ae1d44923","source":"gateway","state":"complete","request_id":"request_897903dab9c4","duration_ms":0},"timestamp":1772178441694} diff --git a/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md new file mode 100644 index 0000000..3d15069 --- /dev/null +++ b/docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md @@ -0,0 +1,50 @@ +# Phase 0 Baseline Telemetry Summary + +- Run state events: 5 +- Run cancel events: 1 +- Reaction matches: 0 +- Reaction skips: 0 + +- Sources: gateway + +## Run Outcomes (Overall) + +- Total outcomes: 2 +- Complete: 1 (50.00%) +- Cancelled: 1 (50.00%) +- Errors: 0 (0.00%) +- Cancel requested: 1 +- Starts: 2 + +## Run Outcomes by Channel + +| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| ws | 2 | 1 | 1 | 0 | 50.00% | 50.00% | 0.00% | 1 | 2 | + +## Run Outcomes by Session + +| Session | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| session_67024a716ed2 | 2 | 1 | 1 | 0 | 50.00% | 50.00% | 0.00% | 1 | 2 | + +## Cancel Latency + +- Count: 1 +- Avg: 0ms +- P50: 0ms +- P95: 0ms +- Min: 0ms +- Max: 0ms + +## Reaction Decisions + +- Matched: 0 (n/a) +- Skipped: 0 (n/a) + +### Skip Reasons + +| Reason | Count | Percent | +| --- | ---: | ---: | +| _none_ | 0 | 0.00% | + diff --git a/docs/plans/state.json b/docs/plans/state.json index 01278f1..23be604 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -67,7 +67,7 @@ "status": "completed", "date": "2026-02-25", "updated": "2026-02-27", - "summary": "Updated protocol/docs/diagrams for phase-0 telemetry fields, documented baseline workflow, and replaced the original probe-only baseline workflow with anonymized live channel-session audit artifacts (`phase0_baseline_live_2026-02-27.*`).", + "summary": "Updated protocol/docs/diagrams for phase-0 telemetry fields, documented baseline workflow, replaced the original probe-only baseline workflow with anonymized live channel-session audit artifacts (`phase0_baseline_live_2026-02-27.*`), and added a second gateway-origin live window artifact set (`phase0_baseline_live_gateway_2026-02-27.*`) including cancel-path telemetry.", "files_modified": [ "README.md", "docs/api/PROTOCOL.md", @@ -77,15 +77,18 @@ "docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl", "docs/plans/artifacts/phase0_baseline_live_2026-02-27.md", "docs/plans/artifacts/phase0_baseline_live_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json", "docs/plans/state.json" ], - "test_status": "pnpm audit:phase0-baseline:live + pnpm test:run src/audit/phase0LiveBaseline.test.ts src/audit/phase0BaselineSummary.test.ts + pnpm typecheck passing" + "test_status": "pnpm audit:phase0-baseline:live + node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --since 1772178440693 --until 1772178442694 --sample-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl --summary-json-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json --summary-md-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md + pnpm test:run src/audit/phase0LiveBaseline.test.ts src/audit/phase0BaselineSummary.test.ts + pnpm typecheck passing" }, "phase0-live-baseline-capture-tooling": { "status": "completed", "date": "2026-02-27", "updated": "2026-02-27", - "summary": "Added a dedicated live phase-0 baseline capture flow that reads audit logs, filters run/reaction telemetry, excludes probe sessions, anonymizes session/sender/request IDs, and writes sample + summary artifacts for operational refreshes.", + "summary": "Added a dedicated live phase-0 baseline capture flow that reads audit logs, filters run/reaction telemetry, excludes probe sessions, anonymizes session/sender/request IDs, and writes sample + summary artifacts for operational refreshes across both channel-origin and gateway-origin windows.", "files_modified": [ "src/audit/phase0LiveBaseline.ts", "src/audit/phase0LiveBaseline.test.ts", @@ -99,9 +102,30 @@ "docs/plans/artifacts/phase0_baseline_live_2026-02-27.jsonl", "docs/plans/artifacts/phase0_baseline_live_2026-02-27.md", "docs/plans/artifacts/phase0_baseline_live_2026-02-27.json", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json", "docs/plans/state.json" ], - "test_status": "pnpm audit:phase0-baseline:live + pnpm test:run src/audit/phase0LiveBaseline.test.ts src/audit/phase0BaselineSummary.test.ts + pnpm typecheck passing" + "test_status": "pnpm audit:phase0-baseline:live + node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --since 1772178440693 --until 1772178442694 --sample-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl --summary-json-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json --summary-md-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md + pnpm test:run src/audit/phase0LiveBaseline.test.ts src/audit/phase0BaselineSummary.test.ts + pnpm typecheck passing" + }, + "phase0-live-baseline-gateway-window": { + "status": "completed", + "date": "2026-02-27", + "updated": "2026-02-27", + "summary": "Captured and committed a gateway-origin live phase-0 baseline window with explicit cancel-path coverage (`run.cancel`, `run.state.cancel_requested`, `run.state.cancelled`) plus a paired completion sample, producing anonymized `phase0_baseline_live_gateway_2026-02-27.*` artifacts.", + "files_modified": [ + "README.md", + "docs/api/PROTOCOL.md", + "docs/architecture/AGENT_DIAGRAM.md", + "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md", + "docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md", + "docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json", + "docs/plans/state.json" + ], + "test_status": "node --import tsx/esm scripts/capture-phase0-live-baseline.ts --audit ~/.local/share/flynn/audit.log --source gateway --since 1772178440693 --until 1772178442694 --sample-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.jsonl --summary-json-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.json --summary-md-out docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.md + pnpm test:run src/audit/phase0LiveBaseline.test.ts src/audit/phase0BaselineSummary.test.ts + pnpm typecheck passing" }, "phase0-instrumentation-ticket-checklist": { "status": "completed", @@ -7288,8 +7312,8 @@ "deeper_surfaces_phase0_ticket_02": "completed — gateway + daemon routing emit run lifecycle/cancel telemetry and reaction match/skip audit events with filter summaries and cancellation latency, plus focused tests", "deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters", "deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output", - "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, and generated anonymized live baseline artifacts from real channel audit traffic (probe-only artifact workflow superseded)", - "next_up": "Capture a gateway-origin live phase-0 baseline sample (including run.cancel/cancelled paths) and append as a second live artifact window alongside the channel sample", + "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, and generated anonymized live baseline artifacts for both channel-origin and gateway-origin traffic (including cancel-path coverage)", + "next_up": "Phase-0 baseline windows now cover channel and gateway sources; keep both artifact windows refreshed on cadence before additional run-control/reaction semantic changes.", "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default", "pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)", "pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing",