diff --git a/README.md b/README.md index 6d9aefc..60dffcf 100644 --- a/README.md +++ b/README.md @@ -1494,6 +1494,16 @@ Session lifecycle now includes proactive context maintenance events: - `session.checkpoint` when proactive checkpoint summaries are written to memory - `session.auto_compact` when proactive critical-threshold auto-compaction runs +Phase 0 baseline observability adds: +- `run.state` (start/complete/cancel_requested/cancelled/error) +- `run.cancel` (cancel intent/ack + latency) +- `reaction.match` / `reaction.skip` (reaction decision outcomes + skip reasons) + +Baseline summaries can be generated from audit logs: +```bash +pnpm audit:phase0-baseline --audit ~/.local/share/flynn/audit.log --since 2026-02-25T00:00:00Z --format markdown +``` + ## Gateway Lock Single-client mode for the WebSocket gateway. When enabled, only one WebSocket connection is allowed at a time. Additional connections are rejected with close code `4003`. diff --git a/docs/api/PROTOCOL.md b/docs/api/PROTOCOL.md index bc80c2f..1f11c6c 100644 --- a/docs/api/PROTOCOL.md +++ b/docs/api/PROTOCOL.md @@ -352,6 +352,59 @@ Useful for proactive compaction monitoring and operator dashboards. } ``` +#### `system.metrics` + +Return aggregated gateway metrics snapshot (used by the dashboard). + +Includes run-state counters, cancel latency samples, and reaction decision counters. + +**Request:** +```json +{ + "id": 11, + "method": "system.metrics" +} +``` + +**Response:** +```json +{ + "id": 11, + "result": { + "messagesProcessed": 120, + "errors": 2, + "activeRequests": 1, + "uptime": 3600, + "modelCalls": { + "total": 15, + "avgLatency": 420, + "errorRate": 0.07, + "recentCalls": [] + }, + "runStates": { + "start": 25, + "complete": 22, + "cancel_requested": 1, + "cancelled": 1, + "error": 1 + }, + "cancelLatencyMs": { + "sampleCount": 4, + "samples": [120, 240, 310, 95] + }, + "reactions": { + "matched": 6, + "skipped": 3, + "skipReasons": { + "no_match": 2, + "no_rules": 1 + } + }, + "queueDepth": 0 + } +} +``` + #### `system.localBackends` Return status for user-level local LLM backend daemons (for example `ollama.service` and `llama-server.service`). diff --git a/docs/architecture/AGENT_DIAGRAM.md b/docs/architecture/AGENT_DIAGRAM.md index b1b5062..6a11534 100644 --- a/docs/architecture/AGENT_DIAGRAM.md +++ b/docs/architecture/AGENT_DIAGRAM.md @@ -27,6 +27,7 @@ flowchart LR subgraph HOST[Host (Flynn Daemon)] CA[ChannelAdapters] GW[Gateway\nHTTP + WS JSON-RPC + Web UI] + MC[MetricsCollector\nrun state + cancel latency + reactions] RT[Routing\ncreateMessageRouter()] PF[Preferences\n~/.local/share/flynn/preferences.json\nmodelTier + backendMode] SM[SessionManager\nSQLite] @@ -60,6 +61,7 @@ flowchart LR CH --> CA GW --> RT + GW --> MC CA --> RT RT --> SM RT --> OR @@ -147,6 +149,7 @@ Key files: - Model routing: `src/models/router.ts` - Tool policy + execution: `src/tools/policy.ts`, `src/tools/executor.ts` - Canary backend telemetry summarization (offline evaluation): `src/audit/backendCanarySummary.ts`, `scripts/summarize-backend-canary.ts` +- Phase 0 baseline telemetry summarization: `src/audit/phase0BaselineSummary.ts`, `scripts/summarize-phase0-baseline.ts` ## Component Graph (Agent-Safety Boundary) diff --git a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md index a2f208d..5dc2228 100644 --- a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md +++ b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md @@ -14,6 +14,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`. - Runtime backend mode can be overridden manually via `/runtime` command fast-path (`status`, `activate pi`, `deactivate pi`, `use config`) and is persisted in preferences (`/backend` remains a compatibility alias). - `flynn tui` now attaches to this same gateway command path for `/runtime ...` and auto-starts/attaches daemon+gateway when needed. - Backend routing outcomes are auditable via `backend.route` / `backend.success` / `backend.fallback`, which enables offline canary evaluation without changing gateway protocol methods. +- Run lifecycle/cancel intent and reaction decisions are emitted to audit logs, and aggregated into `system.metrics` counters (runStates, cancelLatencyMs, reactions) for dashboards. ## Component Map @@ -31,6 +32,8 @@ flowchart LR LQ[LaneQueue\nper-session FIFO] SB[SessionBridge\nconnectionId -> sessionId -> AgentOrchestrator] AQ[AuditLogger\nqueue.preempt events] + MC[MetricsCollector\nrun states + cancel latency + reactions] + UI[Web UI Dashboard] end subgraph CORE[Flynn Core] @@ -46,6 +49,8 @@ flowchart LR GS --> LQ GS --> SB LQ --> AQ + GS --> MC + MC --> UI SB --> AO SB --> SM diff --git a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md index 36b00c8..78c39eb 100644 --- a/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md +++ b/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md @@ -149,6 +149,8 @@ Add or extend report tooling to summarize phase-0 telemetry slices: ## Ticket 0.5 — Docs + Diagram + State Sync +Status: completed (2026-02-25) + ### Scope Document new observability fields and baseline workflow: diff --git a/docs/plans/artifacts/phase0_baseline_2026-02-25.json b/docs/plans/artifacts/phase0_baseline_2026-02-25.json new file mode 100644 index 0000000..11a5728 --- /dev/null +++ b/docs/plans/artifacts/phase0_baseline_2026-02-25.json @@ -0,0 +1,44 @@ +{ + "generated_at": "2026-02-25T17:20:35.391Z", + "event_count": 0, + "filters": { + "since_ms": 1771977600000 + }, + "options": { + "maxSessions": 20, + "maxChannels": 20, + "maxSkipReasons": 10 + }, + "summary": { + "event_counts": { + "run_state": 0, + "run_cancel": 0, + "reaction_match": 0, + "reaction_skip": 0 + }, + "run_outcomes": { + "overall": { + "total_outcomes": 0, + "complete": 0, + "cancelled": 0, + "error": 0, + "cancel_requested": 0, + "start": 0, + "completion_rate_pct": null, + "cancel_rate_pct": null, + "error_rate_pct": null + }, + "by_channel": [], + "by_session": [] + }, + "cancel_latency_ms": null, + "reactions": { + "matched": 0, + "skipped": 0, + "total": 0, + "match_rate_pct": null, + "skip_rate_pct": null, + "skip_reasons": [] + } + } +} diff --git a/docs/plans/artifacts/phase0_baseline_2026-02-25.md b/docs/plans/artifacts/phase0_baseline_2026-02-25.md new file mode 100644 index 0000000..d4f7fc0 --- /dev/null +++ b/docs/plans/artifacts/phase0_baseline_2026-02-25.md @@ -0,0 +1,43 @@ +# Phase 0 Baseline Telemetry Summary + +- Run state events: 0 +- Run cancel events: 0 +- Reaction matches: 0 +- Reaction skips: 0 + +## Run Outcomes (Overall) + +- Total outcomes: 0 +- Complete: 0 (n/a) +- Cancelled: 0 (n/a) +- Errors: 0 (n/a) +- Cancel requested: 0 +- Starts: 0 + +## Run Outcomes by Channel + +| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| _none_ | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 0 | + +## Run Outcomes by Session + +| Session | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts | +| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | +| _none_ | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 0 | + +## Cancel Latency + +- No cancel latency samples. + +## Reaction Decisions + +- Matched: 0 (n/a) +- Skipped: 0 (n/a) + +### Skip Reasons + +| Reason | Count | Percent | +| --- | ---: | ---: | +| _none_ | 0 | 0.00% | + diff --git a/docs/plans/state.json b/docs/plans/state.json index 6ae00aa..d63fd8d 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -63,6 +63,23 @@ ], "test_status": "pnpm test:run src/audit/phase0BaselineSummary.test.ts passing" }, + "phase0-ticket-0.5-docs-diagram-state-sync": { + "status": "completed", + "date": "2026-02-25", + "updated": "2026-02-25", + "summary": "Updated protocol/docs/diagrams for phase-0 telemetry fields, documented baseline workflow, and generated initial phase-0 baseline artifacts (empty sample window).", + "files_modified": [ + "README.md", + "docs/api/PROTOCOL.md", + "docs/architecture/AGENT_DIAGRAM.md", + "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md", + "docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md", + "docs/plans/artifacts/phase0_baseline_2026-02-25.md", + "docs/plans/artifacts/phase0_baseline_2026-02-25.json", + "docs/plans/state.json" + ], + "test_status": "pnpm audit:phase0-baseline --audit ~/.local/share/flynn/audit.log --since 2026-02-25T00:00:00Z --format markdown/json (0 events in window)" + }, "phase0-instrumentation-ticket-checklist": { "status": "completed", "date": "2026-02-25", @@ -6677,7 +6694,8 @@ "deeper_surfaces_phase0_ticket_02": "completed — gateway + daemon routing emit run lifecycle/cancel telemetry and reaction match/skip audit events with filter summaries and cancellation latency, plus focused tests", "deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters", "deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output", - "next_up": "Implement Ticket 0.5 from docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md", + "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, and generated baseline artifacts", + "next_up": "Exercise gateway + channel sessions to emit run/reaction events, then regenerate phase-0 baseline artifacts with real samples", "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default", "pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)", "pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing",