docs(observability): document phase-0 telemetry and baseline workflow

2026-02-25 09:22:56 -08:00
parent 0b8f7c7299
commit 8b5266c66c
8 changed files with 179 additions and 1 deletions
@@ -1494,6 +1494,16 @@ Session lifecycle now includes proactive context maintenance events:
 - `session.checkpoint` when proactive checkpoint summaries are written to memory
 - `session.auto_compact` when proactive critical-threshold auto-compaction runs

+Phase 0 baseline observability adds:
+- `run.state` (start/complete/cancel_requested/cancelled/error)
+- `run.cancel` (cancel intent/ack + latency)
+- `reaction.match` / `reaction.skip` (reaction decision outcomes + skip reasons)
+
+Baseline summaries can be generated from audit logs:
+```bash
+pnpm audit:phase0-baseline --audit ~/.local/share/flynn/audit.log --since 2026-02-25T00:00:00Z --format markdown
+```
+
 ## Gateway Lock

 Single-client mode for the WebSocket gateway. When enabled, only one WebSocket connection is allowed at a time. Additional connections are rejected with close code `4003`.
@@ -352,6 +352,59 @@ Useful for proactive compaction monitoring and operator dashboards.
 }
 ```

+#### `system.metrics`
+
+Return aggregated gateway metrics snapshot (used by the dashboard).
+
+Includes run-state counters, cancel latency samples, and reaction decision counters.
+
+**Request:**
+```json
+{
+  "id": 11,
+  "method": "system.metrics"
+}
+```
+
+**Response:**
+```json
+{
+  "id": 11,
+  "result": {
+    "messagesProcessed": 120,
+    "errors": 2,
+    "activeRequests": 1,
+    "uptime": 3600,
+    "modelCalls": {
+      "total": 15,
+      "avgLatency": 420,
+      "errorRate": 0.07,
+      "recentCalls": []
+    },
+    "runStates": {
+      "start": 25,
+      "complete": 22,
+      "cancel_requested": 1,
+      "cancelled": 1,
+      "error": 1
+    },
+    "cancelLatencyMs": {
+      "sampleCount": 4,
+      "samples": [120, 240, 310, 95]
+    },
+    "reactions": {
+      "matched": 6,
+      "skipped": 3,
+      "skipReasons": {
+        "no_match": 2,
+        "no_rules": 1
+      }
+    },
+    "queueDepth": 0
+  }
+}
+```
+
 #### `system.localBackends`

 Return status for user-level local LLM backend daemons (for example `ollama.service` and `llama-server.service`).
@@ -27,6 +27,7 @@ flowchart LR
  subgraph HOST[Host (Flynn Daemon)]
    CA[ChannelAdapters]
    GW[Gateway\nHTTP + WS JSON-RPC + Web UI]
+    MC[MetricsCollector\nrun state + cancel latency + reactions]
    RT[Routing\ncreateMessageRouter()]
    PF[Preferences\n~/.local/share/flynn/preferences.json\nmodelTier + backendMode]
    SM[SessionManager\nSQLite]
@@ -60,6 +61,7 @@ flowchart LR

  CH --> CA
  GW --> RT
+  GW --> MC
  CA --> RT
  RT --> SM
  RT --> OR
@@ -147,6 +149,7 @@ Key files:
 - Model routing: `src/models/router.ts`
 - Tool policy + execution: `src/tools/policy.ts`, `src/tools/executor.ts`
 - Canary backend telemetry summarization (offline evaluation): `src/audit/backendCanarySummary.ts`, `scripts/summarize-backend-canary.ts`
+- Phase 0 baseline telemetry summarization: `src/audit/phase0BaselineSummary.ts`, `scripts/summarize-phase0-baseline.ts`

 ## Component Graph (Agent-Safety Boundary)

@@ -14,6 +14,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
 - Runtime backend mode can be overridden manually via `/runtime` command fast-path (`status`, `activate pi`, `deactivate pi`, `use config`) and is persisted in preferences (`/backend` remains a compatibility alias).
 - `flynn tui` now attaches to this same gateway command path for `/runtime ...` and auto-starts/attaches daemon+gateway when needed.
 - Backend routing outcomes are auditable via `backend.route` / `backend.success` / `backend.fallback`, which enables offline canary evaluation without changing gateway protocol methods.
+- Run lifecycle/cancel intent and reaction decisions are emitted to audit logs, and aggregated into `system.metrics` counters (runStates, cancelLatencyMs, reactions) for dashboards.

 ## Component Map

@@ -31,6 +32,8 @@ flowchart LR
    LQ[LaneQueue\nper-session FIFO]
    SB[SessionBridge\nconnectionId -> sessionId -> AgentOrchestrator]
    AQ[AuditLogger\nqueue.preempt events]
+    MC[MetricsCollector\nrun states + cancel latency + reactions]
+    UI[Web UI Dashboard]
  end

  subgraph CORE[Flynn Core]
@@ -46,6 +49,8 @@ flowchart LR
  GS --> LQ
  GS --> SB
  LQ --> AQ
+  GS --> MC
+  MC --> UI

  SB --> AO
  SB --> SM
@@ -149,6 +149,8 @@ Add or extend report tooling to summarize phase-0 telemetry slices:

 ## Ticket 0.5 — Docs + Diagram + State Sync

+Status: completed (2026-02-25)
+
 ### Scope

 Document new observability fields and baseline workflow:
@@ -0,0 +1,44 @@
+{
+  "generated_at": "2026-02-25T17:20:35.391Z",
+  "event_count": 0,
+  "filters": {
+    "since_ms": 1771977600000
+  },
+  "options": {
+    "maxSessions": 20,
+    "maxChannels": 20,
+    "maxSkipReasons": 10
+  },
+  "summary": {
+    "event_counts": {
+      "run_state": 0,
+      "run_cancel": 0,
+      "reaction_match": 0,
+      "reaction_skip": 0
+    },
+    "run_outcomes": {
+      "overall": {
+        "total_outcomes": 0,
+        "complete": 0,
+        "cancelled": 0,
+        "error": 0,
+        "cancel_requested": 0,
+        "start": 0,
+        "completion_rate_pct": null,
+        "cancel_rate_pct": null,
+        "error_rate_pct": null
+      },
+      "by_channel": [],
+      "by_session": []
+    },
+    "cancel_latency_ms": null,
+    "reactions": {
+      "matched": 0,
+      "skipped": 0,
+      "total": 0,
+      "match_rate_pct": null,
+      "skip_rate_pct": null,
+      "skip_reasons": []
+    }
+  }
+}
@@ -0,0 +1,43 @@
+# Phase 0 Baseline Telemetry Summary
+
+- Run state events: 0
+- Run cancel events: 0
+- Reaction matches: 0
+- Reaction skips: 0
+
+## Run Outcomes (Overall)
+
+- Total outcomes: 0
+- Complete: 0 (n/a)
+- Cancelled: 0 (n/a)
+- Errors: 0 (n/a)
+- Cancel requested: 0
+- Starts: 0
+
+## Run Outcomes by Channel
+
+| Channel | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| _none_ | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 0 |
+
+## Run Outcomes by Session
+
+| Session | Outcomes | Complete | Cancelled | Error | Complete % | Cancel % | Error % | Cancel Req | Starts |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| _none_ | 0 | 0 | 0 | 0 | n/a | n/a | n/a | 0 | 0 |
+
+## Cancel Latency
+
+- No cancel latency samples.
+
+## Reaction Decisions
+
+- Matched: 0 (n/a)
+- Skipped: 0 (n/a)
+
+### Skip Reasons
+
+| Reason | Count | Percent |
+| --- | ---: | ---: |
+| _none_ | 0 | 0.00% |
+
@@ -63,6 +63,23 @@
      ],
      "test_status": "pnpm test:run src/audit/phase0BaselineSummary.test.ts passing"
    },
+    "phase0-ticket-0.5-docs-diagram-state-sync": {
+      "status": "completed",
+      "date": "2026-02-25",
+      "updated": "2026-02-25",
+      "summary": "Updated protocol/docs/diagrams for phase-0 telemetry fields, documented baseline workflow, and generated initial phase-0 baseline artifacts (empty sample window).",
+      "files_modified": [
+        "README.md",
+        "docs/api/PROTOCOL.md",
+        "docs/architecture/AGENT_DIAGRAM.md",
+        "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
+        "docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md",
+        "docs/plans/artifacts/phase0_baseline_2026-02-25.md",
+        "docs/plans/artifacts/phase0_baseline_2026-02-25.json",
+        "docs/plans/state.json"
+      ],
+      "test_status": "pnpm audit:phase0-baseline --audit ~/.local/share/flynn/audit.log --since 2026-02-25T00:00:00Z --format markdown/json (0 events in window)"
+    },
    "phase0-instrumentation-ticket-checklist": {
      "status": "completed",
      "date": "2026-02-25",
@@ -6677,7 +6694,8 @@
    "deeper_surfaces_phase0_ticket_02": "completed — gateway + daemon routing emit run lifecycle/cancel telemetry and reaction match/skip audit events with filter summaries and cancellation latency, plus focused tests",
    "deeper_surfaces_phase0_ticket_03": "completed — gateway metrics now track run-state outcomes, cancel latency samples, and reaction decision counters with routing/gateway emitters",
    "deeper_surfaces_phase0_ticket_04": "completed — added phase-0 baseline summary tooling for run outcomes, cancel latency, and reaction decisions with markdown/json CLI output",
-    "next_up": "Implement Ticket 0.5 from docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md",
+    "deeper_surfaces_phase0_ticket_05": "completed — documented phase-0 telemetry fields/workflow, refreshed architecture/protocol docs, and generated baseline artifacts",
+    "next_up": "Exercise gateway + channel sessions to emit run/reaction events, then regenerate phase-0 baseline artifacts with real samples",
    "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default",
    "pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)",
    "pi_embedded_manual_mode": "completed — added persisted runtime backend controls for manual Pi activation/deactivation (`/runtime` preferred, `/backend` alias; `status`, `activate pi`, `deactivate pi`, `use config`) while keeping config-driven default routing",