Files
flynn/docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md
T

214 lines
6.4 KiB
Markdown

# Phase 0 Ticket Checklist: Baseline Instrumentation and Guardrails
Created: 2026-02-25
Owner: Flynn core
Status: ready to implement
Parent: `docs/plans/2026-02-25-deeper-end-user-surfaces-and-integrated-behavior-stack-plan.md`
## Goal
Establish a measurement baseline for deeper-surface and behavior-stack work before changing runtime semantics.
This checklist intentionally avoids behavior changes. It adds instrumentation and reporting only.
## PR Boundary
In scope:
1. Audit event coverage for run lifecycle, cancel intent/result, and reaction decision paths.
2. Gateway metrics support for cancel latency and run transition counters.
3. Baseline summary tooling for repeatable canary comparisons.
4. Documentation updates for new telemetry fields and collection workflow.
Out of scope:
1. Queue semantic changes (`interrupt`, `steer`, active preemption behavior).
2. New gateway event surfaces to clients (`run_state` comes in Phase 1).
3. Reactions behavior changes (priority/cooldown/stop semantics come in Phase 2).
4. Companion/canvas persistence behavior changes (Phase 3).
## Ticket 0.1 — Audit Schema Extension
Status: completed (2026-02-25)
### Scope
Add additive audit event types and payload contracts:
1. `run.state` (start, complete, cancel_requested, cancelled, error)
2. `run.cancel` (source, requested, acknowledged, latency_ms)
3. `reaction.match` (rule_name, channel, sender, filter_summary)
4. `reaction.skip` (reason category, candidate_count)
### Files
1. `src/audit/types.ts`
2. `src/audit/logger.ts`
3. `src/audit/logger.test.ts`
### Acceptance
1. All new events are additive and backward-compatible.
2. Audit logger exposes typed methods for each new event.
3. Unit tests cover serialization and path expansion still passes.
### Suggested commit message
`feat(audit): add run lifecycle and reaction decision event types`
## Ticket 0.2 — Gateway/Router Emitters for Baseline Events
Status: completed (2026-02-25)
### Scope
Emit new audit events without changing request handling behavior:
1. Gateway `agent.send` path:
- emit run start/complete/error states,
- emit cancel request/ack timing when `/stop` or cancellation path is used.
2. Daemon routing path:
- emit matching run lifecycle events for channel-origin sessions,
- emit reaction match/skip events around `matchReactionPrompt(...)`.
### Files
1. `src/gateway/handlers/agent.ts`
2. `src/daemon/routing.ts`
3. `src/gateway/handlers/agent.test.ts`
4. `src/daemon/routing.test.ts`
### Acceptance
1. Event emission is side-effect free and does not alter reply behavior.
2. Run IDs/session IDs are consistent with existing audit conventions.
3. Existing tests continue passing; new assertions verify emitted metadata.
### Suggested commit message
`feat(observability): emit run and reaction baseline audit events`
## Ticket 0.3 — Metrics Collector Baseline Counters
Status: completed (2026-02-25)
### Scope
Extend in-memory gateway metrics with baseline counters:
1. Run-state counters by outcome (`completed`, `cancelled`, `errored`).
2. Cancel-latency histogram buckets (or ring-buffer sample list).
3. Reaction decision counters (`matched`, `skipped`, per-reason).
### Files
1. `src/gateway/metrics.ts`
2. `src/gateway/metrics.test.ts`
3. `src/gateway/handlers/observability.ts` (if exposing additional metrics fields)
4. `src/gateway/handlers/observability.test.ts` (if needed)
### Acceptance
1. New metrics appear in snapshots with stable defaults.
2. No regression to existing dashboard/observability consumers.
3. Tests validate accumulation, reset behavior (if any), and shape compatibility.
### Suggested commit message
`feat(metrics): add baseline run and reaction counters for phase-0`
## Ticket 0.4 — Baseline Summary Tooling
Status: completed (2026-02-25)
### Scope
Add or extend report tooling to summarize phase-0 telemetry slices:
1. Run completion/error/cancel rates by channel/session.
2. Cancel latency p50/p95.
3. Reaction match/skip rates and top skip reasons.
4. Optional markdown + json output for plan artifacts.
### Files
1. `src/audit/backendCanarySummary.ts` (extend or split shared summarizer helpers)
2. `src/audit/backendCanarySummary.test.ts`
3. `scripts/summarize-backend-canary.ts` (or new script if cleaner)
4. `package.json` (script entry if needed)
### Acceptance
1. Tool works on current audit logs without requiring new event types.
2. New event types are incorporated when present.
3. Output is deterministic and safe on missing fields.
### Suggested commit message
`feat(audit): add phase-0 baseline summary metrics for runs and reactions`
## Ticket 0.5 — Docs + Diagram + State Sync
Status: completed (2026-02-25)
### Scope
Document new observability fields and baseline workflow:
1. Protocol/observability notes (if gateway payloads changed).
2. Architecture docs review for run lifecycle observability path.
3. Plan/state updates with executed commands and artifacts.
### Files
1. `README.md`
2. `docs/api/PROTOCOL.md`
3. `docs/architecture/AGENT_DIAGRAM.md`
4. `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`
5. `docs/plans/state.json`
6. `docs/plans/artifacts/*` (baseline outputs)
### Acceptance
1. Diagram review is explicitly documented.
2. `state.json` includes ticket completion/test status.
3. Artifact links are present and reproducible.
### Suggested commit message
`docs(observability): document phase-0 telemetry and baseline workflow`
## Validation Commands (per completed ticket set)
Run focused suites first, then full validation when phase is complete:
```bash
pnpm typecheck
pnpm test:run src/audit/logger.test.ts
pnpm test:run src/gateway/metrics.test.ts
pnpm test:run src/gateway/handlers/agent.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run src/audit/backendCanarySummary.test.ts
pnpm test:run src/gateway/handlers/observability.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Rollout and Exit Criteria
Phase 0 is complete when:
1. Baseline metrics are emitted for at least one representative channel session and one gateway session.
2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`.
3. No user-visible response behavior changed compared to pre-phase baseline.
## Subagent Model Assignment Plan
1. `claude-haiku-4.5`:
- schema additions, mechanical logger wiring, docs consistency edits.
2. `claude-sonnet-4.6`:
- gateway/router instrumentation and metrics implementation.
3. `claude-opus-4.6`:
- event taxonomy review, baseline gate design, failure-mode coverage review.