Add shared artifact-tag normalization/validation and apply it to capture, drift, and prune scripts for --tag/--report-tag/--baseline-tag paths. Architecture diagrams reviewed; no flow changes required.
8.0 KiB
Phase 0 Ticket Checklist: Baseline Instrumentation and Guardrails
Created: 2026-02-25
Owner: Flynn core
Status: ready to implement
Parent: docs/plans/2026-02-25-deeper-end-user-surfaces-and-integrated-behavior-stack-plan.md
Goal
Establish a measurement baseline for deeper-surface and behavior-stack work before changing runtime semantics.
This checklist intentionally avoids behavior changes. It adds instrumentation and reporting only.
PR Boundary
In scope:
- Audit event coverage for run lifecycle, cancel intent/result, and reaction decision paths.
- Gateway metrics support for cancel latency and run transition counters.
- Baseline summary tooling for repeatable canary comparisons.
- Documentation updates for new telemetry fields and collection workflow.
Out of scope:
- Queue semantic changes (
interrupt,steer, active preemption behavior). - New gateway event surfaces to clients (
run_statecomes in Phase 1). - Reactions behavior changes (priority/cooldown/stop semantics come in Phase 2).
- Companion/canvas persistence behavior changes (Phase 3).
Ticket 0.1 — Audit Schema Extension
Status: completed (2026-02-25)
Scope
Add additive audit event types and payload contracts:
run.state(start, complete, cancel_requested, cancelled, error)run.cancel(source, requested, acknowledged, latency_ms)reaction.match(rule_name, channel, sender, filter_summary)reaction.skip(reason category, candidate_count)
Files
src/audit/types.tssrc/audit/logger.tssrc/audit/logger.test.ts
Acceptance
- All new events are additive and backward-compatible.
- Audit logger exposes typed methods for each new event.
- Unit tests cover serialization and path expansion still passes.
Suggested commit message
feat(audit): add run lifecycle and reaction decision event types
Ticket 0.2 — Gateway/Router Emitters for Baseline Events
Status: completed (2026-02-25)
Scope
Emit new audit events without changing request handling behavior:
- Gateway
agent.sendpath:- emit run start/complete/error states,
- emit cancel request/ack timing when
/stopor cancellation path is used.
- Daemon routing path:
- emit matching run lifecycle events for channel-origin sessions,
- emit reaction match/skip events around
matchReactionPrompt(...).
Files
src/gateway/handlers/agent.tssrc/daemon/routing.tssrc/gateway/handlers/agent.test.tssrc/daemon/routing.test.ts
Acceptance
- Event emission is side-effect free and does not alter reply behavior.
- Run IDs/session IDs are consistent with existing audit conventions.
- Existing tests continue passing; new assertions verify emitted metadata.
Suggested commit message
feat(observability): emit run and reaction baseline audit events
Ticket 0.3 — Metrics Collector Baseline Counters
Status: completed (2026-02-25)
Scope
Extend in-memory gateway metrics with baseline counters:
- Run-state counters by outcome (
completed,cancelled,errored). - Cancel-latency histogram buckets (or ring-buffer sample list).
- Reaction decision counters (
matched,skipped, per-reason).
Files
src/gateway/metrics.tssrc/gateway/metrics.test.tssrc/gateway/handlers/observability.ts(if exposing additional metrics fields)src/gateway/handlers/observability.test.ts(if needed)
Acceptance
- New metrics appear in snapshots with stable defaults.
- No regression to existing dashboard/observability consumers.
- Tests validate accumulation, reset behavior (if any), and shape compatibility.
Suggested commit message
feat(metrics): add baseline run and reaction counters for phase-0
Ticket 0.4 — Baseline Summary Tooling
Status: completed (2026-02-25)
Scope
Add or extend report tooling to summarize phase-0 telemetry slices:
- Run completion/error/cancel rates by channel/session.
- Cancel latency p50/p95.
- Reaction match/skip rates and top skip reasons.
- Optional markdown + json output for plan artifacts.
Files
src/audit/backendCanarySummary.ts(extend or split shared summarizer helpers)src/audit/backendCanarySummary.test.tsscripts/summarize-backend-canary.ts(or new script if cleaner)package.json(script entry if needed)
Acceptance
- Tool works on current audit logs without requiring new event types.
- New event types are incorporated when present.
- Output is deterministic and safe on missing fields.
Suggested commit message
feat(audit): add phase-0 baseline summary metrics for runs and reactions
Ticket 0.5 — Docs + Diagram + State Sync
Status: completed (2026-02-25)
Scope
Document new observability fields and baseline workflow:
- Protocol/observability notes (if gateway payloads changed).
- Architecture docs review for run lifecycle observability path.
- Plan/state updates with executed commands and artifacts.
Files
README.mddocs/api/PROTOCOL.mddocs/architecture/AGENT_DIAGRAM.mddocs/architecture/GATEWAY_SESSIONS_AND_QUEUE.mddocs/plans/state.jsondocs/plans/artifacts/*(baseline outputs)
Acceptance
- Diagram review is explicitly documented.
state.jsonincludes ticket completion/test status.- Artifact links are present and reproducible.
Suggested commit message
docs(observability): document phase-0 telemetry and baseline workflow
Validation Commands (per completed ticket set)
Run focused suites first, then full validation when phase is complete:
pnpm typecheck
pnpm test:run src/audit/logger.test.ts
pnpm test:run src/gateway/metrics.test.ts
pnpm test:run src/gateway/handlers/agent.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run src/audit/backendCanarySummary.test.ts
pnpm test:run src/gateway/handlers/observability.test.ts
pnpm test:run
pnpm lint
pnpm build
Rollout and Exit Criteria
Phase 0 is complete when:
- Baseline metrics are emitted for at least one representative channel session and one gateway session.
- A baseline summary artifact is generated and committed under
docs/plans/artifacts/. - No user-visible response behavior changed compared to pre-phase baseline.
Follow-up status (2026-02-27): live channel-session artifacts exist under docs/plans/artifacts/phase0_baseline_live_2026-02-27.* via pnpm audit:phase0-baseline:live (anonymized IDs), and a second gateway-origin live window (including run.cancel + cancel_requested/cancelled) exists under docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*. Gateway window refreshes can now run via pnpm audit:phase0-baseline:live:gateway (auto-selected cancel window), all live windows can be refreshed together with pnpm audit:phase0-baseline:live:refresh (channel + gateway + backend-scoped pi/native; scheduling example included in README), backend artifact freshness/drift checks are now available via pnpm audit:phase0-baseline:live:drift (or chained with pnpm audit:phase0-baseline:live:refresh:drift) with drift report artifacts written to docs/plans/artifacts/phase0_baseline_live_backend_drift_<tag>.{md,json} and optional reaction match/skip drift thresholds, cadence runs can preserve distinct timestamped comparison points via pnpm audit:phase0-baseline:live:refresh:drift:rolling (supports shared TAG override with filename-safe tag values), rolling-tag retention can be managed via pnpm audit:phase0-baseline:live:prune (dry-run) / pnpm audit:phase0-baseline:live:prune:apply with prune report artifacts written to phase0_baseline_live_prune_<tag>.{md,json} (and retained as a managed rolling family), and one-command cadence scheduling is available via pnpm audit:phase0-baseline:live:refresh:drift:rolling:prune (non-negative integer KEEP_PER_FAMILY optional override).
Subagent Model Assignment Plan
claude-haiku-4.5:- schema additions, mechanical logger wiring, docs consistency edits.
claude-sonnet-4.6:- gateway/router instrumentation and metrics implementation.
claude-opus-4.6:- event taxonomy review, baseline gate design, failure-mode coverage review.