216 lines
7.4 KiB
Markdown
216 lines
7.4 KiB
Markdown
# Phase 0 Ticket Checklist: Baseline Instrumentation and Guardrails
|
|
|
|
Created: 2026-02-25
|
|
Owner: Flynn core
|
|
Status: ready to implement
|
|
Parent: `docs/plans/2026-02-25-deeper-end-user-surfaces-and-integrated-behavior-stack-plan.md`
|
|
|
|
## Goal
|
|
|
|
Establish a measurement baseline for deeper-surface and behavior-stack work before changing runtime semantics.
|
|
|
|
This checklist intentionally avoids behavior changes. It adds instrumentation and reporting only.
|
|
|
|
## PR Boundary
|
|
|
|
In scope:
|
|
|
|
1. Audit event coverage for run lifecycle, cancel intent/result, and reaction decision paths.
|
|
2. Gateway metrics support for cancel latency and run transition counters.
|
|
3. Baseline summary tooling for repeatable canary comparisons.
|
|
4. Documentation updates for new telemetry fields and collection workflow.
|
|
|
|
Out of scope:
|
|
|
|
1. Queue semantic changes (`interrupt`, `steer`, active preemption behavior).
|
|
2. New gateway event surfaces to clients (`run_state` comes in Phase 1).
|
|
3. Reactions behavior changes (priority/cooldown/stop semantics come in Phase 2).
|
|
4. Companion/canvas persistence behavior changes (Phase 3).
|
|
|
|
## Ticket 0.1 — Audit Schema Extension
|
|
|
|
Status: completed (2026-02-25)
|
|
|
|
### Scope
|
|
|
|
Add additive audit event types and payload contracts:
|
|
|
|
1. `run.state` (start, complete, cancel_requested, cancelled, error)
|
|
2. `run.cancel` (source, requested, acknowledged, latency_ms)
|
|
3. `reaction.match` (rule_name, channel, sender, filter_summary)
|
|
4. `reaction.skip` (reason category, candidate_count)
|
|
|
|
### Files
|
|
|
|
1. `src/audit/types.ts`
|
|
2. `src/audit/logger.ts`
|
|
3. `src/audit/logger.test.ts`
|
|
|
|
### Acceptance
|
|
|
|
1. All new events are additive and backward-compatible.
|
|
2. Audit logger exposes typed methods for each new event.
|
|
3. Unit tests cover serialization and path expansion still passes.
|
|
|
|
### Suggested commit message
|
|
|
|
`feat(audit): add run lifecycle and reaction decision event types`
|
|
|
|
## Ticket 0.2 — Gateway/Router Emitters for Baseline Events
|
|
|
|
Status: completed (2026-02-25)
|
|
|
|
### Scope
|
|
|
|
Emit new audit events without changing request handling behavior:
|
|
|
|
1. Gateway `agent.send` path:
|
|
- emit run start/complete/error states,
|
|
- emit cancel request/ack timing when `/stop` or cancellation path is used.
|
|
2. Daemon routing path:
|
|
- emit matching run lifecycle events for channel-origin sessions,
|
|
- emit reaction match/skip events around `matchReactionPrompt(...)`.
|
|
|
|
### Files
|
|
|
|
1. `src/gateway/handlers/agent.ts`
|
|
2. `src/daemon/routing.ts`
|
|
3. `src/gateway/handlers/agent.test.ts`
|
|
4. `src/daemon/routing.test.ts`
|
|
|
|
### Acceptance
|
|
|
|
1. Event emission is side-effect free and does not alter reply behavior.
|
|
2. Run IDs/session IDs are consistent with existing audit conventions.
|
|
3. Existing tests continue passing; new assertions verify emitted metadata.
|
|
|
|
### Suggested commit message
|
|
|
|
`feat(observability): emit run and reaction baseline audit events`
|
|
|
|
## Ticket 0.3 — Metrics Collector Baseline Counters
|
|
|
|
Status: completed (2026-02-25)
|
|
|
|
### Scope
|
|
|
|
Extend in-memory gateway metrics with baseline counters:
|
|
|
|
1. Run-state counters by outcome (`completed`, `cancelled`, `errored`).
|
|
2. Cancel-latency histogram buckets (or ring-buffer sample list).
|
|
3. Reaction decision counters (`matched`, `skipped`, per-reason).
|
|
|
|
### Files
|
|
|
|
1. `src/gateway/metrics.ts`
|
|
2. `src/gateway/metrics.test.ts`
|
|
3. `src/gateway/handlers/observability.ts` (if exposing additional metrics fields)
|
|
4. `src/gateway/handlers/observability.test.ts` (if needed)
|
|
|
|
### Acceptance
|
|
|
|
1. New metrics appear in snapshots with stable defaults.
|
|
2. No regression to existing dashboard/observability consumers.
|
|
3. Tests validate accumulation, reset behavior (if any), and shape compatibility.
|
|
|
|
### Suggested commit message
|
|
|
|
`feat(metrics): add baseline run and reaction counters for phase-0`
|
|
|
|
## Ticket 0.4 — Baseline Summary Tooling
|
|
|
|
Status: completed (2026-02-25)
|
|
|
|
### Scope
|
|
|
|
Add or extend report tooling to summarize phase-0 telemetry slices:
|
|
|
|
1. Run completion/error/cancel rates by channel/session.
|
|
2. Cancel latency p50/p95.
|
|
3. Reaction match/skip rates and top skip reasons.
|
|
4. Optional markdown + json output for plan artifacts.
|
|
|
|
### Files
|
|
|
|
1. `src/audit/backendCanarySummary.ts` (extend or split shared summarizer helpers)
|
|
2. `src/audit/backendCanarySummary.test.ts`
|
|
3. `scripts/summarize-backend-canary.ts` (or new script if cleaner)
|
|
4. `package.json` (script entry if needed)
|
|
|
|
### Acceptance
|
|
|
|
1. Tool works on current audit logs without requiring new event types.
|
|
2. New event types are incorporated when present.
|
|
3. Output is deterministic and safe on missing fields.
|
|
|
|
### Suggested commit message
|
|
|
|
`feat(audit): add phase-0 baseline summary metrics for runs and reactions`
|
|
|
|
## Ticket 0.5 — Docs + Diagram + State Sync
|
|
|
|
Status: completed (2026-02-25)
|
|
|
|
### Scope
|
|
|
|
Document new observability fields and baseline workflow:
|
|
|
|
1. Protocol/observability notes (if gateway payloads changed).
|
|
2. Architecture docs review for run lifecycle observability path.
|
|
3. Plan/state updates with executed commands and artifacts.
|
|
|
|
### Files
|
|
|
|
1. `README.md`
|
|
2. `docs/api/PROTOCOL.md`
|
|
3. `docs/architecture/AGENT_DIAGRAM.md`
|
|
4. `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`
|
|
5. `docs/plans/state.json`
|
|
6. `docs/plans/artifacts/*` (baseline outputs)
|
|
|
|
### Acceptance
|
|
|
|
1. Diagram review is explicitly documented.
|
|
2. `state.json` includes ticket completion/test status.
|
|
3. Artifact links are present and reproducible.
|
|
|
|
### Suggested commit message
|
|
|
|
`docs(observability): document phase-0 telemetry and baseline workflow`
|
|
|
|
## Validation Commands (per completed ticket set)
|
|
|
|
Run focused suites first, then full validation when phase is complete:
|
|
|
|
```bash
|
|
pnpm typecheck
|
|
pnpm test:run src/audit/logger.test.ts
|
|
pnpm test:run src/gateway/metrics.test.ts
|
|
pnpm test:run src/gateway/handlers/agent.test.ts
|
|
pnpm test:run src/daemon/routing.test.ts
|
|
pnpm test:run src/audit/backendCanarySummary.test.ts
|
|
pnpm test:run src/gateway/handlers/observability.test.ts
|
|
pnpm test:run
|
|
pnpm lint
|
|
pnpm build
|
|
```
|
|
|
|
## Rollout and Exit Criteria
|
|
|
|
Phase 0 is complete when:
|
|
|
|
1. Baseline metrics are emitted for at least one representative channel session and one gateway session.
|
|
2. A baseline summary artifact is generated and committed under `docs/plans/artifacts/`.
|
|
3. No user-visible response behavior changed compared to pre-phase baseline.
|
|
|
|
Follow-up status (2026-02-27): live channel-session artifacts exist under `docs/plans/artifacts/phase0_baseline_live_2026-02-27.*` via `pnpm audit:phase0-baseline:live` (anonymized IDs), and a second gateway-origin live window (including `run.cancel` + `cancel_requested`/`cancelled`) exists under `docs/plans/artifacts/phase0_baseline_live_gateway_2026-02-27.*`. Gateway window refreshes can now run via `pnpm audit:phase0-baseline:live:gateway` (auto-selected cancel window), all live windows can be refreshed together with `pnpm audit:phase0-baseline:live:refresh` (channel + gateway + backend-scoped `pi`/`native`; scheduling example included in README), backend artifact freshness/drift checks are now available via `pnpm audit:phase0-baseline:live:drift` (or chained with `pnpm audit:phase0-baseline:live:refresh:drift`) with drift report artifacts written to `docs/plans/artifacts/phase0_baseline_live_backend_drift_<UTC-date>.{md,json}`.
|
|
|
|
## Subagent Model Assignment Plan
|
|
|
|
1. `claude-haiku-4.5`:
|
|
- schema additions, mechanical logger wiring, docs consistency edits.
|
|
2. `claude-sonnet-4.6`:
|
|
- gateway/router instrumentation and metrics implementation.
|
|
3. `claude-opus-4.6`:
|
|
- event taxonomy review, baseline gate design, failure-mode coverage review.
|