will/flynn

Files

T

William Valentin 23b813a92f feat(audit): add phase0 run/reaction baseline audit events

2026-02-25 00:12:31 -08:00

8.9 KiB

Raw Blame History

Flynn Deeper End-User Surfaces + Integrated Behavior Stack Plan

Date: 2026-02-25
Status: proposed roadmap
Scope: deepen assistant "product feel" (behavior semantics + user-facing surfaces) without rewriting Flynn core architecture

Summary

This plan adopts a balanced hybrid strategy:

Improve behavior semantics first where correctness risk is highest (interrupt/cancel/run control).
In parallel, ship selective deeper user surfaces (companion, canvas persistence, voice continuity).
Land each slice with explicit observability gates so rollout decisions are data-driven.

Why This Plan

Flynn already has strong foundations:

Queue + session orchestration: src/gateway/lane-queue.ts, src/gateway/session-bridge.ts
Multi-path routing and backend fallback: src/daemon/routing.ts
Companion RPC foundation: src/companion/runtimeClient.ts, src/gateway/protocol.ts
Canvas API baseline: src/gateway/handlers/canvas.ts, src/gateway/canvas-store.ts
Voice in/out primitives: src/models/media.ts, src/models/tts.ts
Reactions baseline: src/automation/reactions.ts

Largest remaining gap vs OpenClaw-like "assistant feel" is integration behavior across those systems, not missing foundational architecture.

Goals and Success Criteria

Deterministic active-run control under bursty traffic
Rich, safe proactive behavior stack
Durable end-user surfaces for companion/canvas/voice
Measurable reliability improvements across canary phases

Quantitative success gates:

Cancel-to-ack p95 <= 500ms on gateway sessions.
Duplicate assistant responses caused by run preemption: 0 in integration tests.
Reaction false-positive rate <= 3% in canary logs.
Companion reconnect success >= 99% in soak tests.
Canvas artifact persistence survives daemon restart in integration tests.
Voice failures degrade to text-only replies with no dropped responses.

Out of Scope

Full native macOS/iOS/Android app suite in this phase set.
Broad protocol redesign or protocol-version breaking changes.
Pi backend expansion (kept separate from this roadmap until re-approval).

Workstreams and Complexity

Workstream	Complexity	Main Risk
Run-control semantics unification	High	race conditions and cancellation ordering
Reactions + proactive behavior v2	Medium-High	noisy or looping automation
Companion + canvas + voice deepening	High	cross-surface consistency and restart behavior
Rollout hardening + observability	Medium	incomplete canary signals

Phase 0 - Baseline Instrumentation and Guardrails

Duration: 3-5 days

Execution checklist:

docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md

Deliverables

Add baseline event and metric instrumentation for:
- run-state transitions,
- cancellation path timings,
- reaction match/skip reasons,
- surface delivery outcomes.
Define canary gate calculator inputs for each later phase.

Files

src/audit/types.ts
src/audit/logger.ts
src/gateway/metrics.ts
docs/api/PROTOCOL.md (event semantics if needed)
docs/plans/state.json

Acceptance

Baseline report generated before behavior changes.
No runtime behavior changes in this phase, only observability.

Phase 1 - Run-Control Semantics + Gateway UX Signals

Duration: 2-3 weeks

Objectives

Enforce "latest wins" semantics when queue policy is interrupt.
Align cancellation behavior between gateway and channel router paths.
Expose user-visible run lifecycle state in gateway events/UI.

Implementation

Queue semantics hardening:
- src/gateway/lane-queue.ts
- ensure queued + active behavior rules are explicit and testable.
Active run cancellation wiring:
- src/gateway/handlers/agent.ts
- src/gateway/session-bridge.ts
- src/daemon/routing.ts (activeRuns parity behavior).
Event surface:
- add additive run_state event in src/gateway/protocol.ts
- consume/render in src/gateway/ui/pages/chat.js.

Test Plan

src/gateway/lane-queue.test.ts: preemption ordering, overflow with interrupt, debounce edge cases.
src/gateway/handlers/agent.test.ts: interrupt + active cancel + queued supersede flows.
src/daemon/routing.test.ts: channel-path cancellation parity.
src/gateway/ui/pages/chat.test.ts: run_state rendering and transitions.

Acceptance

Cancel-to-ack p95 <= 500ms.
Zero duplicate final responses in integration suite.
Backward compatibility for clients ignoring run_state.

Phase 2 - Reactions and Proactive Behavior Stack V2

Duration: 2 weeks

Objectives

Replace first-match reaction behavior with deterministic priority + cooldown semantics.
Keep announce delivery safe and auditable.
Prevent recursion/looping behavior.

Config/API Additions (backward-compatible)

Extend automation.reactions[] in src/config/schema.ts with:

priority (number, default 100)
cooldown_ms (number, default 0)
stop_on_match (boolean, default true)

Existing fields remain valid and unchanged.

Implementation

Reaction engine expansion:
- src/automation/reactions.ts (or split reactionEngine.ts if needed).
Routing integration:
- src/daemon/routing.ts deterministic reaction resolution.
Delivery consistency:
- src/automation/cron.ts
- src/automation/webhooks.ts
- preserve delivery_mode semantics and audit metadata.

Test Plan

src/automation/reactions.test.ts:
- priority conflict resolution,
- cooldown suppression,
- metadata and template rendering.
src/daemon/routing.test.ts:
- reaction trigger integration and command-path exclusion.
src/automation/cron.test.ts / src/automation/webhooks.test.ts:
- announce/isolation metadata correctness.

Acceptance

False-positive match rate <= 3% in canary.
No reaction recursion loops.
Deterministic rule selection under overlap.

Phase 3 - Deeper Surfaces: Companion, Canvas Durability, Voice Continuity

Duration: 3-4 weeks

Objectives

Upgrade companion from heartbeat-only utility to reliable daily-use surface.
Make canvas artifacts durable across restart.
Improve voice continuity behavior around cancellation and fallbacks.

Implementation

Companion hardening:
- src/cli/companion.ts
- src/companion/runtimeClient.ts
- src/gateway/handlers/node.ts
- focus on reconnect and subscription resilience.
Canvas persistence:
- src/gateway/canvas-store.ts (durable backing instead of in-memory only)
- src/gateway/handlers/canvas.ts
- UI rendering/inspection in src/gateway/ui/pages/chat.js (or dedicated canvas page).
Voice continuity:
- src/daemon/routing.ts (talk-mode + cancellation + output behavior)
- src/models/tts.ts
- channel adapter output checks where required.

Test Plan

Companion integration tests for reconnect and event continuity.
Canvas store tests for restart durability and eviction policy.
Voice tests for TTS errors, fallback to text, and interrupted runs.

Acceptance

Companion reconnect success >= 99% in soak.
Canvas survives daemon restart in integration suite.
Voice path never drops assistant reply when TTS fails.

Phase 4 - Rollout, Hardening, and Operator Readiness

Duration: 1 week

Deliverables

Canary rollout plan by feature flag/surface.
Explicit rollback playbook.
Operator docs and architecture/protocol docs synchronized.

Documentation Updates (required in same PRs)

README.md
docs/api/PROTOCOL.md
docs/architecture/AGENT_DIAGRAM.md
docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md
docs/plans/state.json

Execution and Commit Structure

Branch: feature/deeper-surfaces-integrated-behavior-stack
Atomic commits per slice:
- implementation + tests + docs + state.json together
Rebase onto main before merge.
Fast-forward merge only.

Model-Tier Delegation Plan for Implementation Work

claude-haiku-4.5:
- mechanical schema/test/doc updates.
claude-sonnet-4.6:
- default implementation tasks across queue/routing/companion/canvas.
claude-opus-4.6:
- concurrency semantics review, failure-mode design, and rollout gate design.

Risks and Mitigations

Risk: preemption races create duplicate or orphaned replies.
Mitigation: run-state event model + deterministic cancellation tests.
Risk: proactive rules become noisy.
Mitigation: priority/cooldown/stop semantics + canary thresholds.
Risk: deeper surfaces drift from core behavior semantics.
Mitigation: shared gateway protocol contracts and integration tests across surfaces.

Default Decisions Locked

Keep gateway protocol backward-compatible (additive only).
Prioritize behavior reliability before broadening platform count.
Use companion CLI/runtime path as first deep-surface target.
Keep Pi expansion out of this roadmap until separate canary re-approval.

8.9 KiB Raw Blame History

Flynn Deeper End-User Surfaces + Integrated Behavior Stack Plan

Summary

Why This Plan

Goals and Success Criteria

Out of Scope

Workstreams and Complexity

Phase 0 - Baseline Instrumentation and Guardrails

Deliverables

Files

Acceptance

Phase 1 - Run-Control Semantics + Gateway UX Signals

Objectives

Implementation

Test Plan

Acceptance

Phase 2 - Reactions and Proactive Behavior Stack V2

Objectives

Config/API Additions (backward-compatible)

Implementation

Test Plan

Acceptance

Phase 3 - Deeper Surfaces: Companion, Canvas Durability, Voice Continuity

Objectives

Implementation

Test Plan

Acceptance

Phase 4 - Rollout, Hardening, and Operator Readiness

Deliverables

Documentation Updates (required in same PRs)

Execution and Commit Structure

Model-Tier Delegation Plan for Implementation Work

Risks and Mitigations

Default Decisions Locked

8.9 KiB

Raw Blame History