flynn/docs/plans/2026-02-25-deeper-end-user-surfaces-and-integrated-behavior-stack-plan.md

# Flynn Deeper End-User Surfaces + Integrated Behavior Stack Plan

Date: 2026-02-25
Status: proposed roadmap
Scope: deepen assistant "product feel" (behavior semantics + user-facing surfaces) without rewriting Flynn core architecture

## Summary

This plan adopts a balanced hybrid strategy:

1. Improve behavior semantics first where correctness risk is highest (interrupt/cancel/run control).
2. In parallel, ship selective deeper user surfaces (companion, canvas persistence, voice continuity).
3. Land each slice with explicit observability gates so rollout decisions are data-driven.

## Why This Plan

Flynn already has strong foundations:

- Queue + session orchestration: `src/gateway/lane-queue.ts`, `src/gateway/session-bridge.ts`
- Multi-path routing and backend fallback: `src/daemon/routing.ts`
- Companion RPC foundation: `src/companion/runtimeClient.ts`, `src/gateway/protocol.ts`
- Canvas API baseline: `src/gateway/handlers/canvas.ts`, `src/gateway/canvas-store.ts`
- Voice in/out primitives: `src/models/media.ts`, `src/models/tts.ts`
- Reactions baseline: `src/automation/reactions.ts`

Largest remaining gap vs OpenClaw-like "assistant feel" is integration behavior across those systems, not missing foundational architecture.

## Goals and Success Criteria

1. Deterministic active-run control under bursty traffic
2. Rich, safe proactive behavior stack
3. Durable end-user surfaces for companion/canvas/voice
4. Measurable reliability improvements across canary phases

Quantitative success gates:

1. Cancel-to-ack p95 <= 500ms on gateway sessions.
2. Duplicate assistant responses caused by run preemption: 0 in integration tests.
3. Reaction false-positive rate <= 3% in canary logs.
4. Companion reconnect success >= 99% in soak tests.
5. Canvas artifact persistence survives daemon restart in integration tests.
6. Voice failures degrade to text-only replies with no dropped responses.

## Out of Scope

1. Full native macOS/iOS/Android app suite in this phase set.
2. Broad protocol redesign or protocol-version breaking changes.
3. Pi backend expansion (kept separate from this roadmap until re-approval).

## Workstreams and Complexity

| Workstream | Complexity | Main Risk |
| --- | --- | --- |
| Run-control semantics unification | High | race conditions and cancellation ordering |
| Reactions + proactive behavior v2 | Medium-High | noisy or looping automation |
| Companion + canvas + voice deepening | High | cross-surface consistency and restart behavior |
| Rollout hardening + observability | Medium | incomplete canary signals |

## Phase 0 - Baseline Instrumentation and Guardrails

Duration: 3-5 days

Execution checklist:
- `docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md`

### Deliverables

1. Add baseline event and metric instrumentation for:
   - run-state transitions,
   - cancellation path timings,
   - reaction match/skip reasons,
   - surface delivery outcomes.
2. Define canary gate calculator inputs for each later phase.

### Files

1. `src/audit/types.ts`
2. `src/audit/logger.ts`
3. `src/gateway/metrics.ts`
4. `docs/api/PROTOCOL.md` (event semantics if needed)
5. `docs/plans/state.json`

### Acceptance

1. Baseline report generated before behavior changes.
2. No runtime behavior changes in this phase, only observability.

## Phase 1 - Run-Control Semantics + Gateway UX Signals

Duration: 2-3 weeks

### Objectives

1. Enforce "latest wins" semantics when queue policy is `interrupt`.
2. Align cancellation behavior between gateway and channel router paths.
3. Expose user-visible run lifecycle state in gateway events/UI.

### Implementation

1. Queue semantics hardening:
   - `src/gateway/lane-queue.ts`
   - ensure queued + active behavior rules are explicit and testable.
2. Active run cancellation wiring:
   - `src/gateway/handlers/agent.ts`
   - `src/gateway/session-bridge.ts`
   - `src/daemon/routing.ts` (`activeRuns` parity behavior).
3. Event surface:
   - add additive `run_state` event in `src/gateway/protocol.ts`
   - consume/render in `src/gateway/ui/pages/chat.js`.

### Test Plan

1. `src/gateway/lane-queue.test.ts`: preemption ordering, overflow with interrupt, debounce edge cases.
2. `src/gateway/handlers/agent.test.ts`: interrupt + active cancel + queued supersede flows.
3. `src/daemon/routing.test.ts`: channel-path cancellation parity.
4. `src/gateway/ui/pages/chat.test.ts`: `run_state` rendering and transitions.

### Acceptance

1. Cancel-to-ack p95 <= 500ms.
2. Zero duplicate final responses in integration suite.
3. Backward compatibility for clients ignoring `run_state`.

## Phase 2 - Reactions and Proactive Behavior Stack V2

Duration: 2 weeks

### Objectives

1. Replace first-match reaction behavior with deterministic priority + cooldown semantics.
2. Keep announce delivery safe and auditable.
3. Prevent recursion/looping behavior.

### Config/API Additions (backward-compatible)

Extend `automation.reactions[]` in `src/config/schema.ts` with:

1. `priority` (number, default `100`)
2. `cooldown_ms` (number, default `0`)
3. `stop_on_match` (boolean, default `true`)

Existing fields remain valid and unchanged.

### Implementation

1. Reaction engine expansion:
   - `src/automation/reactions.ts` (or split `reactionEngine.ts` if needed).
2. Routing integration:
   - `src/daemon/routing.ts` deterministic reaction resolution.
3. Delivery consistency:
   - `src/automation/cron.ts`
   - `src/automation/webhooks.ts`
   - preserve `delivery_mode` semantics and audit metadata.

### Test Plan

1. `src/automation/reactions.test.ts`:
   - priority conflict resolution,
   - cooldown suppression,
   - metadata and template rendering.
2. `src/daemon/routing.test.ts`:
   - reaction trigger integration and command-path exclusion.
3. `src/automation/cron.test.ts` / `src/automation/webhooks.test.ts`:
   - announce/isolation metadata correctness.

### Acceptance

1. False-positive match rate <= 3% in canary.
2. No reaction recursion loops.
3. Deterministic rule selection under overlap.

## Phase 3 - Deeper Surfaces: Companion, Canvas Durability, Voice Continuity

Duration: 3-4 weeks

### Objectives

1. Upgrade companion from heartbeat-only utility to reliable daily-use surface.
2. Make canvas artifacts durable across restart.
3. Improve voice continuity behavior around cancellation and fallbacks.

### Implementation

1. Companion hardening:
   - `src/cli/companion.ts`
   - `src/companion/runtimeClient.ts`
   - `src/gateway/handlers/node.ts`
   - focus on reconnect and subscription resilience.
2. Canvas persistence:
   - `src/gateway/canvas-store.ts` (durable backing instead of in-memory only)
   - `src/gateway/handlers/canvas.ts`
   - UI rendering/inspection in `src/gateway/ui/pages/chat.js` (or dedicated canvas page).
3. Voice continuity:
   - `src/daemon/routing.ts` (talk-mode + cancellation + output behavior)
   - `src/models/tts.ts`
   - channel adapter output checks where required.

### Test Plan

1. Companion integration tests for reconnect and event continuity.
2. Canvas store tests for restart durability and eviction policy.
3. Voice tests for TTS errors, fallback to text, and interrupted runs.

### Acceptance

1. Companion reconnect success >= 99% in soak.
2. Canvas survives daemon restart in integration suite.
3. Voice path never drops assistant reply when TTS fails.

## Phase 4 - Rollout, Hardening, and Operator Readiness

Duration: 1 week

### Deliverables

1. Canary rollout plan by feature flag/surface.
2. Explicit rollback playbook.
3. Operator docs and architecture/protocol docs synchronized.

### Documentation Updates (required in same PRs)

1. `README.md`
2. `docs/api/PROTOCOL.md`
3. `docs/architecture/AGENT_DIAGRAM.md`
4. `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`
5. `docs/plans/state.json`

## Execution and Commit Structure

1. Branch: `feature/deeper-surfaces-integrated-behavior-stack`
2. Atomic commits per slice:
   - implementation + tests + docs + `state.json` together
3. Rebase onto `main` before merge.
4. Fast-forward merge only.

## Model-Tier Delegation Plan for Implementation Work

1. `claude-haiku-4.5`:
   - mechanical schema/test/doc updates.
2. `claude-sonnet-4.6`:
   - default implementation tasks across queue/routing/companion/canvas.
3. `claude-opus-4.6`:
   - concurrency semantics review, failure-mode design, and rollout gate design.

## Risks and Mitigations

1. Risk: preemption races create duplicate or orphaned replies.
   Mitigation: run-state event model + deterministic cancellation tests.
2. Risk: proactive rules become noisy.
   Mitigation: priority/cooldown/stop semantics + canary thresholds.
3. Risk: deeper surfaces drift from core behavior semantics.
   Mitigation: shared gateway protocol contracts and integration tests across surfaces.

## Default Decisions Locked

1. Keep gateway protocol backward-compatible (additive only).
2. Prioritize behavior reliability before broadening platform count.
3. Use companion CLI/runtime path as first deep-surface target.
4. Keep Pi expansion out of this roadmap until separate canary re-approval.