Files
flynn/docs/plans/2026-02-25-deeper-end-user-surfaces-and-integrated-behavior-stack-plan.md
T
2026-02-25 00:12:31 -08:00

8.9 KiB

Flynn Deeper End-User Surfaces + Integrated Behavior Stack Plan

Date: 2026-02-25
Status: proposed roadmap
Scope: deepen assistant "product feel" (behavior semantics + user-facing surfaces) without rewriting Flynn core architecture

Summary

This plan adopts a balanced hybrid strategy:

  1. Improve behavior semantics first where correctness risk is highest (interrupt/cancel/run control).
  2. In parallel, ship selective deeper user surfaces (companion, canvas persistence, voice continuity).
  3. Land each slice with explicit observability gates so rollout decisions are data-driven.

Why This Plan

Flynn already has strong foundations:

  • Queue + session orchestration: src/gateway/lane-queue.ts, src/gateway/session-bridge.ts
  • Multi-path routing and backend fallback: src/daemon/routing.ts
  • Companion RPC foundation: src/companion/runtimeClient.ts, src/gateway/protocol.ts
  • Canvas API baseline: src/gateway/handlers/canvas.ts, src/gateway/canvas-store.ts
  • Voice in/out primitives: src/models/media.ts, src/models/tts.ts
  • Reactions baseline: src/automation/reactions.ts

Largest remaining gap vs OpenClaw-like "assistant feel" is integration behavior across those systems, not missing foundational architecture.

Goals and Success Criteria

  1. Deterministic active-run control under bursty traffic
  2. Rich, safe proactive behavior stack
  3. Durable end-user surfaces for companion/canvas/voice
  4. Measurable reliability improvements across canary phases

Quantitative success gates:

  1. Cancel-to-ack p95 <= 500ms on gateway sessions.
  2. Duplicate assistant responses caused by run preemption: 0 in integration tests.
  3. Reaction false-positive rate <= 3% in canary logs.
  4. Companion reconnect success >= 99% in soak tests.
  5. Canvas artifact persistence survives daemon restart in integration tests.
  6. Voice failures degrade to text-only replies with no dropped responses.

Out of Scope

  1. Full native macOS/iOS/Android app suite in this phase set.
  2. Broad protocol redesign or protocol-version breaking changes.
  3. Pi backend expansion (kept separate from this roadmap until re-approval).

Workstreams and Complexity

Workstream Complexity Main Risk
Run-control semantics unification High race conditions and cancellation ordering
Reactions + proactive behavior v2 Medium-High noisy or looping automation
Companion + canvas + voice deepening High cross-surface consistency and restart behavior
Rollout hardening + observability Medium incomplete canary signals

Phase 0 - Baseline Instrumentation and Guardrails

Duration: 3-5 days

Execution checklist:

  • docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md

Deliverables

  1. Add baseline event and metric instrumentation for:
    • run-state transitions,
    • cancellation path timings,
    • reaction match/skip reasons,
    • surface delivery outcomes.
  2. Define canary gate calculator inputs for each later phase.

Files

  1. src/audit/types.ts
  2. src/audit/logger.ts
  3. src/gateway/metrics.ts
  4. docs/api/PROTOCOL.md (event semantics if needed)
  5. docs/plans/state.json

Acceptance

  1. Baseline report generated before behavior changes.
  2. No runtime behavior changes in this phase, only observability.

Phase 1 - Run-Control Semantics + Gateway UX Signals

Duration: 2-3 weeks

Objectives

  1. Enforce "latest wins" semantics when queue policy is interrupt.
  2. Align cancellation behavior between gateway and channel router paths.
  3. Expose user-visible run lifecycle state in gateway events/UI.

Implementation

  1. Queue semantics hardening:
    • src/gateway/lane-queue.ts
    • ensure queued + active behavior rules are explicit and testable.
  2. Active run cancellation wiring:
    • src/gateway/handlers/agent.ts
    • src/gateway/session-bridge.ts
    • src/daemon/routing.ts (activeRuns parity behavior).
  3. Event surface:
    • add additive run_state event in src/gateway/protocol.ts
    • consume/render in src/gateway/ui/pages/chat.js.

Test Plan

  1. src/gateway/lane-queue.test.ts: preemption ordering, overflow with interrupt, debounce edge cases.
  2. src/gateway/handlers/agent.test.ts: interrupt + active cancel + queued supersede flows.
  3. src/daemon/routing.test.ts: channel-path cancellation parity.
  4. src/gateway/ui/pages/chat.test.ts: run_state rendering and transitions.

Acceptance

  1. Cancel-to-ack p95 <= 500ms.
  2. Zero duplicate final responses in integration suite.
  3. Backward compatibility for clients ignoring run_state.

Phase 2 - Reactions and Proactive Behavior Stack V2

Duration: 2 weeks

Objectives

  1. Replace first-match reaction behavior with deterministic priority + cooldown semantics.
  2. Keep announce delivery safe and auditable.
  3. Prevent recursion/looping behavior.

Config/API Additions (backward-compatible)

Extend automation.reactions[] in src/config/schema.ts with:

  1. priority (number, default 100)
  2. cooldown_ms (number, default 0)
  3. stop_on_match (boolean, default true)

Existing fields remain valid and unchanged.

Implementation

  1. Reaction engine expansion:
    • src/automation/reactions.ts (or split reactionEngine.ts if needed).
  2. Routing integration:
    • src/daemon/routing.ts deterministic reaction resolution.
  3. Delivery consistency:
    • src/automation/cron.ts
    • src/automation/webhooks.ts
    • preserve delivery_mode semantics and audit metadata.

Test Plan

  1. src/automation/reactions.test.ts:
    • priority conflict resolution,
    • cooldown suppression,
    • metadata and template rendering.
  2. src/daemon/routing.test.ts:
    • reaction trigger integration and command-path exclusion.
  3. src/automation/cron.test.ts / src/automation/webhooks.test.ts:
    • announce/isolation metadata correctness.

Acceptance

  1. False-positive match rate <= 3% in canary.
  2. No reaction recursion loops.
  3. Deterministic rule selection under overlap.

Phase 3 - Deeper Surfaces: Companion, Canvas Durability, Voice Continuity

Duration: 3-4 weeks

Objectives

  1. Upgrade companion from heartbeat-only utility to reliable daily-use surface.
  2. Make canvas artifacts durable across restart.
  3. Improve voice continuity behavior around cancellation and fallbacks.

Implementation

  1. Companion hardening:
    • src/cli/companion.ts
    • src/companion/runtimeClient.ts
    • src/gateway/handlers/node.ts
    • focus on reconnect and subscription resilience.
  2. Canvas persistence:
    • src/gateway/canvas-store.ts (durable backing instead of in-memory only)
    • src/gateway/handlers/canvas.ts
    • UI rendering/inspection in src/gateway/ui/pages/chat.js (or dedicated canvas page).
  3. Voice continuity:
    • src/daemon/routing.ts (talk-mode + cancellation + output behavior)
    • src/models/tts.ts
    • channel adapter output checks where required.

Test Plan

  1. Companion integration tests for reconnect and event continuity.
  2. Canvas store tests for restart durability and eviction policy.
  3. Voice tests for TTS errors, fallback to text, and interrupted runs.

Acceptance

  1. Companion reconnect success >= 99% in soak.
  2. Canvas survives daemon restart in integration suite.
  3. Voice path never drops assistant reply when TTS fails.

Phase 4 - Rollout, Hardening, and Operator Readiness

Duration: 1 week

Deliverables

  1. Canary rollout plan by feature flag/surface.
  2. Explicit rollback playbook.
  3. Operator docs and architecture/protocol docs synchronized.

Documentation Updates (required in same PRs)

  1. README.md
  2. docs/api/PROTOCOL.md
  3. docs/architecture/AGENT_DIAGRAM.md
  4. docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md
  5. docs/plans/state.json

Execution and Commit Structure

  1. Branch: feature/deeper-surfaces-integrated-behavior-stack
  2. Atomic commits per slice:
    • implementation + tests + docs + state.json together
  3. Rebase onto main before merge.
  4. Fast-forward merge only.

Model-Tier Delegation Plan for Implementation Work

  1. claude-haiku-4.5:
    • mechanical schema/test/doc updates.
  2. claude-sonnet-4.6:
    • default implementation tasks across queue/routing/companion/canvas.
  3. claude-opus-4.6:
    • concurrency semantics review, failure-mode design, and rollout gate design.

Risks and Mitigations

  1. Risk: preemption races create duplicate or orphaned replies.
    Mitigation: run-state event model + deterministic cancellation tests.
  2. Risk: proactive rules become noisy.
    Mitigation: priority/cooldown/stop semantics + canary thresholds.
  3. Risk: deeper surfaces drift from core behavior semantics.
    Mitigation: shared gateway protocol contracts and integration tests across surfaces.

Default Decisions Locked

  1. Keep gateway protocol backward-compatible (additive only).
  2. Prioritize behavior reliability before broadening platform count.
  3. Use companion CLI/runtime path as first deep-surface target.
  4. Keep Pi expansion out of this roadmap until separate canary re-approval.