260 lines
8.9 KiB
Markdown
260 lines
8.9 KiB
Markdown
# Flynn Deeper End-User Surfaces + Integrated Behavior Stack Plan
|
|
|
|
Date: 2026-02-25
|
|
Status: proposed roadmap
|
|
Scope: deepen assistant "product feel" (behavior semantics + user-facing surfaces) without rewriting Flynn core architecture
|
|
|
|
## Summary
|
|
|
|
This plan adopts a balanced hybrid strategy:
|
|
|
|
1. Improve behavior semantics first where correctness risk is highest (interrupt/cancel/run control).
|
|
2. In parallel, ship selective deeper user surfaces (companion, canvas persistence, voice continuity).
|
|
3. Land each slice with explicit observability gates so rollout decisions are data-driven.
|
|
|
|
## Why This Plan
|
|
|
|
Flynn already has strong foundations:
|
|
|
|
- Queue + session orchestration: `src/gateway/lane-queue.ts`, `src/gateway/session-bridge.ts`
|
|
- Multi-path routing and backend fallback: `src/daemon/routing.ts`
|
|
- Companion RPC foundation: `src/companion/runtimeClient.ts`, `src/gateway/protocol.ts`
|
|
- Canvas API baseline: `src/gateway/handlers/canvas.ts`, `src/gateway/canvas-store.ts`
|
|
- Voice in/out primitives: `src/models/media.ts`, `src/models/tts.ts`
|
|
- Reactions baseline: `src/automation/reactions.ts`
|
|
|
|
Largest remaining gap vs OpenClaw-like "assistant feel" is integration behavior across those systems, not missing foundational architecture.
|
|
|
|
## Goals and Success Criteria
|
|
|
|
1. Deterministic active-run control under bursty traffic
|
|
2. Rich, safe proactive behavior stack
|
|
3. Durable end-user surfaces for companion/canvas/voice
|
|
4. Measurable reliability improvements across canary phases
|
|
|
|
Quantitative success gates:
|
|
|
|
1. Cancel-to-ack p95 <= 500ms on gateway sessions.
|
|
2. Duplicate assistant responses caused by run preemption: 0 in integration tests.
|
|
3. Reaction false-positive rate <= 3% in canary logs.
|
|
4. Companion reconnect success >= 99% in soak tests.
|
|
5. Canvas artifact persistence survives daemon restart in integration tests.
|
|
6. Voice failures degrade to text-only replies with no dropped responses.
|
|
|
|
## Out of Scope
|
|
|
|
1. Full native macOS/iOS/Android app suite in this phase set.
|
|
2. Broad protocol redesign or protocol-version breaking changes.
|
|
3. Pi backend expansion (kept separate from this roadmap until re-approval).
|
|
|
|
## Workstreams and Complexity
|
|
|
|
| Workstream | Complexity | Main Risk |
|
|
| --- | --- | --- |
|
|
| Run-control semantics unification | High | race conditions and cancellation ordering |
|
|
| Reactions + proactive behavior v2 | Medium-High | noisy or looping automation |
|
|
| Companion + canvas + voice deepening | High | cross-surface consistency and restart behavior |
|
|
| Rollout hardening + observability | Medium | incomplete canary signals |
|
|
|
|
## Phase 0 - Baseline Instrumentation and Guardrails
|
|
|
|
Duration: 3-5 days
|
|
|
|
Execution checklist:
|
|
- `docs/plans/2026-02-25-phase0-instrumentation-ticket-checklist.md`
|
|
|
|
### Deliverables
|
|
|
|
1. Add baseline event and metric instrumentation for:
|
|
- run-state transitions,
|
|
- cancellation path timings,
|
|
- reaction match/skip reasons,
|
|
- surface delivery outcomes.
|
|
2. Define canary gate calculator inputs for each later phase.
|
|
|
|
### Files
|
|
|
|
1. `src/audit/types.ts`
|
|
2. `src/audit/logger.ts`
|
|
3. `src/gateway/metrics.ts`
|
|
4. `docs/api/PROTOCOL.md` (event semantics if needed)
|
|
5. `docs/plans/state.json`
|
|
|
|
### Acceptance
|
|
|
|
1. Baseline report generated before behavior changes.
|
|
2. No runtime behavior changes in this phase, only observability.
|
|
|
|
## Phase 1 - Run-Control Semantics + Gateway UX Signals
|
|
|
|
Duration: 2-3 weeks
|
|
|
|
### Objectives
|
|
|
|
1. Enforce "latest wins" semantics when queue policy is `interrupt`.
|
|
2. Align cancellation behavior between gateway and channel router paths.
|
|
3. Expose user-visible run lifecycle state in gateway events/UI.
|
|
|
|
### Implementation
|
|
|
|
1. Queue semantics hardening:
|
|
- `src/gateway/lane-queue.ts`
|
|
- ensure queued + active behavior rules are explicit and testable.
|
|
2. Active run cancellation wiring:
|
|
- `src/gateway/handlers/agent.ts`
|
|
- `src/gateway/session-bridge.ts`
|
|
- `src/daemon/routing.ts` (`activeRuns` parity behavior).
|
|
3. Event surface:
|
|
- add additive `run_state` event in `src/gateway/protocol.ts`
|
|
- consume/render in `src/gateway/ui/pages/chat.js`.
|
|
|
|
### Test Plan
|
|
|
|
1. `src/gateway/lane-queue.test.ts`: preemption ordering, overflow with interrupt, debounce edge cases.
|
|
2. `src/gateway/handlers/agent.test.ts`: interrupt + active cancel + queued supersede flows.
|
|
3. `src/daemon/routing.test.ts`: channel-path cancellation parity.
|
|
4. `src/gateway/ui/pages/chat.test.ts`: `run_state` rendering and transitions.
|
|
|
|
### Acceptance
|
|
|
|
1. Cancel-to-ack p95 <= 500ms.
|
|
2. Zero duplicate final responses in integration suite.
|
|
3. Backward compatibility for clients ignoring `run_state`.
|
|
|
|
## Phase 2 - Reactions and Proactive Behavior Stack V2
|
|
|
|
Duration: 2 weeks
|
|
|
|
### Objectives
|
|
|
|
1. Replace first-match reaction behavior with deterministic priority + cooldown semantics.
|
|
2. Keep announce delivery safe and auditable.
|
|
3. Prevent recursion/looping behavior.
|
|
|
|
### Config/API Additions (backward-compatible)
|
|
|
|
Extend `automation.reactions[]` in `src/config/schema.ts` with:
|
|
|
|
1. `priority` (number, default `100`)
|
|
2. `cooldown_ms` (number, default `0`)
|
|
3. `stop_on_match` (boolean, default `true`)
|
|
|
|
Existing fields remain valid and unchanged.
|
|
|
|
### Implementation
|
|
|
|
1. Reaction engine expansion:
|
|
- `src/automation/reactions.ts` (or split `reactionEngine.ts` if needed).
|
|
2. Routing integration:
|
|
- `src/daemon/routing.ts` deterministic reaction resolution.
|
|
3. Delivery consistency:
|
|
- `src/automation/cron.ts`
|
|
- `src/automation/webhooks.ts`
|
|
- preserve `delivery_mode` semantics and audit metadata.
|
|
|
|
### Test Plan
|
|
|
|
1. `src/automation/reactions.test.ts`:
|
|
- priority conflict resolution,
|
|
- cooldown suppression,
|
|
- metadata and template rendering.
|
|
2. `src/daemon/routing.test.ts`:
|
|
- reaction trigger integration and command-path exclusion.
|
|
3. `src/automation/cron.test.ts` / `src/automation/webhooks.test.ts`:
|
|
- announce/isolation metadata correctness.
|
|
|
|
### Acceptance
|
|
|
|
1. False-positive match rate <= 3% in canary.
|
|
2. No reaction recursion loops.
|
|
3. Deterministic rule selection under overlap.
|
|
|
|
## Phase 3 - Deeper Surfaces: Companion, Canvas Durability, Voice Continuity
|
|
|
|
Duration: 3-4 weeks
|
|
|
|
### Objectives
|
|
|
|
1. Upgrade companion from heartbeat-only utility to reliable daily-use surface.
|
|
2. Make canvas artifacts durable across restart.
|
|
3. Improve voice continuity behavior around cancellation and fallbacks.
|
|
|
|
### Implementation
|
|
|
|
1. Companion hardening:
|
|
- `src/cli/companion.ts`
|
|
- `src/companion/runtimeClient.ts`
|
|
- `src/gateway/handlers/node.ts`
|
|
- focus on reconnect and subscription resilience.
|
|
2. Canvas persistence:
|
|
- `src/gateway/canvas-store.ts` (durable backing instead of in-memory only)
|
|
- `src/gateway/handlers/canvas.ts`
|
|
- UI rendering/inspection in `src/gateway/ui/pages/chat.js` (or dedicated canvas page).
|
|
3. Voice continuity:
|
|
- `src/daemon/routing.ts` (talk-mode + cancellation + output behavior)
|
|
- `src/models/tts.ts`
|
|
- channel adapter output checks where required.
|
|
|
|
### Test Plan
|
|
|
|
1. Companion integration tests for reconnect and event continuity.
|
|
2. Canvas store tests for restart durability and eviction policy.
|
|
3. Voice tests for TTS errors, fallback to text, and interrupted runs.
|
|
|
|
### Acceptance
|
|
|
|
1. Companion reconnect success >= 99% in soak.
|
|
2. Canvas survives daemon restart in integration suite.
|
|
3. Voice path never drops assistant reply when TTS fails.
|
|
|
|
## Phase 4 - Rollout, Hardening, and Operator Readiness
|
|
|
|
Duration: 1 week
|
|
|
|
### Deliverables
|
|
|
|
1. Canary rollout plan by feature flag/surface.
|
|
2. Explicit rollback playbook.
|
|
3. Operator docs and architecture/protocol docs synchronized.
|
|
|
|
### Documentation Updates (required in same PRs)
|
|
|
|
1. `README.md`
|
|
2. `docs/api/PROTOCOL.md`
|
|
3. `docs/architecture/AGENT_DIAGRAM.md`
|
|
4. `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`
|
|
5. `docs/plans/state.json`
|
|
|
|
## Execution and Commit Structure
|
|
|
|
1. Branch: `feature/deeper-surfaces-integrated-behavior-stack`
|
|
2. Atomic commits per slice:
|
|
- implementation + tests + docs + `state.json` together
|
|
3. Rebase onto `main` before merge.
|
|
4. Fast-forward merge only.
|
|
|
|
## Model-Tier Delegation Plan for Implementation Work
|
|
|
|
1. `claude-haiku-4.5`:
|
|
- mechanical schema/test/doc updates.
|
|
2. `claude-sonnet-4.6`:
|
|
- default implementation tasks across queue/routing/companion/canvas.
|
|
3. `claude-opus-4.6`:
|
|
- concurrency semantics review, failure-mode design, and rollout gate design.
|
|
|
|
## Risks and Mitigations
|
|
|
|
1. Risk: preemption races create duplicate or orphaned replies.
|
|
Mitigation: run-state event model + deterministic cancellation tests.
|
|
2. Risk: proactive rules become noisy.
|
|
Mitigation: priority/cooldown/stop semantics + canary thresholds.
|
|
3. Risk: deeper surfaces drift from core behavior semantics.
|
|
Mitigation: shared gateway protocol contracts and integration tests across surfaces.
|
|
|
|
## Default Decisions Locked
|
|
|
|
1. Keep gateway protocol backward-compatible (additive only).
|
|
2. Prioritize behavior reliability before broadening platform count.
|
|
3. Use companion CLI/runtime path as first deep-surface target.
|
|
4. Keep Pi expansion out of this roadmap until separate canary re-approval.
|