# Phase 4 Rollout + Operator Readiness (Deeper Surfaces) Date: 2026-02-25 ## Summary This document provides the rollout plan, rollback playbook, and operator readiness checklist for the deeper end-user surfaces + integrated behavior stack workstreams (run-control, reactions v2, companion/canvas/voice). ## Canary Rollout Plan ### Guarded Rollout Steps 1. **Run-control semantics (Phase 1)** Toggle: `server.queue.mode: interrupt` only for canary sessions via `server.queue.overrides.sessions`. Gate: `cancel-to-ack p95 <= 500ms`, zero duplicate final responses in integration tests. Observe: `run_state` events (`start`, `cancel_requested`, `cancelled`, `complete`, `error`) in gateway UI + audit logs. 2. **Reactions v2 (Phase 2)** Toggle: restrict `automation.reactions` list to canary rules + scoped triggers. Gate: reaction false-positive rate <= 3% in audit logs (`reactionMatch`, `reactionSkip`). Observe: `system.metrics` reaction counters + recursion guard skip reasons. 3. **Companion + Canvas (Phase 3)** Toggle: `server.nodes.enabled: true` for companion canary nodes, enable `server.nodes.feature_gates.ui.canvas`. Gate: companion reconnect success >= 99% in soak; canvas artifacts survive restart in integration runs. Observe: node registration + capability logs; canvas list/get/put success in gateway UI. 4. **Voice Continuity (Phase 3)** Toggle: `tts.enabled: true` and `tts.enabled_channels` for canary channels; `audio.enabled: true` for inbound voice. Gate: no dropped responses when TTS fails; text-only fallback confirmed in tests. Observe: warning logs for TTS failures, reply delivery counts. ### Rollout Cadence 1. Week 1: enable canary on a single internal channel + 1-2 sessions. 2. Week 2: expand to 5-10% sessions/channels after gates hold. 3. Week 3: expand to 25-50% after second gate review. 4. Week 4: default-on unless gates fail; keep toggles for rollback. ## Rollback Playbook 1. **Run-control rollback** Set `server.queue.mode: collect` globally. Remove canary overrides in `server.queue.overrides.sessions`. 2. **Reactions rollback** Set `automation.reactions: []` or remove canary rules. Verify `reactionMatch` count drops to zero. 3. **Companion rollback** Set `server.nodes.enabled: false` (or restrict `allowed_roles` to none). Clear companion node registrations by restarting gateway. 4. **Canvas rollback** Disable `ui.canvas` in `server.nodes.feature_gates`. Optional: archive/remove `dataDir/canvas` after capture if needed. 5. **Voice rollback** Set `tts.enabled: false` and/or remove `tts.enabled_channels`. Set `audio.enabled: false` to stop inbound voice processing. ## Operator Readiness Checklist Confirm protocol and architecture docs are synchronized (`docs/api/PROTOCOL.md`, `docs/architecture/AGENT_DIAGRAM.md`, `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`). Verify audit logs and `system.metrics` are capturing `run_state` transitions, cancel latency buckets, and reaction match/skip reasons. Validate canary tests: run-control queue preemption + cancel, reaction priority/cooldown, companion reconnect + re-register, canvas persistence across restart, TTS failure fallback. Capture a before/after snapshot of error rate, cancellation latency, reaction false positives, companion reconnect success. ## Owner + Comms - Primary owner: Flynn core team - Canary checkpoint cadence: weekly - Escalation: revert via rollback playbook within 1 hour of gate breach