3.4 KiB
Phase 4 Rollout + Operator Readiness (Deeper Surfaces)
Date: 2026-02-25
Summary
This document provides the rollout plan, rollback playbook, and operator readiness checklist for the deeper end-user surfaces + integrated behavior stack workstreams (run-control, reactions v2, companion/canvas/voice).
Canary Rollout Plan
Guarded Rollout Steps
-
Run-control semantics (Phase 1) Toggle:
server.queue.mode: interruptonly for canary sessions viaserver.queue.overrides.sessions. Gate:cancel-to-ack p95 <= 500ms, zero duplicate final responses in integration tests. Observe:run_stateevents (start,cancel_requested,cancelled,complete,error) in gateway UI + audit logs. -
Reactions v2 (Phase 2) Toggle: restrict
automation.reactionslist to canary rules + scoped triggers. Gate: reaction false-positive rate <= 3% in audit logs (reactionMatch,reactionSkip). Observe:system.metricsreaction counters + recursion guard skip reasons. -
Companion + Canvas (Phase 3) Toggle:
server.nodes.enabled: truefor companion canary nodes, enableserver.nodes.feature_gates.ui.canvas. Gate: companion reconnect success >= 99% in soak; canvas artifacts survive restart in integration runs. Observe: node registration + capability logs; canvas list/get/put success in gateway UI. -
Voice Continuity (Phase 3) Toggle:
tts.enabled: trueandtts.enabled_channelsfor canary channels;audio.enabled: truefor inbound voice. Gate: no dropped responses when TTS fails; text-only fallback confirmed in tests. Observe: warning logs for TTS failures, reply delivery counts.
Rollout Cadence
- Week 1: enable canary on a single internal channel + 1-2 sessions.
- Week 2: expand to 5-10% sessions/channels after gates hold.
- Week 3: expand to 25-50% after second gate review.
- Week 4: default-on unless gates fail; keep toggles for rollback.
Rollback Playbook
-
Run-control rollback Set
server.queue.mode: collectglobally. Remove canary overrides inserver.queue.overrides.sessions. -
Reactions rollback Set
automation.reactions: []or remove canary rules. VerifyreactionMatchcount drops to zero. -
Companion rollback Set
server.nodes.enabled: false(or restrictallowed_rolesto none). Clear companion node registrations by restarting gateway. -
Canvas rollback Disable
ui.canvasinserver.nodes.feature_gates. Optional: archive/removedataDir/canvasafter capture if needed. -
Voice rollback Set
tts.enabled: falseand/or removetts.enabled_channels. Setaudio.enabled: falseto stop inbound voice processing.
Operator Readiness Checklist
Confirm protocol and architecture docs are synchronized (docs/api/PROTOCOL.md, docs/architecture/AGENT_DIAGRAM.md, docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md).
Verify audit logs and system.metrics are capturing run_state transitions, cancel latency buckets, and reaction match/skip reasons.
Validate canary tests: run-control queue preemption + cancel, reaction priority/cooldown, companion reconnect + re-register, canvas persistence across restart, TTS failure fallback.
Capture a before/after snapshot of error rate, cancellation latency, reaction false positives, companion reconnect success.
Owner + Comms
- Primary owner: Flynn core team
- Canary checkpoint cadence: weekly
- Escalation: revert via rollback playbook within 1 hour of gate breach