docs: sync voice reliability updates and phase state

This commit is contained in:
William Valentin
2026-02-26 17:29:29 -08:00
parent 163b1a0139
commit 03926a81eb
6 changed files with 83 additions and 11 deletions
+2 -1
View File
@@ -156,7 +156,8 @@ Gateway streaming UX signals:
- Routing applies reaction rules with deterministic priority/cooldown (and recursion guard) before intent routing.
- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect, optionally replay cached node state (`register/status/location/push`), and surface connection events.
- Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts.
- TTS synthesis failures degrade to text-only replies without dropping the response.
- TTS synthesis uses an ordered provider chain with health cooldown tracking; if all providers fail, replies degrade to text-only without dropping the response.
- Talk mode accepts spoken/text `stop`/`cancel` while active and maps it onto the same `/stop` run-control cancellation path used for text sessions.
Key files:
@@ -21,7 +21,8 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
- Browser workflow reliability primitives (`browser.wait_for/assert/extract/checkpoint.*`) execute in the same queued session lane and apply browser-config guardrails (domain allowlist/high-risk confirmation, bounded retries, workflow step budget).
- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods (or use runtime-client reconnect state replay to re-register/status/location/push automatically).
- Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts.
- TTS output is best-effort; synthesis failures fall back to text-only responses.
- TTS output is best-effort with ordered provider fallback + per-provider cooldown tracking; synthesis failures still fall back to text-only responses.
- Talk mode voice sessions share the same cancel/replace semantics as text lanes (`/stop`, interrupt mode preemption), including spoken `stop`/`cancel` mapping while talk mode is active.
## Component Map