docs: sync voice reliability updates and phase state

This commit is contained in:
William Valentin
2026-02-26 17:29:29 -08:00
parent 163b1a0139
commit 03926a81eb
6 changed files with 83 additions and 11 deletions
+24 -1
View File
@@ -572,7 +572,24 @@ Flynn can attach synthesized voice replies (OpenAI-compatible `/v1/audio/speech`
tts:
enabled: true
enabled_channels: [telegram, whatsapp, discord] # Empty = all channels
provider:
providers:
- name: primary
type: custom # openai | custom
endpoint: "https://tts-primary.example.com/v1/audio/speech"
api_key: "${PRIMARY_TTS_API_KEY}"
model: "gpt-4o-mini-tts"
voice: "alloy"
format: "mp3" # mp3 | wav | opus
- name: backup
type: openai
api_key: "${OPENAI_API_KEY}"
model: "gpt-4o-mini-tts"
voice: "nova"
format: "opus"
fallback:
max_attempts: 2
failure_cooldown_ms: 60000
provider: # Legacy single-provider shape (still supported)
type: openai # openai | custom
endpoint: "https://api.openai.com/v1/audio/speech"
api_key: "${OPENAI_API_KEY}" # Optional Bearer token
@@ -585,12 +602,18 @@ tts:
|-------|----------|-------------|
| `tts.enabled` | no | Enable voice reply synthesis (default: `false`) |
| `tts.enabled_channels` | no | Channels allowed to receive voice replies (`[]` means all channels) |
| `tts.providers[]` | no | Ordered provider chain for synthesis fallback |
| `tts.providers[].name` | no | Provider label used for health tracking/debug logs |
| `tts.provider.type` | no | `openai` or `custom` (default: `openai`) |
| `tts.provider.endpoint` | no | OpenAI-compatible `/v1/audio/speech` endpoint (`openai` defaults to OpenAI API URL) |
| `tts.provider.api_key` | no | Bearer token for authentication |
| `tts.provider.model` | no | TTS model (default: `gpt-4o-mini-tts`) |
| `tts.provider.voice` | no | Voice identifier (default: `alloy`) |
| `tts.provider.format` | no | Output format: `mp3`, `wav`, `opus` (default: `mp3`) |
| `tts.fallback.max_attempts` | no | Max providers attempted per reply before text fallback (default: `3`) |
| `tts.fallback.failure_cooldown_ms` | no | Cooldown for providers after synthesis failures (default: `60000`) |
If all configured providers fail, Flynn deterministically returns text-only (no dropped reply) and retries unhealthy providers after their cooldown window.
### Capture Tools
+19
View File
@@ -592,6 +592,7 @@ automation:
# wake_phrase: "hey flynn"
# timeout_ms: 120000
# allow_manual_toggle: true
# # While active, spoken/text "stop" or "cancel" maps to /stop run control.
# ── Text-to-Speech (TTS) Output ──────────────────────────────────────
# Optional voice output for assistant replies. Uses an OpenAI-compatible
@@ -600,6 +601,23 @@ automation:
# tts:
# enabled: false
# enabled_channels: [telegram, whatsapp, discord] # Empty = all channels
# providers:
# - name: primary
# type: custom # openai | custom
# endpoint: "https://tts-primary.example.com/v1/audio/speech"
# api_key: "${PRIMARY_TTS_API_KEY}"
# model: "gpt-4o-mini-tts"
# voice: "alloy"
# format: "mp3" # mp3 | wav | opus
# - name: backup
# type: openai
# api_key: "${OPENAI_API_KEY}"
# model: "gpt-4o-mini-tts"
# voice: "nova"
# format: "opus"
# fallback:
# max_attempts: 2
# failure_cooldown_ms: 60000
# provider:
# type: openai # openai | custom
# endpoint: "https://api.openai.com/v1/audio/speech"
@@ -607,6 +625,7 @@ automation:
# model: "gpt-4o-mini-tts"
# voice: "alloy"
# format: "mp3" # mp3 | wav | opus
# # Legacy single-provider format (`provider`) is still supported.
# ── Sub-Agent Configs ────────────────────────────────────────────────
# Named agent configurations for delegation via agent.delegate tool.
+2 -2
View File
@@ -1550,7 +1550,7 @@ Outbound attachment (image, audio, file).
}
```
Audio attachments (TTS responses) are best-effort: if synthesis fails, the gateway still returns the text reply without any audio attachment.
Audio attachments (TTS responses) are best-effort with provider fallback: the daemon can try an ordered TTS provider chain and tracks provider health/cooldown between attempts. If synthesis still fails, the gateway returns the text reply without any audio attachment.
#### `context_warning`
@@ -1582,7 +1582,7 @@ Proactive context pressure signal emitted by `agent.send` before `done`.
#### `run_state`
Run lifecycle transition emitted during `agent.send` processing.
Run lifecycle transition emitted during `agent.send` processing (text and voice/talk turns use the same lifecycle states and cancel semantics).
```json
{
+2 -1
View File
@@ -156,7 +156,8 @@ Gateway streaming UX signals:
- Routing applies reaction rules with deterministic priority/cooldown (and recursion guard) before intent routing.
- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect, optionally replay cached node state (`register/status/location/push`), and surface connection events.
- Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts.
- TTS synthesis failures degrade to text-only replies without dropping the response.
- TTS synthesis uses an ordered provider chain with health cooldown tracking; if all providers fail, replies degrade to text-only without dropping the response.
- Talk mode accepts spoken/text `stop`/`cancel` while active and maps it onto the same `/stop` run-control cancellation path used for text sessions.
Key files:
@@ -21,7 +21,8 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
- Browser workflow reliability primitives (`browser.wait_for/assert/extract/checkpoint.*`) execute in the same queued session lane and apply browser-config guardrails (domain allowlist/high-risk confirmation, bounded retries, workflow step budget).
- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods (or use runtime-client reconnect state replay to re-register/status/location/push automatically).
- Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts.
- TTS output is best-effort; synthesis failures fall back to text-only responses.
- TTS output is best-effort with ordered provider fallback + per-provider cooldown tracking; synthesis failures still fall back to text-only responses.
- Talk mode voice sessions share the same cancel/replace semantics as text lanes (`/stop`, interrupt mode preemption), including spoken `stop`/`cancel` mapping while talk mode is active.
## Component Map
+34 -6
View File
@@ -1,6 +1,6 @@
{
"version": "1.0",
"updated_at": "2026-02-26",
"updated_at": "2026-02-27",
"description": "Tracks the status of all Flynn plans and implementation phases",
"plans": {
"phase0-ticket-0.1-audit-schema-extension": {
@@ -6788,8 +6788,8 @@
"personal-assistant-productization-plan-2026-02-26": {
"status": "in_progress",
"date": "2026-02-26",
"updated": "2026-02-26",
"summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability is shipped, and Phase 1 companion reliability/runtime handoff hardening is in progress (node-state reconnect replay + wrapper/CLI handoff paths + reconnect/token-refresh integration coverage).",
"updated": "2026-02-27",
"summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability, Phase 1 companion reconnect/handoff reliability, and Phase 2 voice daily-driver reliability (talk controls + TTS provider fallback/health + interruption-safe voice cancel semantics) are now shipped.",
"files_modified": [
"docs/plans/2026-02-26-personal-assistant-productization-plan.md",
"docs/plans/state.json"
@@ -6840,6 +6840,34 @@
],
"test_status": "pnpm test:run src/companion/runtimeClient.test.ts src/companion/platformClients.test.ts src/companion/platformClients.integration.test.ts src/cli/companion.test.ts + pnpm typecheck passing"
},
"personal-assistant-productization-phase2-voice-reliability": {
"status": "completed",
"date": "2026-02-27",
"updated": "2026-02-27",
"summary": "Implemented Phase 2 voice daily-driver reliability: added ordered multi-provider TTS fallback with per-provider health cooldown tracking, exposed talk mode + wake/timer controls and TTS fallback policy in gateway config/UI surfaces, and mapped spoken talk-mode `stop/cancel` to `/stop` run-control semantics so voice interruption behavior matches text flows.",
"files_modified": [
"src/models/tts.ts",
"src/models/tts.test.ts",
"src/daemon/routing.ts",
"src/daemon/routing.test.ts",
"src/config/schema.ts",
"src/config/schema.test.ts",
"src/gateway/handlers/config.ts",
"src/gateway/handlers/handlers.test.ts",
"src/gateway/ui/pages/settings.js",
"src/gateway/ui/pages/settings.test.ts",
"src/gateway/ui/pages/dashboard.js",
"src/gateway/ui/pages/dashboard.test.ts",
"src/gateway/ui/pages/chat.test.ts",
"config/default.yaml",
"README.md",
"docs/api/PROTOCOL.md",
"docs/architecture/AGENT_DIAGRAM.md",
"docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
"docs/plans/state.json"
],
"test_status": "pnpm test:run src/models/tts.test.ts src/daemon/routing.test.ts src/gateway/handlers/handlers.test.ts src/gateway/ui/pages/settings.test.ts src/gateway/ui/pages/dashboard.test.ts src/gateway/ui/pages/chat.test.ts src/config/schema.test.ts + pnpm typecheck passing"
},
"subagents-support-phase1": {
"status": "completed",
"date": "2026-02-26",
@@ -6874,7 +6902,7 @@
}
},
"overall_progress": {
"total_test_count": 2553,
"total_test_count": 2559,
"all_tests_passing": true,
"p0_completion": "3/3 (100%)",
"p1_completion": "4/4 (100%)",
@@ -6889,7 +6917,7 @@
"tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor",
"tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings",
"tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes",
"feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-26 (phase 3 + phase 1 reliability slice) — channel breadth, setup wizard, baseline browser automation, subagent controls, browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets), and companion reconnect/runtime-handoff foundations are implemented; remaining high-impact personal-assistant gaps center on shipped desktop/mobile companion apps, voice UX polish, and first-success onboarding funnel optimization.",
"feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-27 (phase 3 + phase 1 + phase 2 reliability slices) — channel breadth, setup wizard, baseline browser automation, subagent controls, browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets), companion reconnect/runtime-handoff foundations, and voice reliability hardening (talk controls + TTS fallback/health + interruption-safe cancel semantics) are implemented; remaining high-impact personal-assistant gaps center on shipped desktop/mobile companion apps and first-success onboarding funnel optimization.",
"operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done",
"dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling",
"gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram",
@@ -6922,7 +6950,7 @@
"deeper_surfaces_phase3_companion_canvas_voice": "completed \u2014 companion reconnect resilience (auto-reconnect with backoff, pending-wait cancellation on disconnect), canvas artifact persistence (SQLite-backed store, daemon-restart durability), voice TTS fallback coverage (text-only reply on TTS failure, no dropped responses)",
"deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized",
"post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)",
"personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability shipped, and Phase 1 companion runtime reliability now includes reconnect state replay plus typed handoff support with integration coverage. Remaining phases: companion app packaging/surfaces, voice reliability hardening, and onboarding 2.0 first-success funnel.",
"personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability shipped, Phase 1 companion runtime reliability includes reconnect state replay + typed handoff support, and Phase 2 voice reliability now ships talk controls + TTS provider fallback/health + interruption-safe voice cancel mapping. Remaining phases: companion app packaging/surfaces and onboarding 2.0 first-success funnel.",
"subagents_support": "completed \u2014 subagent phases 1-3 shipped with `subagent.spawn/send/list/cancel/delete/summary`, per-child queue mode (`followup|interrupt`), budgets (`max_turns`, `max_total_tokens`, `turn_timeout_ms`), tool-profile overrides, trace-linked audit events, `/subagents` inspection commands, and focused regression tests."
},
"soul_md_and_cron_create": {