Document companion reconnect/handoff reliability progress

This commit is contained in:
William Valentin
2026-02-26 17:01:22 -08:00
parent 184dc2c688
commit 2a9bed8c91
5 changed files with 34 additions and 8 deletions
+2 -1
View File
@@ -1072,6 +1072,7 @@ Register node role/capabilities for the current WebSocket connection.
Registration is scoped to the connection. If a companion reconnects it must call `node.register` again
to restore node identity, capabilities, and access to `node.*` methods.
`CompanionRuntimeClient` can optionally replay cached node registration/status/location/push state on reconnect.
**Request:**
```json
@@ -1857,5 +1858,5 @@ For more implementation details, see:
- Protocol types: `src/gateway/protocol.ts`
- Handlers: `src/gateway/handlers/`
- Gateway server: `src/gateway/server.ts`
- Companion runtime client helper: `src/companion/runtimeClient.ts` (node + system + `canvas.*` typed RPC wrappers, optional `autoConnect`/`autoReconnect`, connection event subscriptions)
- Companion runtime client helper: `src/companion/runtimeClient.ts` (node + system + `canvas.*` typed RPC wrappers, optional `autoConnect`/`autoReconnect`, optional reconnect state replay, `sendAgentMessage` handoff helper, connection event subscriptions)
- Platform companion wrappers: `src/companion/platformClients.ts`
+1 -1
View File
@@ -154,7 +154,7 @@ Gateway streaming UX signals:
- WebSocket `agent.send` emits `run_state` lifecycle events (`start`, `cancel_requested`, `cancelled`, `complete`, `error`) for UI/state rendering.
- Routing applies reaction rules with deterministic priority/cooldown (and recursion guard) before intent routing.
- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect and surface connection events.
- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect, optionally replay cached node state (`register/status/location/push`), and surface connection events.
- Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts.
- TTS synthesis failures degrade to text-only replies without dropping the response.
@@ -19,7 +19,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
- Reaction matching is deterministic (priority + cooldown + recursion guard) before intent/agent routing.
- `subagent.*` tools create child orchestrators scoped to the parent conversation (`subagent:<parentSessionId>:<childId>`) with idle TTL cleanup, per-child queue mode (`followup|interrupt`), and session budgets (turn/token/timeout); this is tool-loop behavior, not a separate gateway RPC session lane.
- Browser workflow reliability primitives (`browser.wait_for/assert/extract/checkpoint.*`) execute in the same queued session lane and apply browser-config guardrails (domain allowlist/high-risk confirmation, bounded retries, workflow step budget).
- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods.
- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods (or use runtime-client reconnect state replay to re-register/status/location/push automatically).
- Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts.
- TTS output is best-effort; synthesis failures fall back to text-only responses.
+26 -4
View File
@@ -6789,7 +6789,7 @@
"status": "in_progress",
"date": "2026-02-26",
"updated": "2026-02-26",
"summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability work is now shipped (workflow primitives, retry/budget/guardrails, checkpoints), with companion/voice/onboarding phases remaining.",
"summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability is shipped, and Phase 1 companion reliability/runtime handoff hardening is in progress (node-state reconnect replay + wrapper/CLI handoff paths + reconnect/token-refresh integration coverage).",
"files_modified": [
"docs/plans/2026-02-26-personal-assistant-productization-plan.md",
"docs/plans/state.json"
@@ -6818,6 +6818,28 @@
],
"test_status": "pnpm test:run src/tools/builtin/browser/tools.test.ts src/config/schema.test.ts src/tools/policy.test.ts + pnpm typecheck passing"
},
"personal-assistant-productization-phase1-companion-reconnect-handoff": {
"status": "completed",
"date": "2026-02-26",
"updated": "2026-02-26",
"summary": "Advanced Phase 1 companion MVP reliability: companion runtime now supports optional reconnect-time node-state replay (`node.register/status/location/push`), added typed companion `agent.send` handoff helper (`sendAgentMessage` + platform `sendMessageHandoff`), and expanded integration coverage for reconnect/background wake/token refresh and handoff flows.",
"files_modified": [
"src/companion/runtimeClient.ts",
"src/companion/runtimeClient.test.ts",
"src/companion/platformClients.ts",
"src/companion/platformClients.test.ts",
"src/companion/platformClients.integration.test.ts",
"src/companion/index.ts",
"src/cli/companion.ts",
"src/cli/companion.test.ts",
"README.md",
"docs/api/PROTOCOL.md",
"docs/architecture/AGENT_DIAGRAM.md",
"docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
"docs/plans/state.json"
],
"test_status": "pnpm test:run src/companion/runtimeClient.test.ts src/companion/platformClients.test.ts src/companion/platformClients.integration.test.ts src/cli/companion.test.ts + pnpm typecheck passing"
},
"subagents-support-phase1": {
"status": "completed",
"date": "2026-02-26",
@@ -6852,7 +6874,7 @@
}
},
"overall_progress": {
"total_test_count": 2544,
"total_test_count": 2553,
"all_tests_passing": true,
"p0_completion": "3/3 (100%)",
"p1_completion": "4/4 (100%)",
@@ -6867,7 +6889,7 @@
"tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor",
"tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings",
"tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes",
"feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-26 (phase 3) — channel breadth, setup wizard, baseline browser automation, subagent controls, and browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets) are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, and first-success onboarding funnel optimization.",
"feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-26 (phase 3 + phase 1 reliability slice) — channel breadth, setup wizard, baseline browser automation, subagent controls, browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets), and companion reconnect/runtime-handoff foundations are implemented; remaining high-impact personal-assistant gaps center on shipped desktop/mobile companion apps, voice UX polish, and first-success onboarding funnel optimization.",
"operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done",
"dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling",
"gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram",
@@ -6900,7 +6922,7 @@
"deeper_surfaces_phase3_companion_canvas_voice": "completed \u2014 companion reconnect resilience (auto-reconnect with backoff, pending-wait cancellation on disconnect), canvas artifact persistence (SQLite-backed store, daemon-restart durability), voice TTS fallback coverage (text-only reply on TTS failure, no dropped responses)",
"deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized",
"post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)",
"personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability layer shipped (wait/assert/extract/checkpoints + guardrails/retries/budgets). Remaining phases: companion MVP surfaces, voice reliability hardening, and onboarding 2.0 first-success funnel.",
"personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability shipped, and Phase 1 companion runtime reliability now includes reconnect state replay plus typed handoff support with integration coverage. Remaining phases: companion app packaging/surfaces, voice reliability hardening, and onboarding 2.0 first-success funnel.",
"subagents_support": "completed \u2014 subagent phases 1-3 shipped with `subagent.spawn/send/list/cancel/delete/summary`, per-child queue mode (`followup|interrupt`), budgets (`max_turns`, `max_total_tokens`, `turn_timeout_ms`), tool-profile overrides, trace-linked audit events, `/subagents` inspection commands, and focused regression tests."
},
"soul_md_and_cron_create": {