From 2a9bed8c9177e231a321437ea99df77e15284b28 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Thu, 26 Feb 2026 17:01:22 -0800 Subject: [PATCH] Document companion reconnect/handoff reliability progress --- README.md | 5 +++- docs/api/PROTOCOL.md | 3 +- docs/architecture/AGENT_DIAGRAM.md | 2 +- .../GATEWAY_SESSIONS_AND_QUEUE.md | 2 +- docs/plans/state.json | 30 ++++++++++++++++--- 5 files changed, 34 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 91bbfdc..e68932e 100644 --- a/README.md +++ b/README.md @@ -1692,7 +1692,7 @@ Methods: - `system.capabilities` returns gateway protocol and node policy snapshot. Companion runtime helper: -- `src/companion/runtimeClient.ts` provides a typed Node/WebSocket client for companion runtimes (macOS/iOS/Android workers) with wrappers for `node.register`, `node.capabilities.get`, `node.location.set/get`, `node.status.set`, `node.push_token.set`, `system.capabilities`, `system.nodes`, and canvas artifact RPCs (`canvas.put/get/list/delete/clear`), plus convenience helpers (`bootstrapNode`, optional `autoConnect`, `dispose()`, `waitForIdle()` for pending-work drain synchronization) and event helpers (`subscribeEvents()`, `subscribeEvent()`, `subscribeAgentStream()`, `subscribeAgentTyping()`, `subscribeContextWarning()`, `waitForEvent()` with timeout/predicate/abort support plus event-name/timeout validation and deterministic teardown cancellation including socket-close rejection, `waitForAnyEvent()` with event-list/timeout validation, `waitForAgentStream()`, `waitForAgentTyping()`, `waitForContextWarning()`, `clearEventSubscriptions()` returning cleared-subscription/cancelled-wait counts, `cancelPendingEventWaits()` returning cancelled waiter count, `listKnownEventNames()`, `eventSubscriptionCount`) plus in-flight observability via `pendingRequestCount`, `pendingEventWaitCount`, `hasPendingWork`, `idle`, `lastDisconnectCode`, `lastDisconnectReason`, `getPendingWorkSnapshot()`, `getEventSurfaceSnapshot()`, and `getConnectionSnapshot()` (including disconnect metadata, cleared on successful reconnect). +- `src/companion/runtimeClient.ts` provides a typed Node/WebSocket client for companion runtimes (macOS/iOS/Android workers) with wrappers for `node.register`, `node.capabilities.get`, `node.location.set/get`, `node.status.set`, `node.push_token.set`, `system.capabilities`, `system.nodes`, canvas artifact RPCs (`canvas.put/get/list/delete/clear`), and `sendAgentMessage` handoff support (`agent.send` with `done`/`error` resolution), plus convenience helpers (`bootstrapNode`, optional `autoConnect`, `dispose()`, `waitForIdle()` for pending-work drain synchronization, optional reconnect-time node state replay via `enableNodeStateRecovery`) and event helpers (`subscribeEvents()`, `subscribeEvent()`, `subscribeAgentStream()`, `subscribeAgentTyping()`, `subscribeContextWarning()`, `waitForEvent()` with timeout/predicate/abort support plus event-name/timeout validation and deterministic teardown cancellation including socket-close rejection, `waitForAnyEvent()` with event-list/timeout validation, `waitForAgentStream()`, `waitForAgentTyping()`, `waitForContextWarning()`, `clearEventSubscriptions()` returning cleared-subscription/cancelled-wait counts, `cancelPendingEventWaits()` returning cancelled waiter count, `listKnownEventNames()`, `eventSubscriptionCount`) plus in-flight observability via `pendingRequestCount`, `pendingEventWaitCount`, `hasPendingWork`, `idle`, `lastDisconnectCode`, `lastDisconnectReason`, `getPendingWorkSnapshot()`, `getEventSurfaceSnapshot()`, and `getConnectionSnapshot()` (including disconnect metadata, cleared on successful reconnect). - `src/companion/platformClients.ts` provides platform-focused wrappers: - `MacOSCompanionClient` (`platform: "macos"`, APNs push registration) - `IOSCompanionClient` (`platform: "ios"`, APNs push registration) @@ -1701,6 +1701,8 @@ Companion runtime helper: - shared `publishHeartbeat()` helper for periodic `node.status.set` updates with safe defaults - `createHeartbeatLoop()` convenience helper that returns a bound `CompanionHeartbeatLoop` - optional `defaultSessionId` for canvas helper calls so `sessionId` can be omitted per call + - optional reconnect recovery toggle (`recoverNodeStateOnReconnect`, default true) to replay node registration/status/location/push after transport reconnect + - `sendMessageHandoff()` passthrough helper for companion-originated `agent.send` handoff - lifecycle passthroughs for connection state/teardown (`connected`, `disconnect(code?, reason?)`, `dispose(code?, reason?)`) - stream passthrough helpers (`subscribeEvents`, `subscribeEvent`, `clearEventSubscriptions`, `cancelPendingEventWaits`, `listKnownEventNames`, `eventSubscriptionCount`, `subscribeAgentStream/Typing/ContextWarning`, `waitForEvent`, `waitForAnyEvent`, `waitForAgentStream/Typing/ContextWarning`) - runtime observability/control passthroughs (`pendingRequestCount`, `pendingEventWaitCount`, `hasPendingWork`, `idle`, `lastDisconnectCode`, `lastDisconnectReason`, `getPendingWorkSnapshot()`, `getEventSurfaceSnapshot()`, `getConnectionSnapshot()`, `connected`, `waitForIdle()`) @@ -1709,6 +1711,7 @@ Companion runtime helper: Minimal companion CLI: - `flynn companion --once` connects to the gateway, registers a node, publishes one heartbeat, then exits. - `flynn companion --platform macos --heartbeat 30` runs a long-lived node with periodic heartbeats and logs `agent.stream`/`agent.typing` events. +- `flynn companion --once --handoff "summarize my status"` performs one post-registration `agent.send` handoff and prints the `done` content. ## WebChat PWA Push Subscriptions diff --git a/docs/api/PROTOCOL.md b/docs/api/PROTOCOL.md index 0329397..2856cb9 100644 --- a/docs/api/PROTOCOL.md +++ b/docs/api/PROTOCOL.md @@ -1072,6 +1072,7 @@ Register node role/capabilities for the current WebSocket connection. Registration is scoped to the connection. If a companion reconnects it must call `node.register` again to restore node identity, capabilities, and access to `node.*` methods. +`CompanionRuntimeClient` can optionally replay cached node registration/status/location/push state on reconnect. **Request:** ```json @@ -1857,5 +1858,5 @@ For more implementation details, see: - Protocol types: `src/gateway/protocol.ts` - Handlers: `src/gateway/handlers/` - Gateway server: `src/gateway/server.ts` -- Companion runtime client helper: `src/companion/runtimeClient.ts` (node + system + `canvas.*` typed RPC wrappers, optional `autoConnect`/`autoReconnect`, connection event subscriptions) +- Companion runtime client helper: `src/companion/runtimeClient.ts` (node + system + `canvas.*` typed RPC wrappers, optional `autoConnect`/`autoReconnect`, optional reconnect state replay, `sendAgentMessage` handoff helper, connection event subscriptions) - Platform companion wrappers: `src/companion/platformClients.ts` diff --git a/docs/architecture/AGENT_DIAGRAM.md b/docs/architecture/AGENT_DIAGRAM.md index 51b5bfc..61199d4 100644 --- a/docs/architecture/AGENT_DIAGRAM.md +++ b/docs/architecture/AGENT_DIAGRAM.md @@ -154,7 +154,7 @@ Gateway streaming UX signals: - WebSocket `agent.send` emits `run_state` lifecycle events (`start`, `cancel_requested`, `cancelled`, `complete`, `error`) for UI/state rendering. - Routing applies reaction rules with deterministic priority/cooldown (and recursion guard) before intent routing. -- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect and surface connection events. +- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect, optionally replay cached node state (`register/status/location/push`), and surface connection events. - Canvas artifacts are persisted by the gateway so session UI surfaces can recover after daemon restarts. - TTS synthesis failures degrade to text-only replies without dropping the response. diff --git a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md index d8662ae..647bdef 100644 --- a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md +++ b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md @@ -19,7 +19,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`. - Reaction matching is deterministic (priority + cooldown + recursion guard) before intent/agent routing. - `subagent.*` tools create child orchestrators scoped to the parent conversation (`subagent::`) with idle TTL cleanup, per-child queue mode (`followup|interrupt`), and session budgets (turn/token/timeout); this is tool-loop behavior, not a separate gateway RPC session lane. - Browser workflow reliability primitives (`browser.wait_for/assert/extract/checkpoint.*`) execute in the same queued session lane and apply browser-config guardrails (domain allowlist/high-risk confirmation, bounded retries, workflow step budget). -- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods. +- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods (or use runtime-client reconnect state replay to re-register/status/location/push automatically). - Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts. - TTS output is best-effort; synthesis failures fall back to text-only responses. diff --git a/docs/plans/state.json b/docs/plans/state.json index 657ef10..47077f9 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -6789,7 +6789,7 @@ "status": "in_progress", "date": "2026-02-26", "updated": "2026-02-26", - "summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability work is now shipped (workflow primitives, retry/budget/guardrails, checkpoints), with companion/voice/onboarding phases remaining.", + "summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week roadmap. Phase 3 browser reliability is shipped, and Phase 1 companion reliability/runtime handoff hardening is in progress (node-state reconnect replay + wrapper/CLI handoff paths + reconnect/token-refresh integration coverage).", "files_modified": [ "docs/plans/2026-02-26-personal-assistant-productization-plan.md", "docs/plans/state.json" @@ -6818,6 +6818,28 @@ ], "test_status": "pnpm test:run src/tools/builtin/browser/tools.test.ts src/config/schema.test.ts src/tools/policy.test.ts + pnpm typecheck passing" }, + "personal-assistant-productization-phase1-companion-reconnect-handoff": { + "status": "completed", + "date": "2026-02-26", + "updated": "2026-02-26", + "summary": "Advanced Phase 1 companion MVP reliability: companion runtime now supports optional reconnect-time node-state replay (`node.register/status/location/push`), added typed companion `agent.send` handoff helper (`sendAgentMessage` + platform `sendMessageHandoff`), and expanded integration coverage for reconnect/background wake/token refresh and handoff flows.", + "files_modified": [ + "src/companion/runtimeClient.ts", + "src/companion/runtimeClient.test.ts", + "src/companion/platformClients.ts", + "src/companion/platformClients.test.ts", + "src/companion/platformClients.integration.test.ts", + "src/companion/index.ts", + "src/cli/companion.ts", + "src/cli/companion.test.ts", + "README.md", + "docs/api/PROTOCOL.md", + "docs/architecture/AGENT_DIAGRAM.md", + "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md", + "docs/plans/state.json" + ], + "test_status": "pnpm test:run src/companion/runtimeClient.test.ts src/companion/platformClients.test.ts src/companion/platformClients.integration.test.ts src/cli/companion.test.ts + pnpm typecheck passing" + }, "subagents-support-phase1": { "status": "completed", "date": "2026-02-26", @@ -6852,7 +6874,7 @@ } }, "overall_progress": { - "total_test_count": 2544, + "total_test_count": 2553, "all_tests_passing": true, "p0_completion": "3/3 (100%)", "p1_completion": "4/4 (100%)", @@ -6867,7 +6889,7 @@ "tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor", "tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings", "tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes", - "feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-26 (phase 3) — channel breadth, setup wizard, baseline browser automation, subagent controls, and browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets) are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, and first-success onboarding funnel optimization.", + "feature_gap_scorecard": "rebaselined 2026-02-26 and updated 2026-02-26 (phase 3 + phase 1 reliability slice) — channel breadth, setup wizard, baseline browser automation, subagent controls, browser workflow reliability primitives (wait/assert/extract/retries/checkpoints/guardrails/budgets), and companion reconnect/runtime-handoff foundations are implemented; remaining high-impact personal-assistant gaps center on shipped desktop/mobile companion apps, voice UX polish, and first-success onboarding funnel optimization.", "operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done", "dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling", "gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram", @@ -6900,7 +6922,7 @@ "deeper_surfaces_phase3_companion_canvas_voice": "completed \u2014 companion reconnect resilience (auto-reconnect with backoff, pending-wait cancellation on disconnect), canvas artifact persistence (SQLite-backed store, daemon-restart durability), voice TTS fallback coverage (text-only reply on TTS failure, no dropped responses)", "deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized", "post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)", - "personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability layer shipped (wait/assert/extract/checkpoints + guardrails/retries/budgets). Remaining phases: companion MVP surfaces, voice reliability hardening, and onboarding 2.0 first-success funnel.", + "personal_assistant_productization_plan": "in_progress \u2014 8-10 week phased roadmap active; Phase 3 browser workflow reliability shipped, and Phase 1 companion runtime reliability now includes reconnect state replay plus typed handoff support with integration coverage. Remaining phases: companion app packaging/surfaces, voice reliability hardening, and onboarding 2.0 first-success funnel.", "subagents_support": "completed \u2014 subagent phases 1-3 shipped with `subagent.spawn/send/list/cancel/delete/summary`, per-child queue mode (`followup|interrupt`), budgets (`max_turns`, `max_total_tokens`, `turn_timeout_ms`), tool-profile overrides, trace-linked audit events, `/subagents` inspection commands, and focused regression tests." }, "soul_md_and_cron_create": {