feat(companion): add reconnect resilience

This commit is contained in:
William Valentin
2026-02-25 11:12:21 -08:00
parent 7b170cff4d
commit ac60fa5be3
9 changed files with 297 additions and 27 deletions
+1
View File
@@ -143,6 +143,7 @@ Gateway streaming UX signals:
- WebSocket `agent.send` emits `run_state` lifecycle events (`start`, `cancel_requested`, `cancelled`, `complete`, `error`) for UI/state rendering.
- Routing applies reaction rules with deterministic priority/cooldown (and recursion guard) before intent routing.
- Companion nodes re-register `node.*` capabilities after reconnect; runtime clients can auto-reconnect and surface connection events.
Key files:
@@ -16,6 +16,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
- Backend routing outcomes are auditable via `backend.route` / `backend.success` / `backend.fallback`, which enables offline canary evaluation without changing gateway protocol methods.
- Run lifecycle/cancel intent and reaction decisions are emitted to audit logs, and aggregated into `system.metrics` counters (runStates, cancelLatencyMs, reactions) for dashboards.
- Reaction matching is deterministic (priority + cooldown + recursion guard) before intent/agent routing.
- Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods.
## Component Map