9.9 KiB
9.9 KiB
Gateway Sessions and Queueing (Agent Execution Model)
This document explains how the gateway maps WebSocket clients onto durable sessions, and how work is serialised per session so agent execution stays coherent under concurrent requests.
If you only want the protocol surface, see docs/api/PROTOCOL.md.
Key Ideas
- A WebSocket client gets a
connectionId. - Each connection is attached to a
sessionId. - Agent work is queued per
sessionId(FIFO), not per connection. - Sessions persist in SQLite via
SessionManagereven if clients disconnect. - On the first message of a session, the orchestrator can inject session-start memory (
user/profile+user/working) whenmemory.user_namespaceis configured. - Once dequeued, message routing may execute the native orchestrator path or an optional external backend path (
claude_code,opencode,codex,gemini,pi_embedded) depending on agent/backend config. - Runtime backend mode can be overridden manually via
/runtimecommand fast-path (status,activate pi,deactivate pi,use config) and is persisted in preferences (/backendremains a compatibility alias). flynn tuinow attaches to this same gateway command path for/runtime ...and auto-starts/attaches daemon+gateway when needed.- Backend routing outcomes are auditable via
backend.route/backend.success/backend.fallback, which enables offline canary evaluation without changing gateway protocol methods. - Run lifecycle/cancel intent and reaction decisions are emitted to audit logs, and aggregated into
system.metricscounters (runStates, cancelLatencyMs, reactions) for dashboards. - Reaction matching is deterministic (priority + cooldown + recursion guard) before intent/agent routing.
subagent.*tools create child orchestrators scoped to the parent conversation (subagent:<parentSessionId>:<childId>) with idle TTL cleanup, per-child queue mode (followup|interrupt), and session budgets (turn/token/timeout); this is tool-loop behavior, not a separate gateway RPC session lane.- Browser workflow reliability primitives (
browser.wait_for/assert/extract/checkpoint.*) execute in the same queued session lane and apply browser-config guardrails (domain allowlist/high-risk confirmation, bounded retries, workflow step budget). - Companion
node.*registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods (or use runtime-client reconnect state replay to re-register/status/location/push automatically). - Companion packaging/bootstrap can be generated offline via
flynn companion --export-bootstrap <path|->, which emits resolved gateway/node/runtime settings without opening a WebSocket session. - Companion release artifacts can be generated via
flynn companion --export-release-bundle <dir>, producing bootstrap JSON + launcher + README +CHECKSUMS.sha256+RELEASE_MANIFEST.jsonfor installable shell distribution workflows. - Generated launchers validate
CHECKSUMS.sha256before invokingflynn companion, reducing accidental tampered-bundle launches. - Companion release-bundle exports can optionally be signed (
--signing-key,--signing-key-id) to emitCHECKSUMS.sha256.sigfor distribution trust verification. - Companion release bundles can be verified before install via
flynn companion --verify-release-bundle <dir>with optional signature-key checks. - Companion packaging automation is available via
pnpm companion:bundle -- --output <dir> ..., which builds and verifies the release bundle in one pass. - Companion platform starter scaffolds can be generated via
flynn companion --export-shell-template <dir>for macOS/iOS/Android reference app bootstrapping, including iOS/Android runtime skeletons that issuenode.register,node.status.set,node.location.set,node.push_token.set, andagent.send. - Companion reference app directories can be regenerated via
pnpm companion:reference-apps -- --output apps/companionfor repo-shipped starter surfaces, including a runnablemacos-appSwift Package menu-bar scaffold. - Companion reference-app sync can be enforced with
pnpm companion:reference-apps:check(regenerate + diff fail on drift). - CI workflow
.github/workflows/companion-release-bundle.ymlmirrors this pipeline for manual artifact generation/upload. - CI workflow
.github/workflows/companion-reference-apps-check.ymlenforces reference-app generator sync on pull requests. - Audit phase-0 live telemetry snapshots can be regenerated with
pnpm audit:phase0-baseline:live(channel-origin anonymized sample JSONL + summary JSON/markdown artifacts). - Backend-scoped channel snapshots can be regenerated with
pnpm audit:phase0-baseline:live:pi/pnpm audit:phase0-baseline:live:native(--backendfiltering viabackend.routetimelines). - Gateway-origin phase-0 windows (including cancel-path samples) can be captured with
pnpm audit:phase0-baseline:live:gateway(auto-detect latest cancel window) orscripts/capture-phase0-live-baseline.ts --source gateway --since ... --until ...for explicit bounds. pnpm audit:phase0-baseline:live:refreshruns channel + gateway + backend-scoped (pi_embeddedandnative) capture paths in one command.pnpm audit:phase0-baseline:live:driftchecks backend-scoped artifact freshness/drift gates and writesphase0_baseline_live_backend_drift_<UTC-date>.md/.json;pnpm audit:phase0-baseline:live:refresh:driftchains refresh + drift checks for scheduled cadence runs.pnpm audit:phase0-baseline:live:refresh:drift:rollingperforms the same chain using one UTC timestamp tag (YYYY-MM-DD-HHMMSS) across channel/gateway/backend/drift outputs so each cadence run preserves a distinct comparison point.pnpm audit:phase0-baseline:live:prune(dry-run) andpnpm audit:phase0-baseline:live:prune:apply(delete) manage retention of rolling-tag artifacts to control artifact growth while preserving newest snapshots per family.pnpm audit:phase0-baseline:live:refresh:drift:rolling:prunecombines rolling refresh+drift with retention apply for one-command cron scheduling; adjust retention depth withKEEP_PER_FAMILY.audit:phase0-baseline:live*package scripts now omit fixed tags so scheduled runs automatically roll to current UTC-date artifact tags.- Companion CLI supports one-shot shell bootstrap metadata for live sessions (
--app-version/--status-text,--latitude/--longitude,--push-token) so desktop/mobile wrappers can initialize node status/location/push in a single launch flow. - Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts.
- TTS output is best-effort with ordered provider fallback + per-provider cooldown tracking; synthesis failures still fall back to text-only responses.
- Talk mode voice sessions share the same cancel/replace semantics as text lanes (
/stop, interrupt mode preemption), including spokenstop/cancelmapping while talk mode is active. - Setup/onboarding UX now adds post-save live readiness checks (model/channel/memory/automation) and a guided first-success task flow, improving zero-to-first-automation path reliability before sustained gateway use.
Component Map
flowchart LR
subgraph CFG[Config + Runtime Policy]
QP[server.queue policy\nmode/cap/overflow/overrides]
BM[backend runtime mode\nconfig_default|force_native|force_pi_embedded]
end
subgraph GW[Gateway Process]
TUI[TUI client\n(runtime command forwarding)]
WS[WebSocket connection\n(connectionId)]
GS[GatewayServer]
LQ[LaneQueue\nper-session FIFO]
SB[SessionBridge\nconnectionId -> sessionId -> AgentOrchestrator]
AQ[AuditLogger\nqueue.preempt events]
MC[MetricsCollector\nrun states + cancel latency + reactions]
UI[Web UI Dashboard]
end
subgraph CORE[Flynn Core]
SM[SessionManager\nin-memory cache + SQLite]
SS[SessionStore\nSQLite tables]
AO[AgentOrchestrator / External Backends]
end
TUI --> WS
WS --> GS
QP --> GS
BM --> GS
GS --> LQ
GS --> SB
LQ --> AQ
GS --> MC
MC --> UI
SB --> AO
SB --> SM
SM --> SS
Session IDs (What Actually Gets Stored)
The durable session ID stored by SessionManager is:
<frontend>:<userId>
For the gateway:
SessionBridge.connect()assigns aconnectionId(UUID).- It defaults the connection's
sessionIdtows:<connectionId>. - It then calls
SessionManager.getSession('ws', sessionId).
That means gateway sessions are stored as:
ws:ws:<connectionId>
This is expected: the gateway adds its own namespace, and the session manager namespaces again by frontend.
Key files:
src/gateway/session-bridge.tssrc/session/manager.ts
Per-Session FIFO Queueing (LaneQueue)
agent.send uses a lane ID derived from the session:
- lane =
SessionBridge.getSessionId(connectionId)(preferred) - fallback lane =
connectionId(only if session lookup fails)
Within a lane:
- Only one request executes at a time.
- Later requests queue (FIFO) and start after the active request finishes.
interruptmode enforces "latest wins": any queued backlog is superseded and the active run is asked to cancel so the newest request becomes the next (or immediate) execution.
Across lanes:
- Independent sessions run in parallel.
Key files:
src/gateway/lane-queue.tssrc/gateway/handlers/agent.ts
Cancellation Semantics
agent.cancel performs two separate actions:
- Cancels any queued (not-yet-started) work in the lane (
LaneQueue.cancel(laneId)). - Requests cancellation of the active agent operation (
AgentOrchestrator.cancel()viaSessionBridge.cancel()).
Important:
- Cancellation is best-effort for the currently running work: it stops at the next safe point in the agent loop.
- Queued work is deterministically rejected.
- Gateway streams
run_stateevents (start,cancel_requested,cancelled,complete,error) so clients can render lifecycle state without parsing assistant text.
Key files:
src/gateway/handlers/agent.tssrc/backends/native/orchestrator.ts