docs: map existing codebase

2026-02-09 19:31:05 -08:00
parent 85b1401440
commit d2d64f3361
7 changed files with 2042 additions and 0 deletions
@@ -0,0 +1,298 @@
+# Architecture
+
+**Analysis Date:** 2025-02-09
+
+## Pattern Overview
+
+**Overall:** Multi-channel AI agent daemon with layered pipeline architecture
+
+**Key Characteristics:**
+- Message pipeline: Channel Adapter → ChannelRegistry → MessageRouter → AgentOrchestrator → NativeAgent → ModelClient
+- Registry/factory pattern for extensible channels, tools, models, and agent configs
+- Dependency injection via constructor config objects (no DI container)
+- YAML + Zod config-driven feature toggling — most subsystems are optional and activated by config
+- Lifecycle-managed daemon with ordered shutdown handlers (LIFO)
+- SQLite for session persistence, filesystem for memory persistence, SQLite for vector embeddings
+
+## Layers
+
+**CLI Layer:**
+- Purpose: Parse commands, load config, bootstrap daemon or TUI
+- Location: `src/cli/`
+- Contains: Command registrations (commander.js), config loading, TUI entry
+- Depends on: Config, Daemon
+- Used by: End user via `flynn` binary
+- Entry: `src/cli/index.ts` → registers commands: `start`, `tui`, `send`, `sessions`, `doctor`, `config`, `completion`
+
+**Config Layer:**
+- Purpose: Load YAML config, validate with Zod, expand `${ENV_VAR}` references
+- Location: `src/config/`
+- Contains: Zod schema definitions (`schema.ts`), YAML loader with env expansion (`loader.ts`)
+- Depends on: `zod`, `yaml`
+- Used by: All layers — the `Config` type flows through the entire system
+- Key file: `src/config/schema.ts` — single source of truth for all configuration types (409 lines)
+
+**Channel Layer:**
+- Purpose: Uniform messaging abstraction across platforms
+- Location: `src/channels/`
+- Contains: `ChannelAdapter` interface, `ChannelRegistry`, platform adapters
+- Depends on: Platform SDKs (grammy, discord.js, @slack/bolt, whatsapp-web.js)
+- Used by: Daemon (message routing), Automation (cron/webhook output)
+- Interface: `ChannelAdapter` with `connect()`, `disconnect()`, `send()`, `onMessage()`
+- Adapters: `src/channels/telegram/adapter.ts`, `src/channels/discord/adapter.ts`, `src/channels/slack/adapter.ts`, `src/channels/whatsapp/adapter.ts`, `src/channels/webchat/adapter.ts`
+
+**Agent Layer (Backend):**
+- Purpose: Core AI agent loop — process messages, execute tools, manage conversation
+- Location: `src/backends/native/`
+- Contains: `NativeAgent` (tool loop), `AgentOrchestrator` (delegation, compaction, memory)
+- Depends on: Models, Tools, Session, Context, Memory
+- Used by: Daemon (via message router), Gateway (via SessionBridge)
+- Key abstraction: `NativeAgent` runs the tool loop; `AgentOrchestrator` wraps it with orchestration features
+
+**Model Layer:**
+- Purpose: Unified interface to LLM providers with tier-based routing and fallback
+- Location: `src/models/`
+- Contains: `ModelClient` interface, provider implementations, `ModelRouter`, retry logic, cost estimation
+- Depends on: Provider SDKs (@anthropic-ai/sdk, openai, @google/generative-ai, ollama, @aws-sdk/client-bedrock-runtime)
+- Used by: Agent layer, Gateway SessionBridge
+- Providers: `anthropic.ts`, `openai.ts`, `gemini.ts`, `bedrock.ts`, `github.ts`, `local/ollama.ts`, `local/llamacpp.ts`
+
+**Tool Layer:**
+- Purpose: Tool registry, execution with policy enforcement and hook checks
+- Location: `src/tools/`
+- Contains: `ToolRegistry`, `ToolExecutor`, `ToolPolicy`, builtin tool implementations
+- Depends on: Hooks (confirmation), Memory, Process/Browser managers
+- Used by: Agent layer (tool loop), Gateway (tool execution)
+- Three tool patterns:
+  - Static: `export const fooTool: Tool` (e.g., `src/tools/builtin/shell.ts`)
+  - Factory: `export function createFooTool(dep): Tool` (e.g., `src/tools/builtin/media-send.ts`)
+  - Multi-factory: `export function createFooTools(dep): Tool[]` (e.g., `src/tools/builtin/process/index.ts`)
+
+**Session Layer:**
+- Purpose: Persistent conversation history per channel+sender pair
+- Location: `src/session/`
+- Contains: `SessionStore` (SQLite), `SessionManager` (in-memory cache + store), `ManagedSession`
+- Depends on: `better-sqlite3`
+- Used by: Agent layer, Gateway SessionBridge
+
+**Memory Layer:**
+- Purpose: Persistent knowledge across sessions — namespace-based files + hybrid vector search
+- Location: `src/memory/`
+- Contains: `MemoryStore` (file-based), `VectorStore` (SQLite-backed embeddings), `HybridSearch`, embedding providers, text chunker
+- Depends on: Embedding providers (OpenAI, Gemini, Ollama, llama.cpp, Voyage)
+- Used by: Agent layer (memory injection into system prompt), Tools (memory.read/write/search)
+
+**Context Layer:**
+- Purpose: Token estimation and conversation compaction
+- Location: `src/context/`
+- Contains: Token estimator (`tokens.ts`), compaction logic (`compaction.ts`)
+- Depends on: Agent layer (delegation for compaction), Memory (extraction)
+- Used by: AgentOrchestrator (automatic compaction before each message)
+
+**Gateway Layer:**
+- Purpose: WebSocket JSON-RPC server + HTTP static file server + vanilla JS dashboard
+- Location: `src/gateway/`
+- Contains: `GatewayServer`, `Router`, `SessionBridge`, `LaneQueue`, auth, protocol, static serving, Tailscale Serve
+- Depends on: `ws`, Session, Agent, Tools
+- Used by: WebChat adapter, TUI (connects as WS client), external dashboard clients
+- Protocol: JSON-RPC 2.0 over WebSocket
+
+**Hooks Layer:**
+- Purpose: Pattern-based tool confirmation engine
+- Location: `src/hooks/`
+- Contains: `HookEngine` with glob-pattern matching for confirm/log/silent actions
+- Depends on: Nothing (pure logic)
+- Used by: ToolExecutor (checks before execution)
+
+**Prompt Layer:**
+- Purpose: Assemble system prompt from template files (SOUL.md, AGENTS.md, etc.)
+- Location: `src/prompt/`
+- Contains: Template search and assembly logic
+- Depends on: Filesystem (searches multiple directories)
+- Used by: Daemon (system prompt construction at startup)
+
+**MCP Layer:**
+- Purpose: Model Context Protocol server management — bridge external MCP tools into the tool registry
+- Location: `src/mcp/`
+- Contains: `McpClient`, `McpManager`, tool bridging (`bridge.ts`)
+- Depends on: `@modelcontextprotocol/sdk`
+- Used by: Daemon (starts MCP servers, registers bridged tools)
+
+**Skills Layer:**
+- Purpose: Pluggable skill system — bundled, managed (installed), and workspace skills
+- Location: `src/skills/`
+- Contains: `SkillRegistry`, `SkillInstaller`, skill loader
+- Depends on: Filesystem
+- Used by: Daemon (loads skills, injects into system prompt)
+
+**Agents Config Layer:**
+- Purpose: Named agent configurations with per-agent overrides (model tier, tools, sandbox)
+- Location: `src/agents/`
+- Contains: `AgentConfigRegistry`, `AgentRouter` (channel+sender → agent config resolution)
+- Depends on: Config
+- Used by: Daemon message router (selects agent config per session)
+
+**Automation Layer:**
+- Purpose: Scheduled tasks, webhooks, heartbeat monitoring, Gmail watching
+- Location: `src/automation/`
+- Contains: `CronScheduler`, `WebhookHandler`, `HeartbeatMonitor`, `GmailWatcher`
+- Depends on: `croner`, `googleapis`, Channels
+- Used by: Daemon (registers as channel adapters or standalone monitors)
+
+**Sandbox Layer:**
+- Purpose: Docker container isolation for tool execution
+- Location: `src/sandbox/`
+- Contains: `DockerSandbox`, `SandboxManager`, sandboxed shell/process tools
+- Depends on: Docker CLI
+- Used by: Daemon message router (replaces shell/process tools with sandboxed variants)
+
+**Frontend Layer (Legacy):**
+- Purpose: Direct Telegram bot integration and TUI
+- Location: `src/frontends/`
+- Contains: Telegram bot handlers with confirmation UI, TUI (minimal readline + fullscreen React/Ink)
+- Depends on: grammy (Telegram), ink/react (TUI)
+- Used by: CLI commands (`start` uses Telegram frontend, `tui` uses TUI)
+
+## Data Flow
+
+**Inbound Message Processing (Channel → Response):**
+
+1. Platform SDK receives message → Channel adapter normalizes to `InboundMessage`
+2. Adapter calls `onMessage()` callback → `ChannelRegistry.handleInbound()` routes to `MessageHandler`
+3. `createMessageRouter()` resolves agent config via `AgentRouter.resolve(channel, senderId)`
+4. `getOrCreateAgent()` creates/retrieves `AgentOrchestrator` for the session (cached by `channel:sender:agentConfig`)
+5. Audio attachments transcribed if present
+6. `orchestrator.process()` → injects memory context → checks compaction → delegates to `NativeAgent.process()`
+7. `NativeAgent.toolLoop()` → sends to `ModelRouter.chat()` → model returns response or tool calls
+8. If tool calls: `ToolExecutor.execute()` → policy check → hook check → tool execution → loop back to model
+9. Final text response returned → reply function sends via adapter → `adapter.send()` → platform SDK
+
+**Gateway WebSocket Flow:**
+
+1. Client connects via WebSocket → auth check → `SessionBridge.connect()` → `NativeAgent` created
+2. Client sends JSON-RPC message → `GatewayServer.handleMessage()` → `Router.dispatch()` → handler
+3. `agent.send` handler → `LaneQueue` serializes requests → `SessionBridge` processes via `NativeAgent`
+4. Streaming events sent back via WebSocket as JSON-RPC notifications
+5. HTTP requests serve static dashboard UI or webhook endpoints
+
+**Model Routing with Fallback:**
+
+1. `ModelRouter.chat(request, tier)` → tries primary client for requested tier
+2. If retry config enabled: `withRetry()` wraps call with exponential backoff
+3. On failure → try tier-specific fallbacks (e.g., Anthropic → GitHub Models same model)
+4. On failure → try global fallback chain (typically local model)
+5. All failures → throw aggregated error
+
+**Compaction Flow:**
+
+1. Before each `process()`, `AgentOrchestrator.compactIfNeeded()` checks token count vs threshold
+2. If threshold exceeded → `compactHistory()` splits messages into compactable + recent (keep N turns)
+3. Delegates summarization to `fast` tier via `orchestrator.delegate()`
+4. Optionally extracts memory facts via separate delegation call
+5. Replaces session history with `[summary_message, ...recent_messages]`
+
+**State Management:**
+- Session history: SQLite (`~/.local/share/flynn/sessions.db`) + in-memory cache in `SessionManager`
+- Memory: Namespace-based markdown files in `~/.local/share/flynn/memory/`
+- Vectors: SQLite (`~/.local/share/flynn/vectors.db`) for embeddings
+- Config: YAML file at `~/.config/flynn/config.yaml` (read once at startup)
+
+## Key Abstractions
+
+**ModelClient:**
+- Purpose: Uniform interface to any LLM provider
+- Interface: `chat(request: ChatRequest): Promise<ChatResponse>` + optional `chatStream()`
+- Implementations: `src/models/anthropic.ts`, `src/models/openai.ts`, `src/models/gemini.ts`, `src/models/bedrock.ts`, `src/models/github.ts`, `src/models/local/ollama.ts`, `src/models/local/llamacpp.ts`
+- Pattern: Each provider wraps its SDK and normalizes to `ChatResponse`
+
+**ModelRouter:**
+- Purpose: Tier-based model selection with cascading fallback
+- Location: `src/models/router.ts`
+- Tiers: `fast`, `default`, `complex`, `local` — each maps to a `ModelClient`
+- Implements `ModelClient` interface itself, so consumers don't need to know about tiers
+
+**ChannelAdapter:**
+- Purpose: Normalize platform-specific messaging into a common interface
+- Interface: `connect()`, `disconnect()`, `send(peerId, msg)`, `onMessage(handler)`
+- Location: `src/channels/types.ts`
+- Pattern: Each adapter wraps a platform SDK, handles auth/filtering, emits `InboundMessage`
+
+**Tool:**
+- Purpose: Executable capability exposed to the AI model
+- Interface: `{ name, description, inputSchema, execute(args): Promise<ToolResult> }`
+- Location: `src/tools/types.ts`
+- Registration: tool file → `src/tools/builtin/index.ts` → `src/tools/index.ts` → `src/daemon/index.ts`
+
+**Session:**
+- Purpose: Conversation state (message history) for a channel+sender pair
+- Interface: `addMessage()`, `getHistory()`, `clear()`, `replaceHistory()`
+- Location: `src/session/manager.ts`
+- ID format: `channel:senderId` (e.g., `telegram:123456`)
+
+**AgentOrchestrator:**
+- Purpose: Wraps NativeAgent with delegation, compaction, memory, usage tracking
+- Location: `src/backends/native/orchestrator.ts`
+- Key method: `delegate(SubAgentRequest)` — stateless single-turn call to any tier
+- Delegation tasks: compaction, memory extraction, classification, tool summarisation, complex reasoning
+
+**DaemonContext:**
+- Purpose: Holds all initialized subsystems returned by `startDaemon()`
+- Location: `src/daemon/index.ts`
+- Contains: config, lifecycle, session/model/tool/channel/gateway/mcp/skill/agent registries
+
+## Entry Points
+
+**CLI Binary (`flynn`):**
+- Location: `src/cli/index.ts`
+- Triggers: `flynn start`, `flynn tui`, `flynn send`, `flynn sessions`, `flynn doctor`, `flynn config`
+- Responsibilities: Parse args, load config, bootstrap subsystems
+
+**Daemon Start:**
+- Location: `src/daemon/index.ts` → `startDaemon(config)`
+- Triggers: `flynn start` CLI command
+- Responsibilities: Initialize all subsystems in order, wire dependencies, start channel adapters and gateway
+
+**Gateway Server:**
+- Location: `src/gateway/server.ts`
+- Triggers: HTTP/WS connections on configured port (default 18800)
+- Responsibilities: JSON-RPC routing, WebSocket session management, static UI serving, webhook HTTP endpoints
+
+**TUI:**
+- Location: `src/frontends/tui/minimal.ts` (readline) and `src/frontends/tui/fullscreen.ts` (React/Ink)
+- Triggers: `flynn tui` or `flynn tui --fullscreen`
+- Responsibilities: Local interactive chat interface connecting to gateway via WebSocket
+
+## Error Handling
+
+**Strategy:** Catch-and-convert with descriptive context. No global error handler.
+
+**Patterns:**
+- Model layer: Retry with exponential backoff → tier fallback → global fallback → throw aggregated error
+- Tool execution: `Promise.race` timeout → catch → return `ToolResult { success: false, error: message }`
+- Channel adapters: `Promise.allSettled` for start/stop — log per-adapter errors, don't crash
+- Daemon: Lifecycle LIFO shutdown handlers — each wrapped in try/catch
+- Config: Zod validation throws with structured error messages on invalid config
+- Gateway: JSON-RPC error codes (`ParseError`, `MethodNotFound`, `InternalError`)
+
+## Cross-Cutting Concerns
+
+**Logging:** `console.log`/`console.error`/`console.warn`/`console.debug` throughout. No structured logging framework. Debug-level messages for model fallback decisions.
+
+**Validation:** Zod for config validation (`src/config/schema.ts`). Tool args validated by model-provided schema. No runtime validation on tool args beyond what the tool itself checks.
+
+**Authentication:** Multi-layer:
+- Gateway: Bearer token auth + optional Tailscale identity header (`src/gateway/auth.ts`)
+- Telegram: `allowed_chat_ids` whitelist
+- Discord: `allowed_guild_ids` + `allowed_channel_ids` whitelists
+- Slack: `allowed_channel_ids` whitelist + signing secret
+- WhatsApp: `allowed_numbers` + `allowed_group_ids` whitelists
+- Webhooks: HMAC signature verification (per-webhook secret)
+- Pairing: DM pairing codes for unknown senders (`src/channels/pairing.ts`)
+
+**Tool Policy:** Profile-based filtering (minimal/messaging/coding/full) + glob-pattern allow/deny lists + per-agent/per-provider overrides (`src/tools/policy.ts`).
+
+**Configuration:** Single YAML file with `${ENV_VAR}` expansion, validated by comprehensive Zod schema. Every subsystem is feature-toggled via config. Default config path: `~/.config/flynn/config.yaml`.
+
+---
+
+*Architecture analysis: 2025-02-09*
@@ -0,0 +1,213 @@
+# Codebase Concerns
+
+**Analysis Date:** 2025-02-09
+
+## Tech Debt
+
+**God File: `src/daemon/index.ts` (1087 lines):**
+- Issue: Single file handles all service wiring — model client creation, channel setup, agent factory, memory initialization, vector indexer, session pruning, lifecycle management, and graceful shutdown. This is the most complex file in the codebase.
+- Files: `src/daemon/index.ts`
+- Impact: Any change to wiring logic requires editing this file. High risk of merge conflicts. Difficult to test individual setup phases in isolation.
+- Fix approach: Extract service factories into separate modules (e.g., `src/daemon/models.ts`, `src/daemon/channels.ts`, `src/daemon/memory.ts`). Keep `index.ts` as a thin composition root that calls into extracted modules.
+
+**No ESLint Configuration:**
+- Issue: No `eslint.config.js`, `.eslintrc.*`, or any ESLint configuration file exists. The `pnpm lint` command exists in package.json but has no config to enforce rules.
+- Files: Project root (missing `eslint.config.js`)
+- Impact: No automated enforcement of code style, unused imports, or common error patterns. Relying solely on TypeScript compiler checks and developer discipline.
+- Fix approach: Add `eslint.config.js` with flat config format. Start with `@typescript-eslint/recommended` and add project-specific rules incrementally.
+
+**Pervasive `as any` Casting (~100+ instances):**
+- Issue: Heavy use of `as any` and `as unknown as` type casts, concentrated in two areas:
+  1. **Test files** (~70%): Mocking objects with `{} as any` or `mockObj as unknown as RealType`. This is common in test code but makes refactoring fragile — tests won't fail when interfaces change.
+  2. **Model clients** (~15%): SDK types don't cover all features (e.g., Ollama thinking, WhatsApp `getChat()`, Discord `sendTyping()`).
+  3. **Production code** (~15%): Type system workarounds like `loopMessages as unknown as Message[]` in `src/backends/native/agent.ts:136`.
+- Files: `src/models/local/ollama.ts` (lines 36, 92, 159, 173), `src/channels/whatsapp/adapter.ts` (lines 105, 224, 263, 280), `src/channels/discord/adapter.ts` (lines 208, 219), `src/backends/native/agent.ts` (line 136), `src/models/anthropic.ts` (line 83), `src/models/bedrock.ts` (line 180)
+- Impact: Type safety gaps. Refactoring won't catch breakage in cast sites. SDK upgrades may silently break at runtime.
+- Fix approach: For test files, create proper mock factories with typed interfaces. For model clients, use module augmentation or wrapper types. For agent.ts, define a proper `LoopMessage` → `Message` adapter.
+
+**Agent Cache Grows Unboundedly:**
+- Issue: The `agents` Map in `src/daemon/index.ts:342` caches `AgentOrchestrator` instances keyed by `channel:senderId:agentConfig`. Entries are never evicted. Each entry holds a full agent with session history, token counts, and tool state.
+- Files: `src/daemon/index.ts:342-453`
+- Impact: Memory grows linearly with unique senders. In a multi-channel deployment with many users, this becomes a memory leak. Long-running daemon will accumulate stale agent instances.
+- Fix approach: Add LRU eviction or TTL-based pruning (similar to session store's existing TTL pattern). Consider flushing agent state to SQLite and lazy-loading.
+
+**Tool Executor Timer Leak:**
+- Issue: `Promise.race` timeout in tool executor creates a `setTimeout` that is never cleared when the tool completes before timeout. The rejected promise's setTimeout continues to fire after resolution.
+- Files: `src/tools/executor.ts:61-65`
+- Impact: Minor: each tool execution leaks one timer until it fires (default timeout). Under heavy tool use, this creates unnecessary timer churn. The timeout promise's rejection is unhandled (but swallowed by Promise.race semantics).
+- Fix approach: Use `AbortController` + `AbortSignal.timeout()` pattern, or explicitly `clearTimeout` in a `.finally()` block. See `src/tools/builtin/web-fetch.ts:200` for the correct pattern already used elsewhere in the codebase.
+
+**Agent Cancellation Not Implemented:**
+- Issue: `TODO: Wire AbortController into NativeAgent for actual cancellation` — the gateway handler sets a flag but the agent loop has no mechanism to check it or abort mid-iteration.
+- Files: `src/gateway/handlers/agent.ts:89`, `src/backends/native/agent.ts`
+- Impact: Users cannot cancel long-running agent tasks. The "cancel" API endpoint exists but is a no-op beyond setting a flag. This affects gateway/TUI UX.
+- Fix approach: Pass an `AbortSignal` into `NativeAgent.run()`. Check `signal.aborted` between loop iterations and before each tool execution. Propagate signal to model client `chat()` calls.
+
+**PairingManager State is Ephemeral:**
+- Issue: `PairingManager` stores approved senders and pending codes in memory-only Maps (`src/channels/pairing.ts:36-37`). All pairing state is lost on daemon restart.
+- Files: `src/channels/pairing.ts:36-37`
+- Impact: After restart, all previously paired senders must re-pair. In production, this means users lose access on every deployment or crash.
+- Fix approach: Persist approved senders to SQLite (alongside sessions). Pending codes can remain in-memory since they're short-lived by design.
+
+**Hardcoded Anthropic → GitHub Model Mapping:**
+- Issue: `anthropicToGitHubModel()` contains a hardcoded mapping table that must be manually updated for each new Anthropic model release. A generic fallback regex exists but only handles the date-suffix stripping pattern.
+- Files: `src/daemon/index.ts:155-175`
+- Impact: New Anthropic model releases require a code change to work with GitHub Models provider. The fallback regex may produce incorrect names for models that don't follow the `name-N-N-YYYYMMDD` convention.
+- Fix approach: Move mapping to config file or allow user overrides in YAML config. The generic fallback is a good start but should be validated against known GitHub model names.
+
+**Duplicated Content Conversion Logic Across Model Clients:**
+- Issue: Each model client independently implements `Message` → provider-specific format conversion, tool call parsing, and response normalization. The same JSON.parse pattern for tool arguments appears in 3+ clients.
+- Files: `src/models/github.ts`, `src/models/openai.ts`, `src/models/local/llamacpp.ts`, `src/models/gemini.ts`, `src/models/anthropic.ts`, `src/models/bedrock.ts`
+- Impact: Bug fixes or format changes must be applied across all clients. Easy to miss one. Different error handling behavior per client.
+- Fix approach: Extract shared conversion utilities (e.g., `parseToolArguments()`, `convertMessages()`) into `src/models/shared.ts` or `src/models/utils.ts`.
+
+## Known Bugs
+
+**Tool Executor Timeout Creates Orphaned Promise:**
+- Symptoms: When a tool completes before the timeout, the `setTimeout` callback fires after the race resolves and creates an unhandled rejection (the `reject` from the timeout promise).
+- Files: `src/tools/executor.ts:63-65`
+- Trigger: Any tool that completes normally while a timeout is pending.
+- Workaround: Node.js swallows the unhandled rejection from the losing Promise.race branch in practice, but it's technically incorrect.
+
+## Security Considerations
+
+**File Tools Have No Path Restrictions (CRITICAL):**
+- Risk: `file.read`, `file.write`, `file.edit`, `file.list` tools can access ANY path on the filesystem. An AI agent (or a prompt injection via user input) could read sensitive files (`/etc/shadow`, `~/.ssh/id_rsa`, `.env`) or write to critical system paths.
+- Files: `src/tools/builtin/file-read.ts`, `src/tools/builtin/file-write.ts`, `src/tools/builtin/file-edit.ts`, `src/tools/builtin/file-list.ts`
+- Current mitigation: Tool policy system (`src/tools/policy.ts`) can deny specific tools entirely, but there is NO path-level restriction. Hooks can prompt for confirmation on specific tools.
+- Recommendations: Add configurable path allowlist/denylist to file tools. At minimum, deny access to common sensitive paths (`/etc/shadow`, `~/.ssh/`, any `.env` file). Consider a sandbox directory approach where file tools only operate within configured working directories.
+
+**`browser.evaluate` Uses `eval()` (HIGH):**
+- Risk: The `browser.evaluate` tool calls `eval(expr)` in the browser page context. While this executes in the Puppeteer browser sandbox (not the Node.js process), it allows arbitrary JavaScript execution on whatever page the browser has navigated to.
+- Files: `src/tools/builtin/browser/tools.ts:226`
+- Current mitigation: Tool policy can deny `browser.evaluate` entirely. The browser runs in a sandboxed Puppeteer context.
+- Recommendations: Consider restricting to a safer evaluation mechanism or adding URL allowlists for browser navigation.
+
+**Gateway Token Comparison Vulnerable to Timing Attacks (MEDIUM):**
+- Risk: Token authentication in `src/gateway/auth.ts` uses `===` for string comparison (lines 38, 43). This is theoretically vulnerable to timing attacks where an attacker can deduce the token character by character based on comparison timing.
+- Files: `src/gateway/auth.ts:38,43`
+- Current mitigation: The gateway is typically accessed over Tailscale (private network), reducing attack surface.
+- Recommendations: Use `crypto.timingSafeEqual()` for token comparison. The codebase already uses this correctly in `src/automation/webhooks.ts` — apply the same pattern here.
+
+**Unguarded JSON.parse on Tool Call Arguments (MEDIUM):**
+- Risk: Multiple model clients call `JSON.parse(tc.function.arguments)` without try-catch. If a model returns malformed JSON in tool arguments, this throws an unhandled exception that could crash the agent loop.
+- Files: `src/models/github.ts:151,268`, `src/models/openai.ts:96`, `src/models/local/llamacpp.ts:127,262`
+- Current mitigation: None. The agent loop's outer try-catch may catch this, but the error message won't be helpful.
+- Recommendations: Wrap all tool argument parsing in try-catch with a fallback to `{}` or a descriptive error. Extract a shared `safeParseToolArgs()` utility.
+
+**No Rate Limiting on Gateway (LOW-MEDIUM):**
+- Risk: The WebSocket and HTTP gateway endpoints have no rate limiting. A malicious or misconfigured client could flood the daemon with requests, causing resource exhaustion.
+- Files: `src/gateway/server.ts`
+- Current mitigation: Authentication is required (token or Tailscale identity). The lane queue serializes per-session, which limits concurrency per sender.
+- Recommendations: Add connection-level rate limiting (max connections per IP, max messages per minute per session). Consider using the `ws` server's built-in `maxPayload` option and adding message rate tracking.
+
+## Performance Bottlenecks
+
+**Memory Context Rebuilt on Every Message:**
+- Problem: `_injectMemoryContext()` is called at the start of every `process()` call in the orchestrator. It reads from the memory store (which reads files from disk via `getContextForPrompt()`) and rebuilds the system prompt string by concatenating base prompt + memory context.
+- Files: `src/backends/native/orchestrator.ts:218,347-360`, `src/memory/store.ts`
+- Cause: No caching or dirty-checking. Even if memory hasn't changed since last message, the full read + concatenation happens again.
+- Improvement path: Cache the enriched system prompt and only rebuild when memory store signals a change (e.g., via a dirty flag or version counter on the store).
+
+**Background Vector Indexer Runs Unconditionally Every 30 Seconds:**
+- Problem: The vector indexer `setInterval` fires every 30 seconds regardless of whether any namespaces are dirty. While `getDirtyNamespaces()` returns an empty array quickly, the interval itself and the function call overhead are unnecessary when idle.
+- Files: `src/daemon/index.ts:606-637`
+- Cause: Simple polling pattern with fixed interval.
+- Improvement path: Use event-driven approach — trigger indexing when `memoryStore.write()` is called. Or at minimum, skip the loop body immediately when dirty set is empty (which it already does, so this is low priority).
+
+**Web-Fetch Cache Has No Size Limit:**
+- Problem: The module-level `cache` Map in web-fetch grows without bound. Entries are only evicted by TTL (5 minutes) via lazy expiry on cache hits. If many unique URLs are fetched, the cache grows until TTL-based eviction catches up.
+- Files: `src/tools/builtin/web-fetch.ts:37`
+- Cause: No max-size constraint on the cache Map. Eviction only happens on read (lazy).
+- Improvement path: Add a max entry count (e.g., 100). Use LRU eviction when limit is reached. Or add periodic eviction via the existing `evictExpired()` function on a timer.
+
+## Fragile Areas
+
+**Model Client Content Format Handling:**
+- Files: `src/backends/native/agent.ts:127-136`, `src/models/anthropic.ts:83`, `src/models/gemini.ts:206`
+- Why fragile: The `LoopMessage` type carries structured content (tool_use blocks, tool_result blocks) that must be cast to `Message[]` via `as unknown as`. Each model client then re-interprets this structure differently. Adding a new content block type (e.g., images) requires changes in every client.
+- Safe modification: When adding new content types, test against ALL model providers, not just the primary one. The cast at agent.ts:136 hides type mismatches.
+- Test coverage: Agent loop is tested (`agent.test.ts`) but individual model client content conversion is only partially tested.
+
+**Session Store Stores Structured Content as TEXT:**
+- Files: `src/session/store.ts`
+- Why fragile: Session messages with structured content (tool_use, tool_result blocks) are serialized to TEXT columns in SQLite. Deserialization must exactly match the serialization format. Schema changes to message content structure require migration.
+- Safe modification: Always verify round-trip serialization when changing message content types. Add integration tests for session save/restore with tool_use content.
+- Test coverage: Session store has tests but they may not cover all content block variations.
+
+**Compaction Delegation Has No Circuit Breaker:**
+- Files: `src/context/compaction.ts:72-91`, `src/backends/native/orchestrator.ts:366-402`
+- Why fragile: Compaction delegates to a sub-agent (fast tier) for summarization and memory extraction. If the sub-agent call fails (model error, timeout, rate limit), the error is caught and logged but the original (uncompacted) history remains. Repeated failures will cause the context to grow until it exceeds the model's context window, at which point messages will fail entirely.
+- Safe modification: Add retry logic or a fallback compaction strategy (e.g., simple truncation) when delegation fails.
+- Test coverage: Compaction has unit tests but no integration test for delegation failure scenarios.
+
+## Scaling Limits
+
+**Agent Cache (Memory):**
+- Current capacity: One `AgentOrchestrator` + `OutboundAttachmentCollector` per unique `channel:senderId:agentConfig` combination.
+- Limit: Memory-bound. Each agent holds full conversation history (until compacted). With 100+ active users, memory usage becomes significant.
+- Scaling path: LRU eviction with configurable max entries. Persist agent state to SQLite for cold sessions.
+
+**SQLite Session Store:**
+- Current capacity: Single SQLite database for all sessions across all channels.
+- Limit: SQLite write throughput (~100 writes/sec). With high-volume multi-channel deployment, write contention may occur.
+- Scaling path: WAL mode is likely already enabled. For higher scale, consider per-channel databases or migration to a server database.
+
+## Dependencies at Risk
+
+**WhatsApp Web.js (`whatsapp-web.js`):**
+- Risk: Unofficial WhatsApp client library that reverse-engineers the WhatsApp Web protocol. Can break on any WhatsApp update without warning. Heavy `as any` casting in the adapter suggests the types are incomplete.
+- Impact: WhatsApp channel becomes unavailable until library is updated.
+- Migration plan: No official WhatsApp Node.js SDK alternative. Consider the WhatsApp Business API (Cloud API) for a more stable integration.
+
+## Missing Critical Features
+
+**No Structured Logging:**
+- Problem: All logging uses `console.log`, `console.error`, `console.warn` throughout the codebase. No structured logging framework (winston, pino, etc.).
+- Blocks: Log aggregation, filtering by severity, JSON log parsing by monitoring tools, correlation IDs for request tracing.
+
+**No Health Check Endpoint:**
+- Problem: The gateway server has no `/health` or `/ready` endpoint for container orchestration or monitoring.
+- Blocks: Kubernetes liveness/readiness probes, load balancer health checks, uptime monitoring.
+
+## Test Coverage Gaps
+
+**File Tools (ALL untested):**
+- What's not tested: `file.read`, `file.write`, `file.edit`, `file.list` — the tools that interact with the filesystem.
+- Files: `src/tools/builtin/file-read.ts`, `src/tools/builtin/file-write.ts`, `src/tools/builtin/file-edit.ts`, `src/tools/builtin/file-list.ts`
+- Risk: Path handling edge cases, permission errors, large file handling, and (critically) any path restriction logic added in the future.
+- Priority: High — these are security-sensitive tools.
+
+**Memory Tools (ALL untested):**
+- What's not tested: `memory.read`, `memory.write`, `memory.search` tool wrappers.
+- Files: `src/tools/builtin/memory-read.ts`, `src/tools/builtin/memory-write.ts`, `src/tools/builtin/memory-search.ts`
+- Risk: Namespace validation, content truncation, error handling for missing namespaces.
+- Priority: Medium
+
+**Process Tools (ALL untested):**
+- What's not tested: `process.start`, `process.kill`, `process.list`, `process.output`, `process.status` — tools for managing background processes.
+- Files: `src/tools/builtin/process/start.ts`, `src/tools/builtin/process/kill.ts`, `src/tools/builtin/process/list.ts`, `src/tools/builtin/process/output.ts`, `src/tools/builtin/process/status.ts`
+- Risk: Process lifecycle edge cases, PID reuse, zombie processes. Note: `src/tools/builtin/process/manager.ts` IS tested.
+- Priority: Medium
+
+**Gateway Handlers (ALL untested individually):**
+- What's not tested: Individual handler modules — agent, config, pairing, sessions, system, tools handlers.
+- Files: `src/gateway/handlers/agent.ts`, `src/gateway/handlers/config.ts`, `src/gateway/handlers/pairing.ts`, `src/gateway/handlers/sessions.ts`, `src/gateway/handlers/system.ts`, `src/gateway/handlers/tools.ts`
+- Risk: Request validation, error responses, edge cases in each handler. Note: `src/gateway/handlers/handlers.test.ts` exists and tests the composed handler set, but individual handler logic is not unit-tested.
+- Priority: Low-Medium (integration tests via handlers.test.ts provide some coverage)
+
+**MCP Client (untested):**
+- What's not tested: MCP protocol client for connecting to external tool servers.
+- Files: `src/mcp/client.ts`
+- Risk: Connection lifecycle, protocol errors, tool discovery failures. Note: `src/mcp/bridge.test.ts` tests the bridge layer with mocked clients.
+- Priority: Medium
+
+**GitHub Models Client (untested):**
+- What's not tested: GitHub Models provider with streaming and non-streaming chat.
+- Files: `src/models/github.ts`
+- Risk: Unguarded `JSON.parse` on tool arguments (lines 151, 268), streaming chunk assembly, error handling for API failures.
+- Priority: Medium — this is a primary model provider.
+
+---
+
+*Concerns audit: 2025-02-09*
@@ -0,0 +1,294 @@
+# Coding Conventions
+
+**Analysis Date:** 2026-02-09
+
+## Naming Patterns
+
+**Files:**
+- Source files: camelCase with `.ts` extension — `clientFactory.ts`, `hybridSearch.ts`, `vectorStore.ts`
+- React/Ink components: PascalCase with `.tsx` extension (in `src/frontends/tui/components/`)
+- Test files: co-located with source, suffixed `.test.ts` — `agent.test.ts` beside `agent.ts`
+- Type-only files: `types.ts` in each module directory
+- Index barrel files: `index.ts` in each module directory
+
+**Classes:**
+- PascalCase: `NativeAgent`, `ModelRouter`, `ToolRegistry`, `ChannelRegistry`, `SessionStore`
+- Implementation classes use `implements` with interface: `class AnthropicClient implements ModelClient`
+
+**Interfaces/Types:**
+- PascalCase: `ChatRequest`, `ChatResponse`, `ToolResult`, `InboundMessage`
+- Config interfaces suffixed with `Config`: `NativeAgentConfig`, `OrchestratorConfig`, `AnthropicClientConfig`
+- Union types use `type` keyword: `type ConversationMessage = Message | ToolMessage`
+- Zod-inferred types: `type Config = z.infer<typeof configSchema>` in `src/config/schema.ts`
+
+**Functions:**
+- camelCase: `loadConfig()`, `expandEnvVars()`, `createClientFromConfig()`
+- Factory functions prefixed with `create`: `createMemoryReadTool()`, `createWebSearchTool()`, `createBrowserTools()`
+- Boolean checks prefixed with `is`/`has`/`should`: `isSupportedImage()`, `hasImages()`, `shouldCompact()`
+
+**Variables/Fields:**
+- camelCase for public: `modelClient`, `systemPrompt`, `toolRegistry`
+- Underscore prefix for private fields: `_config`, `_dirtyNamespaces`, `_totalUsage`, `_callCount`
+- Constants: UPPER_SNAKE_CASE: `FETCH_TIMEOUT_MS`, `MAX_RESULTS`, `DEFAULT_RETRY_CONFIG`, `MODEL_COSTS_PER_MILLION`
+
+**Tool Names:**
+- Dot-separated namespacing: `shell.exec`, `file.read`, `file.write`, `memory.read`, `web.fetch`
+- MCP tools use colon separator: `mcp:filesystem:read_file`
+- Tool groups prefixed with `group:`: `group:fs`, `group:runtime`, `group:web`, `group:memory`
+
+## Code Style
+
+**Formatting:**
+- No Prettier or formatter config detected — formatting is enforced by convention
+- 2-space indentation throughout
+- Single quotes for strings
+- Trailing commas in multiline structures
+- Semicolons always used
+
+**Linting:**
+- ESLint v9+ configured (via `eslint` devDependency), no config file found — likely uses flat config defaults
+- Run with: `pnpm lint` (runs `eslint src/`)
+
+**TypeScript Strictness:**
+- `strict: true` in `tsconfig.json`
+- Target: ES2022, Module: NodeNext, ModuleResolution: NodeNext
+- JSX: react-jsx (for Ink TUI components)
+- All declarations, declaration maps, and source maps enabled
+
+## Import Organization
+
+**Order:**
+1. Node.js stdlib: `import { readFileSync } from 'fs';`
+2. Third-party packages: `import Anthropic from '@anthropic-ai/sdk';`
+3. Local imports: `import { configSchema } from './schema.js';`
+
+**Path Style:**
+- Always use `.js` extensions for local imports (NodeNext resolution):
+  ```typescript
+  import { NativeAgent } from './agent.js';
+  import type { Config } from '../config/schema.js';
+  ```
+
+**Type-only Imports:**
+- Use `import type` for type-only imports:
+  ```typescript
+  import type { ModelClient, ChatResponse } from '../../models/types.js';
+  import type { Tool, ToolResult } from '../../tools/types.js';
+  ```
+
+**Barrel Files:**
+- Every module has an `index.ts` that re-exports public API
+- Use explicit named exports, not `export *`:
+  ```typescript
+  // src/tools/index.ts
+  export type { Tool, ToolCall, ToolResult } from './types.js';
+  export { ToolRegistry } from './registry.js';
+  export { ToolExecutor } from './executor.js';
+  ```
+- Types re-exported with `export type`:
+  ```typescript
+  export type { AnthropicToolDef, OpenAIToolDef } from './registry.js';
+  ```
+
+## Tool Patterns
+
+Flynn tools follow three distinct patterns. Use the appropriate one:
+
+**Static Tool (no dependencies):**
+```typescript
+// src/tools/builtin/shell.ts
+import type { Tool, ToolResult } from '../types.js';
+
+interface ShellExecArgs {
+  command: string;
+  cwd?: string;
+}
+
+export const shellExecTool: Tool = {
+  name: 'shell.exec',
+  description: 'Execute a shell command...',
+  inputSchema: {
+    type: 'object',
+    properties: {
+      command: { type: 'string', description: 'The shell command to execute' },
+    },
+    required: ['command'],
+  },
+  execute: async (rawArgs: unknown): Promise<ToolResult> => {
+    const args = rawArgs as ShellExecArgs;
+    // implementation
+  },
+};
+```
+
+**Factory Tool (single tool needing dependency injection):**
+```typescript
+// src/tools/builtin/memory-read.ts
+export function createMemoryReadTool(store: MemoryStore): Tool {
+  return {
+    name: 'memory.read',
+    description: '...',
+    inputSchema: { ... },
+    execute: async (rawArgs: unknown): Promise<ToolResult> => {
+      const args = rawArgs as MemoryReadArgs;
+      // uses `store` from closure
+    },
+  };
+}
+```
+
+**Multi-Factory (related tool set):**
+```typescript
+// src/tools/builtin/index.ts
+export function createMemoryTools(store: MemoryStore, hybridSearch?: HybridSearch): Tool[] {
+  return [
+    createMemoryReadTool(store),
+    createMemoryWriteTool(store),
+    createMemorySearchTool(store, hybridSearch),
+  ];
+}
+```
+
+**Registration chain:**
+1. Tool file (e.g., `src/tools/builtin/shell.ts`)
+2. `src/tools/builtin/index.ts` (barrel exports + `allBuiltinTools` array)
+3. `src/tools/index.ts` (barrel re-exports)
+4. Registered in `src/daemon/index.ts` via `ToolRegistry.register()`
+
+## Error Handling
+
+**Pattern 1: Return ToolResult with error field (tools):**
+```typescript
+try {
+  const content = store.read(args.namespace);
+  return { success: true, output: content };
+} catch (error) {
+  return {
+    success: false,
+    output: '',
+    error: error instanceof Error ? error.message : String(error),
+  };
+}
+```
+
+**Pattern 2: Throw with descriptive message (config/setup):**
+```typescript
+if (envValue === undefined) {
+  throw new Error(`Environment variable ${envVar} is not set`);
+}
+```
+
+**Pattern 3: Throw for duplicate registration (registries):**
+```typescript
+if (this.tools.has(tool.name)) {
+  throw new Error(`Tool '${tool.name}' is already registered`);
+}
+```
+
+**Pattern 4: Yield error events in streams:**
+```typescript
+try {
+  for await (const event of stream) {
+    yield { type: 'content', content: event.delta.text };
+  }
+} catch (error) {
+  yield {
+    type: 'error',
+    error: error instanceof Error ? error : new Error(String(error)),
+  };
+}
+```
+
+**Pattern 5: Fire-and-forget with error logging (channels):**
+```typescript
+this.messageHandler(msg, reply).catch((err: unknown) => {
+  console.error(`Error handling message from '${msg.channel}':`, err);
+});
+```
+
+**Pattern 6: Promise.allSettled for non-critical failures:**
+```typescript
+const results = await Promise.allSettled(adapters.map((a) => a.connect()));
+for (const [i, result] of results.entries()) {
+  if (result.status === 'rejected') {
+    console.error(`Failed to start channel '${adapters[i].name}':`, result.reason);
+  }
+}
+```
+
+**instanceof check pattern:** Always use `error instanceof Error ? error.message : String(error)` for unknown errors.
+
+## Logging
+
+**Framework:** `console` (no logging library)
+
+**Patterns:**
+- `console.log()` for informational messages: startup, config loaded
+- `console.warn()` for non-fatal issues: missing handler, unknown channel
+- `console.error()` for failures: failed connections, adapter errors
+- No structured logging — messages are plain strings with context
+
+## Comments
+
+**JSDoc-style comments on interfaces:**
+```typescript
+/** Media attachment received from or sent to a channel. */
+export interface Attachment {
+  /** MIME type (e.g. "image/jpeg", "audio/ogg", "application/pdf"). */
+  mimeType: string;
+  /** Base64-encoded data (preferred for model APIs). */
+  data?: string;
+}
+```
+
+**Section dividers in larger files:**
+```typescript
+// ── Public types ──────────────────────────────────────────────────────
+// ── Helpers ─────────────────────────────────────────────────────────
+```
+
+**Inline comments for non-obvious logic:**
+```typescript
+// Policy check (defense in depth — tools should also be filtered at listing time)
+// Fire and forget — errors are logged, not propagated
+```
+
+**When to comment:**
+- Always add JSDoc on exported interfaces and their fields
+- Use section dividers in files > 100 lines
+- Add inline comments for non-obvious behavior, workarounds, or design decisions
+- No comments on self-explanatory code
+
+## Function Design
+
+**Size:** Functions generally stay under 50 lines. Complex logic is extracted into private methods (e.g., `toolLoop()`, `singleTurn()` in `NativeAgent`).
+
+**Parameters:**
+- Config objects for constructors with >2 params: `NativeAgentConfig`, `OrchestratorConfig`
+- Raw args typed as `unknown`, then cast internally: `const args = rawArgs as ShellExecArgs`
+- Optional params use `?` with defaults via `??`: `config.maxTokens ?? 4096`
+
+**Return Values:**
+- Async functions return `Promise<T>`
+- Tools return `Promise<ToolResult>` with `{ success, output, error? }`
+- Streams use `AsyncIterable<ChatStreamEvent>`
+- Model clients always return `ChatResponse` with required `content`, `stopReason`, `usage`
+
+## Module Design
+
+**Exports:** Named exports only — no default exports anywhere in the codebase.
+
+**Barrel Files:** Every module directory has `index.ts`. Import from barrel in consuming code:
+```typescript
+import { ToolRegistry, ToolExecutor, ToolPolicy } from '../../tools/index.js';
+```
+
+**Config validation:** Use Zod schemas in `src/config/schema.ts`. All config types are inferred from schemas:
+```typescript
+export const configSchema = z.object({ ... });
+export type Config = z.infer<typeof configSchema>;
+```
+
+---
+
+*Convention analysis: 2026-02-09*
@@ -0,0 +1,323 @@
+# External Integrations
+
+**Analysis Date:** 2026-02-09
+
+## AI Model Providers
+
+Flynn supports 10 model providers via a unified `ModelClient` interface (`src/models/types.ts`). Each provider implements `chat()` and optionally `chatStream()`. The `ModelRouter` (`src/models/router.ts`) manages tier-based routing (fast/default/complex/local) with fallback chains.
+
+**Anthropic:**
+- SDK: `@anthropic-ai/sdk` (`src/models/anthropic.ts`)
+- Auth: `ANTHROPIC_API_KEY` env var or `api_key` in config
+- Features: Streaming, tool use, extended thinking mode, multimodal (images)
+- Extended thinking: `{ type: 'enabled', budget_tokens: 4096 }` on request
+
+**OpenAI:**
+- SDK: `openai` (`src/models/openai.ts`)
+- Auth: `OPENAI_API_KEY` env var or `api_key` in config
+- Features: Tool use, multimodal (images via data URIs or URLs)
+- Also powers: OpenRouter, ZhipuAI, xAI via `baseURL` override
+
+**Google Gemini:**
+- SDK: `@google/generative-ai` (`src/models/gemini.ts`)
+- Auth: `GOOGLE_API_KEY` env var or `api_key` in config
+- Features: Streaming, tool use, extended thinking, multimodal
+
+**AWS Bedrock:**
+- SDK: `@aws-sdk/client-bedrock-runtime` (`src/models/bedrock.ts`)
+- Auth: `AWS_REGION` env var + IAM credentials or explicit `accessKeyId`/`secretAccessKey` in config
+- Features: Streaming (ConverseStream), tool use, multimodal
+- Models: Meta Llama, Amazon Titan (cost-tracked in `src/models/costs.ts`)
+
+**GitHub Models (Copilot):**
+- SDK: `openai` (OpenAI-compatible API) (`src/models/github.ts`)
+- Auth: `GITHUB_TOKEN` env var or OAuth device flow (`src/auth/github.ts`)
+- Endpoint: `https://api.githubcopilot.com`
+- Auto-fallback: When an Anthropic tier fails, Flynn automatically tries the same model via GitHub Models before the global fallback chain (`src/daemon/index.ts` `createAutoFallbackClient()`)
+- OAuth device flow: Uses client ID `Ov23li8tweQw6odWQebz`, stores token at `~/.config/flynn/auth.json`
+
+**OpenRouter:**
+- SDK: `openai` with `baseURL: https://openrouter.ai/api/v1` (`src/daemon/index.ts`)
+- Auth: `OPENROUTER_API_KEY` env var or `api_key` in config
+
+**ZhipuAI:**
+- SDK: `openai` with `baseURL: https://api.z.ai/api/paas/v4` (`src/daemon/index.ts`)
+- Auth: `ZHIPUAI_API_KEY` env var or `api_key` in config
+
+**xAI (Grok):**
+- SDK: `openai` with `baseURL: https://api.x.ai/v1` (`src/daemon/index.ts`)
+- Auth: `XAI_API_KEY` env var or `api_key` in config
+
+**Ollama (Local):**
+- SDK: `ollama` (`src/models/local/ollama.ts`)
+- Auth: None (local server)
+- Endpoint: Configurable `host` (default: `http://localhost:11434`)
+- Config: `num_gpu` option for GPU layer control
+
+**llama.cpp (Local):**
+- SDK: Raw `fetch` HTTP calls (`src/models/local/llamacpp.ts`)
+- Auth: Optional `auth_token` header
+- Endpoint: Configurable (default: `http://localhost:8080`)
+
+## Embedding Providers
+
+Embedding providers (`src/memory/embeddings.ts`) power the hybrid vector + keyword search system. Factory function: `createEmbeddingProvider()`.
+
+**OpenAI Embeddings:**
+- SDK: `openai` (lazy import)
+- Auth: `OPENAI_API_KEY` or config `api_key`
+- Default model: `text-embedding-3-small`, default dims: 1536
+
+**Gemini Embeddings:**
+- SDK: `@google/generative-ai` (lazy import)
+- Auth: `GOOGLE_API_KEY` or config `api_key`
+- Uses `batchEmbedContents` for efficiency, default dims: 768
+
+**Ollama Embeddings:**
+- SDK: `ollama` (lazy import)
+- Auth: None (local)
+- Configurable host endpoint, default dims: 768
+
+**LlamaCpp Embeddings:**
+- SDK: Raw `fetch` to `/embedding` endpoint
+- Auth: None
+- Default endpoint: `http://localhost:8080`, default dims: 768
+
+**Voyage AI Embeddings:**
+- SDK: `openai` (OpenAI-compatible API, lazy import)
+- Auth: `VOYAGE_API_KEY` env var or config `api_key`
+- Endpoint: `https://api.voyageai.com/v1`, default dims: 1024
+
+## Data Storage
+
+**Session Database (SQLite):**
+- Library: `better-sqlite3` (`src/session/store.ts`)
+- Location: `{dataDir}/sessions.db`
+- Schema: `messages` table with `id`, `session_id`, `role`, `content`, `created_at`
+- TTL-based pruning: Configurable via `sessions.ttl` (default: 30 days), hourly cleanup
+
+**Vector Database (SQLite):**
+- Library: `better-sqlite3` (`src/memory/vector-store.ts`)
+- Location: `{dataDir}/vectors.db`
+- Stores embedding chunks as `Float32Array` BLOBs
+- Content hashing for deduplication
+- Background indexer runs every 30 seconds
+
+**Memory Store (Filesystem):**
+- Location: `{dataDir}/memory/` (`src/memory/store.ts`)
+- Format: Markdown files organized by namespace
+- Layout: `global.md`, `user.md`, `sessions/{id}.md`
+- Hybrid search: Keyword + vector (configurable weight via `hybrid_weight`, default 0.7)
+
+**File Storage:**
+- Local filesystem only — no cloud object storage
+
+**Caching:**
+- In-memory response cache for web fetch tool (5-minute TTL) (`src/tools/builtin/web-fetch.ts`)
+- No external cache service (Redis, etc.)
+
+## Channel Adapters (Messaging Platforms)
+
+All adapters implement `ChannelAdapter` interface (`src/channels/types.ts`): `connect()`, `disconnect()`, `send()`, `onMessage()`.
+
+**Telegram:**
+- SDK: `grammy` (`src/channels/telegram/`)
+- Auth: Bot token via `telegram.bot_token` config
+- Features: Long polling, chat ID allowlist, mention requirement, pairing codes, image/audio attachments
+
+**Discord:**
+- SDK: `discord.js` (`src/channels/discord/`)
+- Auth: Bot token via `discord.bot_token` config
+- Features: Guild/channel allowlists, mention requirement, pairing codes
+
+**Slack:**
+- SDK: `@slack/bolt` (`src/channels/slack/`)
+- Auth: `bot_token`, `app_token`, `signing_secret` in config
+- Features: Socket mode, channel allowlists, mention requirement, pairing codes
+
+**WhatsApp:**
+- SDK: `whatsapp-web.js` (`src/channels/whatsapp/`)
+- Auth: QR code scanning (web client emulation)
+- Features: Number/group allowlists, mention requirement, custom data directory, pairing codes
+
+**WebChat:**
+- Implementation: Gateway WebSocket bridge (`src/channels/webchat/`)
+- Auth: Gateway token or Tailscale identity
+- UI: Vanilla JS dashboard at `src/gateway/ui/` (HTML + CSS + JS, no framework)
+
+## Authentication & Identity
+
+**GitHub OAuth (Device Flow):**
+- Implementation: `src/auth/github.ts`
+- Client ID: `Ov23li8tweQw6odWQebz` (GitHub Copilot)
+- Flow: Device code → User authorization → Token polling
+- Storage: `~/.config/flynn/auth.json` (600 permissions)
+- Priority: `GITHUB_TOKEN` env → stored OAuth token → `null`
+
+**Gateway Auth:**
+- Static bearer token (`server.token` in config)
+- Tailscale identity header trust (`server.tailscale_identity`)
+- HTTP auth optional (`server.auth_http`)
+- Gateway lock: Single-client WebSocket mode (`server.lock`)
+
+**DM Pairing Codes:**
+- Implementation: `src/channels/pairing.ts`
+- Purpose: Authenticate unknown senders via one-time codes
+- Config: `pairing.enabled`, `pairing.code_ttl` (default 5m), `pairing.code_length` (default 6)
+- Gateway handlers for code generation/verification
+
+**Gmail OAuth2:**
+- SDK: `googleapis` (`src/automation/gmail.ts`)
+- Credentials: `~/.config/flynn/gmail-credentials.json`
+- Token: `~/.config/flynn/gmail-token.json`
+- Setup: `flynn gmail-auth` CLI command
+
+## Automation
+
+**Cron Scheduler:**
+- Library: `croner` (`src/automation/cron.ts`)
+- Config: `automation.cron[]` — each job has `name`, `schedule`, `message`, `output.channel`, `output.peer`
+- Implements `ChannelAdapter` to inject cron-triggered messages into the channel registry
+- Features: Enable/disable per job, timezone support, runtime management tools
+
+**Webhooks:**
+- Implementation: `src/automation/webhooks.ts`
+- Auth: HMAC-SHA256 signature verification (`X-Webhook-Signature` header)
+- Templates: `{{body}}` and `{{json.field}}` placeholders
+- Route: `POST /webhooks/{name}` on the gateway HTTP server
+- Config: `automation.webhooks[]` with `name`, `secret`, `message`, `output`
+
+**Gmail Watcher:**
+- SDK: `googleapis` (`src/automation/gmail.ts`)
+- Modes: Pub/Sub push notifications or polling fallback
+- Pub/Sub topic: `projects/flynn-agent/topics/gmail-push`
+- Watch renewal: Every 6 days (Google watch expires at ~7 days)
+- Config: `automation.gmail` with `watch_labels`, `poll_interval`, `history_start`
+- Route: `POST /gmail/push` on gateway for Pub/Sub push
+
+**Heartbeat Monitor:**
+- Implementation: `src/automation/heartbeat.ts`
+- Checks: gateway, model, channels, memory, disk
+- Config: `automation.heartbeat` with `interval`, `checks`, `failure_threshold`, `disk_threshold_mb`
+- Notification: Sends to configured channel/peer on failures
+
+## Web & Content Tools
+
+**Web Search (Brave / SearXNG):**
+- Implementation: `src/tools/builtin/web-search.ts`
+- Brave Search API: `https://api.search.brave.com/res/v1/web/search`
+  - Auth: `X-Subscription-Token` header via `web_search.api_key`
+- SearXNG: Self-hosted instance via `web_search.endpoint`
+  - Auth: None (private instance)
+- Config: `web_search.provider` (`brave` or `searxng`), `web_search.max_results`
+
+**Web Fetch (Readability):**
+- Libraries: `linkedom`, `@mozilla/readability`, `turndown` (`src/tools/builtin/web-fetch.ts`)
+- Features: HTML → Markdown conversion, article extraction, response caching (5min TTL)
+- Truncation: 50,000 character max
+
+**Browser Automation:**
+- Library: `puppeteer-core` (`src/tools/builtin/browser/`)
+- Config: `browser.executable_path` or `browser.ws_endpoint`
+- Features: Headless browsing, page management, screenshots
+- Limits: `browser.max_pages` (default 5), `browser.default_timeout` (default 30s)
+
+## Audio Transcription
+
+**Whisper-Compatible API:**
+- Implementation: `src/models/media.ts`
+- Endpoint: Configurable via `audio.transcription_endpoint`
+- Auth: `audio.transcription_api_key` (Bearer token)
+- Model: `audio.transcription_model` (default: `whisper-1`)
+- Supported formats: OGG, MP3, WAV, WebM, MP4, M4A
+- Integration: Auto-transcribes audio attachments from channels before model processing
+
+## MCP (Model Context Protocol)
+
+**MCP Client:**
+- SDK: `@modelcontextprotocol/sdk` (`src/mcp/client.ts`)
+- Transport: stdio (spawns external processes)
+- Config: `mcp.servers[]` with `name`, `command`, `args`, `env`, `cwd`
+- Bridge: MCP tools auto-registered in Flynn's tool registry (`src/mcp/bridge.ts`)
+- Management: `McpManager` starts/stops all configured servers (`src/mcp/manager.ts`)
+
+## Docker Sandbox
+
+**Per-Session Containers:**
+- Implementation: `src/sandbox/manager.ts`, `src/sandbox/docker.ts`
+- Config: `sandbox.image` (default: `node:22-slim`), `sandbox.network` (default: `none`), `sandbox.memory_limit`, `sandbox.cpu_limit`
+- Features: Lazily created per session, replaces `shell.exec` and `process.start` tools with sandboxed versions
+- Prerequisite: Docker daemon available
+
+## Networking & Exposure
+
+**Gateway Server:**
+- Protocol: WebSocket (JSON-RPC) + HTTP (`src/gateway/server.ts`)
+- Default port: 18800
+- Binding: `127.0.0.1` (localhost only) or `0.0.0.0`
+- Features: LaneQueue for request ordering, session bridge, static file serving for dashboard
+
+**Tailscale Serve:**
+- Implementation: `src/gateway/tailscale.ts`
+- Purpose: Expose gateway HTTPS endpoint on tailnet
+- Config: `server.tailscale.serve`, `server.tailscale.hostname`, `server.tailscale.port`
+- Prerequisite: Tailscale CLI installed and daemon running
+
+## Monitoring & Observability
+
+**Error Tracking:**
+- None (console.error only)
+
+**Logging:**
+- `console.log` / `console.error` / `console.debug` throughout
+- No structured logging framework
+
+**Cost Tracking:**
+- Built-in: `src/models/costs.ts` with per-million-token pricing for known models
+- Tracks: Anthropic, OpenAI, Gemini, xAI, Bedrock models
+- GitHub Copilot models tracked at $0 (subscription-included)
+- Usage exposed via `/usage` command and gateway `system.usage` RPC
+
+## CI/CD & Deployment
+
+**Hosting:**
+- Self-hosted (designed for personal deployment)
+- Process supervisor expected for restarts (exit code 75 = restart signal)
+
+**CI Pipeline:**
+- Not detected in repository
+
+## Environment Configuration
+
+**Required env vars (minimum viable):**
+- `ANTHROPIC_API_KEY` (or other model provider key)
+- `FLYNN_TELEGRAM_TOKEN` (if using default Telegram channel)
+
+**Optional env vars (by feature):**
+- `OPENAI_API_KEY` - OpenAI models and embeddings
+- `GOOGLE_API_KEY` - Gemini models and embeddings
+- `GITHUB_TOKEN` - GitHub Models / Copilot access
+- `AWS_REGION` - Bedrock region
+- `OPENROUTER_API_KEY` - OpenRouter access
+- `ZHIPUAI_API_KEY` - ZhipuAI access
+- `XAI_API_KEY` - xAI (Grok) access
+- `VOYAGE_API_KEY` - Voyage AI embeddings
+- `FLYNN_DATA_DIR` - Custom data directory
+
+**Secrets location:**
+- API keys: YAML config (with `${ENV_VAR}` expansion) or environment variables
+- OAuth tokens: `~/.config/flynn/auth.json` (GitHub), `~/.config/flynn/gmail-token.json` (Gmail)
+- `.env.example` present at project root
+
+## Webhooks & Callbacks
+
+**Incoming:**
+- `POST /webhooks/{name}` - Named webhooks with HMAC-SHA256 verification (`src/automation/webhooks.ts`)
+- `POST /gmail/push` - Google Pub/Sub push notifications for Gmail (`src/automation/gmail.ts`)
+
+**Outgoing:**
+- None (no outbound webhooks — all communication goes through channel adapters)
+
+---
+
+*Integration audit: 2026-02-09*
@@ -0,0 +1,155 @@
+# Technology Stack
+
+**Analysis Date:** 2026-02-09
+
+## Languages
+
+**Primary:**
+- TypeScript 5.7+ - All source code in `src/`
+- JSX (react-jsx) - TUI components in `src/frontends/tui/components/`
+
+**Secondary:**
+- JavaScript (vanilla) - Gateway web dashboard in `src/gateway/ui/` (HTML/CSS/JS, no framework)
+- SQL - Inline SQLite schema in `src/session/store.ts` and `src/memory/vector-store.ts`
+- YAML - Configuration files in `config/`
+
+## Runtime
+
+**Environment:**
+- Node.js >= 22.0.0 (required by `engines` in `package.json`)
+- ES2022 target, NodeNext module system
+
+**Package Manager:**
+- pnpm
+- Lockfile: `pnpm-lock.yaml` (present, ~6500 lines)
+
+## Frameworks
+
+**Core:**
+- None (custom daemon architecture — no Express, Fastify, or similar)
+- Raw `http.createServer` + `ws` WebSocket server for the gateway (`src/gateway/server.ts`)
+
+**CLI:**
+- Commander 14.x - CLI argument parsing (`src/cli/index.ts`)
+- Ink 6.x + React 19.x - Terminal UI framework (`src/frontends/tui/`)
+
+**Testing:**
+- Vitest 3.x - Test runner and assertion library
+
+**Build/Dev:**
+- TypeScript compiler (`tsc`) - Production builds
+- tsx 4.x - Development mode with watch (`pnpm dev`)
+
+## Key Dependencies
+
+**Critical (AI/Model Providers):**
+- `@anthropic-ai/sdk` ^0.39.0 - Anthropic Claude API client (`src/models/anthropic.ts`)
+- `openai` ^4.0.0 - OpenAI API client, also used for OpenRouter/ZhipuAI/xAI/GitHub Models (`src/models/openai.ts`, `src/models/github.ts`)
+- `@google/generative-ai` ^0.24.1 - Google Gemini API client (`src/models/gemini.ts`)
+- `@aws-sdk/client-bedrock-runtime` ^3.985.0 - AWS Bedrock for model inference (`src/models/bedrock.ts`)
+- `ollama` ^0.5.0 - Ollama local model client (`src/models/local/ollama.ts`)
+
+**Critical (Channel Adapters):**
+- `grammy` ^1.35.0 - Telegram bot framework (`src/channels/telegram/`)
+- `discord.js` ^14.25.1 - Discord bot library (`src/channels/discord/`)
+- `@slack/bolt` ^4.6.0 - Slack bot framework (`src/channels/slack/`)
+- `whatsapp-web.js` ^1.34.6 - WhatsApp Web client (`src/channels/whatsapp/`)
+
+**Critical (Data):**
+- `better-sqlite3` ^11.0.0 - SQLite for sessions (`src/session/store.ts`) and vector storage (`src/memory/vector-store.ts`)
+- `zod` ^3.24.0 - Configuration schema validation (`src/config/schema.ts`)
+- `yaml` ^2.7.0 - Config file parsing (`src/config/loader.ts`)
+
+**Infrastructure:**
+- `ws` ^8.19.0 - WebSocket server for gateway (`src/gateway/server.ts`)
+- `@modelcontextprotocol/sdk` ^1.26.0 - MCP client for external tool servers (`src/mcp/client.ts`)
+- `googleapis` ^148.0.0 - Gmail API integration (`src/automation/gmail.ts`)
+- `croner` ^10.0.1 - Cron job scheduling (`src/automation/cron.ts`)
+- `puppeteer-core` ^24.37.2 - Browser automation for web tools (`src/tools/builtin/browser/`)
+
+**Content Processing:**
+- `@mozilla/readability` ^0.5.0 - Article extraction from web pages (`src/tools/builtin/web-fetch.ts`)
+- `linkedom` ^0.18.0 - Server-side DOM for Readability (`src/tools/builtin/web-fetch.ts`)
+- `turndown` ^7.2.0 - HTML to Markdown conversion (`src/tools/builtin/web-fetch.ts`)
+- `marked` ^17.0.1 + `marked-terminal` ^7.3.0 - Markdown rendering in terminal
+- `cli-highlight` ^2.1.11 - Syntax highlighting in terminal
+
+## Configuration
+
+**Format:**
+- YAML config file with `${ENV_VAR}` expansion support
+- Zod schema validation at load time (`src/config/schema.ts`)
+- Config loaded via `loadConfig()` in `src/config/loader.ts`
+
+**Config Location:**
+- Default template: `config/default.yaml`
+- User config: `~/.config/flynn/config.yaml` (conventional)
+- CLI flag: `--config <path>` on `flynn start`
+
+**Data Directory:**
+- Default: `~/.local/share/flynn/`
+- Override: `FLYNN_DATA_DIR` environment variable
+- Contains: `sessions.db`, `vectors.db`, `memory/` directory
+
+**Auth Storage:**
+- GitHub OAuth tokens: `~/.config/flynn/auth.json` (0600 permissions)
+- Gmail OAuth tokens: `~/.config/flynn/gmail-token.json`
+- Gmail credentials: `~/.config/flynn/gmail-credentials.json`
+
+**Key Environment Variables:**
+- `ANTHROPIC_API_KEY` - Anthropic API authentication
+- `OPENAI_API_KEY` - OpenAI API authentication
+- `GOOGLE_API_KEY` - Gemini API authentication
+- `GITHUB_TOKEN` - GitHub Models / Copilot authentication
+- `AWS_REGION` - AWS Bedrock region (default: us-east-1)
+- `OPENROUTER_API_KEY` - OpenRouter API authentication
+- `ZHIPUAI_API_KEY` - ZhipuAI API authentication
+- `XAI_API_KEY` - xAI (Grok) API authentication
+- `VOYAGE_API_KEY` - Voyage AI embeddings authentication
+- `FLYNN_TELEGRAM_TOKEN` - Telegram bot token (referenced in default config)
+- `FLYNN_DATA_DIR` - Custom data directory override
+
+**Build:**
+- `tsconfig.json` - TypeScript compiler config
+  - Target: ES2022
+  - Module: NodeNext
+  - ModuleResolution: NodeNext
+  - Strict mode enabled
+  - JSX: react-jsx
+  - Source maps, declarations, and declaration maps enabled
+  - Root: `src/`, Output: `dist/`
+
+## Platform Requirements
+
+**Development:**
+- Node.js >= 22.0.0
+- pnpm (package manager)
+- TypeScript 5.7+
+
+**Production:**
+- Node.js >= 22.0.0
+- SQLite3 (native binding via better-sqlite3)
+
+**Optional Runtime Dependencies:**
+- Docker - For sandbox container execution (`src/sandbox/`)
+- Tailscale CLI - For Tailscale Serve gateway exposure (`src/gateway/tailscale.ts`)
+- Chromium/Chrome - For browser tools via puppeteer-core (`src/tools/builtin/browser/`)
+
+## Build Commands
+
+```bash
+pnpm build                    # Compile TypeScript to dist/
+pnpm dev                      # Run daemon with tsx watch mode
+pnpm start                    # Start production build (node dist/cli/index.js start)
+pnpm tui                      # Run TUI (readline mode)
+pnpm tui:fs                   # Run TUI (fullscreen React/Ink mode)
+pnpm tui:dev                  # Run TUI with watch mode
+pnpm test                     # Run Vitest in watch mode
+pnpm test:run                 # Run Vitest once
+pnpm lint                     # ESLint
+pnpm typecheck                # tsc --noEmit
+```
+
+---
+
+*Stack analysis: 2026-02-09*
@@ -0,0 +1,331 @@
+# Codebase Structure
+
+**Analysis Date:** 2025-02-09
+
+## Directory Layout
+
+```
+flynn/
+├── src/                       # All TypeScript source code
+│   ├── agents/                # Named agent configs + routing
+│   ├── auth/                  # GitHub device flow auth
+│   ├── automation/            # Cron, webhooks, heartbeat, Gmail
+│   ├── backends/              # AI agent implementations
+│   │   └── native/            # NativeAgent + AgentOrchestrator
+│   ├── channels/              # Multi-platform messaging adapters
+│   │   ├── discord/           # Discord adapter
+│   │   ├── slack/             # Slack adapter
+│   │   ├── telegram/          # Telegram adapter
+│   │   ├── webchat/           # WebChat adapter (wraps gateway)
+│   │   └── whatsapp/          # WhatsApp adapter
+│   ├── cli/                   # CLI commands (commander.js)
+│   ├── config/                # YAML config loading + Zod schema
+│   ├── context/               # Token estimation + compaction
+│   ├── daemon/                # Daemon bootstrap + lifecycle
+│   ├── frontends/             # UI frontends
+│   │   ├── telegram/          # Telegram bot (legacy direct integration)
+│   │   └── tui/               # Terminal UI (readline + React/Ink)
+│   │       └── components/    # Ink React components
+│   ├── gateway/               # WebSocket JSON-RPC server
+│   │   ├── handlers/          # JSON-RPC method handlers
+│   │   └── ui/                # Vanilla JS dashboard
+│   │       ├── lib/           # Shared JS (ws-client)
+│   │       └── pages/         # Page JS (chat, dashboard, sessions, settings, usage)
+│   ├── hooks/                 # Tool confirmation engine
+│   ├── mcp/                   # Model Context Protocol bridge
+│   ├── memory/                # Persistent memory + vector search
+│   ├── models/                # LLM provider clients + router
+│   │   └── local/             # Ollama + llama.cpp clients
+│   ├── prompt/                # System prompt template assembly
+│   ├── sandbox/               # Docker sandbox for tool execution
+│   ├── session/               # SQLite session persistence
+│   ├── skills/                # Pluggable skill system
+│   └── tools/                 # Tool registry, executor, policy
+│       └── builtin/           # Built-in tool implementations
+│           ├── browser/       # Puppeteer browser tools
+│           └── process/       # Background process tools
+├── config/                    # Example/default config files
+├── docs/                      # Documentation
+│   └── plans/                 # Planning docs
+├── .planning/                 # GSD planning documents
+│   └── codebase/              # Codebase analysis (this file)
+├── AGENTS.md                  # Agent instructions for Claude Code
+├── CHANGELOG.md               # Version changelog
+├── CLAUDE.md                  # Claude Code shared memory
+├── SOUL.md                    # Flynn's AI personality/identity
+├── Dockerfile                 # Docker build config
+├── docker-compose.yml         # Docker Compose config
+├── package.json               # Node.js package manifest
+├── tsconfig.json              # TypeScript configuration
+└── pnpm-lock.yaml             # Lockfile
+```
+
+## Directory Purposes
+
+**`src/agents/`:**
+- Purpose: Named agent configurations and routing logic
+- Contains: `AgentConfigRegistry` (stores named configs), `AgentRouter` (resolves channel+sender → agent config)
+- Key files: `src/agents/registry.ts`, `src/agents/router.ts`
+
+**`src/auth/`:**
+- Purpose: GitHub device flow authentication for GitHub Models provider
+- Contains: OAuth device flow implementation
+- Key files: `src/auth/github.ts`
+
+**`src/automation/`:**
+- Purpose: Scheduled jobs, incoming webhooks, health monitoring, Gmail watching
+- Contains: CronScheduler (croner-based), WebhookHandler (HMAC auth), HeartbeatMonitor, GmailWatcher
+- Key files: `src/automation/cron.ts`, `src/automation/webhooks.ts`, `src/automation/heartbeat.ts`, `src/automation/gmail.ts`
+
+**`src/backends/native/`:**
+- Purpose: Core AI agent implementation — message processing and tool execution loop
+- Contains: `NativeAgent` (tool loop), `AgentOrchestrator` (delegation/compaction/memory wrapper), prompt templates
+- Key files: `src/backends/native/agent.ts`, `src/backends/native/orchestrator.ts`, `src/backends/native/prompts.ts`, `src/backends/native/attachments.ts`
+
+**`src/channels/`:**
+- Purpose: Platform-agnostic messaging layer with uniform adapter interface
+- Contains: `ChannelAdapter` interface, `ChannelRegistry`, per-platform adapter directories, `PairingManager`
+- Key files: `src/channels/types.ts`, `src/channels/registry.ts`, `src/channels/pairing.ts`, `src/channels/utils.ts`
+
+**`src/cli/`:**
+- Purpose: CLI command definitions and entry point
+- Contains: Commander.js command registrations, config loading helpers
+- Key files: `src/cli/index.ts` (entry point), `src/cli/start.ts`, `src/cli/tui.ts`, `src/cli/send.ts`, `src/cli/sessions.ts`, `src/cli/doctor.ts`, `src/cli/config-cmd.ts`, `src/cli/completion.ts`, `src/cli/shared.ts`
+
+**`src/config/`:**
+- Purpose: Configuration loading and validation
+- Contains: YAML loader with `${ENV_VAR}` expansion, comprehensive Zod schema
+- Key files: `src/config/schema.ts` (all config types), `src/config/loader.ts` (YAML parse + validate)
+
+**`src/context/`:**
+- Purpose: Conversation context management — token counting and history compaction
+- Contains: Token estimator (rule-based, no tokenizer), compaction logic using delegation
+- Key files: `src/context/tokens.ts`, `src/context/compaction.ts`
+
+**`src/daemon/`:**
+- Purpose: Daemon bootstrap — wires all subsystems together and manages lifecycle
+- Contains: `startDaemon()` function (1088 lines), `Lifecycle` manager, model client factory, message router
+- Key files: `src/daemon/index.ts` (main orchestration), `src/daemon/lifecycle.ts`
+
+**`src/frontends/telegram/`:**
+- Purpose: Legacy direct Telegram bot integration with confirmation UI
+- Contains: Bot handlers, confirmation keyboard management
+- Key files: `src/frontends/telegram/bot.ts`, `src/frontends/telegram/handlers.ts`, `src/frontends/telegram/confirmations.ts`
+
+**`src/frontends/tui/`:**
+- Purpose: Terminal user interface with two modes
+- Contains: Minimal readline TUI, fullscreen React/Ink TUI, markdown rendering, slash commands
+- Key files: `src/frontends/tui/minimal.ts`, `src/frontends/tui/fullscreen.ts`, `src/frontends/tui/commands.ts`, `src/frontends/tui/markdown.ts`
+
+**`src/gateway/`:**
+- Purpose: WebSocket JSON-RPC server + HTTP server + dashboard
+- Contains: `GatewayServer`, `Router` (method dispatch), `SessionBridge` (WS → NativeAgent), `LaneQueue` (request serialization), auth, protocol, static file serving, Tailscale Serve integration
+- Key files: `src/gateway/server.ts`, `src/gateway/router.ts`, `src/gateway/session-bridge.ts`, `src/gateway/lane-queue.ts`, `src/gateway/protocol.ts`, `src/gateway/auth.ts`, `src/gateway/static.ts`, `src/gateway/tailscale.ts`
+
+**`src/gateway/handlers/`:**
+- Purpose: JSON-RPC method handler implementations
+- Contains: Handler factories for system, session, tool, agent, config, and pairing methods
+- Key files: `src/gateway/handlers/system.ts`, `src/gateway/handlers/sessions.ts`, `src/gateway/handlers/agent.ts`, `src/gateway/handlers/tools.ts`, `src/gateway/handlers/config.ts`, `src/gateway/handlers/pairing.ts`
+
+**`src/gateway/ui/`:**
+- Purpose: Vanilla JS web dashboard served by gateway HTTP server
+- Contains: HTML pages, CSS, client-side JavaScript
+- Key files: `src/gateway/ui/index.html`, `src/gateway/ui/chat.html`, `src/gateway/ui/style.css`, `src/gateway/ui/app.js`, `src/gateway/ui/lib/ws-client.js`
+
+**`src/hooks/`:**
+- Purpose: Tool execution confirmation engine with glob-pattern matching
+- Contains: `HookEngine` with pending confirmation queue
+- Key files: `src/hooks/engine.ts`
+
+**`src/mcp/`:**
+- Purpose: Model Context Protocol integration — start external MCP servers and bridge their tools
+- Contains: `McpClient`, `McpManager`, tool bridging utilities
+- Key files: `src/mcp/client.ts`, `src/mcp/manager.ts`, `src/mcp/bridge.ts`
+
+**`src/memory/`:**
+- Purpose: Persistent memory system with keyword + vector hybrid search
+- Contains: `MemoryStore` (namespace-based markdown files), `VectorStore` (SQLite), `HybridSearch`, embedding providers, text chunker
+- Key files: `src/memory/store.ts`, `src/memory/vector-store.ts`, `src/memory/hybrid-search.ts`, `src/memory/embeddings.ts`, `src/memory/chunker.ts`
+
+**`src/models/`:**
+- Purpose: LLM provider client implementations and tier-based routing
+- Contains: Provider clients, `ModelRouter`, retry logic, cost estimation, media helpers
+- Key files: `src/models/types.ts` (core interfaces), `src/models/router.ts`, `src/models/anthropic.ts`, `src/models/openai.ts`, `src/models/gemini.ts`, `src/models/bedrock.ts`, `src/models/github.ts`, `src/models/retry.ts`, `src/models/costs.ts`, `src/models/media.ts`
+
+**`src/models/local/`:**
+- Purpose: Local model provider clients
+- Contains: Ollama and llama.cpp client implementations
+- Key files: `src/models/local/ollama.ts`, `src/models/local/llamacpp.ts`
+
+**`src/prompt/`:**
+- Purpose: System prompt assembly from template files
+- Contains: Template search across directories (SOUL.md, AGENTS.md, IDENTITY.md, USER.md, TOOLS.md)
+- Key files: `src/prompt/template.ts`
+
+**`src/sandbox/`:**
+- Purpose: Docker container isolation for shell/process tool execution
+- Contains: `DockerSandbox` (container lifecycle), `SandboxManager` (per-session containers), sandboxed tool wrappers
+- Key files: `src/sandbox/docker.ts`, `src/sandbox/manager.ts`, `src/sandbox/tools.ts`
+
+**`src/session/`:**
+- Purpose: Conversation history persistence
+- Contains: `SessionStore` (SQLite), `SessionManager` (in-memory cache), `ManagedSession`
+- Key files: `src/session/store.ts`, `src/session/manager.ts`
+
+**`src/skills/`:**
+- Purpose: Pluggable skill system — load skills from bundled, managed, and workspace directories
+- Contains: `SkillRegistry`, `SkillInstaller`, skill loader, skill type definitions
+- Key files: `src/skills/registry.ts`, `src/skills/installer.ts`, `src/skills/loader.ts`, `src/skills/types.ts`
+
+**`src/tools/`:**
+- Purpose: Tool abstraction layer — registry, execution, policy enforcement
+- Contains: `ToolRegistry`, `ToolExecutor`, `ToolPolicy`, type definitions
+- Key files: `src/tools/types.ts`, `src/tools/registry.ts`, `src/tools/executor.ts`, `src/tools/policy.ts`
+
+**`src/tools/builtin/`:**
+- Purpose: Built-in tool implementations shipped with Flynn
+- Contains: Shell exec, file operations, web fetch, memory ops, web search, media send, image analysis, session management, agent listing, cross-channel messaging, cron management
+- Key files: `src/tools/builtin/shell.ts`, `src/tools/builtin/file-read.ts`, `src/tools/builtin/file-write.ts`, `src/tools/builtin/file-edit.ts`, `src/tools/builtin/file-patch.ts`, `src/tools/builtin/file-list.ts`, `src/tools/builtin/web-fetch.ts`, `src/tools/builtin/web-search.ts`, `src/tools/builtin/memory-read.ts`, `src/tools/builtin/memory-write.ts`, `src/tools/builtin/memory-search.ts`, `src/tools/builtin/media-send.ts`, `src/tools/builtin/image-analyze.ts`, `src/tools/builtin/system-info.ts`, `src/tools/builtin/sessions.ts`, `src/tools/builtin/agents-list.ts`, `src/tools/builtin/message-send.ts`, `src/tools/builtin/cron.ts`
+
+**`src/tools/builtin/browser/`:**
+- Purpose: Puppeteer-based browser automation tools
+- Contains: `BrowserManager` (page lifecycle), browser tool implementations (navigate, screenshot, click, type, content, eval)
+- Key files: `src/tools/builtin/browser/manager.ts`, `src/tools/builtin/browser/tools.ts`
+
+**`src/tools/builtin/process/`:**
+- Purpose: Background process management tools
+- Contains: `ProcessManager`, tools for start/status/output/kill/list
+- Key files: `src/tools/builtin/process/manager.ts`, `src/tools/builtin/process/start.ts`, `src/tools/builtin/process/status.ts`, `src/tools/builtin/process/output.ts`, `src/tools/builtin/process/kill.ts`, `src/tools/builtin/process/list.ts`
+
+## Key File Locations
+
+**Entry Points:**
+- `src/cli/index.ts`: CLI entry point (binary: `flynn`)
+- `src/daemon/index.ts`: Daemon bootstrap (`startDaemon()`) — the central wiring point
+- `src/gateway/server.ts`: Gateway WebSocket + HTTP server
+- `src/frontends/tui/minimal.ts`: TUI readline mode
+- `src/frontends/tui/fullscreen.ts`: TUI fullscreen Ink mode
+
+**Configuration:**
+- `src/config/schema.ts`: Complete Zod config schema — all types defined here
+- `src/config/loader.ts`: YAML parse + env expansion + Zod validation
+- `tsconfig.json`: TypeScript compiler config (strict, ES2022, NodeNext)
+- `package.json`: Dependencies and scripts
+- `config/`: Example/default config files directory
+
+**Core Logic:**
+- `src/backends/native/agent.ts`: NativeAgent — the AI tool loop
+- `src/backends/native/orchestrator.ts`: AgentOrchestrator — delegation, compaction, memory
+- `src/models/router.ts`: ModelRouter — tier-based model selection with fallback
+- `src/tools/executor.ts`: ToolExecutor — policy check → hook check → execute
+- `src/channels/registry.ts`: ChannelRegistry — adapter lifecycle + message routing
+- `src/daemon/index.ts`: startDaemon() — wires everything together
+
+**Testing:**
+- Test files are co-located with source: `src/path/to/file.test.ts` alongside `src/path/to/file.ts`
+- No separate test directory
+
+## Naming Conventions
+
+**Files:**
+- Source files: `camelCase.ts` (e.g., `session-bridge.ts`, `lane-queue.ts`) — actually `kebab-case.ts`
+- React components: `camelCase.ts` in `src/frontends/tui/components/`
+- Test files: `*.test.ts` suffix (e.g., `agent.test.ts`, `registry.test.ts`)
+- Index files: `index.ts` barrel exports in every directory
+- Type-only files: `types.ts` for pure type definitions
+
+**Directories:**
+- Feature-based: `kebab-case/` (e.g., `web-search`, `file-read`)
+- Platform subdirs: `lowercase/` (e.g., `telegram/`, `discord/`, `slack/`)
+- Nested features: `parent/child/` (e.g., `tools/builtin/browser/`)
+
+**Exports:**
+- Every directory has an `index.ts` barrel file that re-exports public APIs
+- Types use `export type` for type-only exports
+- Registration chain flows: implementation → `builtin/index.ts` → `tools/index.ts` → `daemon/index.ts`
+
+## Where to Add New Code
+
+**New Channel Adapter:**
+- Create directory: `src/channels/<platform>/`
+- Create: `adapter.ts` (implements `ChannelAdapter`), `index.ts` (re-exports)
+- Add test: `adapter.test.ts`
+- Register in: `src/channels/index.ts` (export), `src/daemon/index.ts` (instantiate + register)
+- Add config: `src/config/schema.ts` (new optional schema block)
+
+**New Model Provider:**
+- Create: `src/models/<provider>.ts` (implements `ModelClient`)
+- Add export: `src/models/index.ts`
+- Add case: `src/daemon/index.ts` → `createClientFromConfig()` switch statement
+- Add config: `src/config/schema.ts` → `modelConfigBaseSchema.provider` enum
+
+**New Tool:**
+- Static tool (no deps): Create `src/tools/builtin/<name>.ts`, export const
+- Factory tool (needs deps): Create `src/tools/builtin/<name>.ts`, export function
+- Add to: `src/tools/builtin/index.ts` (export + add to `allBuiltinTools` if static)
+- Add to: `src/tools/index.ts` (re-export)
+- Register in: `src/daemon/index.ts` (call factory + register with `toolRegistry`)
+- Add to profiles: `src/tools/policy.ts` → `PROFILE_TOOLS` if needed
+
+**New Tool Group (multi-tool):**
+- Create directory: `src/tools/builtin/<group>/`
+- Create: `manager.ts` (shared state), individual tool files, `index.ts`
+- Follow pattern of `src/tools/builtin/process/` or `src/tools/builtin/browser/`
+
+**New Gateway Handler:**
+- Create: `src/gateway/handlers/<domain>.ts` (export `createXxxHandlers()`)
+- Add to: `src/gateway/handlers/index.ts`
+- Register in: `src/gateway/server.ts` → `registerHandlers()`
+
+**New Automation Type:**
+- Create: `src/automation/<type>.ts`
+- If it produces messages: implement `ChannelAdapter` interface
+- Add to: `src/automation/index.ts`
+- Register in: `src/daemon/index.ts`
+- Add config: `src/config/schema.ts` → `automationSchema`
+
+**New CLI Command:**
+- Create: `src/cli/<command>.ts` → export `registerXxxCommand(program)`
+- Register in: `src/cli/index.ts`
+
+**Utilities:**
+- Shared helpers: Place in the most specific layer that uses them
+- Cross-cutting: `src/channels/utils.ts` for channel utils, `src/models/media.ts` for media utils
+- No global `utils/` directory — utilities are co-located with their domain
+
+## Special Directories
+
+**`dist/`:**
+- Purpose: Compiled JavaScript output
+- Generated: Yes (by `pnpm build` / `tsc`)
+- Committed: No (in `.gitignore`)
+
+**`node_modules/`:**
+- Purpose: Installed dependencies
+- Generated: Yes (by `pnpm install`)
+- Committed: No (in `.gitignore`)
+
+**`config/`:**
+- Purpose: Example/default configuration files
+- Generated: No
+- Committed: Yes
+
+**`src/gateway/ui/`:**
+- Purpose: Static web dashboard (vanilla HTML/CSS/JS, not compiled)
+- Generated: No — hand-written vanilla JS
+- Committed: Yes
+- Note: Served by the gateway HTTP server at runtime from `dist/gateway/ui/`
+
+**`.planning/`:**
+- Purpose: GSD planning and analysis documents
+- Generated: By analysis tools
+- Committed: Yes
+
+**`docs/plans/`:**
+- Purpose: Feature planning documents and state tracking
+- Generated: No
+- Committed: Yes
+
+---
+
+*Structure analysis: 2025-02-09*
@@ -0,0 +1,428 @@
+# Testing Patterns
+
+**Analysis Date:** 2026-02-09
+
+## Test Framework
+
+**Runner:**
+- Vitest v3.x
+- Config: No `vitest.config.ts` file — uses Vitest defaults with `package.json` `type: "module"`
+
+**Assertion Library:**
+- Vitest built-in `expect()` API (Chai-compatible)
+
+**Run Commands:**
+```bash
+pnpm test                    # Run all tests in watch mode
+pnpm test:run                # Run all tests once (no watch)
+pnpm test:run src/path/to/file.test.ts  # Run a single test file
+```
+
+## Test File Organization
+
+**Location:**
+- Co-located with source files — test files live next to the code they test
+- `src/models/router.ts` → `src/models/router.test.ts`
+- `src/tools/policy.ts` → `src/tools/policy.test.ts`
+- `src/backends/native/agent.ts` → `src/backends/native/agent.test.ts`
+
+**Naming:**
+- `*.test.ts` suffix (no `.spec.ts` files exist)
+- Test file name matches source file name: `schema.test.ts` tests `schema.ts`
+
+**Statistics:**
+- 88 test files across the codebase
+- ~16,676 total lines of test code
+- 152 source (non-test) `.ts` files
+
+## Test Structure
+
+**Suite Organization:**
+```typescript
+import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
+import { ClassName } from './source-file.js';
+
+describe('ClassName', () => {
+  it('does something specific', () => {
+    // arrange, act, assert
+  });
+
+  it('handles error case', () => {
+    // ...
+  });
+});
+```
+
+**Nested Describes for Grouping:**
+```typescript
+describe('ToolPolicy', () => {
+  describe('default config (full profile)', () => {
+    it('allows all tools when profile is full', () => { ... });
+  });
+
+  describe('profile filtering', () => {
+    it('minimal profile only allows read-only tools', () => { ... });
+    it('coding profile includes file writes and shell', () => { ... });
+  });
+
+  describe('edge cases', () => {
+    it('handles empty tool list', () => { ... });
+  });
+});
+```
+
+**Naming convention for `it()` blocks:**
+- Start with verb: "returns", "creates", "handles", "uses", "fires", "respects"
+- Describe expected behavior: `'returns silent action for non-matching tools'`
+- Error cases: `'returns error for missing file'`, `'fails if old_string not found'`
+
+## Setup and Teardown
+
+**beforeEach/afterEach for test isolation:**
+```typescript
+// File system tests — create temp dir, clean up after
+let testDir: string;
+
+beforeEach(() => {
+  testDir = mkdtempSync(join(tmpdir(), 'flynn-file-test-'));
+});
+
+afterEach(() => {
+  rmSync(testDir, { recursive: true });
+});
+```
+
+**Database tests — create store, close and clean:**
+```typescript
+let store: SessionStore;
+
+beforeEach(() => {
+  store = new SessionStore(dbPath);
+});
+
+afterEach(() => {
+  store.close();
+  if (existsSync(dbPath)) {
+    unlinkSync(dbPath);
+  }
+});
+```
+
+**Mock cleanup:**
+```typescript
+beforeEach(() => {
+  vi.clearAllMocks();
+});
+```
+
+## Mocking
+
+**Framework:** Vitest built-in `vi.fn()` and `vi.mock()`
+
+**Pattern 1: Inline mock objects (most common):**
+```typescript
+const createMockClient = (): ModelClient => ({
+  chat: vi.fn().mockResolvedValue({
+    content: 'Hello!',
+    stopReason: 'end_turn',
+    usage: { inputTokens: 10, outputTokens: 5 },
+  } satisfies ChatResponse),
+});
+```
+
+**Pattern 2: Mock factory functions for reusable test doubles:**
+```typescript
+function mockMemoryStore(results: SearchResult[]): MemoryStore {
+  return {
+    search: vi.fn(() => results),
+    read: vi.fn(() => ''),
+    write: vi.fn(),
+    listNamespaces: vi.fn(() => []),
+  } as unknown as MemoryStore;
+}
+
+function mockVectorStore(results: VectorSearchResult[]): VectorStore {
+  return {
+    search: vi.fn(() => results),
+    upsertChunks: vi.fn(),
+    deleteNamespace: vi.fn(),
+  } as unknown as VectorStore;
+}
+```
+
+**Pattern 3: `vi.mock()` for module mocking:**
+```typescript
+vi.mock('./docker.js', () => ({
+  DockerSandbox: vi.fn().mockImplementation(() => ({
+    create: vi.fn().mockResolvedValue(undefined),
+    destroy: vi.fn().mockResolvedValue(undefined),
+    exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }),
+  })),
+}));
+```
+
+**Pattern 4: `vi.mock()` with hoisted shared mocks:**
+```typescript
+const mockGenerateContent = vi.fn();
+const mockGetGenerativeModel = vi.fn().mockReturnValue({
+  generateContent: mockGenerateContent,
+});
+
+vi.mock('@google/generative-ai', () => ({
+  GoogleGenerativeAI: vi.fn().mockImplementation(() => ({
+    getGenerativeModel: mockGetGenerativeModel,
+  })),
+}));
+```
+
+**Pattern 5: Sequential mock returns (multi-step interactions):**
+```typescript
+let callCount = 0;
+const mockClient: ModelClient = {
+  chat: vi.fn().mockImplementation(() => {
+    callCount++;
+    if (callCount === 1) {
+      return {
+        content: '',
+        stopReason: 'tool_use',
+        toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'hello' } }],
+        usage: { inputTokens: 10, outputTokens: 5 },
+      };
+    }
+    return {
+      content: 'The tool returned: hello',
+      stopReason: 'end_turn',
+      usage: { inputTokens: 15, outputTokens: 10 },
+    };
+  }),
+};
+```
+
+**Pattern 6: Spying on console methods:**
+```typescript
+const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
+// ... test code ...
+expect(warnSpy).toHaveBeenCalledWith(expect.stringContaining('Output channel'));
+warnSpy.mockRestore();
+```
+
+**Pattern 7: `as any` / `as unknown as T` for partial mocks:**
+```typescript
+scheduler = new CronScheduler(jobs, mockChannelRegistry as any);
+
+const mockClient = { chat: vi.fn() } as unknown as ModelClient;
+```
+
+**What to Mock:**
+- External API clients (Anthropic, OpenAI, Gemini SDKs)
+- Docker/container interactions
+- Model clients in agent/orchestrator tests
+- Channel adapters and registries in integration tests
+- `console.warn`/`console.error` when testing warning paths
+
+**What NOT to Mock:**
+- Zod schema validation — test against real schemas
+- In-memory data structures (Maps, Sets)
+- Pure functions (formatters, parsers)
+- File system when the test IS about file operations (use temp dirs instead)
+
+## Fixtures and Factories
+
+**Test Data — Helper functions over fixtures:**
+```typescript
+// Factory function for config objects
+function defaultConfig(overrides: Partial<ToolsConfig> = {}): ToolsConfig {
+  return {
+    profile: 'full',
+    allow: [],
+    deny: [],
+    agents: {},
+    providers: {},
+    ...overrides,
+  };
+}
+
+// Factory function for domain objects
+function makeTool(name: string): Tool {
+  return {
+    name,
+    description: `Mock ${name}`,
+    inputSchema: { type: 'object', properties: {} },
+    execute: async () => ({ success: true, output: '' }),
+  };
+}
+
+// Factory for test messages
+function makeMessages(count: number): Message[] {
+  const msgs: Message[] = [];
+  for (let i = 0; i < count; i++) {
+    msgs.push({
+      role: i % 2 === 0 ? 'user' : 'assistant',
+      content: `Message ${i}`,
+    });
+  }
+  return msgs;
+}
+```
+
+**Minimal configs for schema tests:**
+```typescript
+const minimalConfig = {
+  telegram: { bot_token: 'test', allowed_chat_ids: [1] },
+  models: { default: { provider: 'anthropic', model: 'claude-3' } },
+};
+```
+
+**Location:**
+- No separate fixtures directory — helper functions defined at top of each test file
+- No shared test utilities file — each test is self-contained
+
+## Coverage
+
+**Requirements:** No coverage thresholds enforced. No coverage config detected.
+
+**View Coverage:**
+```bash
+pnpm test:run -- --coverage    # If @vitest/coverage-v8 is installed
+```
+
+## Test Types
+
+**Unit Tests (majority):**
+- Test individual classes and functions in isolation
+- Mock external dependencies
+- Files: `src/models/router.test.ts`, `src/tools/policy.test.ts`, `src/hooks/engine.test.ts`
+
+**Integration Tests (some):**
+- Test real interactions between components
+- File system tools use real temp directories: `src/tools/builtin/file.test.ts`
+- Session store uses real SQLite: `src/session/store.test.ts`
+- Gateway tests spin up real WebSocket server: `src/gateway/server.test.ts`
+
+**E2E Tests:**
+- Not present — no end-to-end test framework
+
+## Common Patterns
+
+**Async Testing:**
+```typescript
+it('processes messages', async () => {
+  const response = await agent.process('Hi');
+  expect(response).toBe('Hello!');
+});
+```
+
+**Error/Rejection Testing:**
+```typescript
+it('throws when all providers fail', async () => {
+  await expect(router.chat({ messages: [{ role: 'user', content: 'Hi' }] }))
+    .rejects.toThrow('All model providers failed');
+});
+```
+
+**Zod Schema Rejection Testing:**
+```typescript
+it('rejects cron job with empty name', () => {
+  expect(() => configSchema.parse({
+    ...baseConfig,
+    automation: {
+      cron: [{ name: '', schedule: '0 9 * * *', ... }],
+    },
+  })).toThrow();
+});
+```
+
+**Testing Callback Invocation:**
+```typescript
+it('calls onToolUse callback on start and end', async () => {
+  const onToolUse = vi.fn();
+  // ... setup ...
+  await agent.process('echo hi');
+
+  expect(onToolUse).toHaveBeenCalledTimes(2);
+  expect(onToolUse).toHaveBeenNthCalledWith(1, expect.objectContaining({
+    type: 'start',
+    tool: 'test.echo',
+  }));
+});
+```
+
+**Testing with `satisfies` for type-safe mocks:**
+```typescript
+chat: vi.fn().mockResolvedValue({
+  content: 'Hello!',
+  stopReason: 'end_turn',
+  usage: { inputTokens: 10, outputTokens: 5 },
+} satisfies ChatResponse),
+```
+
+**Testing Collection Contents:**
+```typescript
+const names = result.map(t => t.name);
+expect(names).toContain('file.read');
+expect(names).not.toContain('shell.exec');
+```
+
+**Async Stream Testing:**
+```typescript
+it('streams from primary client', async () => {
+  const mockStream = async function* (): AsyncIterable<ChatStreamEvent> {
+    yield { type: 'content', content: 'Hello' };
+    yield { type: 'done', usage: { inputTokens: 5, outputTokens: 3 } };
+  };
+
+  const mockClient = {
+    chat: vi.fn(),
+    chatStream: vi.fn().mockReturnValue(mockStream()),
+  };
+
+  const chunks: string[] = [];
+  for await (const event of router.chatStream({ messages: [] })) {
+    if (event.type === 'content' && event.content) {
+      chunks.push(event.content);
+    }
+  }
+  expect(chunks).toEqual(['Hello']);
+});
+```
+
+**Integration Test with Real Server (beforeAll/afterAll):**
+```typescript
+let server: GatewayServer;
+
+beforeAll(async () => {
+  server = new GatewayServer(config);
+  await server.start();
+});
+
+afterAll(async () => {
+  await server.stop();
+});
+
+function createClient(): Promise<WebSocket> {
+  return new Promise((resolve, reject) => {
+    const ws = new WebSocket(`ws://127.0.0.1:${TEST_PORT}`);
+    ws.on('open', () => resolve(ws));
+    ws.on('error', reject);
+  });
+}
+```
+
+## Conventions Summary
+
+When writing tests for Flynn:
+
+1. **Import from vitest:** `import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';`
+2. **Import source with `.js` extension:** `import { ClassName } from './source-file.js';`
+3. **Co-locate test file** next to source file as `source-file.test.ts`
+4. **Use `describe`/`it` blocks** with descriptive behavior-focused names
+5. **Create mock factories** as functions at the top of the test file
+6. **Use `vi.fn()` for mocks**, `vi.mock()` for module mocking, `as unknown as T` for partial mocks
+7. **Clean up resources** in `afterEach`: temp dirs, database files, mock spies
+8. **Test both success and failure paths** — every feature should have at least one error test
+9. **Use helper factories** to build test data, not shared fixture files
+10. **Keep tests self-contained** — each test file should be independently understandable
+
+---
+
+*Testing analysis: 2026-02-09*