From 0180d4fb8f152c3590066c9cf9f79b52fabc1e81 Mon Sep 17 00:00:00 2001
From: William Valentin <william.valentin.info@gmail.com>
Date: Fri, 6 Feb 2026 13:17:51 -0800
Subject: [PATCH] docs: add Phase 0/1 implementation plan and feature gap
 analysis

---
 ...026-02-06-openclaw-feature-gap-analysis.md | 306 +++++++
 .../2026-02-06-p0-p1-implementation-plan.md   | 845 ++++++++++++++++++
 2 files changed, 1151 insertions(+)
 create mode 100644 docs/plans/2026-02-06-openclaw-feature-gap-analysis.md
 create mode 100644 docs/plans/2026-02-06-p0-p1-implementation-plan.md

diff --git a/docs/plans/2026-02-06-openclaw-feature-gap-analysis.md b/docs/plans/2026-02-06-openclaw-feature-gap-analysis.md
new file mode 100644
index 0000000..a8c9de2
--- /dev/null
+++ b/docs/plans/2026-02-06-openclaw-feature-gap-analysis.md
@@ -0,0 +1,306 @@
+# Flynn vs OpenClaw — Feature Gap Analysis
+
+**Date:** 2026-02-06
+**Purpose:** Comprehensive comparison of Flynn's current implementation against OpenClaw's feature set, to guide prioritisation of future work.
+
+## Legend
+
+- **MATCH** — Flynn has equivalent functionality
+- **PARTIAL** — Flynn has some implementation but incomplete
+- **MISSING** — Not implemented in Flynn
+
+---
+
+## 1. Channels / Frontends
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Telegram | grammY bot | grammY bot | **MATCH** |
+| WhatsApp | Baileys (WhatsApp Web) | -- | **MISSING** |
+| Discord | discord.js | -- | **MISSING** |
+| Slack | Bolt SDK | -- | **MISSING** |
+| Signal | signal-cli | -- | **MISSING** |
+| iMessage / BlueBubbles | imsg + BlueBubbles | -- | **MISSING** |
+| Google Chat | Chat API | -- | **MISSING** |
+| Microsoft Teams | Bot Framework | -- | **MISSING** |
+| Matrix | Extension | -- | **MISSING** |
+| Zalo / Zalo Personal | Extension | -- | **MISSING** |
+| WebChat | Gateway-served | Gateway (stub) | **PARTIAL** |
+| TUI (terminal) | `openclaw tui` | Minimal + Fullscreen (React/Ink) | **MATCH** |
+| LINE / Feishu / Mattermost | Extensions/plugins | -- | **MISSING** |
+
+Flynn has **2 of ~15 channels**. The messaging channel ecosystem is the single biggest gap.
+
+---
+
+## 2. Model Providers
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Anthropic (Claude) | Full + OAuth | Full | **MATCH** |
+| OpenAI | Full + OAuth + Codex | Full | **MATCH** |
+| Ollama (local) | Supported | Full | **MATCH** |
+| Llama.cpp (local) | Supported | Basic | **PARTIAL** |
+| Gemini / Google | Full provider | Stub only | **PARTIAL** |
+| OpenRouter | Supported | -- | **MISSING** |
+| Amazon Bedrock | Supported | -- | **MISSING** |
+| GLM / MiniMax / Moonshot | Supported | -- | **MISSING** |
+| Vercel AI Gateway | Supported | -- | **MISSING** |
+| Z.AI | Supported | -- | **MISSING** |
+| Synthetic provider | Supported | -- | **MISSING** |
+| OAuth subscription auth | Anthropic + OpenAI | API keys only | **MISSING** |
+| Model failover chains | Full (fallback + rotation) | Fallback chains | **MATCH** |
+| Model tier routing | Per-agent, per-provider | default/fast/complex/local | **MATCH** |
+| Provider-specific tool policy | Per-provider tool filtering | -- | **MISSING** |
+
+---
+
+## 3. Agent Runtime & Tools
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Tool loop with streaming | RPC mode + block streaming | Tool loop (max 10 iter) | **MATCH** |
+| `exec` / shell | Full (background, pty, timeout, elevated) | Basic (bash -c, timeout) | **PARTIAL** |
+| `read` / file read | Full (line ranges) | Full (line offset/limit) | **MATCH** |
+| `write` / file write | Full | Full (auto-mkdir) | **MATCH** |
+| `edit` / file edit | Full | Full (exact match, replace_all) | **MATCH** |
+| `apply_patch` | Multi-hunk structured patches | -- | **MISSING** |
+| `file.list` / glob | -- | Full (glob filtering) | **MATCH** |
+| `web_fetch` | Full (markdown/text extract, caching) | Basic HTTP GET | **PARTIAL** |
+| `web_search` | Brave Search API | -- | **MISSING** |
+| Browser control | Full CDP (Chromium profiles, snapshots, actions) | -- | **MISSING** |
+| Canvas / A2UI | Agent-driven visual workspace | -- | **MISSING** |
+| `process` tool | Background exec management (poll/log/write/kill) | -- | **MISSING** |
+| `image` tool | Image analysis with configurable model | -- | **MISSING** |
+| `message` tool | Cross-channel messaging + actions | -- | **MISSING** |
+| `cron` tool | Runtime cron management | -- | **MISSING** |
+| `gateway` tool | Restart/config management | -- | **MISSING** |
+| `sessions_*` tools | List/history/send/spawn across sessions | -- | **MISSING** |
+| `agents_list` tool | Sub-agent discovery | -- | **MISSING** |
+| Tool profiles | minimal/coding/messaging/full | -- | **MISSING** |
+| Tool groups | `group:fs`, `group:runtime`, etc. | -- | **MISSING** |
+| Tool allow/deny lists | Global + per-agent + per-provider | -- | **MISSING** |
+
+---
+
+## 4. Session Management
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Session persistence | JSONL files | SQLite | **MATCH** (different storage) |
+| Session isolation | Per-sender + group isolation | `{frontend}:{userId}` | **MATCH** |
+| Session transfer | Between channels | Between frontends | **MATCH** |
+| Multi-agent routing | Isolated workspaces per agent | Single backend | **MISSING** |
+| Session pruning | Tool result trimming (in-memory) | -- | **MISSING** |
+| `/new` / `/reset` | Full | Full | **MATCH** |
+| `/status` | Full (model + tokens + cost) | Full (model + confirmations) | **MATCH** |
+
+---
+
+## 5. Context Window & Compaction
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Auto-compaction | Full (summarise older history) | -- | **MISSING** |
+| Manual `/compact` | Full (with instructions) | -- | **MISSING** |
+| Pre-compaction memory flush | Silent agentic turn | -- | **MISSING** |
+| Token tracking | Full (per-response, cost) | Input/output counters | **PARTIAL** |
+
+**Critical gap** — without compaction, long conversations will hit token limits and fail.
+
+---
+
+## 6. Memory System
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Markdown memory files | `MEMORY.md` + daily logs | -- | **MISSING** |
+| `memory_search` tool | Semantic vector search | -- | **MISSING** |
+| `memory_get` tool | Read memory files | -- | **MISSING** |
+| Vector embeddings | OpenAI/Gemini/local | -- | **MISSING** |
+| Hybrid search (BM25 + vector) | Full | -- | **MISSING** |
+| Session memory indexing | Experimental | -- | **MISSING** |
+| QMD backend | Experimental | -- | **MISSING** |
+
+OpenClaw has a sophisticated memory system. Flynn has none.
+
+---
+
+## 7. MCP (Model Context Protocol)
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| MCP tool servers | Not emphasised | Full (stdio transport) | **MATCH** |
+| MCP tool bridging | Not emphasised | Full (`mcp:{server}:{tool}`) | **MATCH** |
+| MCP server lifecycle | Not emphasised | Full (start/stop/restart) | **MATCH** |
+
+Flynn actually has MCP support that OpenClaw doesn't emphasise — OpenClaw relies on its own native tool system and plugins instead.
+
+---
+
+## 8. Security & Safety
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Tool confirmation hooks | Full | Full (confirm/log/silent patterns) | **MATCH** |
+| Chat ID allowlists | Per-channel | Telegram only | **PARTIAL** |
+| DM pairing (unknown senders) | Full (pairing codes) | -- | **MISSING** |
+| Docker sandboxing | Full (per-session/agent/shared) | -- | **MISSING** |
+| Elevated mode | Host exec escape hatch | -- | **MISSING** |
+| Tool execution timeouts | Full (configurable) | 30s default | **MATCH** |
+| Output truncation | Full | 51KB | **MATCH** |
+| Gateway auth (token/password) | Full | -- | **MISSING** |
+
+---
+
+## 9. Automation & Scheduling
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Cron jobs | Full (runtime + config) | Full (YAML config) | **MATCH** |
+| Webhooks | Full (inbound triggers) | -- | **MISSING** |
+| Gmail Pub/Sub | Full | -- | **MISSING** |
+| Heartbeat | Full | -- | **MISSING** |
+
+---
+
+## 10. Apps & Companion Devices
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| macOS menu bar app | Full | -- | **MISSING** |
+| iOS node | Full (Canvas, Voice, Camera) | -- | **MISSING** |
+| Android node | Full (Canvas, Talk, Camera) | -- | **MISSING** |
+| Voice Wake / Talk Mode | Full (ElevenLabs) | -- | **MISSING** |
+| Camera / screen capture | Via nodes | -- | **MISSING** |
+| Location access | Via nodes | -- | **MISSING** |
+
+---
+
+## 11. Skills & Plugins
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Skills system | Bundled/managed/workspace | Bundled/managed/workspace | **MATCH** |
+| Skill manifest | Full | Full (requirements, versioning) | **MATCH** |
+| ClawHub registry | Community skill registry | -- | **MISSING** |
+| Plugin system | Full (register tools + CLI commands) | -- | **MISSING** |
+| Workspace prompt injection | AGENTS.md, SOUL.md, TOOLS.md | -- | **MISSING** |
+
+---
+
+## 12. Gateway & Infrastructure
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| WebSocket control plane | Full | WebSocket gateway (basic) | **PARTIAL** |
+| Control UI (web dashboard) | Full | -- | **MISSING** |
+| Tailscale Serve/Funnel | Full integration | -- | **MISSING** |
+| Remote gateway access | SSH tunnels + tailnet | -- | **MISSING** |
+| Health checks / doctor | 10+ checks | 10 checks | **MATCH** |
+| `onboard` wizard | Full guided setup | -- | **MISSING** |
+| Docker deployment | Full | -- | **MISSING** |
+| Nix deployment | Full | -- | **MISSING** |
+| Fly.io / Railway / Render | Supported | -- | **MISSING** |
+| Bonjour/mDNS discovery | Full | -- | **MISSING** |
+| Gateway lock | Full | -- | **MISSING** |
+
+---
+
+## 13. Chat Commands
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| `/status` | Full | Full | **MATCH** |
+| `/new` / `/reset` | Full | Full | **MATCH** |
+| `/compact` | Full | -- | **MISSING** |
+| `/think <level>` | Full (off to xhigh) | -- | **MISSING** |
+| `/verbose` | Full | -- | **MISSING** |
+| `/usage` | Full (off/tokens/full) | -- | **MISSING** |
+| `/local` / `/cloud` | -- | Full | Flynn-unique |
+| `/model` | -- | Full | Flynn-unique |
+
+---
+
+## 14. Miscellaneous
+
+| Feature | OpenClaw | Flynn | Status |
+|---------|----------|-------|--------|
+| Streaming & chunking | Full (per-channel limits) | Full (streaming responses) | **MATCH** |
+| Typing indicators | Full | Telegram only | **PARTIAL** |
+| Presence tracking | Full | -- | **MISSING** |
+| Usage tracking / cost | Full | Basic token counters | **PARTIAL** |
+| Markdown rendering | Per-channel formatting | Basic (TUI + Telegram) | **PARTIAL** |
+| Media pipeline | Images/audio/video/transcription | -- | **MISSING** |
+| Group chat support | Full (mention gating, routing) | -- | **MISSING** |
+| Retry policy | Full (configurable) | -- | **MISSING** |
+| System prompt templating | AGENTS.md, SOUL.md, IDENTITY.md, USER.md | -- | **MISSING** |
+
+---
+
+## Summary Scorecard
+
+| Category | Compared | Match | Partial | Missing |
+|----------|:--------:|:-----:|:-------:|:-------:|
+| Channels | 15 | 2 | 1 | 12 |
+| Model Providers | 14 | 5 | 2 | 7 |
+| Agent & Tools | 17 | 4 | 2 | 11 |
+| Sessions | 7 | 5 | 0 | 2 |
+| Context/Compaction | 4 | 0 | 1 | 3 |
+| Memory | 7 | 0 | 0 | 7 |
+| MCP | 3 | 3 | 0 | 0 |
+| Security | 8 | 3 | 1 | 4 |
+| Automation | 4 | 1 | 0 | 3 |
+| Companion Apps | 6 | 0 | 0 | 6 |
+| Skills/Plugins | 5 | 2 | 0 | 3 |
+| Gateway/Infra | 11 | 1 | 1 | 9 |
+| Chat Commands | 8 | 2 | 0 | 4 |
+| Misc | 9 | 1 | 3 | 5 |
+| **TOTAL** | **118** | **29 (25%)** | **11 (9%)** | **78 (66%)** |
+
+---
+
+## Top Priority Gaps (recommended order)
+
+### P0 — Functionally Critical
+
+1. **Context compaction** — Without this, long conversations hit token limits and break. Blocks real-world use for extended sessions.
+
+2. **Memory system** — OpenClaw's markdown-based memory with vector search gives the assistant persistent knowledge across sessions. Flynn has nothing persistent beyond session history.
+
+### P1 — High Impact
+
+3. **Messaging channels (WhatsApp, Discord, Slack)** — Flynn has 2 of 15 channels. Adding the top 3 popular channels covers the majority of use cases.
+
+4. **Web search tool** — `web_search` (Brave API) is a commonly-used agent capability Flynn lacks entirely.
+
+5. **Background exec / process management** — OpenClaw's `process` tool lets agents manage long-running commands. Flynn's shell tool is fire-and-forget.
+
+6. **Enhanced `web_fetch`** — Flynn's is basic HTTP GET; OpenClaw extracts markdown/text, caches responses, and handles JS-heavy sites via browser fallback.
+
+### P2 — Important for Production
+
+7. **Docker sandboxing** — Tool isolation for non-main sessions. Important for any multi-user or group-facing deployment.
+
+8. **Multi-agent routing** — Isolated agents per workspace/sender with sub-agent spawning.
+
+9. **Tool allow/deny and profiles** — Fine-grained control over which tools each agent/session can use.
+
+10. **System prompt templating** — AGENTS.md, SOUL.md, IDENTITY.md, USER.md workspace injection for personality and behaviour customisation.
+
+### P3 — Nice to Have
+
+11. **Browser control (CDP)** — Powerful but complex; depends on use case.
+12. **Gemini provider (full)** — Currently a stub.
+13. **Additional model providers** — OpenRouter, Bedrock, etc.
+14. **Gateway auth** — Token/password auth for the WebSocket control plane.
+15. **Companion apps** — macOS/iOS/Android nodes (huge scope, niche audience).
+
+---
+
+## What Flynn Has That OpenClaw Doesn't Emphasise
+
+- **Full MCP protocol support** with stdio transport, tool bridging, and server lifecycle management
+- **Model tier switching** via chat commands (`/local`, `/cloud`, `/model`)
+- **Gemini provider** (stub, but in the schema — OpenClaw removed non-Pi agent paths)
+- **SQLite session storage** (vs OpenClaw's JSONL files)
diff --git a/docs/plans/2026-02-06-p0-p1-implementation-plan.md b/docs/plans/2026-02-06-p0-p1-implementation-plan.md
new file mode 100644
index 0000000..ed8aec9
--- /dev/null
+++ b/docs/plans/2026-02-06-p0-p1-implementation-plan.md
@@ -0,0 +1,845 @@
+# Flynn P0 + P1 Implementation Plan
+
+**Date:** 2026-02-06
+**Scope:** 7 features from the gap analysis — the functionally critical (P0) and high-impact (P1) items.
+**Prerequisite:** [Feature Gap Analysis](./2026-02-06-openclaw-feature-gap-analysis.md)
+
+---
+
+## Feature Summary
+
+| # | Feature | Priority | Est. Effort | Dependencies |
+|---|---------|----------|-------------|--------------|
+| 0 | Multi-model sub-agent delegation | P0 | 3–4 days | None (foundational) |
+| 1 | Context compaction | P0 | 2–3 days | #0 (uses cheap model for summaries) |
+| 2 | Memory system | P0 | 3–4 days | #0, #1 |
+| 3 | Messaging channels (WhatsApp, Discord, Slack) | P1 | 2–3 days each | None |
+| 4 | Web search tool | P1 | 0.5 day | None |
+| 5 | Background exec / process management | P1 | 1–2 days | None |
+| 6 | Enhanced web_fetch | P1 | 1 day | None |
+
+**Total estimated effort:** 15–22 days
+
+---
+
+## Phase 0: Multi-Model Sub-Agent Delegation (P0 — Foundational)
+
+### Problem
+
+Flynn currently runs a **single NativeAgent per session** that talks to one model tier at a time. The `ModelRouter` (`src/models/router.ts`) supports tiers (`fast`/`default`/`complex`/`local`) and a fallback chain, but:
+
+- There is no concept of **sub-agents** — the primary agent can't spawn a cheaper model for a subtask.
+- Model selection is **per-session** (via `/model` command), not **per-task**.
+- Compaction summaries, memory extraction, and classification tasks all use the same expensive model as the main conversation — wasteful.
+- There is no orchestrator pattern where an expensive model (Opus) plans and delegates to cheaper models (Sonnet, Haiku) for execution.
+
+### Model Tier Mapping
+
+| Tier | Model | Use For |
+|------|-------|---------|
+| **complex** (orchestrator) | Claude Opus 4.6 | Planning, orchestration, complex reasoning, multi-step decisions |
+| **default** (worker) | Claude Sonnet 4.5 | General conversation, tool use, code generation, channel adapters |
+| **fast** (utility) | Claude Haiku 4.5 | Compaction summaries, memory extraction, classification, keyword extraction, formatting |
+
+This maps directly to Flynn's existing `ModelTier` type. The infrastructure is already there — what's missing is the **delegation mechanism**.
+
+### Design
+
+#### Sub-agent spawning
+
+Add the ability for `NativeAgent` to spawn **ephemeral sub-agents** that run a single task on a specific model tier and return the result:
+
+```typescript
+interface SubAgentRequest {
+  /** Which model tier to use for this subtask. */
+  tier: ModelTier;
+  /** System prompt for the sub-agent (task-specific). */
+  systemPrompt: string;
+  /** The task message. */
+  message: string;
+  /** Max tokens for the response. */
+  maxTokens?: number;
+  /** Whether to include tools. Default: false (most subtasks are pure text). */
+  tools?: boolean;
+}
+
+interface SubAgentResult {
+  content: string;
+  usage: TokenUsage;
+  tier: ModelTier;
+}
+```
+
+The sub-agent is **stateless** — no session, no history, just a single request/response. It's a thin wrapper around `modelRouter.chat()` with a specific tier.
+
+#### Where delegation happens
+
+| Task | Delegated to | Reason |
+|------|-------------|--------|
+| Compaction summary | **fast** (Haiku) | Summarisation is a well-defined extraction task; doesn't need complex reasoning |
+| Memory fact extraction | **fast** (Haiku) | Simple extraction from conversation text |
+| Message classification | **fast** (Haiku) | "Is this a command, question, or statement?" — trivial |
+| Tool result summarisation | **fast** (Haiku) | Condense verbose tool output before feeding back |
+| Primary conversation | **default** (Sonnet) | General-purpose agent work |
+| Complex planning/reasoning | **complex** (Opus) | Multi-step planning, architecture decisions, ambiguous requests |
+| Sub-agent orchestration | **complex** (Opus) | When the agent decides to break a task into subtasks |
+
+#### Automatic tier escalation
+
+Add optional **auto-escalation** where the primary agent (Sonnet) can recognise it's struggling and escalate to Opus:
+
+1. If the agent hits `maxIterations` without completing the task → escalate to `complex`.
+2. If the agent's response contains explicit uncertainty markers ("I'm not sure", "This is beyond...") → offer escalation.
+3. Configurable: `auto_escalate: true` in config.
+
+This is a **future enhancement** — start with explicit delegation points (compaction, memory extraction) and add auto-escalation later.
+
+#### AgentOrchestrator class
+
+Create a new `AgentOrchestrator` that sits between the channel message handler and the `NativeAgent`:
+
+```typescript
+class AgentOrchestrator {
+  private primaryAgent: NativeAgent;   // default tier (Sonnet)
+  private modelRouter: ModelRouter;
+
+  /** Spawn a sub-agent for a single-turn task on a specific tier. */
+  async delegate(request: SubAgentRequest): Promise<SubAgentResult>;
+
+  /** Process a user message — delegates to primary agent, which may internally delegate subtasks. */
+  async process(userMessage: string): Promise<string>;
+}
+```
+
+The orchestrator replaces the current direct `NativeAgent` usage in the message router (`src/daemon/index.ts:139-186`).
+
+#### Passing the orchestrator to tools and compaction
+
+The key insight: **compaction and memory extraction don't need a new agent class** — they just need access to `modelRouter.chat(request, 'fast')`. The orchestrator provides a `delegate()` method that any subsystem can call:
+
+```typescript
+// In compaction.ts
+const summary = await orchestrator.delegate({
+  tier: 'fast',
+  systemPrompt: COMPACTION_SYSTEM_PROMPT,
+  message: `Summarise this conversation:\n\n${messagesToCompact}`,
+  maxTokens: 1024,
+});
+
+// In memory extraction
+const facts = await orchestrator.delegate({
+  tier: 'fast',
+  systemPrompt: MEMORY_EXTRACTION_PROMPT,
+  message: `Extract key facts from:\n\n${summary}`,
+  maxTokens: 512,
+});
+```
+
+### New files
+
+| File | Purpose |
+|------|---------|
+| `src/backends/native/orchestrator.ts` | `AgentOrchestrator` — sub-agent spawning and delegation |
+| `src/backends/native/prompts.ts` | System prompts for delegated tasks (compaction, extraction, classification) |
+
+### Changes to existing files
+
+| File | Change |
+|------|--------|
+| `src/backends/native/agent.ts` | Accept optional `orchestrator` reference for internal delegation. Add `delegateSubtask()` method. |
+| `src/daemon/index.ts` | Replace direct `NativeAgent` creation in `createMessageRouter()` with `AgentOrchestrator`. |
+| `src/config/schema.ts` | Add `agents` config block for tier assignment and delegation policy. |
+| `src/models/router.ts` | No changes needed — already supports `chat(request, tier)`. |
+
+### Config additions
+
+```yaml
+agents:
+  primary_tier: default              # Model tier for main conversation (Sonnet)
+  delegation:
+    compaction: fast                  # Tier for compaction summaries (Haiku)
+    memory_extraction: fast           # Tier for memory fact extraction (Haiku)
+    classification: fast              # Tier for message classification (Haiku)
+    tool_summarisation: fast          # Tier for condensing tool output (Haiku)
+    complex_reasoning: complex        # Tier for escalated reasoning (Opus)
+  auto_escalate: false               # Future: auto-escalate on failure
+  max_delegation_depth: 3            # Prevent infinite delegation chains
+```
+
+### Implementation steps
+
+1. Create `src/backends/native/orchestrator.ts`:
+   - Constructor takes `ModelRouter`, `systemPrompt`, `session`, `toolRegistry`, `toolExecutor`, delegation config.
+   - `delegate(request: SubAgentRequest): Promise<SubAgentResult>` — single-turn call to `modelRouter.chat()` with specified tier.
+   - `process(userMessage: string): Promise<string>` — delegates to internal `NativeAgent`.
+   - Tracks delegation depth to prevent loops.
+   - Logs tier usage for cost visibility.
+2. Create `src/backends/native/prompts.ts` with task-specific system prompts.
+3. Update `createMessageRouter()` in `src/daemon/index.ts` to use `AgentOrchestrator` instead of raw `NativeAgent`.
+4. Add `agents` config block to schema.
+5. Wire delegation config through to compaction (Phase 1) and memory (Phase 2).
+6. Tests: delegation routing, tier selection, depth limiting.
+
+### Cost implications
+
+| Operation | Without delegation | With delegation |
+|-----------|-------------------|-----------------|
+| Compaction summary | Opus/Sonnet ($$$) | Haiku ($) |
+| Memory extraction | Opus/Sonnet ($$$) | Haiku ($) |
+| 10 classifications | Opus/Sonnet ($$$) | Haiku ($) |
+| Complex reasoning | Sonnet ($$) | Opus ($$$) — but only when needed |
+
+Net effect: **significant cost reduction** for background tasks, with targeted spend on complex reasoning only when it matters.
+
+---
+
+## Phase 1: Context Compaction (P0)
+
+### Problem
+
+Flynn sends the **entire session history** to the model on every turn. There is no summarisation, trimming, or token budgeting. Once a conversation exceeds the model's context window, it fails hard.
+
+**Current flow** (`src/backends/native/agent.ts:92-165`):
+```
+toolLoop() → loopMessages = full this.history → send to model
+```
+
+The `SessionStore` (`src/session/store.ts`) and `ManagedSession` (`src/session/manager.ts`) store every message verbatim and replay them all on load.
+
+### Design
+
+#### Token counting
+
+Add a `tokenCount` utility that estimates token counts per message. Two strategies:
+
+1. **Cheap estimate** — character-based heuristic (`chars / 4` for English). Good enough for budgeting.
+2. **Accurate count** — use the Anthropic SDK's `count_tokens` or `tiktoken` for OpenAI. Only needed if we want precise billing.
+
+Start with the cheap estimate; add accurate counting later behind a flag.
+
+#### Compaction strategy
+
+Use a **summarise-and-replace** approach (same as OpenClaw):
+
+1. When total estimated tokens exceed a **compaction threshold** (configurable, default: 80% of model's context window), trigger compaction.
+2. Take all messages **except the last N turns** (configurable, default: 4 turns).
+3. **Delegate** the summarisation request to the **fast tier (Haiku)** via `orchestrator.delegate()`: "Summarise this conversation so far, preserving key facts, decisions, and context." This is a well-defined extraction task that doesn't need complex reasoning.
+4. Replace the older messages with a single `[system_summary]` message.
+5. Persist the compacted history to SQLite (replace the old messages).
+
+#### Where compaction runs
+
+Compaction is a concern of `AgentOrchestrator` (Phase 0), not the session store. The orchestrator decides when to compact based on the model it's using, and delegates the summary generation to the **fast** tier via `orchestrator.delegate({ tier: 'fast', ... })`.
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/context/tokens.ts` | Token estimation utilities |
+| `src/context/compaction.ts` | Compaction logic (summarise + replace) |
+
+#### Changes to existing files
+
+| File | Change |
+|------|--------|
+| `src/backends/native/agent.ts` | Add `compactIfNeeded()` call before building `loopMessages`. Add compaction config to `NativeAgentConfig`. |
+| `src/session/manager.ts` | Add `ManagedSession.replaceHistory(messages)` method for compaction to persist the compacted state. |
+| `src/session/store.ts` | Add `replaceMessages(sessionId, messages)` — atomic delete + re-insert in a transaction. |
+| `src/models/types.ts` | Add optional `contextWindow` field to `ChatResponse` or create a `ModelCapabilities` type. |
+| `src/config/schema.ts` | Add `compaction` config block: `{ enabled, threshold_pct, keep_turns, summary_model? }`. |
+| `src/daemon/index.ts` | Pass compaction config to agent creation. |
+
+#### Config additions
+
+```yaml
+compaction:
+  enabled: true
+  threshold_pct: 80          # Trigger at 80% of context window
+  keep_turns: 4              # Always keep the last 4 exchanges
+  # summary_tier is configured in agents.delegation.compaction (default: fast/Haiku)
+```
+
+#### Chat commands
+
+| Command | Description |
+|---------|-------------|
+| `/compact` | Force compaction of the current session immediately. |
+
+#### Implementation steps
+
+1. Create `src/context/tokens.ts` with `estimateTokens(text: string): number` and `estimateMessageTokens(messages: Message[]): number`.
+2. Create `src/context/compaction.ts` with `compactHistory(opts: CompactionOpts): Promise<Message[]>`:
+   - Takes messages, orchestrator (for delegation), keep_turns.
+   - Calls `orchestrator.delegate({ tier: 'fast', ... })` for the summary.
+   - Returns `[summaryMessage, ...recentMessages]`.
+3. Add `replaceMessages()` to `SessionStore`.
+4. Add `replaceHistory()` to `ManagedSession`.
+5. Add compaction config to schema.
+6. Wire `compactIfNeeded()` into `AgentOrchestrator.process()` — called before building the request, checks token budget.
+7. Add `/compact` command handling in the message router.
+8. Tests: token estimation accuracy, compaction trigger logic, history replacement, delegation to fast tier.
+
+#### Model context window sizes
+
+Hard-code a lookup table in `src/context/tokens.ts`:
+
+```typescript
+const CONTEXT_WINDOWS: Record<string, number> = {
+  'claude-sonnet-4-20250514': 200_000,
+  'claude-3-5-haiku-20241022': 200_000,
+  'gpt-4o': 128_000,
+  'gpt-4o-mini': 128_000,
+  // ... etc
+};
+```
+
+Allow override in config: `models.default.context_window: 128000`.
+
+---
+
+## Phase 2: Memory System (P0)
+
+### Problem
+
+Flynn has no persistent knowledge across sessions. Every new session starts blank. The agent can't remember user preferences, past decisions, or accumulated knowledge.
+
+### Design
+
+A lightweight memory system with three layers:
+
+1. **Memory files** — Markdown files that the agent can read/write (like OpenClaw's `MEMORY.md`).
+2. **Memory tools** — `memory.read`, `memory.write`, `memory.search` builtin tools.
+3. **Auto-indexing** — After compaction, key facts are extracted and appended to memory.
+
+#### Storage
+
+Use a dedicated SQLite table in the existing `sessions.db` (or a separate `memory.db`):
+
+```sql
+CREATE TABLE memory_entries (
+  id INTEGER PRIMARY KEY AUTOINCREMENT,
+  session_id TEXT,           -- NULL for global memories
+  namespace TEXT NOT NULL,   -- 'user', 'facts', 'preferences', etc.
+  key TEXT NOT NULL,
+  content TEXT NOT NULL,
+  embedding BLOB,            -- Future: vector embedding for search
+  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
+  updated_at INTEGER NOT NULL DEFAULT (unixepoch())
+);
+CREATE INDEX idx_memory_ns ON memory_entries(namespace);
+CREATE INDEX idx_memory_session ON memory_entries(session_id);
+```
+
+#### Phase 2a: File-based memory (MVP)
+
+The simplest useful memory: a markdown file per namespace in `~/.local/share/flynn/memory/`.
+
+```
+~/.local/share/flynn/memory/
+├── global.md          # Cross-session knowledge
+├── user.md            # User preferences, facts about the user
+└── sessions/
+    └── {session_id}.md  # Per-session notes
+```
+
+#### Memory tools
+
+| Tool | Description |
+|------|-------------|
+| `memory.read` | Read a memory file by namespace. Args: `{ namespace: string }` |
+| `memory.write` | Append to or replace a memory file. Args: `{ namespace: string, content: string, mode: 'append' \| 'replace' }` |
+| `memory.search` | Search across all memory files for a keyword. Args: `{ query: string }`. Returns matching lines with context. |
+
+#### Phase 2b: Vector search (future)
+
+Defer vector embeddings and semantic search to a later phase. The file-based approach with keyword search covers 80% of use cases.
+
+When implemented:
+- Add `sqlite-vec` or similar for vector storage
+- Embed memory entries on write using the configured model's embedding API
+- Hybrid search: keyword (BM25) + vector similarity
+
+#### System prompt integration
+
+On every agent turn, inject a `[Memory Context]` section into the system prompt:
+
+```
+# Memory Context
+
+The following is your persistent memory. Use it to maintain continuity across sessions.
+
+## User
+{contents of user.md, truncated to ~1000 tokens}
+
+## Global
+{contents of global.md, truncated to ~1000 tokens}
+```
+
+This is injected dynamically by the agent before each request, not baked into the static system prompt.
+
+#### Auto-extraction after compaction
+
+When compaction runs (Phase 1), add a follow-up step using the **fast tier (Haiku)** via `orchestrator.delegate()`:
+
+1. Along with the summary, delegate to Haiku to extract any **new facts worth remembering** (user preferences, decisions, names, etc.). This is a simple extraction task — no need for Sonnet/Opus.
+2. Append extracted facts to `user.md` or `global.md`.
+
+This creates a natural knowledge accumulation loop: conversation → compaction (Haiku) → memory extraction (Haiku) → next session gets richer context.
+
+The cost of these background operations is minimal since they run on the cheapest model tier.
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/memory/store.ts` | MemoryStore class — read/write/search markdown files |
+| `src/memory/index.ts` | Exports |
+| `src/tools/builtin/memory-read.ts` | `memory.read` tool |
+| `src/tools/builtin/memory-write.ts` | `memory.write` tool |
+| `src/tools/builtin/memory-search.ts` | `memory.search` tool |
+
+#### Changes to existing files
+
+| File | Change |
+|------|--------|
+| `src/tools/builtin/index.ts` | Register memory tools in `allBuiltinTools` |
+| `src/backends/native/orchestrator.ts` | Inject memory context into system prompt before each request |
+| `src/context/compaction.ts` | Add memory extraction step after summarisation (delegates to fast tier) |
+| `src/daemon/index.ts` | Initialize MemoryStore, pass to orchestrator config |
+| `src/config/schema.ts` | Add `memory` config block: `{ enabled, dir, namespaces, auto_extract }` |
+
+#### Config additions
+
+```yaml
+memory:
+  enabled: true
+  dir: ~/.local/share/flynn/memory
+  auto_extract: true         # Extract facts during compaction
+  max_context_tokens: 2000   # Max tokens injected per turn from memory
+```
+
+#### Implementation steps
+
+1. Create `src/memory/store.ts`:
+   - `read(namespace): string` — read file contents
+   - `write(namespace, content, mode): void` — append or replace
+   - `search(query): SearchResult[]` — line-by-line keyword match with context
+   - `listNamespaces(): string[]`
+2. Create memory tools (3 files).
+3. Register tools.
+4. Add memory context injection to `NativeAgent` — load memory before building the request, inject into system prompt.
+5. Add memory extraction to compaction flow.
+6. Tests: memory CRUD, search, injection, extraction.
+
+---
+
+## Phase 3: Messaging Channels (P1)
+
+### Problem
+
+Flynn has only Telegram and WebChat. The three most requested channels are WhatsApp, Discord, and Slack.
+
+### Design approach
+
+Flynn's `ChannelAdapter` interface (`src/channels/types.ts:51-69`) is clean and well-defined. Adding a new channel means:
+
+1. Implement `ChannelAdapter` (5 methods: `name`, `status`, `connect()`, `disconnect()`, `send()`, `onMessage()`).
+2. Add config section.
+3. Register in daemon startup.
+
+Each channel is independent — implement in any order.
+
+### 3a: Discord
+
+**Library:** `discord.js` v14
+**Effort:** 1–2 days
+
+#### Config
+
+```yaml
+discord:
+  bot_token: ${DISCORD_BOT_TOKEN}
+  allowed_guild_ids: []      # Empty = all guilds
+  allowed_channel_ids: []    # Empty = all channels
+```
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/channels/discord/adapter.ts` | DiscordAdapter implementing ChannelAdapter |
+| `src/channels/discord/index.ts` | Exports |
+
+#### Key decisions
+
+- **Peer ID:** Use `channelId` (not `userId`) so the agent maintains separate sessions per Discord channel.
+- **Message chunking:** Discord has a 2000-char limit. Chunk long responses.
+- **Mentions:** Only respond when mentioned (`@Flynn`) or in DMs. Configurable.
+- **Slash commands:** Register `/reset` and `/status` as Discord slash commands.
+
+#### Implementation steps
+
+1. Add `discord.js` dependency.
+2. Create `DiscordAdapter` class.
+3. Add config schema for `discord` section.
+4. Register in daemon if `config.discord.bot_token` is set.
+5. Export from `src/channels/index.ts`.
+6. Test with a bot in a private server.
+
+### 3b: Slack
+
+**Library:** `@slack/bolt` (Bolt for JavaScript)
+**Effort:** 1–2 days
+
+#### Config
+
+```yaml
+slack:
+  bot_token: ${SLACK_BOT_TOKEN}
+  app_token: ${SLACK_APP_TOKEN}   # For Socket Mode
+  signing_secret: ${SLACK_SIGNING_SECRET}
+  allowed_channel_ids: []
+```
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/channels/slack/adapter.ts` | SlackAdapter implementing ChannelAdapter |
+| `src/channels/slack/index.ts` | Exports |
+
+#### Key decisions
+
+- **Socket Mode** for self-hosted deployments (no public URL needed). Falls back to HTTP events if `app_token` not set.
+- **Peer ID:** `channelId:threadTs` to isolate threaded conversations.
+- **Message chunking:** Slack has a 40,000-char limit with blocks. Use `mrkdwn` formatting.
+- **Slash commands:** `/flynn-reset`, `/flynn-status`.
+
+### 3c: WhatsApp
+
+**Library:** `whatsapp-web.js` (or `@whiskeysockets/baileys` for full WhatsApp Web protocol)
+**Effort:** 2–3 days (more complex due to QR auth)
+
+#### Config
+
+```yaml
+whatsapp:
+  auth_dir: ~/.local/share/flynn/whatsapp-auth
+  allowed_numbers: []        # E.164 format, empty = all
+```
+
+#### Key decisions
+
+- **Auth flow:** WhatsApp Web requires QR code scanning on first connect. Display QR in terminal on startup.
+- **Session persistence:** Store auth state in `auth_dir` so re-auth isn't needed on restart.
+- **Peer ID:** Phone number (E.164).
+- **Media:** Start with text-only; defer image/audio handling.
+
+**WhatsApp is the most complex channel.** Consider doing Discord and Slack first, then WhatsApp.
+
+### Shared channel infrastructure
+
+Before implementing individual channels, extract any common patterns:
+
+1. **Message chunking utility** — `src/channels/utils/chunking.ts`: `chunkMessage(text: string, maxLen: number): string[]`
+2. **Allowlist checking** — `src/channels/utils/auth.ts`: `isAllowed(senderId: string, allowlist: string[]): boolean`
+3. **Markdown adaptation** — `src/channels/utils/markdown.ts`: Platform-specific markdown conversion (Discord uses different syntax from Telegram).
+
+---
+
+## Phase 4: Web Search Tool (P1)
+
+### Problem
+
+The agent has no way to search the web. This is one of the most commonly-used agent tools.
+
+### Design
+
+#### Provider options
+
+| Provider | Pros | Cons |
+|----------|------|------|
+| **Brave Search API** | Free tier (2k/month), clean API, good results | Requires API key signup |
+| **SearXNG** | Self-hosted, no API key, already running in homelab | Results quality varies |
+| **Tavily** | Purpose-built for AI agents, great results | Paid only |
+| **DuckDuckGo** | No API key needed | Unofficial API, rate limits |
+
+**Recommendation:** Support Brave as primary, SearXNG as self-hosted alternative. Make the provider configurable.
+
+#### Config
+
+```yaml
+tools:
+  web_search:
+    provider: brave           # brave | searxng | tavily
+    api_key: ${BRAVE_SEARCH_API_KEY}
+    endpoint: null            # Override for SearXNG: http://searxng:8080
+    max_results: 5
+```
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/tools/builtin/web-search.ts` | `web.search` tool |
+
+#### Tool interface
+
+```typescript
+{
+  name: 'web.search',
+  description: 'Search the web for information. Returns titles, URLs, and snippets.',
+  inputSchema: {
+    type: 'object',
+    properties: {
+      query: { type: 'string', description: 'Search query' },
+      count: { type: 'number', description: 'Number of results (default 5, max 20)' },
+    },
+    required: ['query'],
+  },
+}
+```
+
+#### Output format
+
+```
+1. **Title** — url
+   Snippet text...
+
+2. **Title** — url
+   Snippet text...
+```
+
+Structured as markdown so the model can easily parse and reference results.
+
+#### Implementation steps
+
+1. Create `src/tools/builtin/web-search.ts`.
+2. Add Brave Search API client (simple `fetch` — no SDK needed).
+3. Add SearXNG support as alternative backend.
+4. Add tool config section to schema.
+5. Register in `allBuiltinTools`.
+6. Tests: mock API responses, result formatting.
+
+---
+
+## Phase 5: Background Exec / Process Management (P1)
+
+### Problem
+
+Flynn's `shell.exec` (`src/tools/builtin/shell.ts`) is fire-and-forget: it runs a command, waits for it to finish (up to 30s timeout), and returns stdout/stderr. There's no way to:
+
+- Run a long-running process (e.g., `npm run dev`)
+- Check on a running process
+- Read its ongoing output
+- Kill it
+
+### Design
+
+Add a `process` tool family that manages background processes:
+
+| Tool | Description |
+|------|-------------|
+| `process.start` | Start a command in the background. Returns a process ID. |
+| `process.status` | Check if a process is running, exited, or errored. |
+| `process.output` | Read recent stdout/stderr from a background process. |
+| `process.kill` | Kill a background process. |
+| `process.list` | List all managed background processes. |
+
+#### Process manager
+
+Create a `ProcessManager` class that maintains a registry of spawned processes:
+
+```typescript
+interface ManagedProcess {
+  id: string;
+  command: string;
+  cwd?: string;
+  pid: number;
+  status: 'running' | 'exited' | 'killed' | 'error';
+  exitCode?: number;
+  outputBuffer: RingBuffer;  // Last N bytes of combined stdout+stderr
+  startedAt: number;
+}
+```
+
+#### Output buffering
+
+Use a ring buffer (circular buffer) to keep the last 64KB of output per process. This prevents memory leaks from long-running processes with verbose output.
+
+#### Safety
+
+- **Max processes:** Limit to 10 concurrent background processes.
+- **Auto-cleanup:** Kill processes that have been running for more than 1 hour (configurable).
+- **Shutdown cleanup:** Kill all managed processes on daemon shutdown.
+- **Hook integration:** `process.start` should go through the confirmation engine (same as `shell.exec`).
+
+#### New files
+
+| File | Purpose |
+|------|---------|
+| `src/tools/builtin/process/manager.ts` | ProcessManager class |
+| `src/tools/builtin/process/start.ts` | `process.start` tool |
+| `src/tools/builtin/process/status.ts` | `process.status` tool |
+| `src/tools/builtin/process/output.ts` | `process.output` tool |
+| `src/tools/builtin/process/kill.ts` | `process.kill` tool |
+| `src/tools/builtin/process/list.ts` | `process.list` tool |
+| `src/tools/builtin/process/index.ts` | Exports |
+
+#### Changes to existing files
+
+| File | Change |
+|------|--------|
+| `src/tools/builtin/index.ts` | Register process tools |
+| `src/daemon/index.ts` | Create ProcessManager, pass to tool constructors, register shutdown handler |
+| `src/config/schema.ts` | Add `process` config: `{ max_concurrent, max_runtime_minutes, buffer_size }` |
+
+#### Implementation steps
+
+1. Implement `RingBuffer` utility (or use an npm package like `ringbufferjs`).
+2. Create `ProcessManager` class with spawn, track, kill, cleanup methods.
+3. Implement 5 process tools.
+4. Register tools and wire shutdown cleanup.
+5. Tests: spawn + kill lifecycle, output buffering, max process limits.
+
+---
+
+## Phase 6: Enhanced web_fetch (P1)
+
+### Problem
+
+Flynn's `web.fetch` (`src/tools/builtin/web-fetch.ts:19-50`) is a bare `fetch()` call that returns raw HTML. This is nearly useless for LLMs — they need extracted text/markdown, not raw HTML with scripts and styles.
+
+### Design
+
+#### Enhancements
+
+1. **HTML-to-markdown extraction** — Strip scripts/styles, convert to markdown using `@mozilla/readability` + `turndown`.
+2. **Format parameter** — Let the agent choose: `text`, `markdown` (default), or `html`.
+3. **Response caching** — Cache fetched pages for 5 minutes to avoid redundant requests in tool loops.
+4. **Redirect following** — Already handled by `fetch()`, but add a max redirect limit.
+5. **Content type handling** — Return JSON prettified, plain text as-is, HTML converted.
+
+#### Libraries
+
+| Package | Purpose |
+|---------|---------|
+| `turndown` | HTML → Markdown converter |
+| `linkedom` | Lightweight DOM implementation (for Readability) |
+| `@mozilla/readability` | Extract article content from HTML |
+
+Using `linkedom` instead of `jsdom` — it's much lighter and sufficient for content extraction.
+
+#### Tool interface update
+
+```typescript
+{
+  name: 'web.fetch',
+  description: 'Fetch a URL and extract its content. Returns clean text/markdown by default, not raw HTML.',
+  inputSchema: {
+    type: 'object',
+    properties: {
+      url: { type: 'string', description: 'The URL to fetch' },
+      format: { type: 'string', enum: ['markdown', 'text', 'html'], description: 'Output format (default: markdown)' },
+      timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
+    },
+    required: ['url'],
+  },
+}
+```
+
+#### Caching
+
+Simple in-memory cache with TTL:
+
+```typescript
+const cache = new Map<string, { content: string; timestamp: number }>();
+const CACHE_TTL = 5 * 60 * 1000; // 5 minutes
+```
+
+#### Changes to existing files
+
+| File | Change |
+|------|--------|
+| `src/tools/builtin/web-fetch.ts` | Major rewrite — add extraction, caching, format parameter |
+
+#### Implementation steps
+
+1. Add `turndown`, `linkedom`, `@mozilla/readability` dependencies.
+2. Create extraction pipeline: fetch → parse DOM → readability → turndown → clean markdown.
+3. Add format parameter handling.
+4. Add response caching.
+5. Update tool description to reflect new capabilities.
+6. Tests: extraction from sample HTML, caching behaviour, format handling.
+
+---
+
+## Implementation Order
+
+```
+Week 1:  Phase 0 (Multi-Model Delegation) ─────────────────────── P0 (foundational)
+Week 2:  Phase 1 (Context Compaction) ─────────────────────────── P0 (uses delegation)
+Week 3:  Phase 2 (Memory System) ──────────────────────────────── P0 (uses delegation)
+Week 4:  Phase 4 (Web Search) + Phase 6 (Enhanced web_fetch) ─── P1 (quick wins)
+Week 5:  Phase 5 (Process Management) ─────────────────────────── P1
+Week 6+: Phase 3 (Channels: Discord → Slack → WhatsApp) ──────── P1
+```
+
+**Rationale:**
+- **Delegation first** — Phase 0 is foundational. Compaction and memory both need to delegate subtasks to cheaper models. Building the orchestrator first means Phase 1 and 2 can use it immediately.
+- Compaction and memory are sequential (memory extraction depends on compaction).
+- Web search and enhanced web_fetch are small, independent, and immediately useful — do them as palate cleansers between the big features.
+- Process management is self-contained.
+- Channels are the largest body of work but each is independent — can be done in parallel or interleaved.
+
+### Model usage across all phases
+
+| Phase | Primary model (user-facing) | Delegated tasks | Delegation tier |
+|-------|---------------------------|-----------------|-----------------|
+| 0 | Sonnet (default) | Sub-agent infrastructure | N/A (infrastructure) |
+| 1 | Sonnet (default) | Compaction summaries | Haiku (fast) |
+| 2 | Sonnet (default) | Memory fact extraction | Haiku (fast) |
+| 3 | Sonnet (default) | Message classification, markdown adaptation | Haiku (fast) |
+| 4 | Sonnet (default) | None (direct API call) | N/A |
+| 5 | Sonnet (default) | None | N/A |
+| 6 | Sonnet (default) | None | N/A |
+
+Opus (complex) is reserved for **user-facing tasks** that require deep reasoning — it's never used for background operations.
+
+---
+
+## Testing Strategy
+
+Each phase should include:
+
+1. **Unit tests** — Pure logic (token estimation, ring buffer, markdown extraction, memory search).
+2. **Integration tests** — Tool execution with mocked model responses.
+3. **Manual smoke test** — Run via TUI and Telegram to verify end-to-end.
+
+Key test files to create:
+
+| Test file | Covers |
+|-----------|--------|
+| `src/backends/native/orchestrator.test.ts` | Delegation routing, tier selection, depth limiting, cost tracking |
+| `src/context/tokens.test.ts` | Token estimation accuracy |
+| `src/context/compaction.test.ts` | Compaction trigger logic, summary replacement, fast-tier delegation |
+| `src/memory/store.test.ts` | Memory CRUD, search |
+| `src/tools/builtin/web-search.test.ts` | API mocking, result formatting |
+| `src/tools/builtin/process/manager.test.ts` | Process lifecycle, cleanup |
+| `src/tools/builtin/web-fetch.test.ts` | HTML extraction, caching |
+
+---
+
+## Risk Assessment
+
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| Haiku summaries lose critical context vs Sonnet | High | Validate quality; use detailed extraction prompts; allow per-task tier override in config |
+| Delegation depth spirals (agent delegates to agent that delegates...) | Medium | Hard limit `max_delegation_depth: 3`; sub-agents cannot spawn sub-agents |
+| Fast tier unavailable (Haiku rate limit / outage) | Medium | Fallback to default tier for delegation; log the fallback cost increase |
+| Compaction summaries lose critical context | High | Keep last 4 turns intact; allow user to adjust `keep_turns`; log what was compacted |
+| Memory injection bloats system prompt | Medium | Hard cap on injected memory tokens; truncate oldest entries |
+| WhatsApp auth flow is fragile | Medium | Defer WhatsApp to last; use battle-tested Baileys library |
+| Brave Search free tier limits (2k/month) | Low | SearXNG as free self-hosted fallback |
+| Background processes leak resources | Medium | Max process limit, auto-kill timeout, shutdown cleanup |
+| HTML extraction fails on JS-heavy sites | Low | Accept graceful degradation; defer CDP/browser fallback to P3 |