will/flynn

Files

T

William Valentin 0180d4fb8f docs: add Phase 0/1 implementation plan and feature gap analysis

2026-02-06 13:17:51 -08:00

33 KiB

Raw Blame History

Flynn P0 + P1 Implementation Plan

Date: 2026-02-06 Scope: 7 features from the gap analysis — the functionally critical (P0) and high-impact (P1) items. Prerequisite: Feature Gap Analysis

Feature Summary

#	Feature	Priority	Est. Effort	Dependencies
0	Multi-model sub-agent delegation	P0	3–4 days	None (foundational)
1	Context compaction	P0	2–3 days	#0 (uses cheap model for summaries)
2	Memory system	P0	3–4 days	#0, #1
3	Messaging channels (WhatsApp, Discord, Slack)	P1	2–3 days each	None
4	Web search tool	P1	0.5 day	None
5	Background exec / process management	P1	1–2 days	None
6	Enhanced web_fetch	P1	1 day	None

Total estimated effort: 15–22 days

Phase 0: Multi-Model Sub-Agent Delegation (P0 — Foundational)

Problem

Flynn currently runs a single NativeAgent per session that talks to one model tier at a time. The ModelRouter (src/models/router.ts) supports tiers (fast/default/complex/local) and a fallback chain, but:

There is no concept of sub-agents — the primary agent can't spawn a cheaper model for a subtask.
Model selection is per-session (via /model command), not per-task.
Compaction summaries, memory extraction, and classification tasks all use the same expensive model as the main conversation — wasteful.
There is no orchestrator pattern where an expensive model (Opus) plans and delegates to cheaper models (Sonnet, Haiku) for execution.

Model Tier Mapping

Tier	Model	Use For
complex (orchestrator)	Claude Opus 4.6	Planning, orchestration, complex reasoning, multi-step decisions
default (worker)	Claude Sonnet 4.5	General conversation, tool use, code generation, channel adapters
fast (utility)	Claude Haiku 4.5	Compaction summaries, memory extraction, classification, keyword extraction, formatting

This maps directly to Flynn's existing ModelTier type. The infrastructure is already there — what's missing is the delegation mechanism.

Design

Sub-agent spawning

Add the ability for NativeAgent to spawn ephemeral sub-agents that run a single task on a specific model tier and return the result:

interface SubAgentRequest {
  /** Which model tier to use for this subtask. */
  tier: ModelTier;
  /** System prompt for the sub-agent (task-specific). */
  systemPrompt: string;
  /** The task message. */
  message: string;
  /** Max tokens for the response. */
  maxTokens?: number;
  /** Whether to include tools. Default: false (most subtasks are pure text). */
  tools?: boolean;
}

interface SubAgentResult {
  content: string;
  usage: TokenUsage;
  tier: ModelTier;
}

The sub-agent is stateless — no session, no history, just a single request/response. It's a thin wrapper around modelRouter.chat() with a specific tier.

Where delegation happens

Task	Delegated to	Reason
Compaction summary	fast (Haiku)	Summarisation is a well-defined extraction task; doesn't need complex reasoning
Memory fact extraction	fast (Haiku)	Simple extraction from conversation text
Message classification	fast (Haiku)	"Is this a command, question, or statement?" — trivial
Tool result summarisation	fast (Haiku)	Condense verbose tool output before feeding back
Primary conversation	default (Sonnet)	General-purpose agent work
Complex planning/reasoning	complex (Opus)	Multi-step planning, architecture decisions, ambiguous requests
Sub-agent orchestration	complex (Opus)	When the agent decides to break a task into subtasks

Automatic tier escalation

Add optional auto-escalation where the primary agent (Sonnet) can recognise it's struggling and escalate to Opus:

If the agent hits maxIterations without completing the task → escalate to complex.
If the agent's response contains explicit uncertainty markers ("I'm not sure", "This is beyond...") → offer escalation.
Configurable: auto_escalate: true in config.

This is a future enhancement — start with explicit delegation points (compaction, memory extraction) and add auto-escalation later.

AgentOrchestrator class

Create a new AgentOrchestrator that sits between the channel message handler and the NativeAgent:

class AgentOrchestrator {
  private primaryAgent: NativeAgent;   // default tier (Sonnet)
  private modelRouter: ModelRouter;

  /** Spawn a sub-agent for a single-turn task on a specific tier. */
  async delegate(request: SubAgentRequest): Promise<SubAgentResult>;

  /** Process a user message — delegates to primary agent, which may internally delegate subtasks. */
  async process(userMessage: string): Promise<string>;
}

The orchestrator replaces the current direct NativeAgent usage in the message router (src/daemon/index.ts:139-186).

Passing the orchestrator to tools and compaction

The key insight: compaction and memory extraction don't need a new agent class — they just need access to modelRouter.chat(request, 'fast'). The orchestrator provides a delegate() method that any subsystem can call:

// In compaction.ts
const summary = await orchestrator.delegate({
  tier: 'fast',
  systemPrompt: COMPACTION_SYSTEM_PROMPT,
  message: `Summarise this conversation:\n\n${messagesToCompact}`,
  maxTokens: 1024,
});

// In memory extraction
const facts = await orchestrator.delegate({
  tier: 'fast',
  systemPrompt: MEMORY_EXTRACTION_PROMPT,
  message: `Extract key facts from:\n\n${summary}`,
  maxTokens: 512,
});

New files

File	Purpose
`src/backends/native/orchestrator.ts`	`AgentOrchestrator` — sub-agent spawning and delegation
`src/backends/native/prompts.ts`	System prompts for delegated tasks (compaction, extraction, classification)

Changes to existing files

File	Change
`src/backends/native/agent.ts`	Accept optional `orchestrator` reference for internal delegation. Add `delegateSubtask()` method.
`src/daemon/index.ts`	Replace direct `NativeAgent` creation in `createMessageRouter()` with `AgentOrchestrator`.
`src/config/schema.ts`	Add `agents` config block for tier assignment and delegation policy.
`src/models/router.ts`	No changes needed — already supports `chat(request, tier)`.

Config additions

agents:
  primary_tier: default              # Model tier for main conversation (Sonnet)
  delegation:
    compaction: fast                  # Tier for compaction summaries (Haiku)
    memory_extraction: fast           # Tier for memory fact extraction (Haiku)
    classification: fast              # Tier for message classification (Haiku)
    tool_summarisation: fast          # Tier for condensing tool output (Haiku)
    complex_reasoning: complex        # Tier for escalated reasoning (Opus)
  auto_escalate: false               # Future: auto-escalate on failure
  max_delegation_depth: 3            # Prevent infinite delegation chains

Implementation steps

Create src/backends/native/orchestrator.ts:
- Constructor takes ModelRouter, systemPrompt, session, toolRegistry, toolExecutor, delegation config.
- delegate(request: SubAgentRequest): Promise<SubAgentResult> — single-turn call to modelRouter.chat() with specified tier.
- process(userMessage: string): Promise<string> — delegates to internal NativeAgent.
- Tracks delegation depth to prevent loops.
- Logs tier usage for cost visibility.
Create src/backends/native/prompts.ts with task-specific system prompts.
Update createMessageRouter() in src/daemon/index.ts to use AgentOrchestrator instead of raw NativeAgent.
Add agents config block to schema.
Wire delegation config through to compaction (Phase 1) and memory (Phase 2).
Tests: delegation routing, tier selection, depth limiting.

Cost implications

Operation	Without delegation	With delegation
Compaction summary	Opus/Sonnet ($$$)	Haiku ($)
Memory extraction	Opus/Sonnet ($$$)	Haiku ($)
10 classifications	Opus/Sonnet ($$$)	Haiku ($)
Complex reasoning	Sonnet ($$)	Opus ($$$) — but only when needed

Net effect: significant cost reduction for background tasks, with targeted spend on complex reasoning only when it matters.

Phase 1: Context Compaction (P0)

Problem

Flynn sends the entire session history to the model on every turn. There is no summarisation, trimming, or token budgeting. Once a conversation exceeds the model's context window, it fails hard.

Current flow (src/backends/native/agent.ts:92-165):

toolLoop() → loopMessages = full this.history → send to model

The SessionStore (src/session/store.ts) and ManagedSession (src/session/manager.ts) store every message verbatim and replay them all on load.

Design

Token counting

Add a tokenCount utility that estimates token counts per message. Two strategies:

Cheap estimate — character-based heuristic (chars / 4 for English). Good enough for budgeting.
Accurate count — use the Anthropic SDK's count_tokens or tiktoken for OpenAI. Only needed if we want precise billing.

Start with the cheap estimate; add accurate counting later behind a flag.

Compaction strategy

Use a summarise-and-replace approach (same as OpenClaw):

When total estimated tokens exceed a compaction threshold (configurable, default: 80% of model's context window), trigger compaction.
Take all messages except the last N turns (configurable, default: 4 turns).
Delegate the summarisation request to the fast tier (Haiku) via orchestrator.delegate(): "Summarise this conversation so far, preserving key facts, decisions, and context." This is a well-defined extraction task that doesn't need complex reasoning.
Replace the older messages with a single [system_summary] message.
Persist the compacted history to SQLite (replace the old messages).

Where compaction runs

Compaction is a concern of AgentOrchestrator (Phase 0), not the session store. The orchestrator decides when to compact based on the model it's using, and delegates the summary generation to the fast tier via orchestrator.delegate({ tier: 'fast', ... }).

New files

File	Purpose
`src/context/tokens.ts`	Token estimation utilities
`src/context/compaction.ts`	Compaction logic (summarise + replace)

Changes to existing files

File	Change
`src/backends/native/agent.ts`	Add `compactIfNeeded()` call before building `loopMessages`. Add compaction config to `NativeAgentConfig`.
`src/session/manager.ts`	Add `ManagedSession.replaceHistory(messages)` method for compaction to persist the compacted state.
`src/session/store.ts`	Add `replaceMessages(sessionId, messages)` — atomic delete + re-insert in a transaction.
`src/models/types.ts`	Add optional `contextWindow` field to `ChatResponse` or create a `ModelCapabilities` type.
`src/config/schema.ts`	Add `compaction` config block: `{ enabled, threshold_pct, keep_turns, summary_model? }`.
`src/daemon/index.ts`	Pass compaction config to agent creation.

Config additions

compaction:
  enabled: true
  threshold_pct: 80          # Trigger at 80% of context window
  keep_turns: 4              # Always keep the last 4 exchanges
  # summary_tier is configured in agents.delegation.compaction (default: fast/Haiku)

Chat commands

Command	Description
`/compact`	Force compaction of the current session immediately.

Implementation steps

Create src/context/tokens.ts with estimateTokens(text: string): number and estimateMessageTokens(messages: Message[]): number.
Create src/context/compaction.ts with compactHistory(opts: CompactionOpts): Promise<Message[]>:
- Takes messages, orchestrator (for delegation), keep_turns.
- Calls orchestrator.delegate({ tier: 'fast', ... }) for the summary.
- Returns [summaryMessage, ...recentMessages].
Add replaceMessages() to SessionStore.
Add replaceHistory() to ManagedSession.
Add compaction config to schema.
Wire compactIfNeeded() into AgentOrchestrator.process() — called before building the request, checks token budget.
Add /compact command handling in the message router.
Tests: token estimation accuracy, compaction trigger logic, history replacement, delegation to fast tier.

Model context window sizes

Hard-code a lookup table in src/context/tokens.ts:

const CONTEXT_WINDOWS: Record<string, number> = {
  'claude-sonnet-4-20250514': 200_000,
  'claude-3-5-haiku-20241022': 200_000,
  'gpt-4o': 128_000,
  'gpt-4o-mini': 128_000,
  // ... etc
};

Allow override in config: models.default.context_window: 128000.

Phase 2: Memory System (P0)

Problem

Flynn has no persistent knowledge across sessions. Every new session starts blank. The agent can't remember user preferences, past decisions, or accumulated knowledge.

Design

A lightweight memory system with three layers:

Memory files — Markdown files that the agent can read/write (like OpenClaw's MEMORY.md).
Memory tools — memory.read, memory.write, memory.search builtin tools.
Auto-indexing — After compaction, key facts are extracted and appended to memory.

Storage

Use a dedicated SQLite table in the existing sessions.db (or a separate memory.db):

CREATE TABLE memory_entries (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  session_id TEXT,           -- NULL for global memories
  namespace TEXT NOT NULL,   -- 'user', 'facts', 'preferences', etc.
  key TEXT NOT NULL,
  content TEXT NOT NULL,
  embedding BLOB,            -- Future: vector embedding for search
  created_at INTEGER NOT NULL DEFAULT (unixepoch()),
  updated_at INTEGER NOT NULL DEFAULT (unixepoch())
);
CREATE INDEX idx_memory_ns ON memory_entries(namespace);
CREATE INDEX idx_memory_session ON memory_entries(session_id);

Phase 2a: File-based memory (MVP)

The simplest useful memory: a markdown file per namespace in ~/.local/share/flynn/memory/.

~/.local/share/flynn/memory/
├── global.md          # Cross-session knowledge
├── user.md            # User preferences, facts about the user
└── sessions/
    └── {session_id}.md  # Per-session notes

Memory tools

Tool	Description
`memory.read`	Read a memory file by namespace. Args: `{ namespace: string }`
`memory.write`	Append to or replace a memory file. Args: `{ namespace: string, content: string, mode: 'append' \| 'replace' }`
`memory.search`	Search across all memory files for a keyword. Args: `{ query: string }`. Returns matching lines with context.

Phase 2b: Vector search (future)

Defer vector embeddings and semantic search to a later phase. The file-based approach with keyword search covers 80% of use cases.

When implemented:

Add sqlite-vec or similar for vector storage
Embed memory entries on write using the configured model's embedding API
Hybrid search: keyword (BM25) + vector similarity

System prompt integration

On every agent turn, inject a [Memory Context] section into the system prompt:

# Memory Context

The following is your persistent memory. Use it to maintain continuity across sessions.

## User
{contents of user.md, truncated to ~1000 tokens}

## Global
{contents of global.md, truncated to ~1000 tokens}

This is injected dynamically by the agent before each request, not baked into the static system prompt.

Auto-extraction after compaction

When compaction runs (Phase 1), add a follow-up step using the fast tier (Haiku) via orchestrator.delegate():

Along with the summary, delegate to Haiku to extract any new facts worth remembering (user preferences, decisions, names, etc.). This is a simple extraction task — no need for Sonnet/Opus.
Append extracted facts to user.md or global.md.

This creates a natural knowledge accumulation loop: conversation → compaction (Haiku) → memory extraction (Haiku) → next session gets richer context.

The cost of these background operations is minimal since they run on the cheapest model tier.

New files

File	Purpose
`src/memory/store.ts`	MemoryStore class — read/write/search markdown files
`src/memory/index.ts`	Exports
`src/tools/builtin/memory-read.ts`	`memory.read` tool
`src/tools/builtin/memory-write.ts`	`memory.write` tool
`src/tools/builtin/memory-search.ts`	`memory.search` tool

Changes to existing files

File	Change
`src/tools/builtin/index.ts`	Register memory tools in `allBuiltinTools`
`src/backends/native/orchestrator.ts`	Inject memory context into system prompt before each request
`src/context/compaction.ts`	Add memory extraction step after summarisation (delegates to fast tier)
`src/daemon/index.ts`	Initialize MemoryStore, pass to orchestrator config
`src/config/schema.ts`	Add `memory` config block: `{ enabled, dir, namespaces, auto_extract }`

Config additions

memory:
  enabled: true
  dir: ~/.local/share/flynn/memory
  auto_extract: true         # Extract facts during compaction
  max_context_tokens: 2000   # Max tokens injected per turn from memory

Implementation steps

Create src/memory/store.ts:
- read(namespace): string — read file contents
- write(namespace, content, mode): void — append or replace
- search(query): SearchResult[] — line-by-line keyword match with context
- listNamespaces(): string[]
Create memory tools (3 files).
Register tools.
Add memory context injection to NativeAgent — load memory before building the request, inject into system prompt.
Add memory extraction to compaction flow.
Tests: memory CRUD, search, injection, extraction.

Phase 3: Messaging Channels (P1)

Problem

Flynn has only Telegram and WebChat. The three most requested channels are WhatsApp, Discord, and Slack.

Design approach

Flynn's ChannelAdapter interface (src/channels/types.ts:51-69) is clean and well-defined. Adding a new channel means:

Implement ChannelAdapter (5 methods: name, status, connect(), disconnect(), send(), onMessage()).
Add config section.
Register in daemon startup.

Each channel is independent — implement in any order.

3a: Discord

Library: discord.js v14 Effort: 1–2 days

Config

discord:
  bot_token: ${DISCORD_BOT_TOKEN}
  allowed_guild_ids: []      # Empty = all guilds
  allowed_channel_ids: []    # Empty = all channels

New files

File	Purpose
`src/channels/discord/adapter.ts`	DiscordAdapter implementing ChannelAdapter
`src/channels/discord/index.ts`	Exports

Key decisions

Peer ID: Use channelId (not userId) so the agent maintains separate sessions per Discord channel.
Message chunking: Discord has a 2000-char limit. Chunk long responses.
Mentions: Only respond when mentioned (@Flynn) or in DMs. Configurable.
Slash commands: Register /reset and /status as Discord slash commands.

Implementation steps

Add discord.js dependency.
Create DiscordAdapter class.
Add config schema for discord section.
Register in daemon if config.discord.bot_token is set.
Export from src/channels/index.ts.
Test with a bot in a private server.

3b: Slack

Library: @slack/bolt (Bolt for JavaScript) Effort: 1–2 days

Config

slack:
  bot_token: ${SLACK_BOT_TOKEN}
  app_token: ${SLACK_APP_TOKEN}   # For Socket Mode
  signing_secret: ${SLACK_SIGNING_SECRET}
  allowed_channel_ids: []

New files

File	Purpose
`src/channels/slack/adapter.ts`	SlackAdapter implementing ChannelAdapter
`src/channels/slack/index.ts`	Exports

Key decisions

Socket Mode for self-hosted deployments (no public URL needed). Falls back to HTTP events if app_token not set.
Peer ID: channelId:threadTs to isolate threaded conversations.
Message chunking: Slack has a 40,000-char limit with blocks. Use mrkdwn formatting.
Slash commands: /flynn-reset, /flynn-status.

3c: WhatsApp

Library: whatsapp-web.js (or @whiskeysockets/baileys for full WhatsApp Web protocol) Effort: 2–3 days (more complex due to QR auth)

Config

whatsapp:
  auth_dir: ~/.local/share/flynn/whatsapp-auth
  allowed_numbers: []        # E.164 format, empty = all

Key decisions

Auth flow: WhatsApp Web requires QR code scanning on first connect. Display QR in terminal on startup.
Session persistence: Store auth state in auth_dir so re-auth isn't needed on restart.
Peer ID: Phone number (E.164).
Media: Start with text-only; defer image/audio handling.

WhatsApp is the most complex channel. Consider doing Discord and Slack first, then WhatsApp.

Shared channel infrastructure

Before implementing individual channels, extract any common patterns:

Message chunking utility — src/channels/utils/chunking.ts: chunkMessage(text: string, maxLen: number): string[]
Allowlist checking — src/channels/utils/auth.ts: isAllowed(senderId: string, allowlist: string[]): boolean
Markdown adaptation — src/channels/utils/markdown.ts: Platform-specific markdown conversion (Discord uses different syntax from Telegram).

Phase 4: Web Search Tool (P1)

Problem

The agent has no way to search the web. This is one of the most commonly-used agent tools.

Design

Provider options

Provider	Pros	Cons
Brave Search API	Free tier (2k/month), clean API, good results	Requires API key signup
SearXNG	Self-hosted, no API key, already running in homelab	Results quality varies
Tavily	Purpose-built for AI agents, great results	Paid only
DuckDuckGo	No API key needed	Unofficial API, rate limits

Recommendation: Support Brave as primary, SearXNG as self-hosted alternative. Make the provider configurable.

Config

tools:
  web_search:
    provider: brave           # brave | searxng | tavily
    api_key: ${BRAVE_SEARCH_API_KEY}
    endpoint: null            # Override for SearXNG: http://searxng:8080
    max_results: 5

New files

File	Purpose
`src/tools/builtin/web-search.ts`	`web.search` tool

Tool interface

{
  name: 'web.search',
  description: 'Search the web for information. Returns titles, URLs, and snippets.',
  inputSchema: {
    type: 'object',
    properties: {
      query: { type: 'string', description: 'Search query' },
      count: { type: 'number', description: 'Number of results (default 5, max 20)' },
    },
    required: ['query'],
  },
}

Output format

1. **Title** — url
   Snippet text...

2. **Title** — url
   Snippet text...

Structured as markdown so the model can easily parse and reference results.

Implementation steps

Create src/tools/builtin/web-search.ts.
Add Brave Search API client (simple fetch — no SDK needed).
Add SearXNG support as alternative backend.
Add tool config section to schema.
Register in allBuiltinTools.
Tests: mock API responses, result formatting.

Phase 5: Background Exec / Process Management (P1)

Problem

Flynn's shell.exec (src/tools/builtin/shell.ts) is fire-and-forget: it runs a command, waits for it to finish (up to 30s timeout), and returns stdout/stderr. There's no way to:

Run a long-running process (e.g., npm run dev)
Check on a running process
Read its ongoing output
Kill it

Design

Add a process tool family that manages background processes:

Tool	Description
`process.start`	Start a command in the background. Returns a process ID.
`process.status`	Check if a process is running, exited, or errored.
`process.output`	Read recent stdout/stderr from a background process.
`process.kill`	Kill a background process.
`process.list`	List all managed background processes.

Process manager

Create a ProcessManager class that maintains a registry of spawned processes:

interface ManagedProcess {
  id: string;
  command: string;
  cwd?: string;
  pid: number;
  status: 'running' | 'exited' | 'killed' | 'error';
  exitCode?: number;
  outputBuffer: RingBuffer;  // Last N bytes of combined stdout+stderr
  startedAt: number;
}

Output buffering

Use a ring buffer (circular buffer) to keep the last 64KB of output per process. This prevents memory leaks from long-running processes with verbose output.

Safety

Max processes: Limit to 10 concurrent background processes.
Auto-cleanup: Kill processes that have been running for more than 1 hour (configurable).
Shutdown cleanup: Kill all managed processes on daemon shutdown.
Hook integration: process.start should go through the confirmation engine (same as shell.exec).

New files

File	Purpose
`src/tools/builtin/process/manager.ts`	ProcessManager class
`src/tools/builtin/process/start.ts`	`process.start` tool
`src/tools/builtin/process/status.ts`	`process.status` tool
`src/tools/builtin/process/output.ts`	`process.output` tool
`src/tools/builtin/process/kill.ts`	`process.kill` tool
`src/tools/builtin/process/list.ts`	`process.list` tool
`src/tools/builtin/process/index.ts`	Exports

Changes to existing files

File	Change
`src/tools/builtin/index.ts`	Register process tools
`src/daemon/index.ts`	Create ProcessManager, pass to tool constructors, register shutdown handler
`src/config/schema.ts`	Add `process` config: `{ max_concurrent, max_runtime_minutes, buffer_size }`

Implementation steps

Implement RingBuffer utility (or use an npm package like ringbufferjs).
Create ProcessManager class with spawn, track, kill, cleanup methods.
Implement 5 process tools.
Register tools and wire shutdown cleanup.
Tests: spawn + kill lifecycle, output buffering, max process limits.

Phase 6: Enhanced web_fetch (P1)

Problem

Flynn's web.fetch (src/tools/builtin/web-fetch.ts:19-50) is a bare fetch() call that returns raw HTML. This is nearly useless for LLMs — they need extracted text/markdown, not raw HTML with scripts and styles.

Design

Enhancements

HTML-to-markdown extraction — Strip scripts/styles, convert to markdown using @mozilla/readability + turndown.
Format parameter — Let the agent choose: text, markdown (default), or html.
Response caching — Cache fetched pages for 5 minutes to avoid redundant requests in tool loops.
Redirect following — Already handled by fetch(), but add a max redirect limit.
Content type handling — Return JSON prettified, plain text as-is, HTML converted.

Libraries

Package	Purpose
`turndown`	HTML → Markdown converter
`linkedom`	Lightweight DOM implementation (for Readability)
`@mozilla/readability`	Extract article content from HTML

Using linkedom instead of jsdom — it's much lighter and sufficient for content extraction.

Tool interface update

{
  name: 'web.fetch',
  description: 'Fetch a URL and extract its content. Returns clean text/markdown by default, not raw HTML.',
  inputSchema: {
    type: 'object',
    properties: {
      url: { type: 'string', description: 'The URL to fetch' },
      format: { type: 'string', enum: ['markdown', 'text', 'html'], description: 'Output format (default: markdown)' },
      timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
    },
    required: ['url'],
  },
}

Caching

Simple in-memory cache with TTL:

const cache = new Map<string, { content: string; timestamp: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes

Changes to existing files

File	Change
`src/tools/builtin/web-fetch.ts`	Major rewrite — add extraction, caching, format parameter

Implementation steps

Add turndown, linkedom, @mozilla/readability dependencies.
Create extraction pipeline: fetch → parse DOM → readability → turndown → clean markdown.
Add format parameter handling.
Add response caching.
Update tool description to reflect new capabilities.
Tests: extraction from sample HTML, caching behaviour, format handling.

Implementation Order

Week 1:  Phase 0 (Multi-Model Delegation) ─────────────────────── P0 (foundational)
Week 2:  Phase 1 (Context Compaction) ─────────────────────────── P0 (uses delegation)
Week 3:  Phase 2 (Memory System) ──────────────────────────────── P0 (uses delegation)
Week 4:  Phase 4 (Web Search) + Phase 6 (Enhanced web_fetch) ─── P1 (quick wins)
Week 5:  Phase 5 (Process Management) ─────────────────────────── P1
Week 6+: Phase 3 (Channels: Discord → Slack → WhatsApp) ──────── P1

Rationale:

Delegation first — Phase 0 is foundational. Compaction and memory both need to delegate subtasks to cheaper models. Building the orchestrator first means Phase 1 and 2 can use it immediately.
Compaction and memory are sequential (memory extraction depends on compaction).
Web search and enhanced web_fetch are small, independent, and immediately useful — do them as palate cleansers between the big features.
Process management is self-contained.
Channels are the largest body of work but each is independent — can be done in parallel or interleaved.

Model usage across all phases

Phase	Primary model (user-facing)	Delegated tasks	Delegation tier
0	Sonnet (default)	Sub-agent infrastructure	N/A (infrastructure)
1	Sonnet (default)	Compaction summaries	Haiku (fast)
2	Sonnet (default)	Memory fact extraction	Haiku (fast)
3	Sonnet (default)	Message classification, markdown adaptation	Haiku (fast)
4	Sonnet (default)	None (direct API call)	N/A
5	Sonnet (default)	None	N/A
6	Sonnet (default)	None	N/A

Opus (complex) is reserved for user-facing tasks that require deep reasoning — it's never used for background operations.

Testing Strategy

Each phase should include:

Unit tests — Pure logic (token estimation, ring buffer, markdown extraction, memory search).
Integration tests — Tool execution with mocked model responses.
Manual smoke test — Run via TUI and Telegram to verify end-to-end.

Key test files to create:

Test file	Covers
`src/backends/native/orchestrator.test.ts`	Delegation routing, tier selection, depth limiting, cost tracking
`src/context/tokens.test.ts`	Token estimation accuracy
`src/context/compaction.test.ts`	Compaction trigger logic, summary replacement, fast-tier delegation
`src/memory/store.test.ts`	Memory CRUD, search
`src/tools/builtin/web-search.test.ts`	API mocking, result formatting
`src/tools/builtin/process/manager.test.ts`	Process lifecycle, cleanup
`src/tools/builtin/web-fetch.test.ts`	HTML extraction, caching

Risk Assessment

Risk	Impact	Mitigation
Haiku summaries lose critical context vs Sonnet	High	Validate quality; use detailed extraction prompts; allow per-task tier override in config
Delegation depth spirals (agent delegates to agent that delegates...)	Medium	Hard limit `max_delegation_depth: 3`; sub-agents cannot spawn sub-agents
Fast tier unavailable (Haiku rate limit / outage)	Medium	Fallback to default tier for delegation; log the fallback cost increase
Compaction summaries lose critical context	High	Keep last 4 turns intact; allow user to adjust `keep_turns`; log what was compacted
Memory injection bloats system prompt	Medium	Hard cap on injected memory tokens; truncate oldest entries
WhatsApp auth flow is fragile	Medium	Defer WhatsApp to last; use battle-tested Baileys library
Brave Search free tier limits (2k/month)	Low	SearXNG as free self-hosted fallback
Background processes leak resources	Medium	Max process limit, auto-kill timeout, shutdown cleanup
HTML extraction fails on JS-heavy sites	Low	Accept graceful degradation; defer CDP/browser fallback to P3

33 KiB Raw Blame History Unescape Escape

Flynn P0 + P1 Implementation Plan

Feature Summary

Phase 0: Multi-Model Sub-Agent Delegation (P0 — Foundational)

Problem

Model Tier Mapping

Design

Sub-agent spawning

Where delegation happens

Automatic tier escalation

AgentOrchestrator class

Passing the orchestrator to tools and compaction

New files

Changes to existing files

Config additions

Implementation steps

Cost implications

Phase 1: Context Compaction (P0)

Problem

Design

Token counting

Compaction strategy

Where compaction runs

New files

Changes to existing files

Config additions

Chat commands

Implementation steps

Model context window sizes

Phase 2: Memory System (P0)

Problem

Design

Storage

Phase 2a: File-based memory (MVP)

Memory tools

Phase 2b: Vector search (future)

System prompt integration

Auto-extraction after compaction

New files

Changes to existing files

Config additions

Implementation steps

Phase 3: Messaging Channels (P1)

Problem

Design approach

3a: Discord

Config

New files

Key decisions

Implementation steps

3b: Slack

Config

New files

Key decisions

3c: WhatsApp

Config

Key decisions

Shared channel infrastructure

Phase 4: Web Search Tool (P1)

Problem

Design

Provider options

Config

New files

Tool interface

Output format

Implementation steps

Phase 5: Background Exec / Process Management (P1)

Problem

Design

Process manager

Output buffering

Safety

New files

Changes to existing files

Implementation steps

Phase 6: Enhanced web_fetch (P1)

Problem

Design

Enhancements

33 KiB

Raw Blame History