846 lines
33 KiB
Markdown
846 lines
33 KiB
Markdown
# Flynn P0 + P1 Implementation Plan
|
||
|
||
**Date:** 2026-02-06
|
||
**Scope:** 7 features from the gap analysis — the functionally critical (P0) and high-impact (P1) items.
|
||
**Prerequisite:** [Feature Gap Analysis](./2026-02-06-openclaw-feature-gap-analysis.md)
|
||
|
||
---
|
||
|
||
## Feature Summary
|
||
|
||
| # | Feature | Priority | Est. Effort | Dependencies |
|
||
|---|---------|----------|-------------|--------------|
|
||
| 0 | Multi-model sub-agent delegation | P0 | 3–4 days | None (foundational) |
|
||
| 1 | Context compaction | P0 | 2–3 days | #0 (uses cheap model for summaries) |
|
||
| 2 | Memory system | P0 | 3–4 days | #0, #1 |
|
||
| 3 | Messaging channels (WhatsApp, Discord, Slack) | P1 | 2–3 days each | None |
|
||
| 4 | Web search tool | P1 | 0.5 day | None |
|
||
| 5 | Background exec / process management | P1 | 1–2 days | None |
|
||
| 6 | Enhanced web_fetch | P1 | 1 day | None |
|
||
|
||
**Total estimated effort:** 15–22 days
|
||
|
||
---
|
||
|
||
## Phase 0: Multi-Model Sub-Agent Delegation (P0 — Foundational)
|
||
|
||
### Problem
|
||
|
||
Flynn currently runs a **single NativeAgent per session** that talks to one model tier at a time. The `ModelRouter` (`src/models/router.ts`) supports tiers (`fast`/`default`/`complex`/`local`) and a fallback chain, but:
|
||
|
||
- There is no concept of **sub-agents** — the primary agent can't spawn a cheaper model for a subtask.
|
||
- Model selection is **per-session** (via `/model` command), not **per-task**.
|
||
- Compaction summaries, memory extraction, and classification tasks all use the same expensive model as the main conversation — wasteful.
|
||
- There is no orchestrator pattern where an expensive model (Opus) plans and delegates to cheaper models (Sonnet, Haiku) for execution.
|
||
|
||
### Model Tier Mapping
|
||
|
||
| Tier | Model | Use For |
|
||
|------|-------|---------|
|
||
| **complex** (orchestrator) | Claude Opus 4.6 | Planning, orchestration, complex reasoning, multi-step decisions |
|
||
| **default** (worker) | Claude Sonnet 4.5 | General conversation, tool use, code generation, channel adapters |
|
||
| **fast** (utility) | Claude Haiku 4.5 | Compaction summaries, memory extraction, classification, keyword extraction, formatting |
|
||
|
||
This maps directly to Flynn's existing `ModelTier` type. The infrastructure is already there — what's missing is the **delegation mechanism**.
|
||
|
||
### Design
|
||
|
||
#### Sub-agent spawning
|
||
|
||
Add the ability for `NativeAgent` to spawn **ephemeral sub-agents** that run a single task on a specific model tier and return the result:
|
||
|
||
```typescript
|
||
interface SubAgentRequest {
|
||
/** Which model tier to use for this subtask. */
|
||
tier: ModelTier;
|
||
/** System prompt for the sub-agent (task-specific). */
|
||
systemPrompt: string;
|
||
/** The task message. */
|
||
message: string;
|
||
/** Max tokens for the response. */
|
||
maxTokens?: number;
|
||
/** Whether to include tools. Default: false (most subtasks are pure text). */
|
||
tools?: boolean;
|
||
}
|
||
|
||
interface SubAgentResult {
|
||
content: string;
|
||
usage: TokenUsage;
|
||
tier: ModelTier;
|
||
}
|
||
```
|
||
|
||
The sub-agent is **stateless** — no session, no history, just a single request/response. It's a thin wrapper around `modelRouter.chat()` with a specific tier.
|
||
|
||
#### Where delegation happens
|
||
|
||
| Task | Delegated to | Reason |
|
||
|------|-------------|--------|
|
||
| Compaction summary | **fast** (Haiku) | Summarisation is a well-defined extraction task; doesn't need complex reasoning |
|
||
| Memory fact extraction | **fast** (Haiku) | Simple extraction from conversation text |
|
||
| Message classification | **fast** (Haiku) | "Is this a command, question, or statement?" — trivial |
|
||
| Tool result summarisation | **fast** (Haiku) | Condense verbose tool output before feeding back |
|
||
| Primary conversation | **default** (Sonnet) | General-purpose agent work |
|
||
| Complex planning/reasoning | **complex** (Opus) | Multi-step planning, architecture decisions, ambiguous requests |
|
||
| Sub-agent orchestration | **complex** (Opus) | When the agent decides to break a task into subtasks |
|
||
|
||
#### Automatic tier escalation
|
||
|
||
Add optional **auto-escalation** where the primary agent (Sonnet) can recognise it's struggling and escalate to Opus:
|
||
|
||
1. If the agent hits `maxIterations` without completing the task → escalate to `complex`.
|
||
2. If the agent's response contains explicit uncertainty markers ("I'm not sure", "This is beyond...") → offer escalation.
|
||
3. Configurable: `auto_escalate: true` in config.
|
||
|
||
This is a **future enhancement** — start with explicit delegation points (compaction, memory extraction) and add auto-escalation later.
|
||
|
||
#### AgentOrchestrator class
|
||
|
||
Create a new `AgentOrchestrator` that sits between the channel message handler and the `NativeAgent`:
|
||
|
||
```typescript
|
||
class AgentOrchestrator {
|
||
private primaryAgent: NativeAgent; // default tier (Sonnet)
|
||
private modelRouter: ModelRouter;
|
||
|
||
/** Spawn a sub-agent for a single-turn task on a specific tier. */
|
||
async delegate(request: SubAgentRequest): Promise<SubAgentResult>;
|
||
|
||
/** Process a user message — delegates to primary agent, which may internally delegate subtasks. */
|
||
async process(userMessage: string): Promise<string>;
|
||
}
|
||
```
|
||
|
||
The orchestrator replaces the current direct `NativeAgent` usage in the message router (`src/daemon/index.ts:139-186`).
|
||
|
||
#### Passing the orchestrator to tools and compaction
|
||
|
||
The key insight: **compaction and memory extraction don't need a new agent class** — they just need access to `modelRouter.chat(request, 'fast')`. The orchestrator provides a `delegate()` method that any subsystem can call:
|
||
|
||
```typescript
|
||
// In compaction.ts
|
||
const summary = await orchestrator.delegate({
|
||
tier: 'fast',
|
||
systemPrompt: COMPACTION_SYSTEM_PROMPT,
|
||
message: `Summarise this conversation:\n\n${messagesToCompact}`,
|
||
maxTokens: 1024,
|
||
});
|
||
|
||
// In memory extraction
|
||
const facts = await orchestrator.delegate({
|
||
tier: 'fast',
|
||
systemPrompt: MEMORY_EXTRACTION_PROMPT,
|
||
message: `Extract key facts from:\n\n${summary}`,
|
||
maxTokens: 512,
|
||
});
|
||
```
|
||
|
||
### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/backends/native/orchestrator.ts` | `AgentOrchestrator` — sub-agent spawning and delegation |
|
||
| `src/backends/native/prompts.ts` | System prompts for delegated tasks (compaction, extraction, classification) |
|
||
|
||
### Changes to existing files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `src/backends/native/agent.ts` | Accept optional `orchestrator` reference for internal delegation. Add `delegateSubtask()` method. |
|
||
| `src/daemon/index.ts` | Replace direct `NativeAgent` creation in `createMessageRouter()` with `AgentOrchestrator`. |
|
||
| `src/config/schema.ts` | Add `agents` config block for tier assignment and delegation policy. |
|
||
| `src/models/router.ts` | No changes needed — already supports `chat(request, tier)`. |
|
||
|
||
### Config additions
|
||
|
||
```yaml
|
||
agents:
|
||
primary_tier: default # Model tier for main conversation (Sonnet)
|
||
delegation:
|
||
compaction: fast # Tier for compaction summaries (Haiku)
|
||
memory_extraction: fast # Tier for memory fact extraction (Haiku)
|
||
classification: fast # Tier for message classification (Haiku)
|
||
tool_summarisation: fast # Tier for condensing tool output (Haiku)
|
||
complex_reasoning: complex # Tier for escalated reasoning (Opus)
|
||
auto_escalate: false # Future: auto-escalate on failure
|
||
max_delegation_depth: 3 # Prevent infinite delegation chains
|
||
```
|
||
|
||
### Implementation steps
|
||
|
||
1. Create `src/backends/native/orchestrator.ts`:
|
||
- Constructor takes `ModelRouter`, `systemPrompt`, `session`, `toolRegistry`, `toolExecutor`, delegation config.
|
||
- `delegate(request: SubAgentRequest): Promise<SubAgentResult>` — single-turn call to `modelRouter.chat()` with specified tier.
|
||
- `process(userMessage: string): Promise<string>` — delegates to internal `NativeAgent`.
|
||
- Tracks delegation depth to prevent loops.
|
||
- Logs tier usage for cost visibility.
|
||
2. Create `src/backends/native/prompts.ts` with task-specific system prompts.
|
||
3. Update `createMessageRouter()` in `src/daemon/index.ts` to use `AgentOrchestrator` instead of raw `NativeAgent`.
|
||
4. Add `agents` config block to schema.
|
||
5. Wire delegation config through to compaction (Phase 1) and memory (Phase 2).
|
||
6. Tests: delegation routing, tier selection, depth limiting.
|
||
|
||
### Cost implications
|
||
|
||
| Operation | Without delegation | With delegation |
|
||
|-----------|-------------------|-----------------|
|
||
| Compaction summary | Opus/Sonnet ($$$) | Haiku ($) |
|
||
| Memory extraction | Opus/Sonnet ($$$) | Haiku ($) |
|
||
| 10 classifications | Opus/Sonnet ($$$) | Haiku ($) |
|
||
| Complex reasoning | Sonnet ($$) | Opus ($$$) — but only when needed |
|
||
|
||
Net effect: **significant cost reduction** for background tasks, with targeted spend on complex reasoning only when it matters.
|
||
|
||
---
|
||
|
||
## Phase 1: Context Compaction (P0)
|
||
|
||
### Problem
|
||
|
||
Flynn sends the **entire session history** to the model on every turn. There is no summarisation, trimming, or token budgeting. Once a conversation exceeds the model's context window, it fails hard.
|
||
|
||
**Current flow** (`src/backends/native/agent.ts:92-165`):
|
||
```
|
||
toolLoop() → loopMessages = full this.history → send to model
|
||
```
|
||
|
||
The `SessionStore` (`src/session/store.ts`) and `ManagedSession` (`src/session/manager.ts`) store every message verbatim and replay them all on load.
|
||
|
||
### Design
|
||
|
||
#### Token counting
|
||
|
||
Add a `tokenCount` utility that estimates token counts per message. Two strategies:
|
||
|
||
1. **Cheap estimate** — character-based heuristic (`chars / 4` for English). Good enough for budgeting.
|
||
2. **Accurate count** — use the Anthropic SDK's `count_tokens` or `tiktoken` for OpenAI. Only needed if we want precise billing.
|
||
|
||
Start with the cheap estimate; add accurate counting later behind a flag.
|
||
|
||
#### Compaction strategy
|
||
|
||
Use a **summarise-and-replace** approach (same as OpenClaw):
|
||
|
||
1. When total estimated tokens exceed a **compaction threshold** (configurable, default: 80% of model's context window), trigger compaction.
|
||
2. Take all messages **except the last N turns** (configurable, default: 4 turns).
|
||
3. **Delegate** the summarisation request to the **fast tier (Haiku)** via `orchestrator.delegate()`: "Summarise this conversation so far, preserving key facts, decisions, and context." This is a well-defined extraction task that doesn't need complex reasoning.
|
||
4. Replace the older messages with a single `[system_summary]` message.
|
||
5. Persist the compacted history to SQLite (replace the old messages).
|
||
|
||
#### Where compaction runs
|
||
|
||
Compaction is a concern of `AgentOrchestrator` (Phase 0), not the session store. The orchestrator decides when to compact based on the model it's using, and delegates the summary generation to the **fast** tier via `orchestrator.delegate({ tier: 'fast', ... })`.
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/context/tokens.ts` | Token estimation utilities |
|
||
| `src/context/compaction.ts` | Compaction logic (summarise + replace) |
|
||
|
||
#### Changes to existing files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `src/backends/native/agent.ts` | Add `compactIfNeeded()` call before building `loopMessages`. Add compaction config to `NativeAgentConfig`. |
|
||
| `src/session/manager.ts` | Add `ManagedSession.replaceHistory(messages)` method for compaction to persist the compacted state. |
|
||
| `src/session/store.ts` | Add `replaceMessages(sessionId, messages)` — atomic delete + re-insert in a transaction. |
|
||
| `src/models/types.ts` | Add optional `contextWindow` field to `ChatResponse` or create a `ModelCapabilities` type. |
|
||
| `src/config/schema.ts` | Add `compaction` config block: `{ enabled, threshold_pct, keep_turns, summary_model? }`. |
|
||
| `src/daemon/index.ts` | Pass compaction config to agent creation. |
|
||
|
||
#### Config additions
|
||
|
||
```yaml
|
||
compaction:
|
||
enabled: true
|
||
threshold_pct: 80 # Trigger at 80% of context window
|
||
keep_turns: 4 # Always keep the last 4 exchanges
|
||
# summary_tier is configured in agents.delegation.compaction (default: fast/Haiku)
|
||
```
|
||
|
||
#### Chat commands
|
||
|
||
| Command | Description |
|
||
|---------|-------------|
|
||
| `/compact` | Force compaction of the current session immediately. |
|
||
|
||
#### Implementation steps
|
||
|
||
1. Create `src/context/tokens.ts` with `estimateTokens(text: string): number` and `estimateMessageTokens(messages: Message[]): number`.
|
||
2. Create `src/context/compaction.ts` with `compactHistory(opts: CompactionOpts): Promise<Message[]>`:
|
||
- Takes messages, orchestrator (for delegation), keep_turns.
|
||
- Calls `orchestrator.delegate({ tier: 'fast', ... })` for the summary.
|
||
- Returns `[summaryMessage, ...recentMessages]`.
|
||
3. Add `replaceMessages()` to `SessionStore`.
|
||
4. Add `replaceHistory()` to `ManagedSession`.
|
||
5. Add compaction config to schema.
|
||
6. Wire `compactIfNeeded()` into `AgentOrchestrator.process()` — called before building the request, checks token budget.
|
||
7. Add `/compact` command handling in the message router.
|
||
8. Tests: token estimation accuracy, compaction trigger logic, history replacement, delegation to fast tier.
|
||
|
||
#### Model context window sizes
|
||
|
||
Hard-code a lookup table in `src/context/tokens.ts`:
|
||
|
||
```typescript
|
||
const CONTEXT_WINDOWS: Record<string, number> = {
|
||
'claude-sonnet-4-20250514': 200_000,
|
||
'claude-3-5-haiku-20241022': 200_000,
|
||
'gpt-4o': 128_000,
|
||
'gpt-4o-mini': 128_000,
|
||
// ... etc
|
||
};
|
||
```
|
||
|
||
Allow override in config: `models.default.context_window: 128000`.
|
||
|
||
---
|
||
|
||
## Phase 2: Memory System (P0)
|
||
|
||
### Problem
|
||
|
||
Flynn has no persistent knowledge across sessions. Every new session starts blank. The agent can't remember user preferences, past decisions, or accumulated knowledge.
|
||
|
||
### Design
|
||
|
||
A lightweight memory system with three layers:
|
||
|
||
1. **Memory files** — Markdown files that the agent can read/write (like OpenClaw's `MEMORY.md`).
|
||
2. **Memory tools** — `memory.read`, `memory.write`, `memory.search` builtin tools.
|
||
3. **Auto-indexing** — After compaction, key facts are extracted and appended to memory.
|
||
|
||
#### Storage
|
||
|
||
Use a dedicated SQLite table in the existing `sessions.db` (or a separate `memory.db`):
|
||
|
||
```sql
|
||
CREATE TABLE memory_entries (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
session_id TEXT, -- NULL for global memories
|
||
namespace TEXT NOT NULL, -- 'user', 'facts', 'preferences', etc.
|
||
key TEXT NOT NULL,
|
||
content TEXT NOT NULL,
|
||
embedding BLOB, -- Future: vector embedding for search
|
||
created_at INTEGER NOT NULL DEFAULT (unixepoch()),
|
||
updated_at INTEGER NOT NULL DEFAULT (unixepoch())
|
||
);
|
||
CREATE INDEX idx_memory_ns ON memory_entries(namespace);
|
||
CREATE INDEX idx_memory_session ON memory_entries(session_id);
|
||
```
|
||
|
||
#### Phase 2a: File-based memory (MVP)
|
||
|
||
The simplest useful memory: a markdown file per namespace in `~/.local/share/flynn/memory/`.
|
||
|
||
```
|
||
~/.local/share/flynn/memory/
|
||
├── global.md # Cross-session knowledge
|
||
├── user.md # User preferences, facts about the user
|
||
└── sessions/
|
||
└── {session_id}.md # Per-session notes
|
||
```
|
||
|
||
#### Memory tools
|
||
|
||
| Tool | Description |
|
||
|------|-------------|
|
||
| `memory.read` | Read a memory file by namespace. Args: `{ namespace: string }` |
|
||
| `memory.write` | Append to or replace a memory file. Args: `{ namespace: string, content: string, mode: 'append' \| 'replace' }` |
|
||
| `memory.search` | Search across all memory files for a keyword. Args: `{ query: string }`. Returns matching lines with context. |
|
||
|
||
#### Phase 2b: Vector search (future)
|
||
|
||
Defer vector embeddings and semantic search to a later phase. The file-based approach with keyword search covers 80% of use cases.
|
||
|
||
When implemented:
|
||
- Add `sqlite-vec` or similar for vector storage
|
||
- Embed memory entries on write using the configured model's embedding API
|
||
- Hybrid search: keyword (BM25) + vector similarity
|
||
|
||
#### System prompt integration
|
||
|
||
On every agent turn, inject a `[Memory Context]` section into the system prompt:
|
||
|
||
```
|
||
# Memory Context
|
||
|
||
The following is your persistent memory. Use it to maintain continuity across sessions.
|
||
|
||
## User
|
||
{contents of user.md, truncated to ~1000 tokens}
|
||
|
||
## Global
|
||
{contents of global.md, truncated to ~1000 tokens}
|
||
```
|
||
|
||
This is injected dynamically by the agent before each request, not baked into the static system prompt.
|
||
|
||
#### Auto-extraction after compaction
|
||
|
||
When compaction runs (Phase 1), add a follow-up step using the **fast tier (Haiku)** via `orchestrator.delegate()`:
|
||
|
||
1. Along with the summary, delegate to Haiku to extract any **new facts worth remembering** (user preferences, decisions, names, etc.). This is a simple extraction task — no need for Sonnet/Opus.
|
||
2. Append extracted facts to `user.md` or `global.md`.
|
||
|
||
This creates a natural knowledge accumulation loop: conversation → compaction (Haiku) → memory extraction (Haiku) → next session gets richer context.
|
||
|
||
The cost of these background operations is minimal since they run on the cheapest model tier.
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/memory/store.ts` | MemoryStore class — read/write/search markdown files |
|
||
| `src/memory/index.ts` | Exports |
|
||
| `src/tools/builtin/memory-read.ts` | `memory.read` tool |
|
||
| `src/tools/builtin/memory-write.ts` | `memory.write` tool |
|
||
| `src/tools/builtin/memory-search.ts` | `memory.search` tool |
|
||
|
||
#### Changes to existing files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `src/tools/builtin/index.ts` | Register memory tools in `allBuiltinTools` |
|
||
| `src/backends/native/orchestrator.ts` | Inject memory context into system prompt before each request |
|
||
| `src/context/compaction.ts` | Add memory extraction step after summarisation (delegates to fast tier) |
|
||
| `src/daemon/index.ts` | Initialize MemoryStore, pass to orchestrator config |
|
||
| `src/config/schema.ts` | Add `memory` config block: `{ enabled, dir, namespaces, auto_extract }` |
|
||
|
||
#### Config additions
|
||
|
||
```yaml
|
||
memory:
|
||
enabled: true
|
||
dir: ~/.local/share/flynn/memory
|
||
auto_extract: true # Extract facts during compaction
|
||
max_context_tokens: 2000 # Max tokens injected per turn from memory
|
||
```
|
||
|
||
#### Implementation steps
|
||
|
||
1. Create `src/memory/store.ts`:
|
||
- `read(namespace): string` — read file contents
|
||
- `write(namespace, content, mode): void` — append or replace
|
||
- `search(query): SearchResult[]` — line-by-line keyword match with context
|
||
- `listNamespaces(): string[]`
|
||
2. Create memory tools (3 files).
|
||
3. Register tools.
|
||
4. Add memory context injection to `NativeAgent` — load memory before building the request, inject into system prompt.
|
||
5. Add memory extraction to compaction flow.
|
||
6. Tests: memory CRUD, search, injection, extraction.
|
||
|
||
---
|
||
|
||
## Phase 3: Messaging Channels (P1)
|
||
|
||
### Problem
|
||
|
||
Flynn has only Telegram and WebChat. The three most requested channels are WhatsApp, Discord, and Slack.
|
||
|
||
### Design approach
|
||
|
||
Flynn's `ChannelAdapter` interface (`src/channels/types.ts:51-69`) is clean and well-defined. Adding a new channel means:
|
||
|
||
1. Implement `ChannelAdapter` (5 methods: `name`, `status`, `connect()`, `disconnect()`, `send()`, `onMessage()`).
|
||
2. Add config section.
|
||
3. Register in daemon startup.
|
||
|
||
Each channel is independent — implement in any order.
|
||
|
||
### 3a: Discord
|
||
|
||
**Library:** `discord.js` v14
|
||
**Effort:** 1–2 days
|
||
|
||
#### Config
|
||
|
||
```yaml
|
||
discord:
|
||
bot_token: ${DISCORD_BOT_TOKEN}
|
||
allowed_guild_ids: [] # Empty = all guilds
|
||
allowed_channel_ids: [] # Empty = all channels
|
||
```
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/channels/discord/adapter.ts` | DiscordAdapter implementing ChannelAdapter |
|
||
| `src/channels/discord/index.ts` | Exports |
|
||
|
||
#### Key decisions
|
||
|
||
- **Peer ID:** Use `channelId` (not `userId`) so the agent maintains separate sessions per Discord channel.
|
||
- **Message chunking:** Discord has a 2000-char limit. Chunk long responses.
|
||
- **Mentions:** Only respond when mentioned (`@Flynn`) or in DMs. Configurable.
|
||
- **Slash commands:** Register `/reset` and `/status` as Discord slash commands.
|
||
|
||
#### Implementation steps
|
||
|
||
1. Add `discord.js` dependency.
|
||
2. Create `DiscordAdapter` class.
|
||
3. Add config schema for `discord` section.
|
||
4. Register in daemon if `config.discord.bot_token` is set.
|
||
5. Export from `src/channels/index.ts`.
|
||
6. Test with a bot in a private server.
|
||
|
||
### 3b: Slack
|
||
|
||
**Library:** `@slack/bolt` (Bolt for JavaScript)
|
||
**Effort:** 1–2 days
|
||
|
||
#### Config
|
||
|
||
```yaml
|
||
slack:
|
||
bot_token: ${SLACK_BOT_TOKEN}
|
||
app_token: ${SLACK_APP_TOKEN} # For Socket Mode
|
||
signing_secret: ${SLACK_SIGNING_SECRET}
|
||
allowed_channel_ids: []
|
||
```
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/channels/slack/adapter.ts` | SlackAdapter implementing ChannelAdapter |
|
||
| `src/channels/slack/index.ts` | Exports |
|
||
|
||
#### Key decisions
|
||
|
||
- **Socket Mode** for self-hosted deployments (no public URL needed). Falls back to HTTP events if `app_token` not set.
|
||
- **Peer ID:** `channelId:threadTs` to isolate threaded conversations.
|
||
- **Message chunking:** Slack has a 40,000-char limit with blocks. Use `mrkdwn` formatting.
|
||
- **Slash commands:** `/flynn-reset`, `/flynn-status`.
|
||
|
||
### 3c: WhatsApp
|
||
|
||
**Library:** `whatsapp-web.js` (or `@whiskeysockets/baileys` for full WhatsApp Web protocol)
|
||
**Effort:** 2–3 days (more complex due to QR auth)
|
||
|
||
#### Config
|
||
|
||
```yaml
|
||
whatsapp:
|
||
auth_dir: ~/.local/share/flynn/whatsapp-auth
|
||
allowed_numbers: [] # E.164 format, empty = all
|
||
```
|
||
|
||
#### Key decisions
|
||
|
||
- **Auth flow:** WhatsApp Web requires QR code scanning on first connect. Display QR in terminal on startup.
|
||
- **Session persistence:** Store auth state in `auth_dir` so re-auth isn't needed on restart.
|
||
- **Peer ID:** Phone number (E.164).
|
||
- **Media:** Start with text-only; defer image/audio handling.
|
||
|
||
**WhatsApp is the most complex channel.** Consider doing Discord and Slack first, then WhatsApp.
|
||
|
||
### Shared channel infrastructure
|
||
|
||
Before implementing individual channels, extract any common patterns:
|
||
|
||
1. **Message chunking utility** — `src/channels/utils/chunking.ts`: `chunkMessage(text: string, maxLen: number): string[]`
|
||
2. **Allowlist checking** — `src/channels/utils/auth.ts`: `isAllowed(senderId: string, allowlist: string[]): boolean`
|
||
3. **Markdown adaptation** — `src/channels/utils/markdown.ts`: Platform-specific markdown conversion (Discord uses different syntax from Telegram).
|
||
|
||
---
|
||
|
||
## Phase 4: Web Search Tool (P1)
|
||
|
||
### Problem
|
||
|
||
The agent has no way to search the web. This is one of the most commonly-used agent tools.
|
||
|
||
### Design
|
||
|
||
#### Provider options
|
||
|
||
| Provider | Pros | Cons |
|
||
|----------|------|------|
|
||
| **Brave Search API** | Free tier (2k/month), clean API, good results | Requires API key signup |
|
||
| **SearXNG** | Self-hosted, no API key, already running in homelab | Results quality varies |
|
||
| **Tavily** | Purpose-built for AI agents, great results | Paid only |
|
||
| **DuckDuckGo** | No API key needed | Unofficial API, rate limits |
|
||
|
||
**Recommendation:** Support Brave as primary, SearXNG as self-hosted alternative. Make the provider configurable.
|
||
|
||
#### Config
|
||
|
||
```yaml
|
||
tools:
|
||
web_search:
|
||
provider: brave # brave | searxng | tavily
|
||
api_key: ${BRAVE_SEARCH_API_KEY}
|
||
endpoint: null # Override for SearXNG: http://searxng:8080
|
||
max_results: 5
|
||
```
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/tools/builtin/web-search.ts` | `web.search` tool |
|
||
|
||
#### Tool interface
|
||
|
||
```typescript
|
||
{
|
||
name: 'web.search',
|
||
description: 'Search the web for information. Returns titles, URLs, and snippets.',
|
||
inputSchema: {
|
||
type: 'object',
|
||
properties: {
|
||
query: { type: 'string', description: 'Search query' },
|
||
count: { type: 'number', description: 'Number of results (default 5, max 20)' },
|
||
},
|
||
required: ['query'],
|
||
},
|
||
}
|
||
```
|
||
|
||
#### Output format
|
||
|
||
```
|
||
1. **Title** — url
|
||
Snippet text...
|
||
|
||
2. **Title** — url
|
||
Snippet text...
|
||
```
|
||
|
||
Structured as markdown so the model can easily parse and reference results.
|
||
|
||
#### Implementation steps
|
||
|
||
1. Create `src/tools/builtin/web-search.ts`.
|
||
2. Add Brave Search API client (simple `fetch` — no SDK needed).
|
||
3. Add SearXNG support as alternative backend.
|
||
4. Add tool config section to schema.
|
||
5. Register in `allBuiltinTools`.
|
||
6. Tests: mock API responses, result formatting.
|
||
|
||
---
|
||
|
||
## Phase 5: Background Exec / Process Management (P1)
|
||
|
||
### Problem
|
||
|
||
Flynn's `shell.exec` (`src/tools/builtin/shell.ts`) is fire-and-forget: it runs a command, waits for it to finish (up to 30s timeout), and returns stdout/stderr. There's no way to:
|
||
|
||
- Run a long-running process (e.g., `npm run dev`)
|
||
- Check on a running process
|
||
- Read its ongoing output
|
||
- Kill it
|
||
|
||
### Design
|
||
|
||
Add a `process` tool family that manages background processes:
|
||
|
||
| Tool | Description |
|
||
|------|-------------|
|
||
| `process.start` | Start a command in the background. Returns a process ID. |
|
||
| `process.status` | Check if a process is running, exited, or errored. |
|
||
| `process.output` | Read recent stdout/stderr from a background process. |
|
||
| `process.kill` | Kill a background process. |
|
||
| `process.list` | List all managed background processes. |
|
||
|
||
#### Process manager
|
||
|
||
Create a `ProcessManager` class that maintains a registry of spawned processes:
|
||
|
||
```typescript
|
||
interface ManagedProcess {
|
||
id: string;
|
||
command: string;
|
||
cwd?: string;
|
||
pid: number;
|
||
status: 'running' | 'exited' | 'killed' | 'error';
|
||
exitCode?: number;
|
||
outputBuffer: RingBuffer; // Last N bytes of combined stdout+stderr
|
||
startedAt: number;
|
||
}
|
||
```
|
||
|
||
#### Output buffering
|
||
|
||
Use a ring buffer (circular buffer) to keep the last 64KB of output per process. This prevents memory leaks from long-running processes with verbose output.
|
||
|
||
#### Safety
|
||
|
||
- **Max processes:** Limit to 10 concurrent background processes.
|
||
- **Auto-cleanup:** Kill processes that have been running for more than 1 hour (configurable).
|
||
- **Shutdown cleanup:** Kill all managed processes on daemon shutdown.
|
||
- **Hook integration:** `process.start` should go through the confirmation engine (same as `shell.exec`).
|
||
|
||
#### New files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `src/tools/builtin/process/manager.ts` | ProcessManager class |
|
||
| `src/tools/builtin/process/start.ts` | `process.start` tool |
|
||
| `src/tools/builtin/process/status.ts` | `process.status` tool |
|
||
| `src/tools/builtin/process/output.ts` | `process.output` tool |
|
||
| `src/tools/builtin/process/kill.ts` | `process.kill` tool |
|
||
| `src/tools/builtin/process/list.ts` | `process.list` tool |
|
||
| `src/tools/builtin/process/index.ts` | Exports |
|
||
|
||
#### Changes to existing files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `src/tools/builtin/index.ts` | Register process tools |
|
||
| `src/daemon/index.ts` | Create ProcessManager, pass to tool constructors, register shutdown handler |
|
||
| `src/config/schema.ts` | Add `process` config: `{ max_concurrent, max_runtime_minutes, buffer_size }` |
|
||
|
||
#### Implementation steps
|
||
|
||
1. Implement `RingBuffer` utility (or use an npm package like `ringbufferjs`).
|
||
2. Create `ProcessManager` class with spawn, track, kill, cleanup methods.
|
||
3. Implement 5 process tools.
|
||
4. Register tools and wire shutdown cleanup.
|
||
5. Tests: spawn + kill lifecycle, output buffering, max process limits.
|
||
|
||
---
|
||
|
||
## Phase 6: Enhanced web_fetch (P1)
|
||
|
||
### Problem
|
||
|
||
Flynn's `web.fetch` (`src/tools/builtin/web-fetch.ts:19-50`) is a bare `fetch()` call that returns raw HTML. This is nearly useless for LLMs — they need extracted text/markdown, not raw HTML with scripts and styles.
|
||
|
||
### Design
|
||
|
||
#### Enhancements
|
||
|
||
1. **HTML-to-markdown extraction** — Strip scripts/styles, convert to markdown using `@mozilla/readability` + `turndown`.
|
||
2. **Format parameter** — Let the agent choose: `text`, `markdown` (default), or `html`.
|
||
3. **Response caching** — Cache fetched pages for 5 minutes to avoid redundant requests in tool loops.
|
||
4. **Redirect following** — Already handled by `fetch()`, but add a max redirect limit.
|
||
5. **Content type handling** — Return JSON prettified, plain text as-is, HTML converted.
|
||
|
||
#### Libraries
|
||
|
||
| Package | Purpose |
|
||
|---------|---------|
|
||
| `turndown` | HTML → Markdown converter |
|
||
| `linkedom` | Lightweight DOM implementation (for Readability) |
|
||
| `@mozilla/readability` | Extract article content from HTML |
|
||
|
||
Using `linkedom` instead of `jsdom` — it's much lighter and sufficient for content extraction.
|
||
|
||
#### Tool interface update
|
||
|
||
```typescript
|
||
{
|
||
name: 'web.fetch',
|
||
description: 'Fetch a URL and extract its content. Returns clean text/markdown by default, not raw HTML.',
|
||
inputSchema: {
|
||
type: 'object',
|
||
properties: {
|
||
url: { type: 'string', description: 'The URL to fetch' },
|
||
format: { type: 'string', enum: ['markdown', 'text', 'html'], description: 'Output format (default: markdown)' },
|
||
timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
|
||
},
|
||
required: ['url'],
|
||
},
|
||
}
|
||
```
|
||
|
||
#### Caching
|
||
|
||
Simple in-memory cache with TTL:
|
||
|
||
```typescript
|
||
const cache = new Map<string, { content: string; timestamp: number }>();
|
||
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes
|
||
```
|
||
|
||
#### Changes to existing files
|
||
|
||
| File | Change |
|
||
|------|--------|
|
||
| `src/tools/builtin/web-fetch.ts` | Major rewrite — add extraction, caching, format parameter |
|
||
|
||
#### Implementation steps
|
||
|
||
1. Add `turndown`, `linkedom`, `@mozilla/readability` dependencies.
|
||
2. Create extraction pipeline: fetch → parse DOM → readability → turndown → clean markdown.
|
||
3. Add format parameter handling.
|
||
4. Add response caching.
|
||
5. Update tool description to reflect new capabilities.
|
||
6. Tests: extraction from sample HTML, caching behaviour, format handling.
|
||
|
||
---
|
||
|
||
## Implementation Order
|
||
|
||
```
|
||
Week 1: Phase 0 (Multi-Model Delegation) ─────────────────────── P0 (foundational)
|
||
Week 2: Phase 1 (Context Compaction) ─────────────────────────── P0 (uses delegation)
|
||
Week 3: Phase 2 (Memory System) ──────────────────────────────── P0 (uses delegation)
|
||
Week 4: Phase 4 (Web Search) + Phase 6 (Enhanced web_fetch) ─── P1 (quick wins)
|
||
Week 5: Phase 5 (Process Management) ─────────────────────────── P1
|
||
Week 6+: Phase 3 (Channels: Discord → Slack → WhatsApp) ──────── P1
|
||
```
|
||
|
||
**Rationale:**
|
||
- **Delegation first** — Phase 0 is foundational. Compaction and memory both need to delegate subtasks to cheaper models. Building the orchestrator first means Phase 1 and 2 can use it immediately.
|
||
- Compaction and memory are sequential (memory extraction depends on compaction).
|
||
- Web search and enhanced web_fetch are small, independent, and immediately useful — do them as palate cleansers between the big features.
|
||
- Process management is self-contained.
|
||
- Channels are the largest body of work but each is independent — can be done in parallel or interleaved.
|
||
|
||
### Model usage across all phases
|
||
|
||
| Phase | Primary model (user-facing) | Delegated tasks | Delegation tier |
|
||
|-------|---------------------------|-----------------|-----------------|
|
||
| 0 | Sonnet (default) | Sub-agent infrastructure | N/A (infrastructure) |
|
||
| 1 | Sonnet (default) | Compaction summaries | Haiku (fast) |
|
||
| 2 | Sonnet (default) | Memory fact extraction | Haiku (fast) |
|
||
| 3 | Sonnet (default) | Message classification, markdown adaptation | Haiku (fast) |
|
||
| 4 | Sonnet (default) | None (direct API call) | N/A |
|
||
| 5 | Sonnet (default) | None | N/A |
|
||
| 6 | Sonnet (default) | None | N/A |
|
||
|
||
Opus (complex) is reserved for **user-facing tasks** that require deep reasoning — it's never used for background operations.
|
||
|
||
---
|
||
|
||
## Testing Strategy
|
||
|
||
Each phase should include:
|
||
|
||
1. **Unit tests** — Pure logic (token estimation, ring buffer, markdown extraction, memory search).
|
||
2. **Integration tests** — Tool execution with mocked model responses.
|
||
3. **Manual smoke test** — Run via TUI and Telegram to verify end-to-end.
|
||
|
||
Key test files to create:
|
||
|
||
| Test file | Covers |
|
||
|-----------|--------|
|
||
| `src/backends/native/orchestrator.test.ts` | Delegation routing, tier selection, depth limiting, cost tracking |
|
||
| `src/context/tokens.test.ts` | Token estimation accuracy |
|
||
| `src/context/compaction.test.ts` | Compaction trigger logic, summary replacement, fast-tier delegation |
|
||
| `src/memory/store.test.ts` | Memory CRUD, search |
|
||
| `src/tools/builtin/web-search.test.ts` | API mocking, result formatting |
|
||
| `src/tools/builtin/process/manager.test.ts` | Process lifecycle, cleanup |
|
||
| `src/tools/builtin/web-fetch.test.ts` | HTML extraction, caching |
|
||
|
||
---
|
||
|
||
## Risk Assessment
|
||
|
||
| Risk | Impact | Mitigation |
|
||
|------|--------|------------|
|
||
| Haiku summaries lose critical context vs Sonnet | High | Validate quality; use detailed extraction prompts; allow per-task tier override in config |
|
||
| Delegation depth spirals (agent delegates to agent that delegates...) | Medium | Hard limit `max_delegation_depth: 3`; sub-agents cannot spawn sub-agents |
|
||
| Fast tier unavailable (Haiku rate limit / outage) | Medium | Fallback to default tier for delegation; log the fallback cost increase |
|
||
| Compaction summaries lose critical context | High | Keep last 4 turns intact; allow user to adjust `keep_turns`; log what was compacted |
|
||
| Memory injection bloats system prompt | Medium | Hard cap on injected memory tokens; truncate oldest entries |
|
||
| WhatsApp auth flow is fragile | Medium | Defer WhatsApp to last; use battle-tested Baileys library |
|
||
| Brave Search free tier limits (2k/month) | Low | SearXNG as free self-hosted fallback |
|
||
| Background processes leak resources | Medium | Max process limit, auto-kill timeout, shutdown cleanup |
|
||
| HTML extraction fails on JS-heavy sites | Low | Accept graceful degradation; defer CDP/browser fallback to P3 |
|