flynn/docs/plans/2026-02-05-openclaw-parity-design.md

# OpenClaw Parity Design: Flynn Feature Roadmap

## Overview

Plan to evolve Flynn from a multi-model chat wrapper into a full self-hosted OpenClaw alternative. Bottom-up approach: tools first, then gateway, then channels, then skills/advanced features.

**Build approach**: Use Sonnet/Haiku via GitHub Copilot for implementation subagents.

**Current state**: Flynn v0.1.0 covers ~22% of OpenClaw features. Strong in model routing (4 providers, tiered fallback) and session management (SQLite, transfer). Zero tool execution capability.

---

## Phase 1: Agent Tool Framework + Agent Loop

**Goal**: Turn Flynn from a chatbot into an agent that can execute tools with multi-step reasoning.

### 1.1 Tool Definition System

```
src/tools/
├── types.ts          # Tool, ToolCall, ToolResult interfaces
├── registry.ts       # ToolRegistry: register, lookup, list, serialize for providers
├── executor.ts       # ToolExecutor: run tools, enforce hooks, timeout, truncation
├── builtin/
│   ├── shell.ts      # shell.exec - run bash commands (cwd, timeout, max output)
│   ├── file-read.ts  # file.read - read file contents (path, offset, limit)
│   ├── file-write.ts # file.write - write/create files (path, content)
│   ├── file-edit.ts  # file.edit - find-and-replace (path, oldString, newString)
│   ├── file-list.ts  # file.list - glob/list directory (pattern, path)
│   └── web-fetch.ts  # web.fetch - HTTP GET with markdown conversion (url, format)
```

**Tool interface**:
```typescript
interface Tool {
  name: string;                           // e.g. "shell.exec"
  description: string;                    // For the model's tool selection
  inputSchema: Record<string, unknown>;   // JSON Schema for parameters
  execute(args: unknown): Promise<ToolResult>;
}

interface ToolResult {
  success: boolean;
  output: string;
  error?: string;
}

interface ToolCall {
  id: string;          // Provider-assigned call ID
  name: string;        // Tool name
  args: unknown;       // Parsed arguments
}
```

**ToolRegistry**: Collects tools, serializes to Anthropic format (`{ name, description, input_schema }`) or OpenAI format (`{ type: "function", function: { name, description, parameters } }`).

**ToolExecutor**: Wraps execution with:
- Hook engine check (confirm/log/silent) before execution
- Configurable timeout (default 30s for shell, 10s for file ops)
- Output truncation (max 50KB, with "truncated" marker)
- Error capture (catches exceptions, returns as ToolResult.error)

### 1.2 Model Provider Tool Support

Update `ChatRequest` to accept optional `tools: Tool[]`.

**Anthropic** (`src/models/anthropic.ts`):
- Pass tools as `tools` parameter to `messages.create()`/`messages.stream()`
- Parse `tool_use` content blocks from response (id, name, input)
- Accept `tool_result` content blocks in messages (tool_use_id, content)
- Handle `stop_reason: "tool_use"` to signal tool call response

**OpenAI** (`src/models/openai.ts`):
- Pass tools as `tools` parameter with `type: "function"` wrapper
- Parse `tool_calls` from response choices
- Accept `role: "tool"` messages with `tool_call_id`
- Handle `finish_reason: "tool_calls"` to signal tool call response

**Types updates** (`src/models/types.ts`):
- Add `ToolCall` to `ChatResponse`
- Add `tool_use` and `tool_result` message roles
- Add `tools` to `ChatRequest`
- Add `tool_calls` stream event type

### 1.3 Agent Loop

Replace single-turn NativeAgent with iterative tool-use loop:

```
User message
  -> Model call (with tools in request)
  -> If response contains tool_use:
       -> For each tool call:
            -> Check hooks (confirm/log/silent)
            -> If confirm: wait for approval (Telegram inline keyboard / TUI prompt)
            -> Execute tool via ToolExecutor
            -> Collect ToolResult
       -> Append tool_results to conversation
       -> Loop back to model call
  -> If text response (no tool_use):
       -> Return final response to user
  -> If max iterations reached (default 10):
       -> Return partial response with warning
```

**Streaming during tool execution**: Emit status events:
- `{ event: "tool_start", tool: "shell.exec", args: { command: "ls" } }`
- `{ event: "tool_end", tool: "shell.exec", result: { success: true, output: "..." } }`
- These render as status lines in TUI and Telegram

**Abort support**:
- TUI: Escape key sets abort flag, checked before each tool execution
- Telegram: `/cancel` command sets abort flag on active session
- Agent loop checks abort flag before each iteration

### 1.4 Frontend Updates

**Telegram**:
- Show tool execution as status messages ("Running shell.exec: `ls -la`...")
- Confirmation buttons already exist, wire to tool executor
- Show final response after tool loop completes

**TUI (both modes)**:
- Show tool calls inline with dimmed formatting
- Show tool results with output (truncated in UI if long)
- Streaming text interleaved with tool status
- Escape aborts the agent loop

### 1.5 Deliverables

- [ ] `src/tools/types.ts` - Tool, ToolCall, ToolResult interfaces
- [ ] `src/tools/registry.ts` - ToolRegistry with provider serialization
- [ ] `src/tools/executor.ts` - ToolExecutor with hooks, timeout, truncation
- [ ] `src/tools/builtin/shell.ts` - Shell exec tool
- [ ] `src/tools/builtin/file-read.ts` - File read tool
- [ ] `src/tools/builtin/file-write.ts` - File write tool
- [ ] `src/tools/builtin/file-edit.ts` - File edit tool
- [ ] `src/tools/builtin/file-list.ts` - File list/glob tool
- [ ] `src/tools/builtin/web-fetch.ts` - Web fetch tool
- [ ] `src/models/types.ts` - Add tool-related types
- [ ] `src/models/anthropic.ts` - Add tool use support
- [ ] `src/models/openai.ts` - Add tool use support
- [ ] `src/backends/native/agent.ts` - Agent loop with tool execution
- [ ] `src/frontends/telegram/bot.ts` - Tool status messages
- [ ] `src/frontends/tui/minimal.ts` - Tool display in minimal TUI
- [ ] `src/frontends/tui/components/App.tsx` - Tool display in fullscreen TUI
- [ ] Tests for all new modules

---

## Phase 2: WebSocket Gateway

**Goal**: Central control plane that multiple clients connect to. Decouples frontends from the agent.

### 2.1 Gateway Core

```
src/gateway/
├── server.ts         # WebSocket server (ws library), configurable port
├── protocol.ts       # JSON-RPC-style message types
├── router.ts         # Routes methods to handlers
├── auth.ts           # Token + password auth, Tailscale identity headers
├── session-bridge.ts # Maps WS client connections to sessions
└── handlers/
    ├── agent.ts      # agent.send, agent.cancel, agent.status
    ├── sessions.ts   # sessions.list, sessions.history, sessions.send
    ├── tools.ts      # tools.list, tools.invoke
    ├── config.ts     # config.get, config.patch
    └── system.ts     # system.health, system.restart
```

### 2.2 Protocol

JSON-RPC-like over WebSocket:

```
Request:  { id: number, method: string, params: object }
Response: { id: number, result: object }
Error:    { id: number, error: { code: number, message: string } }
Event:    { id: number, event: string, data: object }  // streaming
```

Event types for agent.send streaming:
- `content` - text chunk from model
- `tool_start` - tool execution beginning
- `tool_end` - tool execution complete
- `thinking` - model reasoning (if exposed)
- `done` - final response

### 2.3 Control UI

Minimal web dashboard served from gateway port:

```
src/gateway/ui/
├── index.html        # Dashboard: sessions, model info, config
├── chat.html         # WebChat: WS-connected chat interface
└── assets/           # CSS + minimal JS (no framework, or Preact)
```

### 2.4 Daemon Refactor

Gateway becomes the hub:
- Owns session manager, tool registry, model router
- Telegram adapter connects as a gateway client (in-process bridge)
- TUI connects as a gateway client (in-process or WS)
- WebChat connects via WS from browser

### 2.5 Deliverables

- [ ] `src/gateway/server.ts` - WebSocket server
- [ ] `src/gateway/protocol.ts` - Message type definitions
- [ ] `src/gateway/router.ts` - Method routing
- [ ] `src/gateway/auth.ts` - Authentication
- [ ] `src/gateway/session-bridge.ts` - Client-to-session mapping
- [ ] Gateway handlers (agent, sessions, tools, config, system)
- [ ] `src/gateway/ui/` - Control UI + WebChat
- [ ] Refactor daemon to use gateway as hub
- [ ] Refactor Telegram as gateway client
- [ ] Tests for gateway protocol and handlers

---

## Phase 3: Channel Adapters

**Goal**: Multi-channel inbox. One assistant accessible from WhatsApp, Telegram, Discord, Slack, and WebChat.

### 3.1 Channel Adapter Interface

```
src/channels/
├── types.ts          # ChannelAdapter interface
├── registry.ts       # ChannelRegistry: load/unload at runtime
├── telegram/         # Refactored from src/frontends/telegram/
├── discord/          # discord.js
├── whatsapp/         # Baileys
├── slack/            # Bolt (Socket Mode)
└── webchat/          # Gateway WS built-in
```

**ChannelAdapter interface**:
```typescript
interface ChannelAdapter {
  name: string;
  connect(): Promise<void>;
  disconnect(): Promise<void>;
  send(peerId: string, message: OutboundMessage): Promise<void>;
  onMessage(handler: (msg: InboundMessage) => void): void;
}
```

### 3.2 Build Order

1. Telegram (refactor existing)
2. Discord (discord.js)
3. WhatsApp (Baileys)
4. Slack (Bolt Socket Mode)
5. WebChat (gateway built-in)

### 3.3 Security Per Channel

Every adapter implements:
- DM pairing: unknown senders get pairing code, must be approved via CLI/UI
- Allowlists: `channels.<name>.allowFrom` config array
- Group mention gating: `channels.<name>.groups.*.requireMention`
- Rate limiting: per-sender throttle (configurable)
- Message size limits per channel

### 3.4 Deliverables

- [ ] `src/channels/types.ts` - ChannelAdapter interface
- [ ] `src/channels/registry.ts` - Channel registry
- [ ] `src/channels/telegram/` - Refactored Telegram adapter
- [ ] `src/channels/discord/` - Discord adapter
- [ ] `src/channels/whatsapp/` - WhatsApp adapter
- [ ] `src/channels/slack/` - Slack adapter
- [ ] `src/channels/webchat/` - WebChat adapter
- [ ] DM pairing system
- [ ] Per-channel security (allowlists, mention gating, rate limiting)
- [ ] Tests per adapter

---

## Phase 4: Skills + MCP

**Goal**: Extensible capability system with community skills and MCP tool servers.

### 4.1 Skills System

```
src/skills/
├── types.ts          # Skill, SkillManifest interfaces
├── loader.ts         # Load SKILL.md + scripts from skill directories
├── registry.ts       # Discovery, gating (OS/bin/env checks)
└── installer.ts      # Auto-install dependencies (with user confirmation)
```

Skill directory structure:
```
~/.flynn/workspace/skills/<name>/
├── SKILL.md          # Instructions injected into system prompt
├── manifest.json     # Requirements, permissions, dependencies
└── scripts/          # Executable helpers
```

Three tiers: bundled (shipped with Flynn), managed (installed via CLI), workspace (user-created).

### 4.2 MCP Integration

```
src/mcp/
├── client.ts         # MCP client (stdio transport)
├── bridge.ts         # Convert MCP tools -> Flynn tool registry entries
└── manager.ts        # Lifecycle: start/stop/restart MCP servers per config
```

Wire the existing `mcp.servers` config to actually start MCP server processes, discover their tools, and register them in the tool registry. MCP tools appear alongside builtins -- the agent doesn't know the difference.

### 4.3 Deliverables

- [ ] Skills types, loader, registry, installer
- [ ] MCP client, bridge, manager
- [ ] Wire MCP config to runtime
- [ ] Bundled skills (at least: web-search, git, system-info)
- [ ] Tests

---

## Phase 5: Advanced Features

Build after core is solid. Each is independent.

| Feature | Module | Description |
|---------|--------|-------------|
| Cron/scheduling | `src/automation/cron.ts` | Cron expressions trigger agent messages |
| Webhooks | `src/automation/webhooks.ts` | Inbound HTTP triggers |
| Browser tool | `src/tools/builtin/browser.ts` | Playwright headless browser |
| Agent-to-agent | `sessions_list/history/send` tools | Multi-agent coordination |
| Sub-agents | `src/backends/native/subagent.ts` | Spawn scoped child sessions |
| Sandboxing | `src/sandbox/` | Docker per-session isolation |
| Voice | `src/voice/` | STT (Whisper) + TTS (ElevenLabs) |
| Canvas/A2UI | `src/gateway/ui/canvas/` | Agent-driven web workspace |
| CLI surface | `src/cli/` | `flynn gateway`, `flynn agent`, `flynn send`, etc. |
| Mobile nodes | Separate repos | iOS (Swift) + Android (Kotlin) companion apps |
| Onboarding wizard | `src/cli/onboard.ts` | Guided setup flow |
| Doctor diagnostics | `src/cli/doctor.ts` | Config + health validation |

---

## Implementation Notes

### Model Selection for Subagents

Use cheaper/faster models via GitHub Copilot for implementation:
- **Sonnet**: Complex implementation tasks (agent loop, gateway, protocol)
- **Haiku**: Mechanical tasks (individual tool implementations, adapter boilerplate, tests)
- **Opus**: Design review, architecture decisions only

### Testing Strategy

- Unit tests for all tools (mock filesystem/network)
- Unit tests for tool registry serialization (Anthropic + OpenAI formats)
- Integration tests for agent loop (mock model returns tool_use, verify execution)
- Integration tests for gateway protocol (WS client/server)
- E2E tests for tool execution (real shell commands in temp dirs)

### Migration Path

- Phase 1 is additive (no breaking changes to existing code)
- Phase 2 refactors daemon (breaking, but internal only)
- Phase 3 moves Telegram (file rename, adapter interface)
- Each phase is a separate feature branch

---

*Design Version: 1.0*
*Created: 2026-02-05*
*Approach: Bottom-up (tools -> gateway -> channels -> skills -> advanced)*