Files
flynn/docs/plans/2026-02-05-openclaw-parity-design.md
T
William Valentin aa95f2132c feat: add channel adapter abstraction with Telegram and WebChat adapters
Implement Phase 3 channel adapters that decouple message sources from
the agent via a uniform ChannelAdapter interface and ChannelRegistry.

- Add ChannelAdapter/InboundMessage/OutboundMessage types
- Add ChannelRegistry for adapter lifecycle and message routing
- Add TelegramAdapter (grammy bot, auth middleware, confirmations, chunking)
- Add WebChatAdapter (thin shim over GatewayServer)
- Refactor daemon to use ChannelRegistry with per-channel-per-user agents
- Add config.get/config.patch gateway handlers (Phase 2 loose end)
- Add system.restart gateway handler (Phase 2 loose end)
- Add implementation plans and design docs

Tests: 225 passing (33 new channel adapter + gateway handler tests)
2026-02-05 20:00:36 -08:00

379 lines
14 KiB
Markdown

# OpenClaw Parity Design: Flynn Feature Roadmap
## Overview
Plan to evolve Flynn from a multi-model chat wrapper into a full self-hosted OpenClaw alternative. Bottom-up approach: tools first, then gateway, then channels, then skills/advanced features.
**Build approach**: Use Sonnet/Haiku via GitHub Copilot for implementation subagents.
**Current state**: Flynn v0.1.0 covers ~22% of OpenClaw features. Strong in model routing (4 providers, tiered fallback) and session management (SQLite, transfer). Zero tool execution capability.
---
## Phase 1: Agent Tool Framework + Agent Loop
**Goal**: Turn Flynn from a chatbot into an agent that can execute tools with multi-step reasoning.
### 1.1 Tool Definition System
```
src/tools/
├── types.ts # Tool, ToolCall, ToolResult interfaces
├── registry.ts # ToolRegistry: register, lookup, list, serialize for providers
├── executor.ts # ToolExecutor: run tools, enforce hooks, timeout, truncation
├── builtin/
│ ├── shell.ts # shell.exec - run bash commands (cwd, timeout, max output)
│ ├── file-read.ts # file.read - read file contents (path, offset, limit)
│ ├── file-write.ts # file.write - write/create files (path, content)
│ ├── file-edit.ts # file.edit - find-and-replace (path, oldString, newString)
│ ├── file-list.ts # file.list - glob/list directory (pattern, path)
│ └── web-fetch.ts # web.fetch - HTTP GET with markdown conversion (url, format)
```
**Tool interface**:
```typescript
interface Tool {
name: string; // e.g. "shell.exec"
description: string; // For the model's tool selection
inputSchema: Record<string, unknown>; // JSON Schema for parameters
execute(args: unknown): Promise<ToolResult>;
}
interface ToolResult {
success: boolean;
output: string;
error?: string;
}
interface ToolCall {
id: string; // Provider-assigned call ID
name: string; // Tool name
args: unknown; // Parsed arguments
}
```
**ToolRegistry**: Collects tools, serializes to Anthropic format (`{ name, description, input_schema }`) or OpenAI format (`{ type: "function", function: { name, description, parameters } }`).
**ToolExecutor**: Wraps execution with:
- Hook engine check (confirm/log/silent) before execution
- Configurable timeout (default 30s for shell, 10s for file ops)
- Output truncation (max 50KB, with "truncated" marker)
- Error capture (catches exceptions, returns as ToolResult.error)
### 1.2 Model Provider Tool Support
Update `ChatRequest` to accept optional `tools: Tool[]`.
**Anthropic** (`src/models/anthropic.ts`):
- Pass tools as `tools` parameter to `messages.create()`/`messages.stream()`
- Parse `tool_use` content blocks from response (id, name, input)
- Accept `tool_result` content blocks in messages (tool_use_id, content)
- Handle `stop_reason: "tool_use"` to signal tool call response
**OpenAI** (`src/models/openai.ts`):
- Pass tools as `tools` parameter with `type: "function"` wrapper
- Parse `tool_calls` from response choices
- Accept `role: "tool"` messages with `tool_call_id`
- Handle `finish_reason: "tool_calls"` to signal tool call response
**Types updates** (`src/models/types.ts`):
- Add `ToolCall` to `ChatResponse`
- Add `tool_use` and `tool_result` message roles
- Add `tools` to `ChatRequest`
- Add `tool_calls` stream event type
### 1.3 Agent Loop
Replace single-turn NativeAgent with iterative tool-use loop:
```
User message
-> Model call (with tools in request)
-> If response contains tool_use:
-> For each tool call:
-> Check hooks (confirm/log/silent)
-> If confirm: wait for approval (Telegram inline keyboard / TUI prompt)
-> Execute tool via ToolExecutor
-> Collect ToolResult
-> Append tool_results to conversation
-> Loop back to model call
-> If text response (no tool_use):
-> Return final response to user
-> If max iterations reached (default 10):
-> Return partial response with warning
```
**Streaming during tool execution**: Emit status events:
- `{ event: "tool_start", tool: "shell.exec", args: { command: "ls" } }`
- `{ event: "tool_end", tool: "shell.exec", result: { success: true, output: "..." } }`
- These render as status lines in TUI and Telegram
**Abort support**:
- TUI: Escape key sets abort flag, checked before each tool execution
- Telegram: `/cancel` command sets abort flag on active session
- Agent loop checks abort flag before each iteration
### 1.4 Frontend Updates
**Telegram**:
- Show tool execution as status messages ("Running shell.exec: `ls -la`...")
- Confirmation buttons already exist, wire to tool executor
- Show final response after tool loop completes
**TUI (both modes)**:
- Show tool calls inline with dimmed formatting
- Show tool results with output (truncated in UI if long)
- Streaming text interleaved with tool status
- Escape aborts the agent loop
### 1.5 Deliverables
- [ ] `src/tools/types.ts` - Tool, ToolCall, ToolResult interfaces
- [ ] `src/tools/registry.ts` - ToolRegistry with provider serialization
- [ ] `src/tools/executor.ts` - ToolExecutor with hooks, timeout, truncation
- [ ] `src/tools/builtin/shell.ts` - Shell exec tool
- [ ] `src/tools/builtin/file-read.ts` - File read tool
- [ ] `src/tools/builtin/file-write.ts` - File write tool
- [ ] `src/tools/builtin/file-edit.ts` - File edit tool
- [ ] `src/tools/builtin/file-list.ts` - File list/glob tool
- [ ] `src/tools/builtin/web-fetch.ts` - Web fetch tool
- [ ] `src/models/types.ts` - Add tool-related types
- [ ] `src/models/anthropic.ts` - Add tool use support
- [ ] `src/models/openai.ts` - Add tool use support
- [ ] `src/backends/native/agent.ts` - Agent loop with tool execution
- [ ] `src/frontends/telegram/bot.ts` - Tool status messages
- [ ] `src/frontends/tui/minimal.ts` - Tool display in minimal TUI
- [ ] `src/frontends/tui/components/App.tsx` - Tool display in fullscreen TUI
- [ ] Tests for all new modules
---
## Phase 2: WebSocket Gateway
**Goal**: Central control plane that multiple clients connect to. Decouples frontends from the agent.
### 2.1 Gateway Core
```
src/gateway/
├── server.ts # WebSocket server (ws library), configurable port
├── protocol.ts # JSON-RPC-style message types
├── router.ts # Routes methods to handlers
├── auth.ts # Token + password auth, Tailscale identity headers
├── session-bridge.ts # Maps WS client connections to sessions
└── handlers/
├── agent.ts # agent.send, agent.cancel, agent.status
├── sessions.ts # sessions.list, sessions.history, sessions.send
├── tools.ts # tools.list, tools.invoke
├── config.ts # config.get, config.patch
└── system.ts # system.health, system.restart
```
### 2.2 Protocol
JSON-RPC-like over WebSocket:
```
Request: { id: number, method: string, params: object }
Response: { id: number, result: object }
Error: { id: number, error: { code: number, message: string } }
Event: { id: number, event: string, data: object } // streaming
```
Event types for agent.send streaming:
- `content` - text chunk from model
- `tool_start` - tool execution beginning
- `tool_end` - tool execution complete
- `thinking` - model reasoning (if exposed)
- `done` - final response
### 2.3 Control UI
Minimal web dashboard served from gateway port:
```
src/gateway/ui/
├── index.html # Dashboard: sessions, model info, config
├── chat.html # WebChat: WS-connected chat interface
└── assets/ # CSS + minimal JS (no framework, or Preact)
```
### 2.4 Daemon Refactor
Gateway becomes the hub:
- Owns session manager, tool registry, model router
- Telegram adapter connects as a gateway client (in-process bridge)
- TUI connects as a gateway client (in-process or WS)
- WebChat connects via WS from browser
### 2.5 Deliverables
- [ ] `src/gateway/server.ts` - WebSocket server
- [ ] `src/gateway/protocol.ts` - Message type definitions
- [ ] `src/gateway/router.ts` - Method routing
- [ ] `src/gateway/auth.ts` - Authentication
- [ ] `src/gateway/session-bridge.ts` - Client-to-session mapping
- [ ] Gateway handlers (agent, sessions, tools, config, system)
- [ ] `src/gateway/ui/` - Control UI + WebChat
- [ ] Refactor daemon to use gateway as hub
- [ ] Refactor Telegram as gateway client
- [ ] Tests for gateway protocol and handlers
---
## Phase 3: Channel Adapters
**Goal**: Multi-channel inbox. One assistant accessible from WhatsApp, Telegram, Discord, Slack, and WebChat.
### 3.1 Channel Adapter Interface
```
src/channels/
├── types.ts # ChannelAdapter interface
├── registry.ts # ChannelRegistry: load/unload at runtime
├── telegram/ # Refactored from src/frontends/telegram/
├── discord/ # discord.js
├── whatsapp/ # Baileys
├── slack/ # Bolt (Socket Mode)
└── webchat/ # Gateway WS built-in
```
**ChannelAdapter interface**:
```typescript
interface ChannelAdapter {
name: string;
connect(): Promise<void>;
disconnect(): Promise<void>;
send(peerId: string, message: OutboundMessage): Promise<void>;
onMessage(handler: (msg: InboundMessage) => void): void;
}
```
### 3.2 Build Order
1. Telegram (refactor existing)
2. Discord (discord.js)
3. WhatsApp (Baileys)
4. Slack (Bolt Socket Mode)
5. WebChat (gateway built-in)
### 3.3 Security Per Channel
Every adapter implements:
- DM pairing: unknown senders get pairing code, must be approved via CLI/UI
- Allowlists: `channels.<name>.allowFrom` config array
- Group mention gating: `channels.<name>.groups.*.requireMention`
- Rate limiting: per-sender throttle (configurable)
- Message size limits per channel
### 3.4 Deliverables
- [ ] `src/channels/types.ts` - ChannelAdapter interface
- [ ] `src/channels/registry.ts` - Channel registry
- [ ] `src/channels/telegram/` - Refactored Telegram adapter
- [ ] `src/channels/discord/` - Discord adapter
- [ ] `src/channels/whatsapp/` - WhatsApp adapter
- [ ] `src/channels/slack/` - Slack adapter
- [ ] `src/channels/webchat/` - WebChat adapter
- [ ] DM pairing system
- [ ] Per-channel security (allowlists, mention gating, rate limiting)
- [ ] Tests per adapter
---
## Phase 4: Skills + MCP
**Goal**: Extensible capability system with community skills and MCP tool servers.
### 4.1 Skills System
```
src/skills/
├── types.ts # Skill, SkillManifest interfaces
├── loader.ts # Load SKILL.md + scripts from skill directories
├── registry.ts # Discovery, gating (OS/bin/env checks)
└── installer.ts # Auto-install dependencies (with user confirmation)
```
Skill directory structure:
```
~/.flynn/workspace/skills/<name>/
├── SKILL.md # Instructions injected into system prompt
├── manifest.json # Requirements, permissions, dependencies
└── scripts/ # Executable helpers
```
Three tiers: bundled (shipped with Flynn), managed (installed via CLI), workspace (user-created).
### 4.2 MCP Integration
```
src/mcp/
├── client.ts # MCP client (stdio transport)
├── bridge.ts # Convert MCP tools -> Flynn tool registry entries
└── manager.ts # Lifecycle: start/stop/restart MCP servers per config
```
Wire the existing `mcp.servers` config to actually start MCP server processes, discover their tools, and register them in the tool registry. MCP tools appear alongside builtins -- the agent doesn't know the difference.
### 4.3 Deliverables
- [ ] Skills types, loader, registry, installer
- [ ] MCP client, bridge, manager
- [ ] Wire MCP config to runtime
- [ ] Bundled skills (at least: web-search, git, system-info)
- [ ] Tests
---
## Phase 5: Advanced Features
Build after core is solid. Each is independent.
| Feature | Module | Description |
|---------|--------|-------------|
| Cron/scheduling | `src/automation/cron.ts` | Cron expressions trigger agent messages |
| Webhooks | `src/automation/webhooks.ts` | Inbound HTTP triggers |
| Browser tool | `src/tools/builtin/browser.ts` | Playwright headless browser |
| Agent-to-agent | `sessions_list/history/send` tools | Multi-agent coordination |
| Sub-agents | `src/backends/native/subagent.ts` | Spawn scoped child sessions |
| Sandboxing | `src/sandbox/` | Docker per-session isolation |
| Voice | `src/voice/` | STT (Whisper) + TTS (ElevenLabs) |
| Canvas/A2UI | `src/gateway/ui/canvas/` | Agent-driven web workspace |
| CLI surface | `src/cli/` | `flynn gateway`, `flynn agent`, `flynn send`, etc. |
| Mobile nodes | Separate repos | iOS (Swift) + Android (Kotlin) companion apps |
| Onboarding wizard | `src/cli/onboard.ts` | Guided setup flow |
| Doctor diagnostics | `src/cli/doctor.ts` | Config + health validation |
---
## Implementation Notes
### Model Selection for Subagents
Use cheaper/faster models via GitHub Copilot for implementation:
- **Sonnet**: Complex implementation tasks (agent loop, gateway, protocol)
- **Haiku**: Mechanical tasks (individual tool implementations, adapter boilerplate, tests)
- **Opus**: Design review, architecture decisions only
### Testing Strategy
- Unit tests for all tools (mock filesystem/network)
- Unit tests for tool registry serialization (Anthropic + OpenAI formats)
- Integration tests for agent loop (mock model returns tool_use, verify execution)
- Integration tests for gateway protocol (WS client/server)
- E2E tests for tool execution (real shell commands in temp dirs)
### Migration Path
- Phase 1 is additive (no breaking changes to existing code)
- Phase 2 refactors daemon (breaking, but internal only)
- Phase 3 moves Telegram (file rename, adapter interface)
- Each phase is a separate feature branch
---
*Design Version: 1.0*
*Created: 2026-02-05*
*Approach: Bottom-up (tools -> gateway -> channels -> skills -> advanced)*