From cfdd4484950de7e5b9a0ccf21b62bc1d9115f619 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Fri, 6 Feb 2026 16:52:38 -0800 Subject: [PATCH] docs: add Docker sandbox and multi-agent routing design/implementation plans --- ...06-p2-docker-sandbox-multi-agent-design.md | 173 ++ ...cker-sandbox-multi-agent-implementation.md | 1832 +++++++++++++++++ 2 files changed, 2005 insertions(+) create mode 100644 docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-design.md create mode 100644 docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-implementation.md diff --git a/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-design.md b/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-design.md new file mode 100644 index 0000000..2b4509f --- /dev/null +++ b/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-design.md @@ -0,0 +1,173 @@ +# P2: Docker Sandboxing + Multi-Agent Routing — Design + +**Date:** 2026-02-06 +**Status:** Approved +**Priority:** P2 (completes all P2 work) + +--- + +## Feature 1: Docker Sandboxing + +### Goal + +Channel sessions (Telegram, Discord, Slack, WhatsApp) execute `shell.exec` and `process.start` inside Docker containers. TUI and local WebSocket sessions continue running on the host. + +### Architecture + +Tool-level wrapping: sandboxed versions of dangerous tools (`shell.exec`, `process.start`) delegate to `docker exec` inside a per-session container. All other tools (file.read, web.fetch, memory.*, etc.) run on the host unchanged. + +``` +src/sandbox/ + docker.ts — DockerSandbox class (create/exec/destroy containers via CLI) + docker.test.ts — Tests (mocked Docker CLI) + manager.ts — SandboxManager (session→container mapping + lifecycle) + manager.test.ts — Tests + tools.ts — createSandboxedShellTool(), createSandboxedProcessStartTool() + tools.test.ts — Tests + index.ts — Barrel export +``` + +### Config Schema + +```yaml +sandbox: + enabled: false # opt-in + image: "node:22-slim" # base container image + workspace_dir: "/workspace" # mount path inside container + network: "none" # container network mode (none/bridge/host) + memory_limit: "512m" # memory limit per container + cpu_limit: "1.0" # CPU limit per container + timeout_seconds: 300 # auto-kill timeout per container +``` + +### DockerSandbox Class + +Wraps Docker CLI via `child_process.execFile` (no Docker SDK dependency): +- `create()` — `docker create` with resource limits, bind mount, network mode +- `start()` — `docker start` +- `exec(command, opts)` — `docker exec` with timeout, returns stdout/stderr +- `destroy()` — `docker rm -f` +- `isRunning()` — `docker inspect` check + +### SandboxManager + +- `getOrCreate(sessionId, config)` — Lazy container creation on first tool call +- `destroy(sessionId)` — Stop and remove container +- `destroyAll()` — Shutdown hook for daemon cleanup + +### Sandboxed Tools + +- `createSandboxedShellTool(sandbox)` — Same `Tool` interface as `shell.exec`, but runs via `sandbox.exec(command)`. Preserves cwd (translated to container path), timeout, output truncation. +- `createSandboxedProcessStartTool(sandbox)` — Wraps `process.start` to spawn via `docker exec -d` (detached mode). + +### Per-Session ToolRegistry + +When sandbox is active for a channel session, the daemon creates a cloned `ToolRegistry` that replaces `shell.exec` and `process.start` with sandboxed versions. All other tools reference the shared host registry. + +### Error Handling + +- Docker not installed → log warning at startup, fall through to host execution +- Container creation fails → log error, return tool error (not crash) +- Container timeout → `docker rm -f`, return timeout error +- Docker daemon unavailable → graceful degradation with clear error messages + +--- + +## Feature 2: Multi-Agent Routing + +### Goal + +Named agent configurations that can be assigned to channels, senders, or sender patterns. Each agent config specifies its own system prompt, model tier, tool profile, and sandbox setting. + +### Architecture + +``` +src/agents/ + registry.ts — AgentConfigRegistry (stores named AgentConfig objects) + router.ts — AgentRouter (resolves {channel, senderId} → AgentConfig) + router.test.ts — Tests + index.ts — Barrel export +``` + +### Config Schema + +```yaml +agent_configs: + assistant: + system_prompt: "You are a helpful assistant." + model_tier: default + tool_profile: messaging + sandbox: true + + coder: + system_prompt: "You are a coding assistant. Focus on writing clean code." + model_tier: complex + tool_profile: coding + sandbox: true + +routing: + default_agent: assistant + channels: + discord: coder + senders: + "telegram:12345": coder + "slack:U0*": assistant +``` + +### AgentConfigRegistry + +Stores parsed `AgentConfig` objects by name: +- `register(config)` — Add a named config +- `get(name)` — Look up by name +- `list()` — All registered configs +- `loadFromConfig(rawConfig)` — Parse from validated YAML + +### AgentConfig Type + +```typescript +interface AgentConfig { + name: string; + systemPrompt?: string; // overrides global system prompt + modelTier?: ModelTier; // fast/default/complex/local + toolProfile?: ToolProfile; // minimal/messaging/coding/full + toolOverrides?: ToolOverrideConfig; + sandbox?: boolean; // use Docker sandbox (if globally enabled) +} +``` + +### AgentRouter + +Resolves which `AgentConfig` to use for a given message: +1. Check `senders` map — exact match first, then glob patterns (via `minimatch`) +2. Check `channels` map — channel name match +3. Fall back to `routing.default_agent` + +### Daemon Integration + +The `createMessageRouter()` function changes: +1. On message: `agentRouter.resolve(channel, senderId)` returns agent config name +2. Cache key: `${channel}:${senderId}:${agentConfigName}` (agent change = new orchestrator) +3. Create `AgentOrchestrator` with resolved config's system prompt, model tier, tool policy +4. If sandbox enabled for this config + globally: create per-session sandboxed ToolRegistry +5. Otherwise: use shared host ToolRegistry + +--- + +## Modified Files + +- `src/config/schema.ts` — Add `sandboxSchema`, `agentConfigSchema`, `routingSchema` +- `src/config/index.ts` — Export new types +- `src/daemon/index.ts` — Wire SandboxManager + AgentRouter into message handler +- `src/tools/registry.ts` — Add `clone()` method for per-session copies + +## Testing + +- All Docker interactions mocked (no real Docker in tests) +- Agent router tested with config fixtures (exact, glob, channel, default fallback) +- Sandboxed tools tested with mocked Docker CLI exec +- Integration tested via daemon message handler with mocked dependencies + +## Dependencies + +- No new npm dependencies (Docker CLI, `minimatch` already available or trivially implemented) +- Runtime: Docker must be installed on host for sandbox feature to work (graceful degradation if absent) diff --git a/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-implementation.md b/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-implementation.md new file mode 100644 index 0000000..25e8360 --- /dev/null +++ b/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-implementation.md @@ -0,0 +1,1832 @@ +# P2: Docker Sandboxing + Multi-Agent Routing — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Add Docker container sandboxing for channel tool execution and named agent configuration with config-based routing. + +**Architecture:** Tool-level wrapping — sandboxed `shell.exec` and `process.start` delegate to `docker exec` inside per-session containers. Agent config registry stores named agent definitions (system prompt, model tier, tool profile, sandbox flag) with config-based routing that maps channels/senders to agent configs. + +**Tech Stack:** TypeScript (ES2022, NodeNext), Zod schemas, Vitest tests, Docker CLI (no SDK dependency), `child_process.execFile`. + +--- + +## Task 1: Config Schema — Sandbox + Agent Configs + Routing + +**Files:** +- Modify: `src/config/schema.ts:164-231` +- Modify: `src/config/index.ts:1-3` + +**Step 1: Write the failing test** + +Create file: `src/config/schema.test.ts` + +```typescript +import { describe, it, expect } from 'vitest'; +import { configSchema } from './schema.js'; + +describe('configSchema — sandbox', () => { + const minimalConfig = { + telegram: { bot_token: 'test', allowed_chat_ids: [1] }, + models: { default: { provider: 'anthropic', model: 'claude-3' } }, + }; + + it('defaults sandbox to disabled', () => { + const result = configSchema.parse(minimalConfig); + expect(result.sandbox.enabled).toBe(false); + expect(result.sandbox.image).toBe('node:22-slim'); + expect(result.sandbox.network).toBe('none'); + expect(result.sandbox.memory_limit).toBe('512m'); + expect(result.sandbox.cpu_limit).toBe('1.0'); + expect(result.sandbox.timeout_seconds).toBe(300); + }); + + it('accepts sandbox config', () => { + const result = configSchema.parse({ + ...minimalConfig, + sandbox: { enabled: true, image: 'ubuntu:24.04', network: 'bridge' }, + }); + expect(result.sandbox.enabled).toBe(true); + expect(result.sandbox.image).toBe('ubuntu:24.04'); + expect(result.sandbox.network).toBe('bridge'); + }); +}); + +describe('configSchema — agent_configs', () => { + const minimalConfig = { + telegram: { bot_token: 'test', allowed_chat_ids: [1] }, + models: { default: { provider: 'anthropic', model: 'claude-3' } }, + }; + + it('defaults agent_configs to empty', () => { + const result = configSchema.parse(minimalConfig); + expect(result.agent_configs).toEqual({}); + }); + + it('accepts named agent configs', () => { + const result = configSchema.parse({ + ...minimalConfig, + agent_configs: { + assistant: { + system_prompt: 'You are helpful.', + model_tier: 'default', + tool_profile: 'messaging', + }, + coder: { + model_tier: 'complex', + tool_profile: 'coding', + sandbox: true, + }, + }, + }); + expect(result.agent_configs.assistant.system_prompt).toBe('You are helpful.'); + expect(result.agent_configs.assistant.tool_profile).toBe('messaging'); + expect(result.agent_configs.coder.sandbox).toBe(true); + }); +}); + +describe('configSchema — routing', () => { + const minimalConfig = { + telegram: { bot_token: 'test', allowed_chat_ids: [1] }, + models: { default: { provider: 'anthropic', model: 'claude-3' } }, + }; + + it('defaults routing to empty', () => { + const result = configSchema.parse(minimalConfig); + expect(result.routing.default_agent).toBeUndefined(); + expect(result.routing.channels).toEqual({}); + expect(result.routing.senders).toEqual({}); + }); + + it('accepts routing config', () => { + const result = configSchema.parse({ + ...minimalConfig, + routing: { + default_agent: 'assistant', + channels: { discord: 'coder' }, + senders: { 'telegram:12345': 'coder' }, + }, + }); + expect(result.routing.default_agent).toBe('assistant'); + expect(result.routing.channels.discord).toBe('coder'); + expect(result.routing.senders['telegram:12345']).toBe('coder'); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/config/schema.test.ts` +Expected: FAIL — `sandbox`, `agent_configs`, and `routing` properties don't exist on config + +**Step 3: Implement the schema additions** + +Add to `src/config/schema.ts` before the `configSchema` definition (before line 192): + +```typescript +// ── Sandbox schemas ─────────────────────────────────────────────────── + +const sandboxSchema = z.object({ + enabled: z.boolean().default(false), + image: z.string().default('node:22-slim'), + workspace_dir: z.string().default('/workspace'), + network: z.enum(['none', 'bridge', 'host']).default('none'), + memory_limit: z.string().default('512m'), + cpu_limit: z.string().default('1.0'), + timeout_seconds: z.number().min(10).max(3600).default(300), +}).default({}); + +// ── Agent config + routing schemas ──────────────────────────────────── + +const modelTierEnum = z.enum(['fast', 'default', 'complex', 'local']); + +const agentConfigEntrySchema = z.object({ + system_prompt: z.string().optional(), + model_tier: modelTierEnum.optional(), + tool_profile: toolProfileEnum.optional(), + tool_overrides: toolOverrideSchema.optional(), + sandbox: z.boolean().default(false), +}); + +const agentConfigsSchema = z.record(z.string(), agentConfigEntrySchema).default({}); + +const routingSchema = z.object({ + default_agent: z.string().optional(), + channels: z.record(z.string(), z.string()).default({}), + senders: z.record(z.string(), z.string()).default({}), +}).default({}); +``` + +Then add to the `configSchema` z.object (around line 192-212), add these three new fields: + +```typescript + sandbox: sandboxSchema, + agent_configs: agentConfigsSchema, + routing: routingSchema, +``` + +And add type exports at the end (after line 230): + +```typescript +export type SandboxConfig = z.infer; +export type AgentConfigEntry = z.infer; +export type RoutingConfig = z.infer; +``` + +**Step 4: Update `src/config/index.ts` barrel export** + +Add the new types to the export line: + +```typescript +export { configSchema, type Config, type TelegramConfig, type ModelConfig, type CronJobConfig, type AgentsConfig, type CompactionConfig, type ToolProfile, type ToolOverrideConfig, type ToolsConfig, type SandboxConfig, type AgentConfigEntry, type RoutingConfig } from './schema.js'; +``` + +**Step 5: Run test to verify it passes** + +Run: `pnpm vitest run src/config/schema.test.ts` +Expected: PASS (all 6 tests) + +**Step 6: Run full test suite** + +Run: `pnpm test:run` +Expected: All 606+ tests pass + +**Step 7: Commit** + +```bash +git add src/config/schema.ts src/config/schema.test.ts src/config/index.ts +git commit -m "feat: add sandbox, agent_configs, and routing config schemas" +``` + +--- + +## Task 2: DockerSandbox Class + +**Files:** +- Create: `src/sandbox/docker.ts` +- Create: `src/sandbox/docker.test.ts` + +**Step 1: Write the failing test** + +Create file: `src/sandbox/docker.test.ts` + +```typescript +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { DockerSandbox, type DockerSandboxConfig } from './docker.js'; +import * as childProcess from 'child_process'; + +// Mock child_process.execFile +vi.mock('child_process', () => ({ + execFile: vi.fn(), +})); + +const mockedExecFile = vi.mocked(childProcess.execFile); + +function mockExecFileSuccess(stdout = '', stderr = '') { + mockedExecFile.mockImplementation( + (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => { + (callback as (err: null, stdout: string, stderr: string) => void)(null, stdout, stderr); + return {} as ReturnType; + }, + ); +} + +function mockExecFileError(message: string) { + mockedExecFile.mockImplementation( + (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => { + (callback as (err: Error) => void)(new Error(message)); + return {} as ReturnType; + }, + ); +} + +describe('DockerSandbox', () => { + const defaultConfig: DockerSandboxConfig = { + sessionId: 'test-session', + image: 'node:22-slim', + workspaceDir: '/workspace', + network: 'none', + memoryLimit: '512m', + cpuLimit: '1.0', + timeoutSeconds: 300, + }; + + beforeEach(() => { + vi.clearAllMocks(); + }); + + describe('create()', () => { + it('creates a docker container with correct args', async () => { + mockExecFileSuccess('container-abc123'); + const sandbox = new DockerSandbox(defaultConfig); + await sandbox.create(); + + expect(mockedExecFile).toHaveBeenCalledWith( + 'docker', + expect.arrayContaining([ + 'create', + '--name', expect.stringContaining('flynn-test-session'), + '--memory', '512m', + '--cpus', '1.0', + '--network', 'none', + '-v', expect.stringContaining(':/workspace'), + 'node:22-slim', + 'sleep', 'infinity', + ]), + expect.any(Object), + expect.any(Function), + ); + expect(sandbox.containerId).toBe('container-abc123'); + }); + + it('starts the container after creating', async () => { + mockExecFileSuccess('container-abc123'); + const sandbox = new DockerSandbox(defaultConfig); + await sandbox.create(); + + // Second call should be docker start + expect(mockedExecFile).toHaveBeenCalledTimes(2); + expect(mockedExecFile).toHaveBeenNthCalledWith( + 2, 'docker', ['start', 'container-abc123'], + expect.any(Object), expect.any(Function), + ); + }); + + it('throws if docker create fails', async () => { + mockExecFileError('docker not found'); + const sandbox = new DockerSandbox(defaultConfig); + await expect(sandbox.create()).rejects.toThrow('docker not found'); + }); + }); + + describe('exec()', () => { + it('runs command inside container', async () => { + const sandbox = new DockerSandbox(defaultConfig); + // Manually set container ID to skip create + (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; + + mockExecFileSuccess('hello world\n'); + const result = await sandbox.exec('echo hello world'); + + expect(mockedExecFile).toHaveBeenCalledWith( + 'docker', + ['exec', 'container-abc', 'bash', '-c', 'echo hello world'], + expect.objectContaining({ timeout: expect.any(Number) }), + expect.any(Function), + ); + expect(result).toEqual({ stdout: 'hello world\n', stderr: '' }); + }); + + it('passes cwd as workdir option', async () => { + const sandbox = new DockerSandbox(defaultConfig); + (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; + + mockExecFileSuccess(''); + await sandbox.exec('ls', { cwd: '/workspace/project' }); + + expect(mockedExecFile).toHaveBeenCalledWith( + 'docker', + ['exec', '-w', '/workspace/project', 'container-abc', 'bash', '-c', 'ls'], + expect.any(Object), + expect.any(Function), + ); + }); + + it('throws if no container created', async () => { + const sandbox = new DockerSandbox(defaultConfig); + await expect(sandbox.exec('echo hi')).rejects.toThrow('not created'); + }); + }); + + describe('destroy()', () => { + it('force-removes the container', async () => { + const sandbox = new DockerSandbox(defaultConfig); + (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; + + mockExecFileSuccess(); + await sandbox.destroy(); + + expect(mockedExecFile).toHaveBeenCalledWith( + 'docker', ['rm', '-f', 'container-abc'], + expect.any(Object), expect.any(Function), + ); + }); + + it('does nothing if no container', async () => { + const sandbox = new DockerSandbox(defaultConfig); + await sandbox.destroy(); // should not throw + expect(mockedExecFile).not.toHaveBeenCalled(); + }); + }); + + describe('isAvailable()', () => { + it('returns true when docker is installed', async () => { + mockExecFileSuccess('Docker version 27.0.0'); + const result = await DockerSandbox.isAvailable(); + expect(result).toBe(true); + }); + + it('returns false when docker is not installed', async () => { + mockExecFileError('command not found'); + const result = await DockerSandbox.isAvailable(); + expect(result).toBe(false); + }); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/sandbox/docker.test.ts` +Expected: FAIL — cannot find module `./docker.js` + +**Step 3: Implement DockerSandbox** + +Create file: `src/sandbox/docker.ts` + +```typescript +import { execFile } from 'child_process'; + +export interface DockerSandboxConfig { + sessionId: string; + image: string; + workspaceDir: string; + network: 'none' | 'bridge' | 'host'; + memoryLimit: string; + cpuLimit: string; + timeoutSeconds: number; +} + +export interface ExecOptions { + cwd?: string; + timeout?: number; +} + +export interface ExecResult { + stdout: string; + stderr: string; +} + +/** + * Manages a single Docker container for sandboxed tool execution. + * Uses the Docker CLI directly (no SDK dependency). + */ +export class DockerSandbox { + private config: DockerSandboxConfig; + private _containerId: string | null = null; + private _hostWorkdir: string; + + constructor(config: DockerSandboxConfig) { + this.config = config; + // Use a temp directory on the host, named by session + const sanitizedId = config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_'); + this._hostWorkdir = `/tmp/flynn-sandbox-${sanitizedId}`; + } + + get containerId(): string | null { + return this._containerId; + } + + get containerName(): string { + const sanitizedId = this.config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_'); + return `flynn-${sanitizedId}`; + } + + /** Create and start the sandbox container. */ + async create(): Promise { + const args = [ + 'create', + '--name', this.containerName, + '--memory', this.config.memoryLimit, + '--cpus', this.config.cpuLimit, + '--network', this.config.network, + '-v', `${this._hostWorkdir}:${this.config.workspaceDir}`, + this.config.image, + 'sleep', 'infinity', + ]; + + const createResult = await this.dockerCmd(args); + this._containerId = createResult.stdout.trim(); + + await this.dockerCmd(['start', this._containerId]); + } + + /** Execute a command inside the container. */ + async exec(command: string, opts?: ExecOptions): Promise { + if (!this._containerId) { + throw new Error('Sandbox container not created. Call create() first.'); + } + + const args = ['exec']; + if (opts?.cwd) { + args.push('-w', opts.cwd); + } + args.push(this._containerId, 'bash', '-c', command); + + const timeout = opts?.timeout ?? this.config.timeoutSeconds * 1000; + return this.dockerCmd(args, timeout); + } + + /** Force-remove the container. */ + async destroy(): Promise { + if (!this._containerId) return; + + try { + await this.dockerCmd(['rm', '-f', this._containerId]); + } catch { + // Ignore errors during cleanup + } + this._containerId = null; + } + + /** Check if Docker is available on this host. */ + static async isAvailable(): Promise { + try { + await new Promise((resolve, reject) => { + execFile('docker', ['version', '--format', '{{.Server.Version}}'], { + timeout: 5000, + }, (error, stdout) => { + if (error) reject(error); + else resolve(stdout); + }); + }); + return true; + } catch { + return false; + } + } + + /** Run a docker CLI command. */ + private dockerCmd(args: string[], timeout = 30_000): Promise { + return new Promise((resolve, reject) => { + execFile('docker', args, { timeout, maxBuffer: 1024 * 1024 }, (error, stdout, stderr) => { + if (error) { + reject(error); + return; + } + resolve({ stdout, stderr }); + }); + }); + } +} +``` + +**Step 4: Run test to verify it passes** + +Run: `pnpm vitest run src/sandbox/docker.test.ts` +Expected: PASS (all tests) + +**Step 5: Commit** + +```bash +git add src/sandbox/docker.ts src/sandbox/docker.test.ts +git commit -m "feat: add DockerSandbox class for container lifecycle" +``` + +--- + +## Task 3: SandboxManager + +**Files:** +- Create: `src/sandbox/manager.ts` +- Create: `src/sandbox/manager.test.ts` + +**Step 1: Write the failing test** + +Create file: `src/sandbox/manager.test.ts` + +```typescript +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { SandboxManager } from './manager.js'; +import { DockerSandbox } from './docker.js'; +import type { SandboxConfig } from '../config/schema.js'; + +// Mock DockerSandbox +vi.mock('./docker.js', () => ({ + DockerSandbox: vi.fn().mockImplementation(() => ({ + create: vi.fn().mockResolvedValue(undefined), + destroy: vi.fn().mockResolvedValue(undefined), + exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }), + containerId: 'mock-container', + })), +})); + +describe('SandboxManager', () => { + const defaultConfig: SandboxConfig = { + enabled: true, + image: 'node:22-slim', + workspace_dir: '/workspace', + network: 'none', + memory_limit: '512m', + cpu_limit: '1.0', + timeout_seconds: 300, + }; + + beforeEach(() => { + vi.clearAllMocks(); + }); + + describe('getOrCreate()', () => { + it('creates a new sandbox for unknown session', async () => { + const manager = new SandboxManager(defaultConfig); + const sandbox = await manager.getOrCreate('session-1'); + + expect(DockerSandbox).toHaveBeenCalledWith(expect.objectContaining({ + sessionId: 'session-1', + image: 'node:22-slim', + })); + expect(sandbox.create).toHaveBeenCalled(); + }); + + it('reuses existing sandbox for same session', async () => { + const manager = new SandboxManager(defaultConfig); + const first = await manager.getOrCreate('session-1'); + const second = await manager.getOrCreate('session-1'); + + expect(first).toBe(second); + expect(DockerSandbox).toHaveBeenCalledTimes(1); + }); + + it('creates separate sandboxes for different sessions', async () => { + const manager = new SandboxManager(defaultConfig); + await manager.getOrCreate('session-1'); + await manager.getOrCreate('session-2'); + + expect(DockerSandbox).toHaveBeenCalledTimes(2); + }); + }); + + describe('destroy()', () => { + it('destroys sandbox and removes from cache', async () => { + const manager = new SandboxManager(defaultConfig); + const sandbox = await manager.getOrCreate('session-1'); + + await manager.destroy('session-1'); + expect(sandbox.destroy).toHaveBeenCalled(); + + // Should create a new one now + await manager.getOrCreate('session-1'); + expect(DockerSandbox).toHaveBeenCalledTimes(2); + }); + + it('does nothing for unknown session', async () => { + const manager = new SandboxManager(defaultConfig); + await manager.destroy('nonexistent'); // should not throw + }); + }); + + describe('destroyAll()', () => { + it('destroys all sandboxes', async () => { + const manager = new SandboxManager(defaultConfig); + const s1 = await manager.getOrCreate('session-1'); + const s2 = await manager.getOrCreate('session-2'); + + await manager.destroyAll(); + expect(s1.destroy).toHaveBeenCalled(); + expect(s2.destroy).toHaveBeenCalled(); + }); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/sandbox/manager.test.ts` +Expected: FAIL — cannot find module `./manager.js` + +**Step 3: Implement SandboxManager** + +Create file: `src/sandbox/manager.ts` + +```typescript +import { DockerSandbox } from './docker.js'; +import type { SandboxConfig } from '../config/schema.js'; + +/** + * Manages per-session Docker sandboxes. + * Creates containers lazily on first access, destroys on session cleanup. + */ +export class SandboxManager { + private sandboxes = new Map(); + private config: SandboxConfig; + + constructor(config: SandboxConfig) { + this.config = config; + } + + /** Get or create a sandbox for a session. */ + async getOrCreate(sessionId: string): Promise { + let sandbox = this.sandboxes.get(sessionId); + if (sandbox) return sandbox; + + sandbox = new DockerSandbox({ + sessionId, + image: this.config.image, + workspaceDir: this.config.workspace_dir, + network: this.config.network, + memoryLimit: this.config.memory_limit, + cpuLimit: this.config.cpu_limit, + timeoutSeconds: this.config.timeout_seconds, + }); + + await sandbox.create(); + this.sandboxes.set(sessionId, sandbox); + return sandbox; + } + + /** Destroy a specific session's sandbox. */ + async destroy(sessionId: string): Promise { + const sandbox = this.sandboxes.get(sessionId); + if (!sandbox) return; + + await sandbox.destroy(); + this.sandboxes.delete(sessionId); + } + + /** Destroy all sandboxes (daemon shutdown). */ + async destroyAll(): Promise { + const entries = Array.from(this.sandboxes.entries()); + await Promise.allSettled( + entries.map(async ([id, sandbox]) => { + await sandbox.destroy(); + this.sandboxes.delete(id); + }), + ); + } +} +``` + +**Step 4: Run test to verify it passes** + +Run: `pnpm vitest run src/sandbox/manager.test.ts` +Expected: PASS + +**Step 5: Commit** + +```bash +git add src/sandbox/manager.ts src/sandbox/manager.test.ts +git commit -m "feat: add SandboxManager for per-session container lifecycle" +``` + +--- + +## Task 4: Sandboxed Tool Wrappers + +**Files:** +- Create: `src/sandbox/tools.ts` +- Create: `src/sandbox/tools.test.ts` + +**Step 1: Write the failing test** + +Create file: `src/sandbox/tools.test.ts` + +```typescript +import { describe, it, expect, vi, beforeEach } from 'vitest'; +import { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js'; +import type { DockerSandbox } from './docker.js'; + +function mockSandbox(): DockerSandbox { + return { + exec: vi.fn().mockResolvedValue({ stdout: 'output', stderr: '' }), + create: vi.fn(), + destroy: vi.fn(), + containerId: 'test-container', + containerName: 'flynn-test', + config: {}, + } as unknown as DockerSandbox; +} + +describe('createSandboxedShellTool', () => { + let sandbox: DockerSandbox; + + beforeEach(() => { + sandbox = mockSandbox(); + }); + + it('has the same name as shell.exec', () => { + const tool = createSandboxedShellTool(sandbox); + expect(tool.name).toBe('shell.exec'); + }); + + it('delegates to sandbox.exec', async () => { + const tool = createSandboxedShellTool(sandbox); + const result = await tool.execute({ command: 'echo hello' }); + + expect(sandbox.exec).toHaveBeenCalledWith('echo hello', { cwd: undefined, timeout: 30000 }); + expect(result.success).toBe(true); + expect(result.output).toBe('output'); + }); + + it('passes cwd to sandbox.exec', async () => { + const tool = createSandboxedShellTool(sandbox); + await tool.execute({ command: 'ls', cwd: '/workspace/project' }); + + expect(sandbox.exec).toHaveBeenCalledWith('ls', { cwd: '/workspace/project', timeout: 30000 }); + }); + + it('passes timeout to sandbox.exec', async () => { + const tool = createSandboxedShellTool(sandbox); + await tool.execute({ command: 'sleep 10', timeout: 5000 }); + + expect(sandbox.exec).toHaveBeenCalledWith('sleep 10', { cwd: undefined, timeout: 5000 }); + }); + + it('returns error on sandbox.exec failure', async () => { + (sandbox.exec as ReturnType).mockRejectedValue(new Error('container dead')); + const tool = createSandboxedShellTool(sandbox); + const result = await tool.execute({ command: 'fail' }); + + expect(result.success).toBe(false); + expect(result.error).toBe('container dead'); + }); + + it('includes stderr in output', async () => { + (sandbox.exec as ReturnType).mockResolvedValue({ stdout: 'out', stderr: 'warn' }); + const tool = createSandboxedShellTool(sandbox); + const result = await tool.execute({ command: 'cmd' }); + + expect(result.output).toContain('out'); + expect(result.output).toContain('stderr: warn'); + }); +}); + +describe('createSandboxedProcessStartTool', () => { + let sandbox: DockerSandbox; + + beforeEach(() => { + sandbox = mockSandbox(); + }); + + it('has the same name as process.start', () => { + const tool = createSandboxedProcessStartTool(sandbox); + expect(tool.name).toBe('process.start'); + }); + + it('runs detached command via sandbox', async () => { + const tool = createSandboxedProcessStartTool(sandbox); + const result = await tool.execute({ command: 'npm run dev' }); + + expect(sandbox.exec).toHaveBeenCalledWith( + expect.stringContaining('npm run dev'), + expect.any(Object), + ); + expect(result.success).toBe(true); + expect(result.output).toContain('Started sandboxed background process'); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/sandbox/tools.test.ts` +Expected: FAIL — cannot find module `./tools.js` + +**Step 3: Implement sandboxed tools** + +Create file: `src/sandbox/tools.ts` + +```typescript +import type { Tool, ToolResult } from '../tools/types.js'; +import type { DockerSandbox } from './docker.js'; + +interface ShellExecArgs { + command: string; + cwd?: string; + timeout?: number; +} + +interface ProcessStartArgs { + command: string; + cwd?: string; +} + +/** + * Create a sandboxed version of shell.exec that delegates to docker exec. + * Same Tool interface — drop-in replacement for the host shell.exec. + */ +export function createSandboxedShellTool(sandbox: DockerSandbox): Tool { + return { + name: 'shell.exec', + description: 'Execute a shell command inside a sandboxed container and return stdout/stderr.', + inputSchema: { + type: 'object', + properties: { + command: { type: 'string', description: 'The shell command to execute' }, + cwd: { type: 'string', description: 'Working directory inside the container (optional)' }, + timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' }, + }, + required: ['command'], + }, + execute: async (rawArgs: unknown): Promise => { + const args = rawArgs as ShellExecArgs; + const timeout = args.timeout ?? 30_000; + + try { + const result = await sandbox.exec(args.command, { + cwd: args.cwd, + timeout, + }); + + const output = result.stdout + (result.stderr ? `\nstderr: ${result.stderr}` : ''); + return { success: true, output }; + } catch (error) { + return { + success: false, + output: '', + error: error instanceof Error ? error.message : String(error), + }; + } + }, + }; +} + +/** + * Create a sandboxed version of process.start that runs in the container. + * Uses `nohup ... &` via docker exec since we can't spawn detached inside containers. + */ +export function createSandboxedProcessStartTool(sandbox: DockerSandbox): Tool { + return { + name: 'process.start', + description: 'Start a command in the background inside a sandboxed container.', + inputSchema: { + type: 'object', + properties: { + command: { type: 'string', description: 'The shell command to run in the background' }, + cwd: { type: 'string', description: 'Working directory inside the container (optional)' }, + }, + required: ['command'], + }, + execute: async (rawArgs: unknown): Promise => { + const args = rawArgs as ProcessStartArgs; + + try { + // Run via nohup + background in the container + const wrappedCmd = `nohup bash -c '${args.command.replace(/'/g, "'\\''")}' > /tmp/proc.log 2>&1 & echo $!`; + const result = await sandbox.exec(wrappedCmd, { cwd: args.cwd }); + + const pid = result.stdout.trim(); + return { + success: true, + output: `Started sandboxed background process (PID ${pid})\nCommand: ${args.command}`, + }; + } catch (error) { + return { + success: false, + output: '', + error: error instanceof Error ? error.message : 'Failed to start sandboxed process', + }; + } + }, + }; +} +``` + +**Step 4: Run test to verify it passes** + +Run: `pnpm vitest run src/sandbox/tools.test.ts` +Expected: PASS + +**Step 5: Commit** + +```bash +git add src/sandbox/tools.ts src/sandbox/tools.test.ts +git commit -m "feat: add sandboxed tool wrappers for shell.exec and process.start" +``` + +--- + +## Task 5: Sandbox Barrel Export + ToolRegistry.clone() + +**Files:** +- Create: `src/sandbox/index.ts` +- Modify: `src/tools/registry.ts:19-97` + +**Step 1: Write the failing test for ToolRegistry.clone()** + +Add to a new test or extend existing tests. Create file `src/tools/registry.test.ts` (if it doesn't exist — check first): + +```typescript +import { describe, it, expect } from 'vitest'; +import { ToolRegistry } from './registry.js'; +import type { Tool } from './types.js'; + +function makeTool(name: string): Tool { + return { + name, + description: `Mock ${name}`, + inputSchema: { type: 'object', properties: {} }, + execute: async () => ({ success: true, output: '' }), + }; +} + +describe('ToolRegistry', () => { + describe('clone()', () => { + it('creates a copy with all tools', () => { + const reg = new ToolRegistry(); + reg.register(makeTool('tool.a')); + reg.register(makeTool('tool.b')); + + const cloned = reg.clone(); + expect(cloned.list().map(t => t.name).sort()).toEqual(['tool.a', 'tool.b']); + }); + + it('inherits the policy from original', () => { + const reg = new ToolRegistry(); + const mockPolicy = { filterTools: vi.fn(), isAllowed: vi.fn(), resolveAllowedNames: vi.fn(), getEffectiveProfile: vi.fn() }; + reg.setPolicy(mockPolicy as any); + + const cloned = reg.clone(); + expect(cloned.getPolicy()).toBe(mockPolicy); + }); + + it('allows replacing tools in clone without affecting original', () => { + const reg = new ToolRegistry(); + const originalTool = makeTool('shell.exec'); + reg.register(originalTool); + + const cloned = reg.clone(); + const replacementTool = makeTool('shell.exec'); + replacementTool.description = 'Sandboxed version'; + + cloned.replace(replacementTool); + expect(cloned.get('shell.exec')!.description).toBe('Sandboxed version'); + expect(reg.get('shell.exec')!.description).toBe('Mock shell.exec'); + }); + }); + + describe('replace()', () => { + it('replaces an existing tool', () => { + const reg = new ToolRegistry(); + reg.register(makeTool('tool.a')); + const replacement = makeTool('tool.a'); + replacement.description = 'New description'; + + reg.replace(replacement); + expect(reg.get('tool.a')!.description).toBe('New description'); + }); + + it('throws if tool does not exist', () => { + const reg = new ToolRegistry(); + expect(() => reg.replace(makeTool('nonexistent'))).toThrow('not registered'); + }); + }); +}); +``` + +Note: Add `import { vi } from 'vitest'` to the imports at the top. + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/tools/registry.test.ts` +Expected: FAIL — `clone()` and `replace()` don't exist on ToolRegistry + +**Step 3: Add clone() and replace() to ToolRegistry** + +In `src/tools/registry.ts`, add these two methods to the `ToolRegistry` class (after the `unregister` method, around line 32): + +```typescript + /** Replace an existing tool with a new implementation. Throws if not registered. */ + replace(tool: Tool): void { + if (!this.tools.has(tool.name)) { + throw new Error(`Tool '${tool.name}' is not registered — cannot replace`); + } + this.tools.set(tool.name, tool); + } + + /** Create a shallow clone of this registry (new Map, same Tool objects + policy). */ + clone(): ToolRegistry { + const cloned = new ToolRegistry(); + for (const tool of this.tools.values()) { + cloned.register(tool); + } + if (this._policy) { + cloned.setPolicy(this._policy); + } + return cloned; + } +``` + +**Step 4: Create the sandbox barrel export** + +Create file: `src/sandbox/index.ts` + +```typescript +export { DockerSandbox, type DockerSandboxConfig, type ExecOptions, type ExecResult } from './docker.js'; +export { SandboxManager } from './manager.js'; +export { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js'; +``` + +**Step 5: Run tests to verify they pass** + +Run: `pnpm vitest run src/tools/registry.test.ts` +Expected: PASS + +**Step 6: Run full test suite** + +Run: `pnpm test:run` +Expected: All tests pass + +**Step 7: Commit** + +```bash +git add src/sandbox/index.ts src/tools/registry.ts src/tools/registry.test.ts +git commit -m "feat: add ToolRegistry.clone() and replace() for per-session registries" +``` + +--- + +## Task 6: Agent Config Registry + +**Files:** +- Create: `src/agents/registry.ts` +- Create: `src/agents/registry.test.ts` + +**Step 1: Write the failing test** + +Create file: `src/agents/registry.test.ts` + +```typescript +import { describe, it, expect } from 'vitest'; +import { AgentConfigRegistry, type AgentConfig } from './registry.js'; + +describe('AgentConfigRegistry', () => { + describe('register()', () => { + it('registers a named agent config', () => { + const registry = new AgentConfigRegistry(); + const config: AgentConfig = { name: 'assistant', systemPrompt: 'Be helpful.' }; + registry.register(config); + + expect(registry.get('assistant')).toEqual(config); + }); + + it('throws on duplicate name', () => { + const registry = new AgentConfigRegistry(); + registry.register({ name: 'assistant' }); + expect(() => registry.register({ name: 'assistant' })).toThrow('already registered'); + }); + }); + + describe('get()', () => { + it('returns undefined for unknown name', () => { + const registry = new AgentConfigRegistry(); + expect(registry.get('nonexistent')).toBeUndefined(); + }); + }); + + describe('list()', () => { + it('returns all registered configs', () => { + const registry = new AgentConfigRegistry(); + registry.register({ name: 'a' }); + registry.register({ name: 'b' }); + expect(registry.list().map(c => c.name).sort()).toEqual(['a', 'b']); + }); + }); + + describe('loadFromConfig()', () => { + it('loads configs from a raw config object', () => { + const registry = new AgentConfigRegistry(); + registry.loadFromConfig({ + assistant: { + system_prompt: 'Be helpful.', + model_tier: 'default', + tool_profile: 'messaging', + sandbox: false, + }, + coder: { + model_tier: 'complex', + tool_profile: 'coding', + sandbox: true, + }, + }); + + expect(registry.list()).toHaveLength(2); + const assistant = registry.get('assistant')!; + expect(assistant.systemPrompt).toBe('Be helpful.'); + expect(assistant.modelTier).toBe('default'); + expect(assistant.toolProfile).toBe('messaging'); + + const coder = registry.get('coder')!; + expect(coder.sandbox).toBe(true); + }); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/agents/registry.test.ts` +Expected: FAIL — cannot find module `./registry.js` + +**Step 3: Implement AgentConfigRegistry** + +Create file: `src/agents/registry.ts` + +```typescript +import type { ToolProfile, ToolOverrideConfig } from '../config/schema.js'; +import type { ModelTier } from '../models/router.js'; + +export interface AgentConfig { + name: string; + systemPrompt?: string; + modelTier?: ModelTier; + toolProfile?: ToolProfile; + toolOverrides?: ToolOverrideConfig; + sandbox?: boolean; +} + +/** + * AgentConfigRegistry — stores named agent configurations. + * Loaded from YAML config at startup. + */ +export class AgentConfigRegistry { + private configs = new Map(); + + register(config: AgentConfig): void { + if (this.configs.has(config.name)) { + throw new Error(`Agent config '${config.name}' is already registered`); + } + this.configs.set(config.name, config); + } + + get(name: string): AgentConfig | undefined { + return this.configs.get(name); + } + + list(): AgentConfig[] { + return Array.from(this.configs.values()); + } + + /** + * Load agent configs from the parsed YAML config. + * Maps from the config schema format to the internal AgentConfig format. + */ + loadFromConfig(rawConfigs: Record): void { + for (const [name, raw] of Object.entries(rawConfigs)) { + this.register({ + name, + systemPrompt: raw.system_prompt, + modelTier: raw.model_tier as ModelTier | undefined, + toolProfile: raw.tool_profile as ToolProfile | undefined, + toolOverrides: raw.tool_overrides, + sandbox: raw.sandbox, + }); + } + } +} +``` + +**Step 4: Run test to verify it passes** + +Run: `pnpm vitest run src/agents/registry.test.ts` +Expected: PASS + +**Step 5: Commit** + +```bash +git add src/agents/registry.ts src/agents/registry.test.ts +git commit -m "feat: add AgentConfigRegistry for named agent configurations" +``` + +--- + +## Task 7: Agent Router + +**Files:** +- Create: `src/agents/router.ts` +- Create: `src/agents/router.test.ts` + +**Step 1: Write the failing test** + +Create file: `src/agents/router.test.ts` + +```typescript +import { describe, it, expect } from 'vitest'; +import { AgentRouter, type RoutingConfig } from './router.js'; + +describe('AgentRouter', () => { + describe('resolve()', () => { + it('returns default_agent when no specific match', () => { + const router = new AgentRouter({ + default_agent: 'assistant', + channels: {}, + senders: {}, + }); + expect(router.resolve('telegram', '12345')).toBe('assistant'); + }); + + it('returns undefined when no default and no match', () => { + const router = new AgentRouter({ + channels: {}, + senders: {}, + }); + expect(router.resolve('telegram', '12345')).toBeUndefined(); + }); + + it('matches exact sender', () => { + const router = new AgentRouter({ + default_agent: 'assistant', + channels: {}, + senders: { 'telegram:12345': 'coder' }, + }); + expect(router.resolve('telegram', '12345')).toBe('coder'); + }); + + it('matches sender with glob pattern', () => { + const router = new AgentRouter({ + default_agent: 'assistant', + channels: {}, + senders: { 'slack:U0*': 'coder' }, + }); + expect(router.resolve('slack', 'U0ABC')).toBe('coder'); + expect(router.resolve('slack', 'U1ABC')).toBeUndefined(); // no channel match, no default... wait + }); + + it('matches channel when no sender match', () => { + const router = new AgentRouter({ + default_agent: 'assistant', + channels: { discord: 'coder' }, + senders: {}, + }); + expect(router.resolve('discord', 'any-user')).toBe('coder'); + }); + + it('sender match takes priority over channel match', () => { + const router = new AgentRouter({ + default_agent: 'assistant', + channels: { discord: 'coder' }, + senders: { 'discord:special-user': 'vip' }, + }); + expect(router.resolve('discord', 'special-user')).toBe('vip'); + expect(router.resolve('discord', 'normal-user')).toBe('coder'); + }); + + it('falls through: sender → channel → default', () => { + const router = new AgentRouter({ + default_agent: 'fallback', + channels: { discord: 'guild-agent' }, + senders: { 'discord:admin': 'admin-agent' }, + }); + expect(router.resolve('discord', 'admin')).toBe('admin-agent'); + expect(router.resolve('discord', 'regular')).toBe('guild-agent'); + expect(router.resolve('telegram', 'someone')).toBe('fallback'); + }); + }); +}); +``` + +**Step 2: Run test to verify it fails** + +Run: `pnpm vitest run src/agents/router.test.ts` +Expected: FAIL — cannot find module `./router.js` + +**Step 3: Implement AgentRouter** + +Create file: `src/agents/router.ts` + +```typescript +/** + * AgentRouter resolves which agent config to use for a given channel+sender. + * + * Resolution order: + * 1. Exact sender match (channel:senderId) + * 2. Glob pattern sender match + * 3. Channel match + * 4. default_agent fallback + */ + +export interface RoutingConfig { + default_agent?: string; + channels: Record; + senders: Record; +} + +/** + * Convert a simple glob pattern to regex. + * Supports `*` (any chars) with `.` escaped. + */ +function patternToRegex(pattern: string): RegExp { + const escaped = pattern + .replace(/[.+^${}()|[\]\\]/g, '\\$&') + .replace(/\*/g, '.*'); + return new RegExp(`^${escaped}$`); +} + +export class AgentRouter { + private config: RoutingConfig; + + constructor(config: RoutingConfig) { + this.config = config; + } + + /** + * Resolve the agent config name for a channel + sender pair. + * Returns undefined if no match and no default. + */ + resolve(channel: string, senderId: string): string | undefined { + const senderKey = `${channel}:${senderId}`; + + // 1. Exact sender match + if (this.config.senders[senderKey]) { + return this.config.senders[senderKey]; + } + + // 2. Glob pattern sender match + for (const [pattern, agentName] of Object.entries(this.config.senders)) { + if (pattern.includes('*') && patternToRegex(pattern).test(senderKey)) { + return agentName; + } + } + + // 3. Channel match + if (this.config.channels[channel]) { + return this.config.channels[channel]; + } + + // 4. Default fallback + return this.config.default_agent; + } +} +``` + +**Step 4: Run test to verify it passes** + +Run: `pnpm vitest run src/agents/router.test.ts` +Expected: PASS + +**Step 5: Commit** + +```bash +git add src/agents/router.ts src/agents/router.test.ts +git commit -m "feat: add AgentRouter for config-based sender/channel routing" +``` + +--- + +## Task 8: Agents Barrel Export + +**Files:** +- Create: `src/agents/index.ts` + +**Step 1: Create the barrel file** + +Create file: `src/agents/index.ts` + +```typescript +export { AgentConfigRegistry, type AgentConfig } from './registry.js'; +export { AgentRouter, type RoutingConfig } from './router.js'; +``` + +**Step 2: Verify build** + +Run: `pnpm typecheck` +Expected: No errors + +**Step 3: Commit** + +```bash +git add src/agents/index.ts +git commit -m "feat: add agents barrel export" +``` + +--- + +## Task 9: Wire Everything Into the Daemon + +**Files:** +- Modify: `src/daemon/index.ts` + +This is the integration task. The daemon's `createMessageRouter()` needs to use the `AgentRouter` and `SandboxManager`. + +**Step 1: Write the integration test** + +Create file: `src/daemon/routing.test.ts` + +```typescript +import { describe, it, expect, vi } from 'vitest'; +import { AgentRouter } from '../agents/router.js'; +import { AgentConfigRegistry } from '../agents/registry.js'; + +describe('daemon agent routing integration', () => { + it('resolves agent config for channel messages', () => { + const registry = new AgentConfigRegistry(); + registry.loadFromConfig({ + assistant: { system_prompt: 'Be helpful.', model_tier: 'default', tool_profile: 'messaging', sandbox: false }, + coder: { system_prompt: 'Write code.', model_tier: 'complex', tool_profile: 'coding', sandbox: true }, + }); + + const router = new AgentRouter({ + default_agent: 'assistant', + channels: { discord: 'coder' }, + senders: { 'telegram:admin': 'coder' }, + }); + + // Discord user gets coder + const discordAgent = router.resolve('discord', 'user123'); + expect(discordAgent).toBe('coder'); + expect(registry.get(discordAgent!)!.systemPrompt).toBe('Write code.'); + + // Telegram admin gets coder + const telegramAdmin = router.resolve('telegram', 'admin'); + expect(telegramAdmin).toBe('coder'); + + // Random telegram user gets assistant + const telegramUser = router.resolve('telegram', 'random'); + expect(telegramUser).toBe('assistant'); + expect(registry.get(telegramUser!)!.systemPrompt).toBe('Be helpful.'); + }); + + it('uses default agent when no routing configured', () => { + const router = new AgentRouter({ channels: {}, senders: {} }); + expect(router.resolve('telegram', '123')).toBeUndefined(); + }); +}); +``` + +**Step 2: Run test to verify it passes** + +Run: `pnpm vitest run src/daemon/routing.test.ts` +Expected: PASS (these are testing already-built components together) + +**Step 3: Modify daemon/index.ts** + +Add imports at the top of `src/daemon/index.ts` (after existing imports): + +```typescript +import { AgentConfigRegistry, AgentRouter } from '../agents/index.js'; +import { SandboxManager, createSandboxedShellTool, createSandboxedProcessStartTool } from '../sandbox/index.js'; +``` + +Add to `DaemonContext` interface: + +```typescript + agentConfigRegistry: AgentConfigRegistry; + agentRouter: AgentRouter; + sandboxManager?: SandboxManager; +``` + +Modify `createMessageRouter()` to accept additional dependencies: + +```typescript +function createMessageRouter(deps: { + sessionManager: SessionManager; + modelRouter: ModelRouter; + systemPrompt: string; + toolRegistry: ToolRegistry; + toolExecutor: ToolExecutor; + config: Config; + memoryStore?: MemoryStore; + agentConfigRegistry?: AgentConfigRegistry; + agentRouter?: AgentRouter; + sandboxManager?: SandboxManager; +}) { +``` + +Inside `getOrCreateAgent()`, resolve the agent config and create sandboxed registries: + +```typescript + function getOrCreateAgent(channel: string, senderId: string): AgentOrchestrator { + // Resolve agent config name from routing + const agentConfigName = deps.agentRouter?.resolve(channel, senderId); + const agentConfig = agentConfigName ? deps.agentConfigRegistry?.get(agentConfigName) : undefined; + + const cacheKey = agentConfigName + ? `${channel}:${senderId}:${agentConfigName}` + : `${channel}:${senderId}`; + + let agent = agents.get(cacheKey); + if (!agent) { + const session = deps.sessionManager.getSession(channel, senderId); + + // Determine system prompt — agent config overrides global + const systemPrompt = agentConfig?.systemPrompt ?? deps.systemPrompt; + + // Determine primary tier + const primaryTier = agentConfig?.modelTier ?? deps.config.agents.primary_tier ?? 'default'; + + // Determine tool policy context + const toolPolicyContext: ToolPolicyContext = { + agent: primaryTier, + provider: deps.config.models.default.provider, + }; + + // Determine tool registry — sandbox if configured + let toolRegistry = deps.toolRegistry; + if (agentConfig?.sandbox && deps.sandboxManager && deps.config.sandbox.enabled) { + // Create a cloned registry with sandboxed tools + toolRegistry = deps.toolRegistry.clone(); + // Sandbox will be created lazily on first tool call + // For now, create a wrapper that handles lazy initialization + const sessionId = `${channel}:${senderId}`; + const sandbox = deps.sandboxManager; + const sandboxConfig = deps.config.sandbox; + + // Replace shell.exec and process.start with lazy-sandboxed versions + const lazySandboxedShell: Tool = { + name: 'shell.exec', + description: 'Execute a shell command inside a sandboxed container.', + inputSchema: { + type: 'object', + properties: { + command: { type: 'string', description: 'The shell command to execute' }, + cwd: { type: 'string', description: 'Working directory (optional)' }, + timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' }, + }, + required: ['command'], + }, + execute: async (rawArgs: unknown) => { + const dockerSandbox = await sandbox.getOrCreate(sessionId); + const tool = createSandboxedShellTool(dockerSandbox); + return tool.execute(rawArgs); + }, + }; + + const lazySandboxedProcessStart: Tool = { + name: 'process.start', + description: 'Start a command in the background inside a sandboxed container.', + inputSchema: { + type: 'object', + properties: { + command: { type: 'string', description: 'The shell command to run' }, + cwd: { type: 'string', description: 'Working directory (optional)' }, + }, + required: ['command'], + }, + execute: async (rawArgs: unknown) => { + const dockerSandbox = await sandbox.getOrCreate(sessionId); + const tool = createSandboxedProcessStartTool(dockerSandbox); + return tool.execute(rawArgs); + }, + }; + + toolRegistry.replace(lazySandboxedShell); + toolRegistry.replace(lazySandboxedProcessStart); + } + + const delegationConfig: DelegationConfig = { + compaction: deps.config.agents.delegation.compaction ?? 'fast', + memory_extraction: deps.config.agents.delegation.memory_extraction ?? 'fast', + classification: deps.config.agents.delegation.classification ?? 'fast', + tool_summarisation: deps.config.agents.delegation.tool_summarisation ?? 'fast', + complex_reasoning: deps.config.agents.delegation.complex_reasoning ?? 'complex', + }; + + agent = new AgentOrchestrator({ + modelRouter: deps.modelRouter, + systemPrompt, + session, + toolRegistry, + toolExecutor: deps.toolExecutor, + primaryTier, + delegation: delegationConfig, + maxDelegationDepth: deps.config.agents.max_delegation_depth ?? 3, + compaction: deps.config.compaction.enabled ? { + thresholdPct: deps.config.compaction.threshold_pct, + keepTurns: deps.config.compaction.keep_turns, + summaryMaxTokens: deps.config.compaction.summary_max_tokens, + } : undefined, + modelName: deps.config.models.default.model, + contextWindow: deps.config.models.default.context_window, + memoryStore: deps.memoryStore, + toolPolicyContext, + }); + agents.set(cacheKey, agent); + } + return agent; + } +``` + +In `startDaemon()`, add agent config registry and router initialization after skills loading (around line 385): + +```typescript + // Initialize agent config registry and router + const agentConfigRegistry = new AgentConfigRegistry(); + if (config.agent_configs && Object.keys(config.agent_configs).length > 0) { + agentConfigRegistry.loadFromConfig(config.agent_configs); + console.log(`Loaded ${Object.keys(config.agent_configs).length} agent config(s): ${Object.keys(config.agent_configs).join(', ')}`); + } + + const agentRouter = new AgentRouter(config.routing); + + // Initialize sandbox manager if enabled + let sandboxManager: SandboxManager | undefined; + if (config.sandbox.enabled) { + const dockerAvailable = await DockerSandbox.isAvailable(); + if (dockerAvailable) { + sandboxManager = new SandboxManager(config.sandbox); + console.log(`Docker sandbox enabled: image=${config.sandbox.image}, network=${config.sandbox.network}`); + } else { + console.warn('Docker sandbox enabled in config but Docker is not available — falling back to host execution'); + } + } +``` + +Add sandbox shutdown hook: + +```typescript + if (sandboxManager) { + lifecycle.onShutdown(async () => { + await sandboxManager!.destroyAll(); + console.log('Docker sandboxes destroyed'); + }); + } +``` + +Pass new deps to `createMessageRouter()`: + +```typescript + channelRegistry.setMessageHandler(createMessageRouter({ + sessionManager, + modelRouter, + systemPrompt, + toolRegistry, + toolExecutor, + config, + memoryStore, + agentConfigRegistry, + agentRouter, + sandboxManager, + })); +``` + +Add to DaemonContext return: + +```typescript + return { + config, + lifecycle, + sessionStore, + sessionManager, + hookEngine, + modelRouter, + toolRegistry, + toolExecutor, + gateway, + channelRegistry, + mcpManager, + skillRegistry, + skillInstaller, + agentConfigRegistry, + agentRouter, + sandboxManager, + }; +``` + +Note: You'll need to import `DockerSandbox` and the `Tool` type at the top, and import `ToolPolicyContext`: + +```typescript +import { DockerSandbox } from '../sandbox/index.js'; +import type { Tool } from '../tools/types.js'; +import type { ToolPolicyContext } from '../tools/policy.js'; +``` + +**Step 4: Run full test suite** + +Run: `pnpm test:run` +Expected: All tests pass + +**Step 5: Run typecheck** + +Run: `pnpm typecheck` +Expected: No errors + +**Step 6: Commit** + +```bash +git add src/daemon/index.ts src/daemon/routing.test.ts +git commit -m "feat: wire Docker sandboxing and agent routing into daemon" +``` + +--- + +## Task 10: Update state.json + Final Verification + +**Files:** +- Modify: `docs/plans/state.json` + +**Step 1: Run full test suite and typecheck** + +Run: `pnpm test:run && pnpm typecheck` +Expected: All tests pass, no type errors + +**Step 2: Update state.json** + +Add the new P2 entries to `docs/plans/state.json` under the `p2-implementation` plan's `phases` object: + +```json +"docker_sandboxing": { + "priority": "P2", + "status": "completed", + "description": "Docker container sandboxing for channel tool execution (shell.exec, process.start)", + "files_created": [ + "src/sandbox/docker.ts", + "src/sandbox/docker.test.ts", + "src/sandbox/manager.ts", + "src/sandbox/manager.test.ts", + "src/sandbox/tools.ts", + "src/sandbox/tools.test.ts", + "src/sandbox/index.ts" + ], + "files_modified": [ + "src/config/schema.ts", + "src/config/index.ts", + "src/tools/registry.ts", + "src/daemon/index.ts" + ], + "test_status": "N/N passing" +}, +"multi_agent_routing": { + "priority": "P2", + "status": "completed", + "description": "Named agent configs with config-based channel/sender routing", + "files_created": [ + "src/agents/registry.ts", + "src/agents/registry.test.ts", + "src/agents/router.ts", + "src/agents/router.test.ts", + "src/agents/index.ts", + "src/daemon/routing.test.ts", + "src/config/schema.test.ts" + ], + "files_modified": [ + "src/config/schema.ts", + "src/config/index.ts", + "src/daemon/index.ts" + ], + "test_status": "N/N passing" +} +``` + +Update `overall_progress.p2_completion` to `"7/7 (100%)"` and `next_up` to `"p3 (group chat, gateway auth, gemini provider, browser control, additional providers)"`. + +Update `overall_progress.total_test_count` with the actual count. + +**Step 3: Commit** + +```bash +git add docs/plans/state.json +git commit -m "docs: update state.json with Docker sandbox and multi-agent routing" +``` + +--- + +## Summary + +| Task | Component | Est. Time | +|------|-----------|-----------| +| 1 | Config schemas (sandbox + agent_configs + routing) | 5 min | +| 2 | DockerSandbox class | 5 min | +| 3 | SandboxManager | 3 min | +| 4 | Sandboxed tool wrappers | 5 min | +| 5 | Barrel export + ToolRegistry.clone() | 3 min | +| 6 | AgentConfigRegistry | 3 min | +| 7 | AgentRouter | 3 min | +| 8 | Agents barrel export | 1 min | +| 9 | Daemon integration | 10 min | +| 10 | State update + verification | 3 min | + +**Total estimated: ~40 minutes**