# P2: Docker Sandboxing + Multi-Agent Routing — Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Add Docker container sandboxing for channel tool execution and named agent configuration with config-based routing. **Architecture:** Tool-level wrapping — sandboxed `shell.exec` and `process.start` delegate to `docker exec` inside per-session containers. Agent config registry stores named agent definitions (system prompt, model tier, tool profile, sandbox flag) with config-based routing that maps channels/senders to agent configs. **Tech Stack:** TypeScript (ES2022, NodeNext), Zod schemas, Vitest tests, Docker CLI (no SDK dependency), `child_process.execFile`. --- ## Task 1: Config Schema — Sandbox + Agent Configs + Routing **Files:** - Modify: `src/config/schema.ts:164-231` - Modify: `src/config/index.ts:1-3` **Step 1: Write the failing test** Create file: `src/config/schema.test.ts` ```typescript import { describe, it, expect } from 'vitest'; import { configSchema } from './schema.js'; describe('configSchema — sandbox', () => { const minimalConfig = { telegram: { bot_token: 'test', allowed_chat_ids: [1] }, models: { default: { provider: 'anthropic', model: 'claude-3' } }, }; it('defaults sandbox to disabled', () => { const result = configSchema.parse(minimalConfig); expect(result.sandbox.enabled).toBe(false); expect(result.sandbox.image).toBe('node:22-slim'); expect(result.sandbox.network).toBe('none'); expect(result.sandbox.memory_limit).toBe('512m'); expect(result.sandbox.cpu_limit).toBe('1.0'); expect(result.sandbox.timeout_seconds).toBe(300); }); it('accepts sandbox config', () => { const result = configSchema.parse({ ...minimalConfig, sandbox: { enabled: true, image: 'ubuntu:24.04', network: 'bridge' }, }); expect(result.sandbox.enabled).toBe(true); expect(result.sandbox.image).toBe('ubuntu:24.04'); expect(result.sandbox.network).toBe('bridge'); }); }); describe('configSchema — agent_configs', () => { const minimalConfig = { telegram: { bot_token: 'test', allowed_chat_ids: [1] }, models: { default: { provider: 'anthropic', model: 'claude-3' } }, }; it('defaults agent_configs to empty', () => { const result = configSchema.parse(minimalConfig); expect(result.agent_configs).toEqual({}); }); it('accepts named agent configs', () => { const result = configSchema.parse({ ...minimalConfig, agent_configs: { assistant: { system_prompt: 'You are helpful.', model_tier: 'default', tool_profile: 'messaging', }, coder: { model_tier: 'complex', tool_profile: 'coding', sandbox: true, }, }, }); expect(result.agent_configs.assistant.system_prompt).toBe('You are helpful.'); expect(result.agent_configs.assistant.tool_profile).toBe('messaging'); expect(result.agent_configs.coder.sandbox).toBe(true); }); }); describe('configSchema — routing', () => { const minimalConfig = { telegram: { bot_token: 'test', allowed_chat_ids: [1] }, models: { default: { provider: 'anthropic', model: 'claude-3' } }, }; it('defaults routing to empty', () => { const result = configSchema.parse(minimalConfig); expect(result.routing.default_agent).toBeUndefined(); expect(result.routing.channels).toEqual({}); expect(result.routing.senders).toEqual({}); }); it('accepts routing config', () => { const result = configSchema.parse({ ...minimalConfig, routing: { default_agent: 'assistant', channels: { discord: 'coder' }, senders: { 'telegram:12345': 'coder' }, }, }); expect(result.routing.default_agent).toBe('assistant'); expect(result.routing.channels.discord).toBe('coder'); expect(result.routing.senders['telegram:12345']).toBe('coder'); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/config/schema.test.ts` Expected: FAIL — `sandbox`, `agent_configs`, and `routing` properties don't exist on config **Step 3: Implement the schema additions** Add to `src/config/schema.ts` before the `configSchema` definition (before line 192): ```typescript // ── Sandbox schemas ─────────────────────────────────────────────────── const sandboxSchema = z.object({ enabled: z.boolean().default(false), image: z.string().default('node:22-slim'), workspace_dir: z.string().default('/workspace'), network: z.enum(['none', 'bridge', 'host']).default('none'), memory_limit: z.string().default('512m'), cpu_limit: z.string().default('1.0'), timeout_seconds: z.number().min(10).max(3600).default(300), }).default({}); // ── Agent config + routing schemas ──────────────────────────────────── const modelTierEnum = z.enum(['fast', 'default', 'complex', 'local']); const agentConfigEntrySchema = z.object({ system_prompt: z.string().optional(), model_tier: modelTierEnum.optional(), tool_profile: toolProfileEnum.optional(), tool_overrides: toolOverrideSchema.optional(), sandbox: z.boolean().default(false), }); const agentConfigsSchema = z.record(z.string(), agentConfigEntrySchema).default({}); const routingSchema = z.object({ default_agent: z.string().optional(), channels: z.record(z.string(), z.string()).default({}), senders: z.record(z.string(), z.string()).default({}), }).default({}); ``` Then add to the `configSchema` z.object (around line 192-212), add these three new fields: ```typescript sandbox: sandboxSchema, agent_configs: agentConfigsSchema, routing: routingSchema, ``` And add type exports at the end (after line 230): ```typescript export type SandboxConfig = z.infer; export type AgentConfigEntry = z.infer; export type RoutingConfig = z.infer; ``` **Step 4: Update `src/config/index.ts` barrel export** Add the new types to the export line: ```typescript export { configSchema, type Config, type TelegramConfig, type ModelConfig, type CronJobConfig, type AgentsConfig, type CompactionConfig, type ToolProfile, type ToolOverrideConfig, type ToolsConfig, type SandboxConfig, type AgentConfigEntry, type RoutingConfig } from './schema.js'; ``` **Step 5: Run test to verify it passes** Run: `pnpm vitest run src/config/schema.test.ts` Expected: PASS (all 6 tests) **Step 6: Run full test suite** Run: `pnpm test:run` Expected: All 606+ tests pass **Step 7: Commit** ```bash git add src/config/schema.ts src/config/schema.test.ts src/config/index.ts git commit -m "feat: add sandbox, agent_configs, and routing config schemas" ``` --- ## Task 2: DockerSandbox Class **Files:** - Create: `src/sandbox/docker.ts` - Create: `src/sandbox/docker.test.ts` **Step 1: Write the failing test** Create file: `src/sandbox/docker.test.ts` ```typescript import { describe, it, expect, vi, beforeEach } from 'vitest'; import { DockerSandbox, type DockerSandboxConfig } from './docker.js'; import * as childProcess from 'child_process'; // Mock child_process.execFile vi.mock('child_process', () => ({ execFile: vi.fn(), })); const mockedExecFile = vi.mocked(childProcess.execFile); function mockExecFileSuccess(stdout = '', stderr = '') { mockedExecFile.mockImplementation( (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => { (callback as (err: null, stdout: string, stderr: string) => void)(null, stdout, stderr); return {} as ReturnType; }, ); } function mockExecFileError(message: string) { mockedExecFile.mockImplementation( (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => { (callback as (err: Error) => void)(new Error(message)); return {} as ReturnType; }, ); } describe('DockerSandbox', () => { const defaultConfig: DockerSandboxConfig = { sessionId: 'test-session', image: 'node:22-slim', workspaceDir: '/workspace', network: 'none', memoryLimit: '512m', cpuLimit: '1.0', timeoutSeconds: 300, }; beforeEach(() => { vi.clearAllMocks(); }); describe('create()', () => { it('creates a docker container with correct args', async () => { mockExecFileSuccess('container-abc123'); const sandbox = new DockerSandbox(defaultConfig); await sandbox.create(); expect(mockedExecFile).toHaveBeenCalledWith( 'docker', expect.arrayContaining([ 'create', '--name', expect.stringContaining('flynn-test-session'), '--memory', '512m', '--cpus', '1.0', '--network', 'none', '-v', expect.stringContaining(':/workspace'), 'node:22-slim', 'sleep', 'infinity', ]), expect.any(Object), expect.any(Function), ); expect(sandbox.containerId).toBe('container-abc123'); }); it('starts the container after creating', async () => { mockExecFileSuccess('container-abc123'); const sandbox = new DockerSandbox(defaultConfig); await sandbox.create(); // Second call should be docker start expect(mockedExecFile).toHaveBeenCalledTimes(2); expect(mockedExecFile).toHaveBeenNthCalledWith( 2, 'docker', ['start', 'container-abc123'], expect.any(Object), expect.any(Function), ); }); it('throws if docker create fails', async () => { mockExecFileError('docker not found'); const sandbox = new DockerSandbox(defaultConfig); await expect(sandbox.create()).rejects.toThrow('docker not found'); }); }); describe('exec()', () => { it('runs command inside container', async () => { const sandbox = new DockerSandbox(defaultConfig); // Manually set container ID to skip create (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; mockExecFileSuccess('hello world\n'); const result = await sandbox.exec('echo hello world'); expect(mockedExecFile).toHaveBeenCalledWith( 'docker', ['exec', 'container-abc', 'bash', '-c', 'echo hello world'], expect.objectContaining({ timeout: expect.any(Number) }), expect.any(Function), ); expect(result).toEqual({ stdout: 'hello world\n', stderr: '' }); }); it('passes cwd as workdir option', async () => { const sandbox = new DockerSandbox(defaultConfig); (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; mockExecFileSuccess(''); await sandbox.exec('ls', { cwd: '/workspace/project' }); expect(mockedExecFile).toHaveBeenCalledWith( 'docker', ['exec', '-w', '/workspace/project', 'container-abc', 'bash', '-c', 'ls'], expect.any(Object), expect.any(Function), ); }); it('throws if no container created', async () => { const sandbox = new DockerSandbox(defaultConfig); await expect(sandbox.exec('echo hi')).rejects.toThrow('not created'); }); }); describe('destroy()', () => { it('force-removes the container', async () => { const sandbox = new DockerSandbox(defaultConfig); (sandbox as unknown as { _containerId: string })._containerId = 'container-abc'; mockExecFileSuccess(); await sandbox.destroy(); expect(mockedExecFile).toHaveBeenCalledWith( 'docker', ['rm', '-f', 'container-abc'], expect.any(Object), expect.any(Function), ); }); it('does nothing if no container', async () => { const sandbox = new DockerSandbox(defaultConfig); await sandbox.destroy(); // should not throw expect(mockedExecFile).not.toHaveBeenCalled(); }); }); describe('isAvailable()', () => { it('returns true when docker is installed', async () => { mockExecFileSuccess('Docker version 27.0.0'); const result = await DockerSandbox.isAvailable(); expect(result).toBe(true); }); it('returns false when docker is not installed', async () => { mockExecFileError('command not found'); const result = await DockerSandbox.isAvailable(); expect(result).toBe(false); }); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/sandbox/docker.test.ts` Expected: FAIL — cannot find module `./docker.js` **Step 3: Implement DockerSandbox** Create file: `src/sandbox/docker.ts` ```typescript import { execFile } from 'child_process'; export interface DockerSandboxConfig { sessionId: string; image: string; workspaceDir: string; network: 'none' | 'bridge' | 'host'; memoryLimit: string; cpuLimit: string; timeoutSeconds: number; } export interface ExecOptions { cwd?: string; timeout?: number; } export interface ExecResult { stdout: string; stderr: string; } /** * Manages a single Docker container for sandboxed tool execution. * Uses the Docker CLI directly (no SDK dependency). */ export class DockerSandbox { private config: DockerSandboxConfig; private _containerId: string | null = null; private _hostWorkdir: string; constructor(config: DockerSandboxConfig) { this.config = config; // Use a temp directory on the host, named by session const sanitizedId = config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_'); this._hostWorkdir = `/tmp/flynn-sandbox-${sanitizedId}`; } get containerId(): string | null { return this._containerId; } get containerName(): string { const sanitizedId = this.config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_'); return `flynn-${sanitizedId}`; } /** Create and start the sandbox container. */ async create(): Promise { const args = [ 'create', '--name', this.containerName, '--memory', this.config.memoryLimit, '--cpus', this.config.cpuLimit, '--network', this.config.network, '-v', `${this._hostWorkdir}:${this.config.workspaceDir}`, this.config.image, 'sleep', 'infinity', ]; const createResult = await this.dockerCmd(args); this._containerId = createResult.stdout.trim(); await this.dockerCmd(['start', this._containerId]); } /** Execute a command inside the container. */ async exec(command: string, opts?: ExecOptions): Promise { if (!this._containerId) { throw new Error('Sandbox container not created. Call create() first.'); } const args = ['exec']; if (opts?.cwd) { args.push('-w', opts.cwd); } args.push(this._containerId, 'bash', '-c', command); const timeout = opts?.timeout ?? this.config.timeoutSeconds * 1000; return this.dockerCmd(args, timeout); } /** Force-remove the container. */ async destroy(): Promise { if (!this._containerId) return; try { await this.dockerCmd(['rm', '-f', this._containerId]); } catch { // Ignore errors during cleanup } this._containerId = null; } /** Check if Docker is available on this host. */ static async isAvailable(): Promise { try { await new Promise((resolve, reject) => { execFile('docker', ['version', '--format', '{{.Server.Version}}'], { timeout: 5000, }, (error, stdout) => { if (error) reject(error); else resolve(stdout); }); }); return true; } catch { return false; } } /** Run a docker CLI command. */ private dockerCmd(args: string[], timeout = 30_000): Promise { return new Promise((resolve, reject) => { execFile('docker', args, { timeout, maxBuffer: 1024 * 1024 }, (error, stdout, stderr) => { if (error) { reject(error); return; } resolve({ stdout, stderr }); }); }); } } ``` **Step 4: Run test to verify it passes** Run: `pnpm vitest run src/sandbox/docker.test.ts` Expected: PASS (all tests) **Step 5: Commit** ```bash git add src/sandbox/docker.ts src/sandbox/docker.test.ts git commit -m "feat: add DockerSandbox class for container lifecycle" ``` --- ## Task 3: SandboxManager **Files:** - Create: `src/sandbox/manager.ts` - Create: `src/sandbox/manager.test.ts` **Step 1: Write the failing test** Create file: `src/sandbox/manager.test.ts` ```typescript import { describe, it, expect, vi, beforeEach } from 'vitest'; import { SandboxManager } from './manager.js'; import { DockerSandbox } from './docker.js'; import type { SandboxConfig } from '../config/schema.js'; // Mock DockerSandbox vi.mock('./docker.js', () => ({ DockerSandbox: vi.fn().mockImplementation(() => ({ create: vi.fn().mockResolvedValue(undefined), destroy: vi.fn().mockResolvedValue(undefined), exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }), containerId: 'mock-container', })), })); describe('SandboxManager', () => { const defaultConfig: SandboxConfig = { enabled: true, image: 'node:22-slim', workspace_dir: '/workspace', network: 'none', memory_limit: '512m', cpu_limit: '1.0', timeout_seconds: 300, }; beforeEach(() => { vi.clearAllMocks(); }); describe('getOrCreate()', () => { it('creates a new sandbox for unknown session', async () => { const manager = new SandboxManager(defaultConfig); const sandbox = await manager.getOrCreate('session-1'); expect(DockerSandbox).toHaveBeenCalledWith(expect.objectContaining({ sessionId: 'session-1', image: 'node:22-slim', })); expect(sandbox.create).toHaveBeenCalled(); }); it('reuses existing sandbox for same session', async () => { const manager = new SandboxManager(defaultConfig); const first = await manager.getOrCreate('session-1'); const second = await manager.getOrCreate('session-1'); expect(first).toBe(second); expect(DockerSandbox).toHaveBeenCalledTimes(1); }); it('creates separate sandboxes for different sessions', async () => { const manager = new SandboxManager(defaultConfig); await manager.getOrCreate('session-1'); await manager.getOrCreate('session-2'); expect(DockerSandbox).toHaveBeenCalledTimes(2); }); }); describe('destroy()', () => { it('destroys sandbox and removes from cache', async () => { const manager = new SandboxManager(defaultConfig); const sandbox = await manager.getOrCreate('session-1'); await manager.destroy('session-1'); expect(sandbox.destroy).toHaveBeenCalled(); // Should create a new one now await manager.getOrCreate('session-1'); expect(DockerSandbox).toHaveBeenCalledTimes(2); }); it('does nothing for unknown session', async () => { const manager = new SandboxManager(defaultConfig); await manager.destroy('nonexistent'); // should not throw }); }); describe('destroyAll()', () => { it('destroys all sandboxes', async () => { const manager = new SandboxManager(defaultConfig); const s1 = await manager.getOrCreate('session-1'); const s2 = await manager.getOrCreate('session-2'); await manager.destroyAll(); expect(s1.destroy).toHaveBeenCalled(); expect(s2.destroy).toHaveBeenCalled(); }); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/sandbox/manager.test.ts` Expected: FAIL — cannot find module `./manager.js` **Step 3: Implement SandboxManager** Create file: `src/sandbox/manager.ts` ```typescript import { DockerSandbox } from './docker.js'; import type { SandboxConfig } from '../config/schema.js'; /** * Manages per-session Docker sandboxes. * Creates containers lazily on first access, destroys on session cleanup. */ export class SandboxManager { private sandboxes = new Map(); private config: SandboxConfig; constructor(config: SandboxConfig) { this.config = config; } /** Get or create a sandbox for a session. */ async getOrCreate(sessionId: string): Promise { let sandbox = this.sandboxes.get(sessionId); if (sandbox) return sandbox; sandbox = new DockerSandbox({ sessionId, image: this.config.image, workspaceDir: this.config.workspace_dir, network: this.config.network, memoryLimit: this.config.memory_limit, cpuLimit: this.config.cpu_limit, timeoutSeconds: this.config.timeout_seconds, }); await sandbox.create(); this.sandboxes.set(sessionId, sandbox); return sandbox; } /** Destroy a specific session's sandbox. */ async destroy(sessionId: string): Promise { const sandbox = this.sandboxes.get(sessionId); if (!sandbox) return; await sandbox.destroy(); this.sandboxes.delete(sessionId); } /** Destroy all sandboxes (daemon shutdown). */ async destroyAll(): Promise { const entries = Array.from(this.sandboxes.entries()); await Promise.allSettled( entries.map(async ([id, sandbox]) => { await sandbox.destroy(); this.sandboxes.delete(id); }), ); } } ``` **Step 4: Run test to verify it passes** Run: `pnpm vitest run src/sandbox/manager.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/sandbox/manager.ts src/sandbox/manager.test.ts git commit -m "feat: add SandboxManager for per-session container lifecycle" ``` --- ## Task 4: Sandboxed Tool Wrappers **Files:** - Create: `src/sandbox/tools.ts` - Create: `src/sandbox/tools.test.ts` **Step 1: Write the failing test** Create file: `src/sandbox/tools.test.ts` ```typescript import { describe, it, expect, vi, beforeEach } from 'vitest'; import { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js'; import type { DockerSandbox } from './docker.js'; function mockSandbox(): DockerSandbox { return { exec: vi.fn().mockResolvedValue({ stdout: 'output', stderr: '' }), create: vi.fn(), destroy: vi.fn(), containerId: 'test-container', containerName: 'flynn-test', config: {}, } as unknown as DockerSandbox; } describe('createSandboxedShellTool', () => { let sandbox: DockerSandbox; beforeEach(() => { sandbox = mockSandbox(); }); it('has the same name as shell.exec', () => { const tool = createSandboxedShellTool(sandbox); expect(tool.name).toBe('shell.exec'); }); it('delegates to sandbox.exec', async () => { const tool = createSandboxedShellTool(sandbox); const result = await tool.execute({ command: 'echo hello' }); expect(sandbox.exec).toHaveBeenCalledWith('echo hello', { cwd: undefined, timeout: 30000 }); expect(result.success).toBe(true); expect(result.output).toBe('output'); }); it('passes cwd to sandbox.exec', async () => { const tool = createSandboxedShellTool(sandbox); await tool.execute({ command: 'ls', cwd: '/workspace/project' }); expect(sandbox.exec).toHaveBeenCalledWith('ls', { cwd: '/workspace/project', timeout: 30000 }); }); it('passes timeout to sandbox.exec', async () => { const tool = createSandboxedShellTool(sandbox); await tool.execute({ command: 'sleep 10', timeout: 5000 }); expect(sandbox.exec).toHaveBeenCalledWith('sleep 10', { cwd: undefined, timeout: 5000 }); }); it('returns error on sandbox.exec failure', async () => { (sandbox.exec as ReturnType).mockRejectedValue(new Error('container dead')); const tool = createSandboxedShellTool(sandbox); const result = await tool.execute({ command: 'fail' }); expect(result.success).toBe(false); expect(result.error).toBe('container dead'); }); it('includes stderr in output', async () => { (sandbox.exec as ReturnType).mockResolvedValue({ stdout: 'out', stderr: 'warn' }); const tool = createSandboxedShellTool(sandbox); const result = await tool.execute({ command: 'cmd' }); expect(result.output).toContain('out'); expect(result.output).toContain('stderr: warn'); }); }); describe('createSandboxedProcessStartTool', () => { let sandbox: DockerSandbox; beforeEach(() => { sandbox = mockSandbox(); }); it('has the same name as process.start', () => { const tool = createSandboxedProcessStartTool(sandbox); expect(tool.name).toBe('process.start'); }); it('runs detached command via sandbox', async () => { const tool = createSandboxedProcessStartTool(sandbox); const result = await tool.execute({ command: 'npm run dev' }); expect(sandbox.exec).toHaveBeenCalledWith( expect.stringContaining('npm run dev'), expect.any(Object), ); expect(result.success).toBe(true); expect(result.output).toContain('Started sandboxed background process'); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/sandbox/tools.test.ts` Expected: FAIL — cannot find module `./tools.js` **Step 3: Implement sandboxed tools** Create file: `src/sandbox/tools.ts` ```typescript import type { Tool, ToolResult } from '../tools/types.js'; import type { DockerSandbox } from './docker.js'; interface ShellExecArgs { command: string; cwd?: string; timeout?: number; } interface ProcessStartArgs { command: string; cwd?: string; } /** * Create a sandboxed version of shell.exec that delegates to docker exec. * Same Tool interface — drop-in replacement for the host shell.exec. */ export function createSandboxedShellTool(sandbox: DockerSandbox): Tool { return { name: 'shell.exec', description: 'Execute a shell command inside a sandboxed container and return stdout/stderr.', inputSchema: { type: 'object', properties: { command: { type: 'string', description: 'The shell command to execute' }, cwd: { type: 'string', description: 'Working directory inside the container (optional)' }, timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' }, }, required: ['command'], }, execute: async (rawArgs: unknown): Promise => { const args = rawArgs as ShellExecArgs; const timeout = args.timeout ?? 30_000; try { const result = await sandbox.exec(args.command, { cwd: args.cwd, timeout, }); const output = result.stdout + (result.stderr ? `\nstderr: ${result.stderr}` : ''); return { success: true, output }; } catch (error) { return { success: false, output: '', error: error instanceof Error ? error.message : String(error), }; } }, }; } /** * Create a sandboxed version of process.start that runs in the container. * Uses `nohup ... &` via docker exec since we can't spawn detached inside containers. */ export function createSandboxedProcessStartTool(sandbox: DockerSandbox): Tool { return { name: 'process.start', description: 'Start a command in the background inside a sandboxed container.', inputSchema: { type: 'object', properties: { command: { type: 'string', description: 'The shell command to run in the background' }, cwd: { type: 'string', description: 'Working directory inside the container (optional)' }, }, required: ['command'], }, execute: async (rawArgs: unknown): Promise => { const args = rawArgs as ProcessStartArgs; try { // Run via nohup + background in the container const wrappedCmd = `nohup bash -c '${args.command.replace(/'/g, "'\\''")}' > /tmp/proc.log 2>&1 & echo $!`; const result = await sandbox.exec(wrappedCmd, { cwd: args.cwd }); const pid = result.stdout.trim(); return { success: true, output: `Started sandboxed background process (PID ${pid})\nCommand: ${args.command}`, }; } catch (error) { return { success: false, output: '', error: error instanceof Error ? error.message : 'Failed to start sandboxed process', }; } }, }; } ``` **Step 4: Run test to verify it passes** Run: `pnpm vitest run src/sandbox/tools.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/sandbox/tools.ts src/sandbox/tools.test.ts git commit -m "feat: add sandboxed tool wrappers for shell.exec and process.start" ``` --- ## Task 5: Sandbox Barrel Export + ToolRegistry.clone() **Files:** - Create: `src/sandbox/index.ts` - Modify: `src/tools/registry.ts:19-97` **Step 1: Write the failing test for ToolRegistry.clone()** Add to a new test or extend existing tests. Create file `src/tools/registry.test.ts` (if it doesn't exist — check first): ```typescript import { describe, it, expect } from 'vitest'; import { ToolRegistry } from './registry.js'; import type { Tool } from './types.js'; function makeTool(name: string): Tool { return { name, description: `Mock ${name}`, inputSchema: { type: 'object', properties: {} }, execute: async () => ({ success: true, output: '' }), }; } describe('ToolRegistry', () => { describe('clone()', () => { it('creates a copy with all tools', () => { const reg = new ToolRegistry(); reg.register(makeTool('tool.a')); reg.register(makeTool('tool.b')); const cloned = reg.clone(); expect(cloned.list().map(t => t.name).sort()).toEqual(['tool.a', 'tool.b']); }); it('inherits the policy from original', () => { const reg = new ToolRegistry(); const mockPolicy = { filterTools: vi.fn(), isAllowed: vi.fn(), resolveAllowedNames: vi.fn(), getEffectiveProfile: vi.fn() }; reg.setPolicy(mockPolicy as any); const cloned = reg.clone(); expect(cloned.getPolicy()).toBe(mockPolicy); }); it('allows replacing tools in clone without affecting original', () => { const reg = new ToolRegistry(); const originalTool = makeTool('shell.exec'); reg.register(originalTool); const cloned = reg.clone(); const replacementTool = makeTool('shell.exec'); replacementTool.description = 'Sandboxed version'; cloned.replace(replacementTool); expect(cloned.get('shell.exec')!.description).toBe('Sandboxed version'); expect(reg.get('shell.exec')!.description).toBe('Mock shell.exec'); }); }); describe('replace()', () => { it('replaces an existing tool', () => { const reg = new ToolRegistry(); reg.register(makeTool('tool.a')); const replacement = makeTool('tool.a'); replacement.description = 'New description'; reg.replace(replacement); expect(reg.get('tool.a')!.description).toBe('New description'); }); it('throws if tool does not exist', () => { const reg = new ToolRegistry(); expect(() => reg.replace(makeTool('nonexistent'))).toThrow('not registered'); }); }); }); ``` Note: Add `import { vi } from 'vitest'` to the imports at the top. **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/tools/registry.test.ts` Expected: FAIL — `clone()` and `replace()` don't exist on ToolRegistry **Step 3: Add clone() and replace() to ToolRegistry** In `src/tools/registry.ts`, add these two methods to the `ToolRegistry` class (after the `unregister` method, around line 32): ```typescript /** Replace an existing tool with a new implementation. Throws if not registered. */ replace(tool: Tool): void { if (!this.tools.has(tool.name)) { throw new Error(`Tool '${tool.name}' is not registered — cannot replace`); } this.tools.set(tool.name, tool); } /** Create a shallow clone of this registry (new Map, same Tool objects + policy). */ clone(): ToolRegistry { const cloned = new ToolRegistry(); for (const tool of this.tools.values()) { cloned.register(tool); } if (this._policy) { cloned.setPolicy(this._policy); } return cloned; } ``` **Step 4: Create the sandbox barrel export** Create file: `src/sandbox/index.ts` ```typescript export { DockerSandbox, type DockerSandboxConfig, type ExecOptions, type ExecResult } from './docker.js'; export { SandboxManager } from './manager.js'; export { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js'; ``` **Step 5: Run tests to verify they pass** Run: `pnpm vitest run src/tools/registry.test.ts` Expected: PASS **Step 6: Run full test suite** Run: `pnpm test:run` Expected: All tests pass **Step 7: Commit** ```bash git add src/sandbox/index.ts src/tools/registry.ts src/tools/registry.test.ts git commit -m "feat: add ToolRegistry.clone() and replace() for per-session registries" ``` --- ## Task 6: Agent Config Registry **Files:** - Create: `src/agents/registry.ts` - Create: `src/agents/registry.test.ts` **Step 1: Write the failing test** Create file: `src/agents/registry.test.ts` ```typescript import { describe, it, expect } from 'vitest'; import { AgentConfigRegistry, type AgentConfig } from './registry.js'; describe('AgentConfigRegistry', () => { describe('register()', () => { it('registers a named agent config', () => { const registry = new AgentConfigRegistry(); const config: AgentConfig = { name: 'assistant', systemPrompt: 'Be helpful.' }; registry.register(config); expect(registry.get('assistant')).toEqual(config); }); it('throws on duplicate name', () => { const registry = new AgentConfigRegistry(); registry.register({ name: 'assistant' }); expect(() => registry.register({ name: 'assistant' })).toThrow('already registered'); }); }); describe('get()', () => { it('returns undefined for unknown name', () => { const registry = new AgentConfigRegistry(); expect(registry.get('nonexistent')).toBeUndefined(); }); }); describe('list()', () => { it('returns all registered configs', () => { const registry = new AgentConfigRegistry(); registry.register({ name: 'a' }); registry.register({ name: 'b' }); expect(registry.list().map(c => c.name).sort()).toEqual(['a', 'b']); }); }); describe('loadFromConfig()', () => { it('loads configs from a raw config object', () => { const registry = new AgentConfigRegistry(); registry.loadFromConfig({ assistant: { system_prompt: 'Be helpful.', model_tier: 'default', tool_profile: 'messaging', sandbox: false, }, coder: { model_tier: 'complex', tool_profile: 'coding', sandbox: true, }, }); expect(registry.list()).toHaveLength(2); const assistant = registry.get('assistant')!; expect(assistant.systemPrompt).toBe('Be helpful.'); expect(assistant.modelTier).toBe('default'); expect(assistant.toolProfile).toBe('messaging'); const coder = registry.get('coder')!; expect(coder.sandbox).toBe(true); }); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/agents/registry.test.ts` Expected: FAIL — cannot find module `./registry.js` **Step 3: Implement AgentConfigRegistry** Create file: `src/agents/registry.ts` ```typescript import type { ToolProfile, ToolOverrideConfig } from '../config/schema.js'; import type { ModelTier } from '../models/router.js'; export interface AgentConfig { name: string; systemPrompt?: string; modelTier?: ModelTier; toolProfile?: ToolProfile; toolOverrides?: ToolOverrideConfig; sandbox?: boolean; } /** * AgentConfigRegistry — stores named agent configurations. * Loaded from YAML config at startup. */ export class AgentConfigRegistry { private configs = new Map(); register(config: AgentConfig): void { if (this.configs.has(config.name)) { throw new Error(`Agent config '${config.name}' is already registered`); } this.configs.set(config.name, config); } get(name: string): AgentConfig | undefined { return this.configs.get(name); } list(): AgentConfig[] { return Array.from(this.configs.values()); } /** * Load agent configs from the parsed YAML config. * Maps from the config schema format to the internal AgentConfig format. */ loadFromConfig(rawConfigs: Record): void { for (const [name, raw] of Object.entries(rawConfigs)) { this.register({ name, systemPrompt: raw.system_prompt, modelTier: raw.model_tier as ModelTier | undefined, toolProfile: raw.tool_profile as ToolProfile | undefined, toolOverrides: raw.tool_overrides, sandbox: raw.sandbox, }); } } } ``` **Step 4: Run test to verify it passes** Run: `pnpm vitest run src/agents/registry.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/agents/registry.ts src/agents/registry.test.ts git commit -m "feat: add AgentConfigRegistry for named agent configurations" ``` --- ## Task 7: Agent Router **Files:** - Create: `src/agents/router.ts` - Create: `src/agents/router.test.ts` **Step 1: Write the failing test** Create file: `src/agents/router.test.ts` ```typescript import { describe, it, expect } from 'vitest'; import { AgentRouter, type RoutingConfig } from './router.js'; describe('AgentRouter', () => { describe('resolve()', () => { it('returns default_agent when no specific match', () => { const router = new AgentRouter({ default_agent: 'assistant', channels: {}, senders: {}, }); expect(router.resolve('telegram', '12345')).toBe('assistant'); }); it('returns undefined when no default and no match', () => { const router = new AgentRouter({ channels: {}, senders: {}, }); expect(router.resolve('telegram', '12345')).toBeUndefined(); }); it('matches exact sender', () => { const router = new AgentRouter({ default_agent: 'assistant', channels: {}, senders: { 'telegram:12345': 'coder' }, }); expect(router.resolve('telegram', '12345')).toBe('coder'); }); it('matches sender with glob pattern', () => { const router = new AgentRouter({ default_agent: 'assistant', channels: {}, senders: { 'slack:U0*': 'coder' }, }); expect(router.resolve('slack', 'U0ABC')).toBe('coder'); expect(router.resolve('slack', 'U1ABC')).toBeUndefined(); // no channel match, no default... wait }); it('matches channel when no sender match', () => { const router = new AgentRouter({ default_agent: 'assistant', channels: { discord: 'coder' }, senders: {}, }); expect(router.resolve('discord', 'any-user')).toBe('coder'); }); it('sender match takes priority over channel match', () => { const router = new AgentRouter({ default_agent: 'assistant', channels: { discord: 'coder' }, senders: { 'discord:special-user': 'vip' }, }); expect(router.resolve('discord', 'special-user')).toBe('vip'); expect(router.resolve('discord', 'normal-user')).toBe('coder'); }); it('falls through: sender → channel → default', () => { const router = new AgentRouter({ default_agent: 'fallback', channels: { discord: 'guild-agent' }, senders: { 'discord:admin': 'admin-agent' }, }); expect(router.resolve('discord', 'admin')).toBe('admin-agent'); expect(router.resolve('discord', 'regular')).toBe('guild-agent'); expect(router.resolve('telegram', 'someone')).toBe('fallback'); }); }); }); ``` **Step 2: Run test to verify it fails** Run: `pnpm vitest run src/agents/router.test.ts` Expected: FAIL — cannot find module `./router.js` **Step 3: Implement AgentRouter** Create file: `src/agents/router.ts` ```typescript /** * AgentRouter resolves which agent config to use for a given channel+sender. * * Resolution order: * 1. Exact sender match (channel:senderId) * 2. Glob pattern sender match * 3. Channel match * 4. default_agent fallback */ export interface RoutingConfig { default_agent?: string; channels: Record; senders: Record; } /** * Convert a simple glob pattern to regex. * Supports `*` (any chars) with `.` escaped. */ function patternToRegex(pattern: string): RegExp { const escaped = pattern .replace(/[.+^${}()|[\]\\]/g, '\\$&') .replace(/\*/g, '.*'); return new RegExp(`^${escaped}$`); } export class AgentRouter { private config: RoutingConfig; constructor(config: RoutingConfig) { this.config = config; } /** * Resolve the agent config name for a channel + sender pair. * Returns undefined if no match and no default. */ resolve(channel: string, senderId: string): string | undefined { const senderKey = `${channel}:${senderId}`; // 1. Exact sender match if (this.config.senders[senderKey]) { return this.config.senders[senderKey]; } // 2. Glob pattern sender match for (const [pattern, agentName] of Object.entries(this.config.senders)) { if (pattern.includes('*') && patternToRegex(pattern).test(senderKey)) { return agentName; } } // 3. Channel match if (this.config.channels[channel]) { return this.config.channels[channel]; } // 4. Default fallback return this.config.default_agent; } } ``` **Step 4: Run test to verify it passes** Run: `pnpm vitest run src/agents/router.test.ts` Expected: PASS **Step 5: Commit** ```bash git add src/agents/router.ts src/agents/router.test.ts git commit -m "feat: add AgentRouter for config-based sender/channel routing" ``` --- ## Task 8: Agents Barrel Export **Files:** - Create: `src/agents/index.ts` **Step 1: Create the barrel file** Create file: `src/agents/index.ts` ```typescript export { AgentConfigRegistry, type AgentConfig } from './registry.js'; export { AgentRouter, type RoutingConfig } from './router.js'; ``` **Step 2: Verify build** Run: `pnpm typecheck` Expected: No errors **Step 3: Commit** ```bash git add src/agents/index.ts git commit -m "feat: add agents barrel export" ``` --- ## Task 9: Wire Everything Into the Daemon **Files:** - Modify: `src/daemon/index.ts` This is the integration task. The daemon's `createMessageRouter()` needs to use the `AgentRouter` and `SandboxManager`. **Step 1: Write the integration test** Create file: `src/daemon/routing.test.ts` ```typescript import { describe, it, expect, vi } from 'vitest'; import { AgentRouter } from '../agents/router.js'; import { AgentConfigRegistry } from '../agents/registry.js'; describe('daemon agent routing integration', () => { it('resolves agent config for channel messages', () => { const registry = new AgentConfigRegistry(); registry.loadFromConfig({ assistant: { system_prompt: 'Be helpful.', model_tier: 'default', tool_profile: 'messaging', sandbox: false }, coder: { system_prompt: 'Write code.', model_tier: 'complex', tool_profile: 'coding', sandbox: true }, }); const router = new AgentRouter({ default_agent: 'assistant', channels: { discord: 'coder' }, senders: { 'telegram:admin': 'coder' }, }); // Discord user gets coder const discordAgent = router.resolve('discord', 'user123'); expect(discordAgent).toBe('coder'); expect(registry.get(discordAgent!)!.systemPrompt).toBe('Write code.'); // Telegram admin gets coder const telegramAdmin = router.resolve('telegram', 'admin'); expect(telegramAdmin).toBe('coder'); // Random telegram user gets assistant const telegramUser = router.resolve('telegram', 'random'); expect(telegramUser).toBe('assistant'); expect(registry.get(telegramUser!)!.systemPrompt).toBe('Be helpful.'); }); it('uses default agent when no routing configured', () => { const router = new AgentRouter({ channels: {}, senders: {} }); expect(router.resolve('telegram', '123')).toBeUndefined(); }); }); ``` **Step 2: Run test to verify it passes** Run: `pnpm vitest run src/daemon/routing.test.ts` Expected: PASS (these are testing already-built components together) **Step 3: Modify daemon/index.ts** Add imports at the top of `src/daemon/index.ts` (after existing imports): ```typescript import { AgentConfigRegistry, AgentRouter } from '../agents/index.js'; import { SandboxManager, createSandboxedShellTool, createSandboxedProcessStartTool } from '../sandbox/index.js'; ``` Add to `DaemonContext` interface: ```typescript agentConfigRegistry: AgentConfigRegistry; agentRouter: AgentRouter; sandboxManager?: SandboxManager; ``` Modify `createMessageRouter()` to accept additional dependencies: ```typescript function createMessageRouter(deps: { sessionManager: SessionManager; modelRouter: ModelRouter; systemPrompt: string; toolRegistry: ToolRegistry; toolExecutor: ToolExecutor; config: Config; memoryStore?: MemoryStore; agentConfigRegistry?: AgentConfigRegistry; agentRouter?: AgentRouter; sandboxManager?: SandboxManager; }) { ``` Inside `getOrCreateAgent()`, resolve the agent config and create sandboxed registries: ```typescript function getOrCreateAgent(channel: string, senderId: string): AgentOrchestrator { // Resolve agent config name from routing const agentConfigName = deps.agentRouter?.resolve(channel, senderId); const agentConfig = agentConfigName ? deps.agentConfigRegistry?.get(agentConfigName) : undefined; const cacheKey = agentConfigName ? `${channel}:${senderId}:${agentConfigName}` : `${channel}:${senderId}`; let agent = agents.get(cacheKey); if (!agent) { const session = deps.sessionManager.getSession(channel, senderId); // Determine system prompt — agent config overrides global const systemPrompt = agentConfig?.systemPrompt ?? deps.systemPrompt; // Determine primary tier const primaryTier = agentConfig?.modelTier ?? deps.config.agents.primary_tier ?? 'default'; // Determine tool policy context const toolPolicyContext: ToolPolicyContext = { agent: primaryTier, provider: deps.config.models.default.provider, }; // Determine tool registry — sandbox if configured let toolRegistry = deps.toolRegistry; if (agentConfig?.sandbox && deps.sandboxManager && deps.config.sandbox.enabled) { // Create a cloned registry with sandboxed tools toolRegistry = deps.toolRegistry.clone(); // Sandbox will be created lazily on first tool call // For now, create a wrapper that handles lazy initialization const sessionId = `${channel}:${senderId}`; const sandbox = deps.sandboxManager; const sandboxConfig = deps.config.sandbox; // Replace shell.exec and process.start with lazy-sandboxed versions const lazySandboxedShell: Tool = { name: 'shell.exec', description: 'Execute a shell command inside a sandboxed container.', inputSchema: { type: 'object', properties: { command: { type: 'string', description: 'The shell command to execute' }, cwd: { type: 'string', description: 'Working directory (optional)' }, timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' }, }, required: ['command'], }, execute: async (rawArgs: unknown) => { const dockerSandbox = await sandbox.getOrCreate(sessionId); const tool = createSandboxedShellTool(dockerSandbox); return tool.execute(rawArgs); }, }; const lazySandboxedProcessStart: Tool = { name: 'process.start', description: 'Start a command in the background inside a sandboxed container.', inputSchema: { type: 'object', properties: { command: { type: 'string', description: 'The shell command to run' }, cwd: { type: 'string', description: 'Working directory (optional)' }, }, required: ['command'], }, execute: async (rawArgs: unknown) => { const dockerSandbox = await sandbox.getOrCreate(sessionId); const tool = createSandboxedProcessStartTool(dockerSandbox); return tool.execute(rawArgs); }, }; toolRegistry.replace(lazySandboxedShell); toolRegistry.replace(lazySandboxedProcessStart); } const delegationConfig: DelegationConfig = { compaction: deps.config.agents.delegation.compaction ?? 'fast', memory_extraction: deps.config.agents.delegation.memory_extraction ?? 'fast', classification: deps.config.agents.delegation.classification ?? 'fast', tool_summarisation: deps.config.agents.delegation.tool_summarisation ?? 'fast', complex_reasoning: deps.config.agents.delegation.complex_reasoning ?? 'complex', }; agent = new AgentOrchestrator({ modelRouter: deps.modelRouter, systemPrompt, session, toolRegistry, toolExecutor: deps.toolExecutor, primaryTier, delegation: delegationConfig, maxDelegationDepth: deps.config.agents.max_delegation_depth ?? 3, compaction: deps.config.compaction.enabled ? { thresholdPct: deps.config.compaction.threshold_pct, keepTurns: deps.config.compaction.keep_turns, summaryMaxTokens: deps.config.compaction.summary_max_tokens, } : undefined, modelName: deps.config.models.default.model, contextWindow: deps.config.models.default.context_window, memoryStore: deps.memoryStore, toolPolicyContext, }); agents.set(cacheKey, agent); } return agent; } ``` In `startDaemon()`, add agent config registry and router initialization after skills loading (around line 385): ```typescript // Initialize agent config registry and router const agentConfigRegistry = new AgentConfigRegistry(); if (config.agent_configs && Object.keys(config.agent_configs).length > 0) { agentConfigRegistry.loadFromConfig(config.agent_configs); console.log(`Loaded ${Object.keys(config.agent_configs).length} agent config(s): ${Object.keys(config.agent_configs).join(', ')}`); } const agentRouter = new AgentRouter(config.routing); // Initialize sandbox manager if enabled let sandboxManager: SandboxManager | undefined; if (config.sandbox.enabled) { const dockerAvailable = await DockerSandbox.isAvailable(); if (dockerAvailable) { sandboxManager = new SandboxManager(config.sandbox); console.log(`Docker sandbox enabled: image=${config.sandbox.image}, network=${config.sandbox.network}`); } else { console.warn('Docker sandbox enabled in config but Docker is not available — falling back to host execution'); } } ``` Add sandbox shutdown hook: ```typescript if (sandboxManager) { lifecycle.onShutdown(async () => { await sandboxManager!.destroyAll(); console.log('Docker sandboxes destroyed'); }); } ``` Pass new deps to `createMessageRouter()`: ```typescript channelRegistry.setMessageHandler(createMessageRouter({ sessionManager, modelRouter, systemPrompt, toolRegistry, toolExecutor, config, memoryStore, agentConfigRegistry, agentRouter, sandboxManager, })); ``` Add to DaemonContext return: ```typescript return { config, lifecycle, sessionStore, sessionManager, hookEngine, modelRouter, toolRegistry, toolExecutor, gateway, channelRegistry, mcpManager, skillRegistry, skillInstaller, agentConfigRegistry, agentRouter, sandboxManager, }; ``` Note: You'll need to import `DockerSandbox` and the `Tool` type at the top, and import `ToolPolicyContext`: ```typescript import { DockerSandbox } from '../sandbox/index.js'; import type { Tool } from '../tools/types.js'; import type { ToolPolicyContext } from '../tools/policy.js'; ``` **Step 4: Run full test suite** Run: `pnpm test:run` Expected: All tests pass **Step 5: Run typecheck** Run: `pnpm typecheck` Expected: No errors **Step 6: Commit** ```bash git add src/daemon/index.ts src/daemon/routing.test.ts git commit -m "feat: wire Docker sandboxing and agent routing into daemon" ``` --- ## Task 10: Update state.json + Final Verification **Files:** - Modify: `docs/plans/state.json` **Step 1: Run full test suite and typecheck** Run: `pnpm test:run && pnpm typecheck` Expected: All tests pass, no type errors **Step 2: Update state.json** Add the new P2 entries to `docs/plans/state.json` under the `p2-implementation` plan's `phases` object: ```json "docker_sandboxing": { "priority": "P2", "status": "completed", "description": "Docker container sandboxing for channel tool execution (shell.exec, process.start)", "files_created": [ "src/sandbox/docker.ts", "src/sandbox/docker.test.ts", "src/sandbox/manager.ts", "src/sandbox/manager.test.ts", "src/sandbox/tools.ts", "src/sandbox/tools.test.ts", "src/sandbox/index.ts" ], "files_modified": [ "src/config/schema.ts", "src/config/index.ts", "src/tools/registry.ts", "src/daemon/index.ts" ], "test_status": "N/N passing" }, "multi_agent_routing": { "priority": "P2", "status": "completed", "description": "Named agent configs with config-based channel/sender routing", "files_created": [ "src/agents/registry.ts", "src/agents/registry.test.ts", "src/agents/router.ts", "src/agents/router.test.ts", "src/agents/index.ts", "src/daemon/routing.test.ts", "src/config/schema.test.ts" ], "files_modified": [ "src/config/schema.ts", "src/config/index.ts", "src/daemon/index.ts" ], "test_status": "N/N passing" } ``` Update `overall_progress.p2_completion` to `"7/7 (100%)"` and `next_up` to `"p3 (group chat, gateway auth, gemini provider, browser control, additional providers)"`. Update `overall_progress.total_test_count` with the actual count. **Step 3: Commit** ```bash git add docs/plans/state.json git commit -m "docs: update state.json with Docker sandbox and multi-agent routing" ``` --- ## Summary | Task | Component | Est. Time | |------|-----------|-----------| | 1 | Config schemas (sandbox + agent_configs + routing) | 5 min | | 2 | DockerSandbox class | 5 min | | 3 | SandboxManager | 3 min | | 4 | Sandboxed tool wrappers | 5 min | | 5 | Barrel export + ToolRegistry.clone() | 3 min | | 6 | AgentConfigRegistry | 3 min | | 7 | AgentRouter | 3 min | | 8 | Agents barrel export | 1 min | | 9 | Daemon integration | 10 min | | 10 | State update + verification | 3 min | **Total estimated: ~40 minutes**