Files
flynn/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-implementation.md
T

54 KiB

P2: Docker Sandboxing + Multi-Agent Routing — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add Docker container sandboxing for channel tool execution and named agent configuration with config-based routing.

Architecture: Tool-level wrapping — sandboxed shell.exec and process.start delegate to docker exec inside per-session containers. Agent config registry stores named agent definitions (system prompt, model tier, tool profile, sandbox flag) with config-based routing that maps channels/senders to agent configs.

Tech Stack: TypeScript (ES2022, NodeNext), Zod schemas, Vitest tests, Docker CLI (no SDK dependency), child_process.execFile.


Task 1: Config Schema — Sandbox + Agent Configs + Routing

Files:

  • Modify: src/config/schema.ts:164-231
  • Modify: src/config/index.ts:1-3

Step 1: Write the failing test

Create file: src/config/schema.test.ts

import { describe, it, expect } from 'vitest';
import { configSchema } from './schema.js';

describe('configSchema — sandbox', () => {
  const minimalConfig = {
    telegram: { bot_token: 'test', allowed_chat_ids: [1] },
    models: { default: { provider: 'anthropic', model: 'claude-3' } },
  };

  it('defaults sandbox to disabled', () => {
    const result = configSchema.parse(minimalConfig);
    expect(result.sandbox.enabled).toBe(false);
    expect(result.sandbox.image).toBe('node:22-slim');
    expect(result.sandbox.network).toBe('none');
    expect(result.sandbox.memory_limit).toBe('512m');
    expect(result.sandbox.cpu_limit).toBe('1.0');
    expect(result.sandbox.timeout_seconds).toBe(300);
  });

  it('accepts sandbox config', () => {
    const result = configSchema.parse({
      ...minimalConfig,
      sandbox: { enabled: true, image: 'ubuntu:24.04', network: 'bridge' },
    });
    expect(result.sandbox.enabled).toBe(true);
    expect(result.sandbox.image).toBe('ubuntu:24.04');
    expect(result.sandbox.network).toBe('bridge');
  });
});

describe('configSchema — agent_configs', () => {
  const minimalConfig = {
    telegram: { bot_token: 'test', allowed_chat_ids: [1] },
    models: { default: { provider: 'anthropic', model: 'claude-3' } },
  };

  it('defaults agent_configs to empty', () => {
    const result = configSchema.parse(minimalConfig);
    expect(result.agent_configs).toEqual({});
  });

  it('accepts named agent configs', () => {
    const result = configSchema.parse({
      ...minimalConfig,
      agent_configs: {
        assistant: {
          system_prompt: 'You are helpful.',
          model_tier: 'default',
          tool_profile: 'messaging',
        },
        coder: {
          model_tier: 'complex',
          tool_profile: 'coding',
          sandbox: true,
        },
      },
    });
    expect(result.agent_configs.assistant.system_prompt).toBe('You are helpful.');
    expect(result.agent_configs.assistant.tool_profile).toBe('messaging');
    expect(result.agent_configs.coder.sandbox).toBe(true);
  });
});

describe('configSchema — routing', () => {
  const minimalConfig = {
    telegram: { bot_token: 'test', allowed_chat_ids: [1] },
    models: { default: { provider: 'anthropic', model: 'claude-3' } },
  };

  it('defaults routing to empty', () => {
    const result = configSchema.parse(minimalConfig);
    expect(result.routing.default_agent).toBeUndefined();
    expect(result.routing.channels).toEqual({});
    expect(result.routing.senders).toEqual({});
  });

  it('accepts routing config', () => {
    const result = configSchema.parse({
      ...minimalConfig,
      routing: {
        default_agent: 'assistant',
        channels: { discord: 'coder' },
        senders: { 'telegram:12345': 'coder' },
      },
    });
    expect(result.routing.default_agent).toBe('assistant');
    expect(result.routing.channels.discord).toBe('coder');
    expect(result.routing.senders['telegram:12345']).toBe('coder');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/config/schema.test.ts Expected: FAIL — sandbox, agent_configs, and routing properties don't exist on config

Step 3: Implement the schema additions

Add to src/config/schema.ts before the configSchema definition (before line 192):

// ── Sandbox schemas ───────────────────────────────────────────────────

const sandboxSchema = z.object({
  enabled: z.boolean().default(false),
  image: z.string().default('node:22-slim'),
  workspace_dir: z.string().default('/workspace'),
  network: z.enum(['none', 'bridge', 'host']).default('none'),
  memory_limit: z.string().default('512m'),
  cpu_limit: z.string().default('1.0'),
  timeout_seconds: z.number().min(10).max(3600).default(300),
}).default({});

// ── Agent config + routing schemas ────────────────────────────────────

const modelTierEnum = z.enum(['fast', 'default', 'complex', 'local']);

const agentConfigEntrySchema = z.object({
  system_prompt: z.string().optional(),
  model_tier: modelTierEnum.optional(),
  tool_profile: toolProfileEnum.optional(),
  tool_overrides: toolOverrideSchema.optional(),
  sandbox: z.boolean().default(false),
});

const agentConfigsSchema = z.record(z.string(), agentConfigEntrySchema).default({});

const routingSchema = z.object({
  default_agent: z.string().optional(),
  channels: z.record(z.string(), z.string()).default({}),
  senders: z.record(z.string(), z.string()).default({}),
}).default({});

Then add to the configSchema z.object (around line 192-212), add these three new fields:

  sandbox: sandboxSchema,
  agent_configs: agentConfigsSchema,
  routing: routingSchema,

And add type exports at the end (after line 230):

export type SandboxConfig = z.infer<typeof sandboxSchema>;
export type AgentConfigEntry = z.infer<typeof agentConfigEntrySchema>;
export type RoutingConfig = z.infer<typeof routingSchema>;

Step 4: Update src/config/index.ts barrel export

Add the new types to the export line:

export { configSchema, type Config, type TelegramConfig, type ModelConfig, type CronJobConfig, type AgentsConfig, type CompactionConfig, type ToolProfile, type ToolOverrideConfig, type ToolsConfig, type SandboxConfig, type AgentConfigEntry, type RoutingConfig } from './schema.js';

Step 5: Run test to verify it passes

Run: pnpm vitest run src/config/schema.test.ts Expected: PASS (all 6 tests)

Step 6: Run full test suite

Run: pnpm test:run Expected: All 606+ tests pass

Step 7: Commit

git add src/config/schema.ts src/config/schema.test.ts src/config/index.ts
git commit -m "feat: add sandbox, agent_configs, and routing config schemas"

Task 2: DockerSandbox Class

Files:

  • Create: src/sandbox/docker.ts
  • Create: src/sandbox/docker.test.ts

Step 1: Write the failing test

Create file: src/sandbox/docker.test.ts

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { DockerSandbox, type DockerSandboxConfig } from './docker.js';
import * as childProcess from 'child_process';

// Mock child_process.execFile
vi.mock('child_process', () => ({
  execFile: vi.fn(),
}));

const mockedExecFile = vi.mocked(childProcess.execFile);

function mockExecFileSuccess(stdout = '', stderr = '') {
  mockedExecFile.mockImplementation(
    (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => {
      (callback as (err: null, stdout: string, stderr: string) => void)(null, stdout, stderr);
      return {} as ReturnType<typeof childProcess.execFile>;
    },
  );
}

function mockExecFileError(message: string) {
  mockedExecFile.mockImplementation(
    (_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => {
      (callback as (err: Error) => void)(new Error(message));
      return {} as ReturnType<typeof childProcess.execFile>;
    },
  );
}

describe('DockerSandbox', () => {
  const defaultConfig: DockerSandboxConfig = {
    sessionId: 'test-session',
    image: 'node:22-slim',
    workspaceDir: '/workspace',
    network: 'none',
    memoryLimit: '512m',
    cpuLimit: '1.0',
    timeoutSeconds: 300,
  };

  beforeEach(() => {
    vi.clearAllMocks();
  });

  describe('create()', () => {
    it('creates a docker container with correct args', async () => {
      mockExecFileSuccess('container-abc123');
      const sandbox = new DockerSandbox(defaultConfig);
      await sandbox.create();

      expect(mockedExecFile).toHaveBeenCalledWith(
        'docker',
        expect.arrayContaining([
          'create',
          '--name', expect.stringContaining('flynn-test-session'),
          '--memory', '512m',
          '--cpus', '1.0',
          '--network', 'none',
          '-v', expect.stringContaining(':/workspace'),
          'node:22-slim',
          'sleep', 'infinity',
        ]),
        expect.any(Object),
        expect.any(Function),
      );
      expect(sandbox.containerId).toBe('container-abc123');
    });

    it('starts the container after creating', async () => {
      mockExecFileSuccess('container-abc123');
      const sandbox = new DockerSandbox(defaultConfig);
      await sandbox.create();

      // Second call should be docker start
      expect(mockedExecFile).toHaveBeenCalledTimes(2);
      expect(mockedExecFile).toHaveBeenNthCalledWith(
        2, 'docker', ['start', 'container-abc123'],
        expect.any(Object), expect.any(Function),
      );
    });

    it('throws if docker create fails', async () => {
      mockExecFileError('docker not found');
      const sandbox = new DockerSandbox(defaultConfig);
      await expect(sandbox.create()).rejects.toThrow('docker not found');
    });
  });

  describe('exec()', () => {
    it('runs command inside container', async () => {
      const sandbox = new DockerSandbox(defaultConfig);
      // Manually set container ID to skip create
      (sandbox as unknown as { _containerId: string })._containerId = 'container-abc';

      mockExecFileSuccess('hello world\n');
      const result = await sandbox.exec('echo hello world');

      expect(mockedExecFile).toHaveBeenCalledWith(
        'docker',
        ['exec', 'container-abc', 'bash', '-c', 'echo hello world'],
        expect.objectContaining({ timeout: expect.any(Number) }),
        expect.any(Function),
      );
      expect(result).toEqual({ stdout: 'hello world\n', stderr: '' });
    });

    it('passes cwd as workdir option', async () => {
      const sandbox = new DockerSandbox(defaultConfig);
      (sandbox as unknown as { _containerId: string })._containerId = 'container-abc';

      mockExecFileSuccess('');
      await sandbox.exec('ls', { cwd: '/workspace/project' });

      expect(mockedExecFile).toHaveBeenCalledWith(
        'docker',
        ['exec', '-w', '/workspace/project', 'container-abc', 'bash', '-c', 'ls'],
        expect.any(Object),
        expect.any(Function),
      );
    });

    it('throws if no container created', async () => {
      const sandbox = new DockerSandbox(defaultConfig);
      await expect(sandbox.exec('echo hi')).rejects.toThrow('not created');
    });
  });

  describe('destroy()', () => {
    it('force-removes the container', async () => {
      const sandbox = new DockerSandbox(defaultConfig);
      (sandbox as unknown as { _containerId: string })._containerId = 'container-abc';

      mockExecFileSuccess();
      await sandbox.destroy();

      expect(mockedExecFile).toHaveBeenCalledWith(
        'docker', ['rm', '-f', 'container-abc'],
        expect.any(Object), expect.any(Function),
      );
    });

    it('does nothing if no container', async () => {
      const sandbox = new DockerSandbox(defaultConfig);
      await sandbox.destroy(); // should not throw
      expect(mockedExecFile).not.toHaveBeenCalled();
    });
  });

  describe('isAvailable()', () => {
    it('returns true when docker is installed', async () => {
      mockExecFileSuccess('Docker version 27.0.0');
      const result = await DockerSandbox.isAvailable();
      expect(result).toBe(true);
    });

    it('returns false when docker is not installed', async () => {
      mockExecFileError('command not found');
      const result = await DockerSandbox.isAvailable();
      expect(result).toBe(false);
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/sandbox/docker.test.ts Expected: FAIL — cannot find module ./docker.js

Step 3: Implement DockerSandbox

Create file: src/sandbox/docker.ts

import { execFile } from 'child_process';

export interface DockerSandboxConfig {
  sessionId: string;
  image: string;
  workspaceDir: string;
  network: 'none' | 'bridge' | 'host';
  memoryLimit: string;
  cpuLimit: string;
  timeoutSeconds: number;
}

export interface ExecOptions {
  cwd?: string;
  timeout?: number;
}

export interface ExecResult {
  stdout: string;
  stderr: string;
}

/**
 * Manages a single Docker container for sandboxed tool execution.
 * Uses the Docker CLI directly (no SDK dependency).
 */
export class DockerSandbox {
  private config: DockerSandboxConfig;
  private _containerId: string | null = null;
  private _hostWorkdir: string;

  constructor(config: DockerSandboxConfig) {
    this.config = config;
    // Use a temp directory on the host, named by session
    const sanitizedId = config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_');
    this._hostWorkdir = `/tmp/flynn-sandbox-${sanitizedId}`;
  }

  get containerId(): string | null {
    return this._containerId;
  }

  get containerName(): string {
    const sanitizedId = this.config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_');
    return `flynn-${sanitizedId}`;
  }

  /** Create and start the sandbox container. */
  async create(): Promise<void> {
    const args = [
      'create',
      '--name', this.containerName,
      '--memory', this.config.memoryLimit,
      '--cpus', this.config.cpuLimit,
      '--network', this.config.network,
      '-v', `${this._hostWorkdir}:${this.config.workspaceDir}`,
      this.config.image,
      'sleep', 'infinity',
    ];

    const createResult = await this.dockerCmd(args);
    this._containerId = createResult.stdout.trim();

    await this.dockerCmd(['start', this._containerId]);
  }

  /** Execute a command inside the container. */
  async exec(command: string, opts?: ExecOptions): Promise<ExecResult> {
    if (!this._containerId) {
      throw new Error('Sandbox container not created. Call create() first.');
    }

    const args = ['exec'];
    if (opts?.cwd) {
      args.push('-w', opts.cwd);
    }
    args.push(this._containerId, 'bash', '-c', command);

    const timeout = opts?.timeout ?? this.config.timeoutSeconds * 1000;
    return this.dockerCmd(args, timeout);
  }

  /** Force-remove the container. */
  async destroy(): Promise<void> {
    if (!this._containerId) return;

    try {
      await this.dockerCmd(['rm', '-f', this._containerId]);
    } catch {
      // Ignore errors during cleanup
    }
    this._containerId = null;
  }

  /** Check if Docker is available on this host. */
  static async isAvailable(): Promise<boolean> {
    try {
      await new Promise<string>((resolve, reject) => {
        execFile('docker', ['version', '--format', '{{.Server.Version}}'], {
          timeout: 5000,
        }, (error, stdout) => {
          if (error) reject(error);
          else resolve(stdout);
        });
      });
      return true;
    } catch {
      return false;
    }
  }

  /** Run a docker CLI command. */
  private dockerCmd(args: string[], timeout = 30_000): Promise<ExecResult> {
    return new Promise((resolve, reject) => {
      execFile('docker', args, { timeout, maxBuffer: 1024 * 1024 }, (error, stdout, stderr) => {
        if (error) {
          reject(error);
          return;
        }
        resolve({ stdout, stderr });
      });
    });
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/sandbox/docker.test.ts Expected: PASS (all tests)

Step 5: Commit

git add src/sandbox/docker.ts src/sandbox/docker.test.ts
git commit -m "feat: add DockerSandbox class for container lifecycle"

Task 3: SandboxManager

Files:

  • Create: src/sandbox/manager.ts
  • Create: src/sandbox/manager.test.ts

Step 1: Write the failing test

Create file: src/sandbox/manager.test.ts

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { SandboxManager } from './manager.js';
import { DockerSandbox } from './docker.js';
import type { SandboxConfig } from '../config/schema.js';

// Mock DockerSandbox
vi.mock('./docker.js', () => ({
  DockerSandbox: vi.fn().mockImplementation(() => ({
    create: vi.fn().mockResolvedValue(undefined),
    destroy: vi.fn().mockResolvedValue(undefined),
    exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }),
    containerId: 'mock-container',
  })),
}));

describe('SandboxManager', () => {
  const defaultConfig: SandboxConfig = {
    enabled: true,
    image: 'node:22-slim',
    workspace_dir: '/workspace',
    network: 'none',
    memory_limit: '512m',
    cpu_limit: '1.0',
    timeout_seconds: 300,
  };

  beforeEach(() => {
    vi.clearAllMocks();
  });

  describe('getOrCreate()', () => {
    it('creates a new sandbox for unknown session', async () => {
      const manager = new SandboxManager(defaultConfig);
      const sandbox = await manager.getOrCreate('session-1');

      expect(DockerSandbox).toHaveBeenCalledWith(expect.objectContaining({
        sessionId: 'session-1',
        image: 'node:22-slim',
      }));
      expect(sandbox.create).toHaveBeenCalled();
    });

    it('reuses existing sandbox for same session', async () => {
      const manager = new SandboxManager(defaultConfig);
      const first = await manager.getOrCreate('session-1');
      const second = await manager.getOrCreate('session-1');

      expect(first).toBe(second);
      expect(DockerSandbox).toHaveBeenCalledTimes(1);
    });

    it('creates separate sandboxes for different sessions', async () => {
      const manager = new SandboxManager(defaultConfig);
      await manager.getOrCreate('session-1');
      await manager.getOrCreate('session-2');

      expect(DockerSandbox).toHaveBeenCalledTimes(2);
    });
  });

  describe('destroy()', () => {
    it('destroys sandbox and removes from cache', async () => {
      const manager = new SandboxManager(defaultConfig);
      const sandbox = await manager.getOrCreate('session-1');

      await manager.destroy('session-1');
      expect(sandbox.destroy).toHaveBeenCalled();

      // Should create a new one now
      await manager.getOrCreate('session-1');
      expect(DockerSandbox).toHaveBeenCalledTimes(2);
    });

    it('does nothing for unknown session', async () => {
      const manager = new SandboxManager(defaultConfig);
      await manager.destroy('nonexistent'); // should not throw
    });
  });

  describe('destroyAll()', () => {
    it('destroys all sandboxes', async () => {
      const manager = new SandboxManager(defaultConfig);
      const s1 = await manager.getOrCreate('session-1');
      const s2 = await manager.getOrCreate('session-2');

      await manager.destroyAll();
      expect(s1.destroy).toHaveBeenCalled();
      expect(s2.destroy).toHaveBeenCalled();
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/sandbox/manager.test.ts Expected: FAIL — cannot find module ./manager.js

Step 3: Implement SandboxManager

Create file: src/sandbox/manager.ts

import { DockerSandbox } from './docker.js';
import type { SandboxConfig } from '../config/schema.js';

/**
 * Manages per-session Docker sandboxes.
 * Creates containers lazily on first access, destroys on session cleanup.
 */
export class SandboxManager {
  private sandboxes = new Map<string, DockerSandbox>();
  private config: SandboxConfig;

  constructor(config: SandboxConfig) {
    this.config = config;
  }

  /** Get or create a sandbox for a session. */
  async getOrCreate(sessionId: string): Promise<DockerSandbox> {
    let sandbox = this.sandboxes.get(sessionId);
    if (sandbox) return sandbox;

    sandbox = new DockerSandbox({
      sessionId,
      image: this.config.image,
      workspaceDir: this.config.workspace_dir,
      network: this.config.network,
      memoryLimit: this.config.memory_limit,
      cpuLimit: this.config.cpu_limit,
      timeoutSeconds: this.config.timeout_seconds,
    });

    await sandbox.create();
    this.sandboxes.set(sessionId, sandbox);
    return sandbox;
  }

  /** Destroy a specific session's sandbox. */
  async destroy(sessionId: string): Promise<void> {
    const sandbox = this.sandboxes.get(sessionId);
    if (!sandbox) return;

    await sandbox.destroy();
    this.sandboxes.delete(sessionId);
  }

  /** Destroy all sandboxes (daemon shutdown). */
  async destroyAll(): Promise<void> {
    const entries = Array.from(this.sandboxes.entries());
    await Promise.allSettled(
      entries.map(async ([id, sandbox]) => {
        await sandbox.destroy();
        this.sandboxes.delete(id);
      }),
    );
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/sandbox/manager.test.ts Expected: PASS

Step 5: Commit

git add src/sandbox/manager.ts src/sandbox/manager.test.ts
git commit -m "feat: add SandboxManager for per-session container lifecycle"

Task 4: Sandboxed Tool Wrappers

Files:

  • Create: src/sandbox/tools.ts
  • Create: src/sandbox/tools.test.ts

Step 1: Write the failing test

Create file: src/sandbox/tools.test.ts

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js';
import type { DockerSandbox } from './docker.js';

function mockSandbox(): DockerSandbox {
  return {
    exec: vi.fn().mockResolvedValue({ stdout: 'output', stderr: '' }),
    create: vi.fn(),
    destroy: vi.fn(),
    containerId: 'test-container',
    containerName: 'flynn-test',
    config: {},
  } as unknown as DockerSandbox;
}

describe('createSandboxedShellTool', () => {
  let sandbox: DockerSandbox;

  beforeEach(() => {
    sandbox = mockSandbox();
  });

  it('has the same name as shell.exec', () => {
    const tool = createSandboxedShellTool(sandbox);
    expect(tool.name).toBe('shell.exec');
  });

  it('delegates to sandbox.exec', async () => {
    const tool = createSandboxedShellTool(sandbox);
    const result = await tool.execute({ command: 'echo hello' });

    expect(sandbox.exec).toHaveBeenCalledWith('echo hello', { cwd: undefined, timeout: 30000 });
    expect(result.success).toBe(true);
    expect(result.output).toBe('output');
  });

  it('passes cwd to sandbox.exec', async () => {
    const tool = createSandboxedShellTool(sandbox);
    await tool.execute({ command: 'ls', cwd: '/workspace/project' });

    expect(sandbox.exec).toHaveBeenCalledWith('ls', { cwd: '/workspace/project', timeout: 30000 });
  });

  it('passes timeout to sandbox.exec', async () => {
    const tool = createSandboxedShellTool(sandbox);
    await tool.execute({ command: 'sleep 10', timeout: 5000 });

    expect(sandbox.exec).toHaveBeenCalledWith('sleep 10', { cwd: undefined, timeout: 5000 });
  });

  it('returns error on sandbox.exec failure', async () => {
    (sandbox.exec as ReturnType<typeof vi.fn>).mockRejectedValue(new Error('container dead'));
    const tool = createSandboxedShellTool(sandbox);
    const result = await tool.execute({ command: 'fail' });

    expect(result.success).toBe(false);
    expect(result.error).toBe('container dead');
  });

  it('includes stderr in output', async () => {
    (sandbox.exec as ReturnType<typeof vi.fn>).mockResolvedValue({ stdout: 'out', stderr: 'warn' });
    const tool = createSandboxedShellTool(sandbox);
    const result = await tool.execute({ command: 'cmd' });

    expect(result.output).toContain('out');
    expect(result.output).toContain('stderr: warn');
  });
});

describe('createSandboxedProcessStartTool', () => {
  let sandbox: DockerSandbox;

  beforeEach(() => {
    sandbox = mockSandbox();
  });

  it('has the same name as process.start', () => {
    const tool = createSandboxedProcessStartTool(sandbox);
    expect(tool.name).toBe('process.start');
  });

  it('runs detached command via sandbox', async () => {
    const tool = createSandboxedProcessStartTool(sandbox);
    const result = await tool.execute({ command: 'npm run dev' });

    expect(sandbox.exec).toHaveBeenCalledWith(
      expect.stringContaining('npm run dev'),
      expect.any(Object),
    );
    expect(result.success).toBe(true);
    expect(result.output).toContain('Started sandboxed background process');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/sandbox/tools.test.ts Expected: FAIL — cannot find module ./tools.js

Step 3: Implement sandboxed tools

Create file: src/sandbox/tools.ts

import type { Tool, ToolResult } from '../tools/types.js';
import type { DockerSandbox } from './docker.js';

interface ShellExecArgs {
  command: string;
  cwd?: string;
  timeout?: number;
}

interface ProcessStartArgs {
  command: string;
  cwd?: string;
}

/**
 * Create a sandboxed version of shell.exec that delegates to docker exec.
 * Same Tool interface — drop-in replacement for the host shell.exec.
 */
export function createSandboxedShellTool(sandbox: DockerSandbox): Tool {
  return {
    name: 'shell.exec',
    description: 'Execute a shell command inside a sandboxed container and return stdout/stderr.',
    inputSchema: {
      type: 'object',
      properties: {
        command: { type: 'string', description: 'The shell command to execute' },
        cwd: { type: 'string', description: 'Working directory inside the container (optional)' },
        timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
      },
      required: ['command'],
    },
    execute: async (rawArgs: unknown): Promise<ToolResult> => {
      const args = rawArgs as ShellExecArgs;
      const timeout = args.timeout ?? 30_000;

      try {
        const result = await sandbox.exec(args.command, {
          cwd: args.cwd,
          timeout,
        });

        const output = result.stdout + (result.stderr ? `\nstderr: ${result.stderr}` : '');
        return { success: true, output };
      } catch (error) {
        return {
          success: false,
          output: '',
          error: error instanceof Error ? error.message : String(error),
        };
      }
    },
  };
}

/**
 * Create a sandboxed version of process.start that runs in the container.
 * Uses `nohup ... &` via docker exec since we can't spawn detached inside containers.
 */
export function createSandboxedProcessStartTool(sandbox: DockerSandbox): Tool {
  return {
    name: 'process.start',
    description: 'Start a command in the background inside a sandboxed container.',
    inputSchema: {
      type: 'object',
      properties: {
        command: { type: 'string', description: 'The shell command to run in the background' },
        cwd: { type: 'string', description: 'Working directory inside the container (optional)' },
      },
      required: ['command'],
    },
    execute: async (rawArgs: unknown): Promise<ToolResult> => {
      const args = rawArgs as ProcessStartArgs;

      try {
        // Run via nohup + background in the container
        const wrappedCmd = `nohup bash -c '${args.command.replace(/'/g, "'\\''")}' > /tmp/proc.log 2>&1 & echo $!`;
        const result = await sandbox.exec(wrappedCmd, { cwd: args.cwd });

        const pid = result.stdout.trim();
        return {
          success: true,
          output: `Started sandboxed background process (PID ${pid})\nCommand: ${args.command}`,
        };
      } catch (error) {
        return {
          success: false,
          output: '',
          error: error instanceof Error ? error.message : 'Failed to start sandboxed process',
        };
      }
    },
  };
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/sandbox/tools.test.ts Expected: PASS

Step 5: Commit

git add src/sandbox/tools.ts src/sandbox/tools.test.ts
git commit -m "feat: add sandboxed tool wrappers for shell.exec and process.start"

Task 5: Sandbox Barrel Export + ToolRegistry.clone()

Files:

  • Create: src/sandbox/index.ts
  • Modify: src/tools/registry.ts:19-97

Step 1: Write the failing test for ToolRegistry.clone()

Add to a new test or extend existing tests. Create file src/tools/registry.test.ts (if it doesn't exist — check first):

import { describe, it, expect } from 'vitest';
import { ToolRegistry } from './registry.js';
import type { Tool } from './types.js';

function makeTool(name: string): Tool {
  return {
    name,
    description: `Mock ${name}`,
    inputSchema: { type: 'object', properties: {} },
    execute: async () => ({ success: true, output: '' }),
  };
}

describe('ToolRegistry', () => {
  describe('clone()', () => {
    it('creates a copy with all tools', () => {
      const reg = new ToolRegistry();
      reg.register(makeTool('tool.a'));
      reg.register(makeTool('tool.b'));

      const cloned = reg.clone();
      expect(cloned.list().map(t => t.name).sort()).toEqual(['tool.a', 'tool.b']);
    });

    it('inherits the policy from original', () => {
      const reg = new ToolRegistry();
      const mockPolicy = { filterTools: vi.fn(), isAllowed: vi.fn(), resolveAllowedNames: vi.fn(), getEffectiveProfile: vi.fn() };
      reg.setPolicy(mockPolicy as any);

      const cloned = reg.clone();
      expect(cloned.getPolicy()).toBe(mockPolicy);
    });

    it('allows replacing tools in clone without affecting original', () => {
      const reg = new ToolRegistry();
      const originalTool = makeTool('shell.exec');
      reg.register(originalTool);

      const cloned = reg.clone();
      const replacementTool = makeTool('shell.exec');
      replacementTool.description = 'Sandboxed version';

      cloned.replace(replacementTool);
      expect(cloned.get('shell.exec')!.description).toBe('Sandboxed version');
      expect(reg.get('shell.exec')!.description).toBe('Mock shell.exec');
    });
  });

  describe('replace()', () => {
    it('replaces an existing tool', () => {
      const reg = new ToolRegistry();
      reg.register(makeTool('tool.a'));
      const replacement = makeTool('tool.a');
      replacement.description = 'New description';

      reg.replace(replacement);
      expect(reg.get('tool.a')!.description).toBe('New description');
    });

    it('throws if tool does not exist', () => {
      const reg = new ToolRegistry();
      expect(() => reg.replace(makeTool('nonexistent'))).toThrow('not registered');
    });
  });
});

Note: Add import { vi } from 'vitest' to the imports at the top.

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/registry.test.ts Expected: FAIL — clone() and replace() don't exist on ToolRegistry

Step 3: Add clone() and replace() to ToolRegistry

In src/tools/registry.ts, add these two methods to the ToolRegistry class (after the unregister method, around line 32):

  /** Replace an existing tool with a new implementation. Throws if not registered. */
  replace(tool: Tool): void {
    if (!this.tools.has(tool.name)) {
      throw new Error(`Tool '${tool.name}' is not registered — cannot replace`);
    }
    this.tools.set(tool.name, tool);
  }

  /** Create a shallow clone of this registry (new Map, same Tool objects + policy). */
  clone(): ToolRegistry {
    const cloned = new ToolRegistry();
    for (const tool of this.tools.values()) {
      cloned.register(tool);
    }
    if (this._policy) {
      cloned.setPolicy(this._policy);
    }
    return cloned;
  }

Step 4: Create the sandbox barrel export

Create file: src/sandbox/index.ts

export { DockerSandbox, type DockerSandboxConfig, type ExecOptions, type ExecResult } from './docker.js';
export { SandboxManager } from './manager.js';
export { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js';

Step 5: Run tests to verify they pass

Run: pnpm vitest run src/tools/registry.test.ts Expected: PASS

Step 6: Run full test suite

Run: pnpm test:run Expected: All tests pass

Step 7: Commit

git add src/sandbox/index.ts src/tools/registry.ts src/tools/registry.test.ts
git commit -m "feat: add ToolRegistry.clone() and replace() for per-session registries"

Task 6: Agent Config Registry

Files:

  • Create: src/agents/registry.ts
  • Create: src/agents/registry.test.ts

Step 1: Write the failing test

Create file: src/agents/registry.test.ts

import { describe, it, expect } from 'vitest';
import { AgentConfigRegistry, type AgentConfig } from './registry.js';

describe('AgentConfigRegistry', () => {
  describe('register()', () => {
    it('registers a named agent config', () => {
      const registry = new AgentConfigRegistry();
      const config: AgentConfig = { name: 'assistant', systemPrompt: 'Be helpful.' };
      registry.register(config);

      expect(registry.get('assistant')).toEqual(config);
    });

    it('throws on duplicate name', () => {
      const registry = new AgentConfigRegistry();
      registry.register({ name: 'assistant' });
      expect(() => registry.register({ name: 'assistant' })).toThrow('already registered');
    });
  });

  describe('get()', () => {
    it('returns undefined for unknown name', () => {
      const registry = new AgentConfigRegistry();
      expect(registry.get('nonexistent')).toBeUndefined();
    });
  });

  describe('list()', () => {
    it('returns all registered configs', () => {
      const registry = new AgentConfigRegistry();
      registry.register({ name: 'a' });
      registry.register({ name: 'b' });
      expect(registry.list().map(c => c.name).sort()).toEqual(['a', 'b']);
    });
  });

  describe('loadFromConfig()', () => {
    it('loads configs from a raw config object', () => {
      const registry = new AgentConfigRegistry();
      registry.loadFromConfig({
        assistant: {
          system_prompt: 'Be helpful.',
          model_tier: 'default',
          tool_profile: 'messaging',
          sandbox: false,
        },
        coder: {
          model_tier: 'complex',
          tool_profile: 'coding',
          sandbox: true,
        },
      });

      expect(registry.list()).toHaveLength(2);
      const assistant = registry.get('assistant')!;
      expect(assistant.systemPrompt).toBe('Be helpful.');
      expect(assistant.modelTier).toBe('default');
      expect(assistant.toolProfile).toBe('messaging');

      const coder = registry.get('coder')!;
      expect(coder.sandbox).toBe(true);
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/agents/registry.test.ts Expected: FAIL — cannot find module ./registry.js

Step 3: Implement AgentConfigRegistry

Create file: src/agents/registry.ts

import type { ToolProfile, ToolOverrideConfig } from '../config/schema.js';
import type { ModelTier } from '../models/router.js';

export interface AgentConfig {
  name: string;
  systemPrompt?: string;
  modelTier?: ModelTier;
  toolProfile?: ToolProfile;
  toolOverrides?: ToolOverrideConfig;
  sandbox?: boolean;
}

/**
 * AgentConfigRegistry — stores named agent configurations.
 * Loaded from YAML config at startup.
 */
export class AgentConfigRegistry {
  private configs = new Map<string, AgentConfig>();

  register(config: AgentConfig): void {
    if (this.configs.has(config.name)) {
      throw new Error(`Agent config '${config.name}' is already registered`);
    }
    this.configs.set(config.name, config);
  }

  get(name: string): AgentConfig | undefined {
    return this.configs.get(name);
  }

  list(): AgentConfig[] {
    return Array.from(this.configs.values());
  }

  /**
   * Load agent configs from the parsed YAML config.
   * Maps from the config schema format to the internal AgentConfig format.
   */
  loadFromConfig(rawConfigs: Record<string, {
    system_prompt?: string;
    model_tier?: string;
    tool_profile?: string;
    tool_overrides?: ToolOverrideConfig;
    sandbox?: boolean;
  }>): void {
    for (const [name, raw] of Object.entries(rawConfigs)) {
      this.register({
        name,
        systemPrompt: raw.system_prompt,
        modelTier: raw.model_tier as ModelTier | undefined,
        toolProfile: raw.tool_profile as ToolProfile | undefined,
        toolOverrides: raw.tool_overrides,
        sandbox: raw.sandbox,
      });
    }
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/agents/registry.test.ts Expected: PASS

Step 5: Commit

git add src/agents/registry.ts src/agents/registry.test.ts
git commit -m "feat: add AgentConfigRegistry for named agent configurations"

Task 7: Agent Router

Files:

  • Create: src/agents/router.ts
  • Create: src/agents/router.test.ts

Step 1: Write the failing test

Create file: src/agents/router.test.ts

import { describe, it, expect } from 'vitest';
import { AgentRouter, type RoutingConfig } from './router.js';

describe('AgentRouter', () => {
  describe('resolve()', () => {
    it('returns default_agent when no specific match', () => {
      const router = new AgentRouter({
        default_agent: 'assistant',
        channels: {},
        senders: {},
      });
      expect(router.resolve('telegram', '12345')).toBe('assistant');
    });

    it('returns undefined when no default and no match', () => {
      const router = new AgentRouter({
        channels: {},
        senders: {},
      });
      expect(router.resolve('telegram', '12345')).toBeUndefined();
    });

    it('matches exact sender', () => {
      const router = new AgentRouter({
        default_agent: 'assistant',
        channels: {},
        senders: { 'telegram:12345': 'coder' },
      });
      expect(router.resolve('telegram', '12345')).toBe('coder');
    });

    it('matches sender with glob pattern', () => {
      const router = new AgentRouter({
        default_agent: 'assistant',
        channels: {},
        senders: { 'slack:U0*': 'coder' },
      });
      expect(router.resolve('slack', 'U0ABC')).toBe('coder');
      expect(router.resolve('slack', 'U1ABC')).toBeUndefined(); // no channel match, no default... wait
    });

    it('matches channel when no sender match', () => {
      const router = new AgentRouter({
        default_agent: 'assistant',
        channels: { discord: 'coder' },
        senders: {},
      });
      expect(router.resolve('discord', 'any-user')).toBe('coder');
    });

    it('sender match takes priority over channel match', () => {
      const router = new AgentRouter({
        default_agent: 'assistant',
        channels: { discord: 'coder' },
        senders: { 'discord:special-user': 'vip' },
      });
      expect(router.resolve('discord', 'special-user')).toBe('vip');
      expect(router.resolve('discord', 'normal-user')).toBe('coder');
    });

    it('falls through: sender → channel → default', () => {
      const router = new AgentRouter({
        default_agent: 'fallback',
        channels: { discord: 'guild-agent' },
        senders: { 'discord:admin': 'admin-agent' },
      });
      expect(router.resolve('discord', 'admin')).toBe('admin-agent');
      expect(router.resolve('discord', 'regular')).toBe('guild-agent');
      expect(router.resolve('telegram', 'someone')).toBe('fallback');
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/agents/router.test.ts Expected: FAIL — cannot find module ./router.js

Step 3: Implement AgentRouter

Create file: src/agents/router.ts

/**
 * AgentRouter resolves which agent config to use for a given channel+sender.
 *
 * Resolution order:
 * 1. Exact sender match (channel:senderId)
 * 2. Glob pattern sender match
 * 3. Channel match
 * 4. default_agent fallback
 */

export interface RoutingConfig {
  default_agent?: string;
  channels: Record<string, string>;
  senders: Record<string, string>;
}

/**
 * Convert a simple glob pattern to regex.
 * Supports `*` (any chars) with `.` escaped.
 */
function patternToRegex(pattern: string): RegExp {
  const escaped = pattern
    .replace(/[.+^${}()|[\]\\]/g, '\\$&')
    .replace(/\*/g, '.*');
  return new RegExp(`^${escaped}$`);
}

export class AgentRouter {
  private config: RoutingConfig;

  constructor(config: RoutingConfig) {
    this.config = config;
  }

  /**
   * Resolve the agent config name for a channel + sender pair.
   * Returns undefined if no match and no default.
   */
  resolve(channel: string, senderId: string): string | undefined {
    const senderKey = `${channel}:${senderId}`;

    // 1. Exact sender match
    if (this.config.senders[senderKey]) {
      return this.config.senders[senderKey];
    }

    // 2. Glob pattern sender match
    for (const [pattern, agentName] of Object.entries(this.config.senders)) {
      if (pattern.includes('*') && patternToRegex(pattern).test(senderKey)) {
        return agentName;
      }
    }

    // 3. Channel match
    if (this.config.channels[channel]) {
      return this.config.channels[channel];
    }

    // 4. Default fallback
    return this.config.default_agent;
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/agents/router.test.ts Expected: PASS

Step 5: Commit

git add src/agents/router.ts src/agents/router.test.ts
git commit -m "feat: add AgentRouter for config-based sender/channel routing"

Task 8: Agents Barrel Export

Files:

  • Create: src/agents/index.ts

Step 1: Create the barrel file

Create file: src/agents/index.ts

export { AgentConfigRegistry, type AgentConfig } from './registry.js';
export { AgentRouter, type RoutingConfig } from './router.js';

Step 2: Verify build

Run: pnpm typecheck Expected: No errors

Step 3: Commit

git add src/agents/index.ts
git commit -m "feat: add agents barrel export"

Task 9: Wire Everything Into the Daemon

Files:

  • Modify: src/daemon/index.ts

This is the integration task. The daemon's createMessageRouter() needs to use the AgentRouter and SandboxManager.

Step 1: Write the integration test

Create file: src/daemon/routing.test.ts

import { describe, it, expect, vi } from 'vitest';
import { AgentRouter } from '../agents/router.js';
import { AgentConfigRegistry } from '../agents/registry.js';

describe('daemon agent routing integration', () => {
  it('resolves agent config for channel messages', () => {
    const registry = new AgentConfigRegistry();
    registry.loadFromConfig({
      assistant: { system_prompt: 'Be helpful.', model_tier: 'default', tool_profile: 'messaging', sandbox: false },
      coder: { system_prompt: 'Write code.', model_tier: 'complex', tool_profile: 'coding', sandbox: true },
    });

    const router = new AgentRouter({
      default_agent: 'assistant',
      channels: { discord: 'coder' },
      senders: { 'telegram:admin': 'coder' },
    });

    // Discord user gets coder
    const discordAgent = router.resolve('discord', 'user123');
    expect(discordAgent).toBe('coder');
    expect(registry.get(discordAgent!)!.systemPrompt).toBe('Write code.');

    // Telegram admin gets coder
    const telegramAdmin = router.resolve('telegram', 'admin');
    expect(telegramAdmin).toBe('coder');

    // Random telegram user gets assistant
    const telegramUser = router.resolve('telegram', 'random');
    expect(telegramUser).toBe('assistant');
    expect(registry.get(telegramUser!)!.systemPrompt).toBe('Be helpful.');
  });

  it('uses default agent when no routing configured', () => {
    const router = new AgentRouter({ channels: {}, senders: {} });
    expect(router.resolve('telegram', '123')).toBeUndefined();
  });
});

Step 2: Run test to verify it passes

Run: pnpm vitest run src/daemon/routing.test.ts Expected: PASS (these are testing already-built components together)

Step 3: Modify daemon/index.ts

Add imports at the top of src/daemon/index.ts (after existing imports):

import { AgentConfigRegistry, AgentRouter } from '../agents/index.js';
import { SandboxManager, createSandboxedShellTool, createSandboxedProcessStartTool } from '../sandbox/index.js';

Add to DaemonContext interface:

  agentConfigRegistry: AgentConfigRegistry;
  agentRouter: AgentRouter;
  sandboxManager?: SandboxManager;

Modify createMessageRouter() to accept additional dependencies:

function createMessageRouter(deps: {
  sessionManager: SessionManager;
  modelRouter: ModelRouter;
  systemPrompt: string;
  toolRegistry: ToolRegistry;
  toolExecutor: ToolExecutor;
  config: Config;
  memoryStore?: MemoryStore;
  agentConfigRegistry?: AgentConfigRegistry;
  agentRouter?: AgentRouter;
  sandboxManager?: SandboxManager;
}) {

Inside getOrCreateAgent(), resolve the agent config and create sandboxed registries:

  function getOrCreateAgent(channel: string, senderId: string): AgentOrchestrator {
    // Resolve agent config name from routing
    const agentConfigName = deps.agentRouter?.resolve(channel, senderId);
    const agentConfig = agentConfigName ? deps.agentConfigRegistry?.get(agentConfigName) : undefined;

    const cacheKey = agentConfigName
      ? `${channel}:${senderId}:${agentConfigName}`
      : `${channel}:${senderId}`;

    let agent = agents.get(cacheKey);
    if (!agent) {
      const session = deps.sessionManager.getSession(channel, senderId);

      // Determine system prompt — agent config overrides global
      const systemPrompt = agentConfig?.systemPrompt ?? deps.systemPrompt;

      // Determine primary tier
      const primaryTier = agentConfig?.modelTier ?? deps.config.agents.primary_tier ?? 'default';

      // Determine tool policy context
      const toolPolicyContext: ToolPolicyContext = {
        agent: primaryTier,
        provider: deps.config.models.default.provider,
      };

      // Determine tool registry — sandbox if configured
      let toolRegistry = deps.toolRegistry;
      if (agentConfig?.sandbox && deps.sandboxManager && deps.config.sandbox.enabled) {
        // Create a cloned registry with sandboxed tools
        toolRegistry = deps.toolRegistry.clone();
        // Sandbox will be created lazily on first tool call
        // For now, create a wrapper that handles lazy initialization
        const sessionId = `${channel}:${senderId}`;
        const sandbox = deps.sandboxManager;
        const sandboxConfig = deps.config.sandbox;

        // Replace shell.exec and process.start with lazy-sandboxed versions
        const lazySandboxedShell: Tool = {
          name: 'shell.exec',
          description: 'Execute a shell command inside a sandboxed container.',
          inputSchema: {
            type: 'object',
            properties: {
              command: { type: 'string', description: 'The shell command to execute' },
              cwd: { type: 'string', description: 'Working directory (optional)' },
              timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
            },
            required: ['command'],
          },
          execute: async (rawArgs: unknown) => {
            const dockerSandbox = await sandbox.getOrCreate(sessionId);
            const tool = createSandboxedShellTool(dockerSandbox);
            return tool.execute(rawArgs);
          },
        };

        const lazySandboxedProcessStart: Tool = {
          name: 'process.start',
          description: 'Start a command in the background inside a sandboxed container.',
          inputSchema: {
            type: 'object',
            properties: {
              command: { type: 'string', description: 'The shell command to run' },
              cwd: { type: 'string', description: 'Working directory (optional)' },
            },
            required: ['command'],
          },
          execute: async (rawArgs: unknown) => {
            const dockerSandbox = await sandbox.getOrCreate(sessionId);
            const tool = createSandboxedProcessStartTool(dockerSandbox);
            return tool.execute(rawArgs);
          },
        };

        toolRegistry.replace(lazySandboxedShell);
        toolRegistry.replace(lazySandboxedProcessStart);
      }

      const delegationConfig: DelegationConfig = {
        compaction: deps.config.agents.delegation.compaction ?? 'fast',
        memory_extraction: deps.config.agents.delegation.memory_extraction ?? 'fast',
        classification: deps.config.agents.delegation.classification ?? 'fast',
        tool_summarisation: deps.config.agents.delegation.tool_summarisation ?? 'fast',
        complex_reasoning: deps.config.agents.delegation.complex_reasoning ?? 'complex',
      };

      agent = new AgentOrchestrator({
        modelRouter: deps.modelRouter,
        systemPrompt,
        session,
        toolRegistry,
        toolExecutor: deps.toolExecutor,
        primaryTier,
        delegation: delegationConfig,
        maxDelegationDepth: deps.config.agents.max_delegation_depth ?? 3,
        compaction: deps.config.compaction.enabled ? {
          thresholdPct: deps.config.compaction.threshold_pct,
          keepTurns: deps.config.compaction.keep_turns,
          summaryMaxTokens: deps.config.compaction.summary_max_tokens,
        } : undefined,
        modelName: deps.config.models.default.model,
        contextWindow: deps.config.models.default.context_window,
        memoryStore: deps.memoryStore,
        toolPolicyContext,
      });
      agents.set(cacheKey, agent);
    }
    return agent;
  }

In startDaemon(), add agent config registry and router initialization after skills loading (around line 385):

  // Initialize agent config registry and router
  const agentConfigRegistry = new AgentConfigRegistry();
  if (config.agent_configs && Object.keys(config.agent_configs).length > 0) {
    agentConfigRegistry.loadFromConfig(config.agent_configs);
    console.log(`Loaded ${Object.keys(config.agent_configs).length} agent config(s): ${Object.keys(config.agent_configs).join(', ')}`);
  }

  const agentRouter = new AgentRouter(config.routing);

  // Initialize sandbox manager if enabled
  let sandboxManager: SandboxManager | undefined;
  if (config.sandbox.enabled) {
    const dockerAvailable = await DockerSandbox.isAvailable();
    if (dockerAvailable) {
      sandboxManager = new SandboxManager(config.sandbox);
      console.log(`Docker sandbox enabled: image=${config.sandbox.image}, network=${config.sandbox.network}`);
    } else {
      console.warn('Docker sandbox enabled in config but Docker is not available — falling back to host execution');
    }
  }

Add sandbox shutdown hook:

  if (sandboxManager) {
    lifecycle.onShutdown(async () => {
      await sandboxManager!.destroyAll();
      console.log('Docker sandboxes destroyed');
    });
  }

Pass new deps to createMessageRouter():

  channelRegistry.setMessageHandler(createMessageRouter({
    sessionManager,
    modelRouter,
    systemPrompt,
    toolRegistry,
    toolExecutor,
    config,
    memoryStore,
    agentConfigRegistry,
    agentRouter,
    sandboxManager,
  }));

Add to DaemonContext return:

  return {
    config,
    lifecycle,
    sessionStore,
    sessionManager,
    hookEngine,
    modelRouter,
    toolRegistry,
    toolExecutor,
    gateway,
    channelRegistry,
    mcpManager,
    skillRegistry,
    skillInstaller,
    agentConfigRegistry,
    agentRouter,
    sandboxManager,
  };

Note: You'll need to import DockerSandbox and the Tool type at the top, and import ToolPolicyContext:

import { DockerSandbox } from '../sandbox/index.js';
import type { Tool } from '../tools/types.js';
import type { ToolPolicyContext } from '../tools/policy.js';

Step 4: Run full test suite

Run: pnpm test:run Expected: All tests pass

Step 5: Run typecheck

Run: pnpm typecheck Expected: No errors

Step 6: Commit

git add src/daemon/index.ts src/daemon/routing.test.ts
git commit -m "feat: wire Docker sandboxing and agent routing into daemon"

Task 10: Update state.json + Final Verification

Files:

  • Modify: docs/plans/state.json

Step 1: Run full test suite and typecheck

Run: pnpm test:run && pnpm typecheck Expected: All tests pass, no type errors

Step 2: Update state.json

Add the new P2 entries to docs/plans/state.json under the p2-implementation plan's phases object:

"docker_sandboxing": {
  "priority": "P2",
  "status": "completed",
  "description": "Docker container sandboxing for channel tool execution (shell.exec, process.start)",
  "files_created": [
    "src/sandbox/docker.ts",
    "src/sandbox/docker.test.ts",
    "src/sandbox/manager.ts",
    "src/sandbox/manager.test.ts",
    "src/sandbox/tools.ts",
    "src/sandbox/tools.test.ts",
    "src/sandbox/index.ts"
  ],
  "files_modified": [
    "src/config/schema.ts",
    "src/config/index.ts",
    "src/tools/registry.ts",
    "src/daemon/index.ts"
  ],
  "test_status": "N/N passing"
},
"multi_agent_routing": {
  "priority": "P2",
  "status": "completed",
  "description": "Named agent configs with config-based channel/sender routing",
  "files_created": [
    "src/agents/registry.ts",
    "src/agents/registry.test.ts",
    "src/agents/router.ts",
    "src/agents/router.test.ts",
    "src/agents/index.ts",
    "src/daemon/routing.test.ts",
    "src/config/schema.test.ts"
  ],
  "files_modified": [
    "src/config/schema.ts",
    "src/config/index.ts",
    "src/daemon/index.ts"
  ],
  "test_status": "N/N passing"
}

Update overall_progress.p2_completion to "7/7 (100%)" and next_up to "p3 (group chat, gateway auth, gemini provider, browser control, additional providers)".

Update overall_progress.total_test_count with the actual count.

Step 3: Commit

git add docs/plans/state.json
git commit -m "docs: update state.json with Docker sandbox and multi-agent routing"

Summary

Task Component Est. Time
1 Config schemas (sandbox + agent_configs + routing) 5 min
2 DockerSandbox class 5 min
3 SandboxManager 3 min
4 Sandboxed tool wrappers 5 min
5 Barrel export + ToolRegistry.clone() 3 min
6 AgentConfigRegistry 3 min
7 AgentRouter 3 min
8 Agents barrel export 1 min
9 Daemon integration 10 min
10 State update + verification 3 min

Total estimated: ~40 minutes