Files
flynn/docs/plans/2026-02-06-p2-docker-sandbox-multi-agent-design.md
T

6.0 KiB

P2: Docker Sandboxing + Multi-Agent Routing — Design

Date: 2026-02-06 Status: Approved Priority: P2 (completes all P2 work)


Feature 1: Docker Sandboxing

Goal

Channel sessions (Telegram, Discord, Slack, WhatsApp) execute shell.exec and process.start inside Docker containers. TUI and local WebSocket sessions continue running on the host.

Architecture

Tool-level wrapping: sandboxed versions of dangerous tools (shell.exec, process.start) delegate to docker exec inside a per-session container. All other tools (file.read, web.fetch, memory.*, etc.) run on the host unchanged.

src/sandbox/
  docker.ts         — DockerSandbox class (create/exec/destroy containers via CLI)
  docker.test.ts    — Tests (mocked Docker CLI)
  manager.ts        — SandboxManager (session→container mapping + lifecycle)
  manager.test.ts   — Tests
  tools.ts          — createSandboxedShellTool(), createSandboxedProcessStartTool()
  tools.test.ts     — Tests
  index.ts          — Barrel export

Config Schema

sandbox:
  enabled: false              # opt-in
  image: "node:22-slim"       # base container image
  workspace_dir: "/workspace" # mount path inside container
  network: "none"             # container network mode (none/bridge/host)
  memory_limit: "512m"        # memory limit per container
  cpu_limit: "1.0"            # CPU limit per container
  timeout_seconds: 300        # auto-kill timeout per container

DockerSandbox Class

Wraps Docker CLI via child_process.execFile (no Docker SDK dependency):

  • create()docker create with resource limits, bind mount, network mode
  • start()docker start
  • exec(command, opts)docker exec with timeout, returns stdout/stderr
  • destroy()docker rm -f
  • isRunning()docker inspect check

SandboxManager

  • getOrCreate(sessionId, config) — Lazy container creation on first tool call
  • destroy(sessionId) — Stop and remove container
  • destroyAll() — Shutdown hook for daemon cleanup

Sandboxed Tools

  • createSandboxedShellTool(sandbox) — Same Tool interface as shell.exec, but runs via sandbox.exec(command). Preserves cwd (translated to container path), timeout, output truncation.
  • createSandboxedProcessStartTool(sandbox) — Wraps process.start to spawn via docker exec -d (detached mode).

Per-Session ToolRegistry

When sandbox is active for a channel session, the daemon creates a cloned ToolRegistry that replaces shell.exec and process.start with sandboxed versions. All other tools reference the shared host registry.

Error Handling

  • Docker not installed → log warning at startup, fall through to host execution
  • Container creation fails → log error, return tool error (not crash)
  • Container timeout → docker rm -f, return timeout error
  • Docker daemon unavailable → graceful degradation with clear error messages

Feature 2: Multi-Agent Routing

Goal

Named agent configurations that can be assigned to channels, senders, or sender patterns. Each agent config specifies its own system prompt, model tier, tool profile, and sandbox setting.

Architecture

src/agents/
  registry.ts        — AgentConfigRegistry (stores named AgentConfig objects)
  router.ts          — AgentRouter (resolves {channel, senderId} → AgentConfig)
  router.test.ts     — Tests
  index.ts           — Barrel export

Config Schema

agent_configs:
  assistant:
    system_prompt: "You are a helpful assistant."
    model_tier: default
    tool_profile: messaging
    sandbox: true

  coder:
    system_prompt: "You are a coding assistant. Focus on writing clean code."
    model_tier: complex
    tool_profile: coding
    sandbox: true

routing:
  default_agent: assistant
  channels:
    discord: coder
  senders:
    "telegram:12345": coder
    "slack:U0*": assistant

AgentConfigRegistry

Stores parsed AgentConfig objects by name:

  • register(config) — Add a named config
  • get(name) — Look up by name
  • list() — All registered configs
  • loadFromConfig(rawConfig) — Parse from validated YAML

AgentConfig Type

interface AgentConfig {
  name: string;
  systemPrompt?: string;     // overrides global system prompt
  modelTier?: ModelTier;     // fast/default/complex/local
  toolProfile?: ToolProfile; // minimal/messaging/coding/full
  toolOverrides?: ToolOverrideConfig;
  sandbox?: boolean;         // use Docker sandbox (if globally enabled)
}

AgentRouter

Resolves which AgentConfig to use for a given message:

  1. Check senders map — exact match first, then glob patterns (via minimatch)
  2. Check channels map — channel name match
  3. Fall back to routing.default_agent

Daemon Integration

The createMessageRouter() function changes:

  1. On message: agentRouter.resolve(channel, senderId) returns agent config name
  2. Cache key: ${channel}:${senderId}:${agentConfigName} (agent change = new orchestrator)
  3. Create AgentOrchestrator with resolved config's system prompt, model tier, tool policy
  4. If sandbox enabled for this config + globally: create per-session sandboxed ToolRegistry
  5. Otherwise: use shared host ToolRegistry

Modified Files

  • src/config/schema.ts — Add sandboxSchema, agentConfigSchema, routingSchema
  • src/config/index.ts — Export new types
  • src/daemon/index.ts — Wire SandboxManager + AgentRouter into message handler
  • src/tools/registry.ts — Add clone() method for per-session copies

Testing

  • All Docker interactions mocked (no real Docker in tests)
  • Agent router tested with config fixtures (exact, glob, channel, default fallback)
  • Sandboxed tools tested with mocked Docker CLI exec
  • Integration tested via daemon message handler with mocked dependencies

Dependencies

  • No new npm dependencies (Docker CLI, minimatch already available or trivially implemented)
  • Runtime: Docker must be installed on host for sandbox feature to work (graceful degradation if absent)