1833 lines
54 KiB
Markdown
1833 lines
54 KiB
Markdown
# P2: Docker Sandboxing + Multi-Agent Routing — Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Add Docker container sandboxing for channel tool execution and named agent configuration with config-based routing.
|
|
|
|
**Architecture:** Tool-level wrapping — sandboxed `shell.exec` and `process.start` delegate to `docker exec` inside per-session containers. Agent config registry stores named agent definitions (system prompt, model tier, tool profile, sandbox flag) with config-based routing that maps channels/senders to agent configs.
|
|
|
|
**Tech Stack:** TypeScript (ES2022, NodeNext), Zod schemas, Vitest tests, Docker CLI (no SDK dependency), `child_process.execFile`.
|
|
|
|
---
|
|
|
|
## Task 1: Config Schema — Sandbox + Agent Configs + Routing
|
|
|
|
**Files:**
|
|
- Modify: `src/config/schema.ts:164-231`
|
|
- Modify: `src/config/index.ts:1-3`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/config/schema.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect } from 'vitest';
|
|
import { configSchema } from './schema.js';
|
|
|
|
describe('configSchema — sandbox', () => {
|
|
const minimalConfig = {
|
|
telegram: { bot_token: 'test', allowed_chat_ids: [1] },
|
|
models: { default: { provider: 'anthropic', model: 'claude-3' } },
|
|
};
|
|
|
|
it('defaults sandbox to disabled', () => {
|
|
const result = configSchema.parse(minimalConfig);
|
|
expect(result.sandbox.enabled).toBe(false);
|
|
expect(result.sandbox.image).toBe('node:22-slim');
|
|
expect(result.sandbox.network).toBe('none');
|
|
expect(result.sandbox.memory_limit).toBe('512m');
|
|
expect(result.sandbox.cpu_limit).toBe('1.0');
|
|
expect(result.sandbox.timeout_seconds).toBe(300);
|
|
});
|
|
|
|
it('accepts sandbox config', () => {
|
|
const result = configSchema.parse({
|
|
...minimalConfig,
|
|
sandbox: { enabled: true, image: 'ubuntu:24.04', network: 'bridge' },
|
|
});
|
|
expect(result.sandbox.enabled).toBe(true);
|
|
expect(result.sandbox.image).toBe('ubuntu:24.04');
|
|
expect(result.sandbox.network).toBe('bridge');
|
|
});
|
|
});
|
|
|
|
describe('configSchema — agent_configs', () => {
|
|
const minimalConfig = {
|
|
telegram: { bot_token: 'test', allowed_chat_ids: [1] },
|
|
models: { default: { provider: 'anthropic', model: 'claude-3' } },
|
|
};
|
|
|
|
it('defaults agent_configs to empty', () => {
|
|
const result = configSchema.parse(minimalConfig);
|
|
expect(result.agent_configs).toEqual({});
|
|
});
|
|
|
|
it('accepts named agent configs', () => {
|
|
const result = configSchema.parse({
|
|
...minimalConfig,
|
|
agent_configs: {
|
|
assistant: {
|
|
system_prompt: 'You are helpful.',
|
|
model_tier: 'default',
|
|
tool_profile: 'messaging',
|
|
},
|
|
coder: {
|
|
model_tier: 'complex',
|
|
tool_profile: 'coding',
|
|
sandbox: true,
|
|
},
|
|
},
|
|
});
|
|
expect(result.agent_configs.assistant.system_prompt).toBe('You are helpful.');
|
|
expect(result.agent_configs.assistant.tool_profile).toBe('messaging');
|
|
expect(result.agent_configs.coder.sandbox).toBe(true);
|
|
});
|
|
});
|
|
|
|
describe('configSchema — routing', () => {
|
|
const minimalConfig = {
|
|
telegram: { bot_token: 'test', allowed_chat_ids: [1] },
|
|
models: { default: { provider: 'anthropic', model: 'claude-3' } },
|
|
};
|
|
|
|
it('defaults routing to empty', () => {
|
|
const result = configSchema.parse(minimalConfig);
|
|
expect(result.routing.default_agent).toBeUndefined();
|
|
expect(result.routing.channels).toEqual({});
|
|
expect(result.routing.senders).toEqual({});
|
|
});
|
|
|
|
it('accepts routing config', () => {
|
|
const result = configSchema.parse({
|
|
...minimalConfig,
|
|
routing: {
|
|
default_agent: 'assistant',
|
|
channels: { discord: 'coder' },
|
|
senders: { 'telegram:12345': 'coder' },
|
|
},
|
|
});
|
|
expect(result.routing.default_agent).toBe('assistant');
|
|
expect(result.routing.channels.discord).toBe('coder');
|
|
expect(result.routing.senders['telegram:12345']).toBe('coder');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/config/schema.test.ts`
|
|
Expected: FAIL — `sandbox`, `agent_configs`, and `routing` properties don't exist on config
|
|
|
|
**Step 3: Implement the schema additions**
|
|
|
|
Add to `src/config/schema.ts` before the `configSchema` definition (before line 192):
|
|
|
|
```typescript
|
|
// ── Sandbox schemas ───────────────────────────────────────────────────
|
|
|
|
const sandboxSchema = z.object({
|
|
enabled: z.boolean().default(false),
|
|
image: z.string().default('node:22-slim'),
|
|
workspace_dir: z.string().default('/workspace'),
|
|
network: z.enum(['none', 'bridge', 'host']).default('none'),
|
|
memory_limit: z.string().default('512m'),
|
|
cpu_limit: z.string().default('1.0'),
|
|
timeout_seconds: z.number().min(10).max(3600).default(300),
|
|
}).default({});
|
|
|
|
// ── Agent config + routing schemas ────────────────────────────────────
|
|
|
|
const modelTierEnum = z.enum(['fast', 'default', 'complex', 'local']);
|
|
|
|
const agentConfigEntrySchema = z.object({
|
|
system_prompt: z.string().optional(),
|
|
model_tier: modelTierEnum.optional(),
|
|
tool_profile: toolProfileEnum.optional(),
|
|
tool_overrides: toolOverrideSchema.optional(),
|
|
sandbox: z.boolean().default(false),
|
|
});
|
|
|
|
const agentConfigsSchema = z.record(z.string(), agentConfigEntrySchema).default({});
|
|
|
|
const routingSchema = z.object({
|
|
default_agent: z.string().optional(),
|
|
channels: z.record(z.string(), z.string()).default({}),
|
|
senders: z.record(z.string(), z.string()).default({}),
|
|
}).default({});
|
|
```
|
|
|
|
Then add to the `configSchema` z.object (around line 192-212), add these three new fields:
|
|
|
|
```typescript
|
|
sandbox: sandboxSchema,
|
|
agent_configs: agentConfigsSchema,
|
|
routing: routingSchema,
|
|
```
|
|
|
|
And add type exports at the end (after line 230):
|
|
|
|
```typescript
|
|
export type SandboxConfig = z.infer<typeof sandboxSchema>;
|
|
export type AgentConfigEntry = z.infer<typeof agentConfigEntrySchema>;
|
|
export type RoutingConfig = z.infer<typeof routingSchema>;
|
|
```
|
|
|
|
**Step 4: Update `src/config/index.ts` barrel export**
|
|
|
|
Add the new types to the export line:
|
|
|
|
```typescript
|
|
export { configSchema, type Config, type TelegramConfig, type ModelConfig, type CronJobConfig, type AgentsConfig, type CompactionConfig, type ToolProfile, type ToolOverrideConfig, type ToolsConfig, type SandboxConfig, type AgentConfigEntry, type RoutingConfig } from './schema.js';
|
|
```
|
|
|
|
**Step 5: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/config/schema.test.ts`
|
|
Expected: PASS (all 6 tests)
|
|
|
|
**Step 6: Run full test suite**
|
|
|
|
Run: `pnpm test:run`
|
|
Expected: All 606+ tests pass
|
|
|
|
**Step 7: Commit**
|
|
|
|
```bash
|
|
git add src/config/schema.ts src/config/schema.test.ts src/config/index.ts
|
|
git commit -m "feat: add sandbox, agent_configs, and routing config schemas"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: DockerSandbox Class
|
|
|
|
**Files:**
|
|
- Create: `src/sandbox/docker.ts`
|
|
- Create: `src/sandbox/docker.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/sandbox/docker.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
|
import { DockerSandbox, type DockerSandboxConfig } from './docker.js';
|
|
import * as childProcess from 'child_process';
|
|
|
|
// Mock child_process.execFile
|
|
vi.mock('child_process', () => ({
|
|
execFile: vi.fn(),
|
|
}));
|
|
|
|
const mockedExecFile = vi.mocked(childProcess.execFile);
|
|
|
|
function mockExecFileSuccess(stdout = '', stderr = '') {
|
|
mockedExecFile.mockImplementation(
|
|
(_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => {
|
|
(callback as (err: null, stdout: string, stderr: string) => void)(null, stdout, stderr);
|
|
return {} as ReturnType<typeof childProcess.execFile>;
|
|
},
|
|
);
|
|
}
|
|
|
|
function mockExecFileError(message: string) {
|
|
mockedExecFile.mockImplementation(
|
|
(_cmd: unknown, _args: unknown, _opts: unknown, callback: unknown) => {
|
|
(callback as (err: Error) => void)(new Error(message));
|
|
return {} as ReturnType<typeof childProcess.execFile>;
|
|
},
|
|
);
|
|
}
|
|
|
|
describe('DockerSandbox', () => {
|
|
const defaultConfig: DockerSandboxConfig = {
|
|
sessionId: 'test-session',
|
|
image: 'node:22-slim',
|
|
workspaceDir: '/workspace',
|
|
network: 'none',
|
|
memoryLimit: '512m',
|
|
cpuLimit: '1.0',
|
|
timeoutSeconds: 300,
|
|
};
|
|
|
|
beforeEach(() => {
|
|
vi.clearAllMocks();
|
|
});
|
|
|
|
describe('create()', () => {
|
|
it('creates a docker container with correct args', async () => {
|
|
mockExecFileSuccess('container-abc123');
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
await sandbox.create();
|
|
|
|
expect(mockedExecFile).toHaveBeenCalledWith(
|
|
'docker',
|
|
expect.arrayContaining([
|
|
'create',
|
|
'--name', expect.stringContaining('flynn-test-session'),
|
|
'--memory', '512m',
|
|
'--cpus', '1.0',
|
|
'--network', 'none',
|
|
'-v', expect.stringContaining(':/workspace'),
|
|
'node:22-slim',
|
|
'sleep', 'infinity',
|
|
]),
|
|
expect.any(Object),
|
|
expect.any(Function),
|
|
);
|
|
expect(sandbox.containerId).toBe('container-abc123');
|
|
});
|
|
|
|
it('starts the container after creating', async () => {
|
|
mockExecFileSuccess('container-abc123');
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
await sandbox.create();
|
|
|
|
// Second call should be docker start
|
|
expect(mockedExecFile).toHaveBeenCalledTimes(2);
|
|
expect(mockedExecFile).toHaveBeenNthCalledWith(
|
|
2, 'docker', ['start', 'container-abc123'],
|
|
expect.any(Object), expect.any(Function),
|
|
);
|
|
});
|
|
|
|
it('throws if docker create fails', async () => {
|
|
mockExecFileError('docker not found');
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
await expect(sandbox.create()).rejects.toThrow('docker not found');
|
|
});
|
|
});
|
|
|
|
describe('exec()', () => {
|
|
it('runs command inside container', async () => {
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
// Manually set container ID to skip create
|
|
(sandbox as unknown as { _containerId: string })._containerId = 'container-abc';
|
|
|
|
mockExecFileSuccess('hello world\n');
|
|
const result = await sandbox.exec('echo hello world');
|
|
|
|
expect(mockedExecFile).toHaveBeenCalledWith(
|
|
'docker',
|
|
['exec', 'container-abc', 'bash', '-c', 'echo hello world'],
|
|
expect.objectContaining({ timeout: expect.any(Number) }),
|
|
expect.any(Function),
|
|
);
|
|
expect(result).toEqual({ stdout: 'hello world\n', stderr: '' });
|
|
});
|
|
|
|
it('passes cwd as workdir option', async () => {
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
(sandbox as unknown as { _containerId: string })._containerId = 'container-abc';
|
|
|
|
mockExecFileSuccess('');
|
|
await sandbox.exec('ls', { cwd: '/workspace/project' });
|
|
|
|
expect(mockedExecFile).toHaveBeenCalledWith(
|
|
'docker',
|
|
['exec', '-w', '/workspace/project', 'container-abc', 'bash', '-c', 'ls'],
|
|
expect.any(Object),
|
|
expect.any(Function),
|
|
);
|
|
});
|
|
|
|
it('throws if no container created', async () => {
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
await expect(sandbox.exec('echo hi')).rejects.toThrow('not created');
|
|
});
|
|
});
|
|
|
|
describe('destroy()', () => {
|
|
it('force-removes the container', async () => {
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
(sandbox as unknown as { _containerId: string })._containerId = 'container-abc';
|
|
|
|
mockExecFileSuccess();
|
|
await sandbox.destroy();
|
|
|
|
expect(mockedExecFile).toHaveBeenCalledWith(
|
|
'docker', ['rm', '-f', 'container-abc'],
|
|
expect.any(Object), expect.any(Function),
|
|
);
|
|
});
|
|
|
|
it('does nothing if no container', async () => {
|
|
const sandbox = new DockerSandbox(defaultConfig);
|
|
await sandbox.destroy(); // should not throw
|
|
expect(mockedExecFile).not.toHaveBeenCalled();
|
|
});
|
|
});
|
|
|
|
describe('isAvailable()', () => {
|
|
it('returns true when docker is installed', async () => {
|
|
mockExecFileSuccess('Docker version 27.0.0');
|
|
const result = await DockerSandbox.isAvailable();
|
|
expect(result).toBe(true);
|
|
});
|
|
|
|
it('returns false when docker is not installed', async () => {
|
|
mockExecFileError('command not found');
|
|
const result = await DockerSandbox.isAvailable();
|
|
expect(result).toBe(false);
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/sandbox/docker.test.ts`
|
|
Expected: FAIL — cannot find module `./docker.js`
|
|
|
|
**Step 3: Implement DockerSandbox**
|
|
|
|
Create file: `src/sandbox/docker.ts`
|
|
|
|
```typescript
|
|
import { execFile } from 'child_process';
|
|
|
|
export interface DockerSandboxConfig {
|
|
sessionId: string;
|
|
image: string;
|
|
workspaceDir: string;
|
|
network: 'none' | 'bridge' | 'host';
|
|
memoryLimit: string;
|
|
cpuLimit: string;
|
|
timeoutSeconds: number;
|
|
}
|
|
|
|
export interface ExecOptions {
|
|
cwd?: string;
|
|
timeout?: number;
|
|
}
|
|
|
|
export interface ExecResult {
|
|
stdout: string;
|
|
stderr: string;
|
|
}
|
|
|
|
/**
|
|
* Manages a single Docker container for sandboxed tool execution.
|
|
* Uses the Docker CLI directly (no SDK dependency).
|
|
*/
|
|
export class DockerSandbox {
|
|
private config: DockerSandboxConfig;
|
|
private _containerId: string | null = null;
|
|
private _hostWorkdir: string;
|
|
|
|
constructor(config: DockerSandboxConfig) {
|
|
this.config = config;
|
|
// Use a temp directory on the host, named by session
|
|
const sanitizedId = config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_');
|
|
this._hostWorkdir = `/tmp/flynn-sandbox-${sanitizedId}`;
|
|
}
|
|
|
|
get containerId(): string | null {
|
|
return this._containerId;
|
|
}
|
|
|
|
get containerName(): string {
|
|
const sanitizedId = this.config.sessionId.replace(/[^a-zA-Z0-9_-]/g, '_');
|
|
return `flynn-${sanitizedId}`;
|
|
}
|
|
|
|
/** Create and start the sandbox container. */
|
|
async create(): Promise<void> {
|
|
const args = [
|
|
'create',
|
|
'--name', this.containerName,
|
|
'--memory', this.config.memoryLimit,
|
|
'--cpus', this.config.cpuLimit,
|
|
'--network', this.config.network,
|
|
'-v', `${this._hostWorkdir}:${this.config.workspaceDir}`,
|
|
this.config.image,
|
|
'sleep', 'infinity',
|
|
];
|
|
|
|
const createResult = await this.dockerCmd(args);
|
|
this._containerId = createResult.stdout.trim();
|
|
|
|
await this.dockerCmd(['start', this._containerId]);
|
|
}
|
|
|
|
/** Execute a command inside the container. */
|
|
async exec(command: string, opts?: ExecOptions): Promise<ExecResult> {
|
|
if (!this._containerId) {
|
|
throw new Error('Sandbox container not created. Call create() first.');
|
|
}
|
|
|
|
const args = ['exec'];
|
|
if (opts?.cwd) {
|
|
args.push('-w', opts.cwd);
|
|
}
|
|
args.push(this._containerId, 'bash', '-c', command);
|
|
|
|
const timeout = opts?.timeout ?? this.config.timeoutSeconds * 1000;
|
|
return this.dockerCmd(args, timeout);
|
|
}
|
|
|
|
/** Force-remove the container. */
|
|
async destroy(): Promise<void> {
|
|
if (!this._containerId) return;
|
|
|
|
try {
|
|
await this.dockerCmd(['rm', '-f', this._containerId]);
|
|
} catch {
|
|
// Ignore errors during cleanup
|
|
}
|
|
this._containerId = null;
|
|
}
|
|
|
|
/** Check if Docker is available on this host. */
|
|
static async isAvailable(): Promise<boolean> {
|
|
try {
|
|
await new Promise<string>((resolve, reject) => {
|
|
execFile('docker', ['version', '--format', '{{.Server.Version}}'], {
|
|
timeout: 5000,
|
|
}, (error, stdout) => {
|
|
if (error) reject(error);
|
|
else resolve(stdout);
|
|
});
|
|
});
|
|
return true;
|
|
} catch {
|
|
return false;
|
|
}
|
|
}
|
|
|
|
/** Run a docker CLI command. */
|
|
private dockerCmd(args: string[], timeout = 30_000): Promise<ExecResult> {
|
|
return new Promise((resolve, reject) => {
|
|
execFile('docker', args, { timeout, maxBuffer: 1024 * 1024 }, (error, stdout, stderr) => {
|
|
if (error) {
|
|
reject(error);
|
|
return;
|
|
}
|
|
resolve({ stdout, stderr });
|
|
});
|
|
});
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/sandbox/docker.test.ts`
|
|
Expected: PASS (all tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/sandbox/docker.ts src/sandbox/docker.test.ts
|
|
git commit -m "feat: add DockerSandbox class for container lifecycle"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: SandboxManager
|
|
|
|
**Files:**
|
|
- Create: `src/sandbox/manager.ts`
|
|
- Create: `src/sandbox/manager.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/sandbox/manager.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
|
import { SandboxManager } from './manager.js';
|
|
import { DockerSandbox } from './docker.js';
|
|
import type { SandboxConfig } from '../config/schema.js';
|
|
|
|
// Mock DockerSandbox
|
|
vi.mock('./docker.js', () => ({
|
|
DockerSandbox: vi.fn().mockImplementation(() => ({
|
|
create: vi.fn().mockResolvedValue(undefined),
|
|
destroy: vi.fn().mockResolvedValue(undefined),
|
|
exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }),
|
|
containerId: 'mock-container',
|
|
})),
|
|
}));
|
|
|
|
describe('SandboxManager', () => {
|
|
const defaultConfig: SandboxConfig = {
|
|
enabled: true,
|
|
image: 'node:22-slim',
|
|
workspace_dir: '/workspace',
|
|
network: 'none',
|
|
memory_limit: '512m',
|
|
cpu_limit: '1.0',
|
|
timeout_seconds: 300,
|
|
};
|
|
|
|
beforeEach(() => {
|
|
vi.clearAllMocks();
|
|
});
|
|
|
|
describe('getOrCreate()', () => {
|
|
it('creates a new sandbox for unknown session', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
const sandbox = await manager.getOrCreate('session-1');
|
|
|
|
expect(DockerSandbox).toHaveBeenCalledWith(expect.objectContaining({
|
|
sessionId: 'session-1',
|
|
image: 'node:22-slim',
|
|
}));
|
|
expect(sandbox.create).toHaveBeenCalled();
|
|
});
|
|
|
|
it('reuses existing sandbox for same session', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
const first = await manager.getOrCreate('session-1');
|
|
const second = await manager.getOrCreate('session-1');
|
|
|
|
expect(first).toBe(second);
|
|
expect(DockerSandbox).toHaveBeenCalledTimes(1);
|
|
});
|
|
|
|
it('creates separate sandboxes for different sessions', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
await manager.getOrCreate('session-1');
|
|
await manager.getOrCreate('session-2');
|
|
|
|
expect(DockerSandbox).toHaveBeenCalledTimes(2);
|
|
});
|
|
});
|
|
|
|
describe('destroy()', () => {
|
|
it('destroys sandbox and removes from cache', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
const sandbox = await manager.getOrCreate('session-1');
|
|
|
|
await manager.destroy('session-1');
|
|
expect(sandbox.destroy).toHaveBeenCalled();
|
|
|
|
// Should create a new one now
|
|
await manager.getOrCreate('session-1');
|
|
expect(DockerSandbox).toHaveBeenCalledTimes(2);
|
|
});
|
|
|
|
it('does nothing for unknown session', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
await manager.destroy('nonexistent'); // should not throw
|
|
});
|
|
});
|
|
|
|
describe('destroyAll()', () => {
|
|
it('destroys all sandboxes', async () => {
|
|
const manager = new SandboxManager(defaultConfig);
|
|
const s1 = await manager.getOrCreate('session-1');
|
|
const s2 = await manager.getOrCreate('session-2');
|
|
|
|
await manager.destroyAll();
|
|
expect(s1.destroy).toHaveBeenCalled();
|
|
expect(s2.destroy).toHaveBeenCalled();
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/sandbox/manager.test.ts`
|
|
Expected: FAIL — cannot find module `./manager.js`
|
|
|
|
**Step 3: Implement SandboxManager**
|
|
|
|
Create file: `src/sandbox/manager.ts`
|
|
|
|
```typescript
|
|
import { DockerSandbox } from './docker.js';
|
|
import type { SandboxConfig } from '../config/schema.js';
|
|
|
|
/**
|
|
* Manages per-session Docker sandboxes.
|
|
* Creates containers lazily on first access, destroys on session cleanup.
|
|
*/
|
|
export class SandboxManager {
|
|
private sandboxes = new Map<string, DockerSandbox>();
|
|
private config: SandboxConfig;
|
|
|
|
constructor(config: SandboxConfig) {
|
|
this.config = config;
|
|
}
|
|
|
|
/** Get or create a sandbox for a session. */
|
|
async getOrCreate(sessionId: string): Promise<DockerSandbox> {
|
|
let sandbox = this.sandboxes.get(sessionId);
|
|
if (sandbox) return sandbox;
|
|
|
|
sandbox = new DockerSandbox({
|
|
sessionId,
|
|
image: this.config.image,
|
|
workspaceDir: this.config.workspace_dir,
|
|
network: this.config.network,
|
|
memoryLimit: this.config.memory_limit,
|
|
cpuLimit: this.config.cpu_limit,
|
|
timeoutSeconds: this.config.timeout_seconds,
|
|
});
|
|
|
|
await sandbox.create();
|
|
this.sandboxes.set(sessionId, sandbox);
|
|
return sandbox;
|
|
}
|
|
|
|
/** Destroy a specific session's sandbox. */
|
|
async destroy(sessionId: string): Promise<void> {
|
|
const sandbox = this.sandboxes.get(sessionId);
|
|
if (!sandbox) return;
|
|
|
|
await sandbox.destroy();
|
|
this.sandboxes.delete(sessionId);
|
|
}
|
|
|
|
/** Destroy all sandboxes (daemon shutdown). */
|
|
async destroyAll(): Promise<void> {
|
|
const entries = Array.from(this.sandboxes.entries());
|
|
await Promise.allSettled(
|
|
entries.map(async ([id, sandbox]) => {
|
|
await sandbox.destroy();
|
|
this.sandboxes.delete(id);
|
|
}),
|
|
);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/sandbox/manager.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/sandbox/manager.ts src/sandbox/manager.test.ts
|
|
git commit -m "feat: add SandboxManager for per-session container lifecycle"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Sandboxed Tool Wrappers
|
|
|
|
**Files:**
|
|
- Create: `src/sandbox/tools.ts`
|
|
- Create: `src/sandbox/tools.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/sandbox/tools.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
|
import { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js';
|
|
import type { DockerSandbox } from './docker.js';
|
|
|
|
function mockSandbox(): DockerSandbox {
|
|
return {
|
|
exec: vi.fn().mockResolvedValue({ stdout: 'output', stderr: '' }),
|
|
create: vi.fn(),
|
|
destroy: vi.fn(),
|
|
containerId: 'test-container',
|
|
containerName: 'flynn-test',
|
|
config: {},
|
|
} as unknown as DockerSandbox;
|
|
}
|
|
|
|
describe('createSandboxedShellTool', () => {
|
|
let sandbox: DockerSandbox;
|
|
|
|
beforeEach(() => {
|
|
sandbox = mockSandbox();
|
|
});
|
|
|
|
it('has the same name as shell.exec', () => {
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
expect(tool.name).toBe('shell.exec');
|
|
});
|
|
|
|
it('delegates to sandbox.exec', async () => {
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
const result = await tool.execute({ command: 'echo hello' });
|
|
|
|
expect(sandbox.exec).toHaveBeenCalledWith('echo hello', { cwd: undefined, timeout: 30000 });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBe('output');
|
|
});
|
|
|
|
it('passes cwd to sandbox.exec', async () => {
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
await tool.execute({ command: 'ls', cwd: '/workspace/project' });
|
|
|
|
expect(sandbox.exec).toHaveBeenCalledWith('ls', { cwd: '/workspace/project', timeout: 30000 });
|
|
});
|
|
|
|
it('passes timeout to sandbox.exec', async () => {
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
await tool.execute({ command: 'sleep 10', timeout: 5000 });
|
|
|
|
expect(sandbox.exec).toHaveBeenCalledWith('sleep 10', { cwd: undefined, timeout: 5000 });
|
|
});
|
|
|
|
it('returns error on sandbox.exec failure', async () => {
|
|
(sandbox.exec as ReturnType<typeof vi.fn>).mockRejectedValue(new Error('container dead'));
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
const result = await tool.execute({ command: 'fail' });
|
|
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toBe('container dead');
|
|
});
|
|
|
|
it('includes stderr in output', async () => {
|
|
(sandbox.exec as ReturnType<typeof vi.fn>).mockResolvedValue({ stdout: 'out', stderr: 'warn' });
|
|
const tool = createSandboxedShellTool(sandbox);
|
|
const result = await tool.execute({ command: 'cmd' });
|
|
|
|
expect(result.output).toContain('out');
|
|
expect(result.output).toContain('stderr: warn');
|
|
});
|
|
});
|
|
|
|
describe('createSandboxedProcessStartTool', () => {
|
|
let sandbox: DockerSandbox;
|
|
|
|
beforeEach(() => {
|
|
sandbox = mockSandbox();
|
|
});
|
|
|
|
it('has the same name as process.start', () => {
|
|
const tool = createSandboxedProcessStartTool(sandbox);
|
|
expect(tool.name).toBe('process.start');
|
|
});
|
|
|
|
it('runs detached command via sandbox', async () => {
|
|
const tool = createSandboxedProcessStartTool(sandbox);
|
|
const result = await tool.execute({ command: 'npm run dev' });
|
|
|
|
expect(sandbox.exec).toHaveBeenCalledWith(
|
|
expect.stringContaining('npm run dev'),
|
|
expect.any(Object),
|
|
);
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toContain('Started sandboxed background process');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/sandbox/tools.test.ts`
|
|
Expected: FAIL — cannot find module `./tools.js`
|
|
|
|
**Step 3: Implement sandboxed tools**
|
|
|
|
Create file: `src/sandbox/tools.ts`
|
|
|
|
```typescript
|
|
import type { Tool, ToolResult } from '../tools/types.js';
|
|
import type { DockerSandbox } from './docker.js';
|
|
|
|
interface ShellExecArgs {
|
|
command: string;
|
|
cwd?: string;
|
|
timeout?: number;
|
|
}
|
|
|
|
interface ProcessStartArgs {
|
|
command: string;
|
|
cwd?: string;
|
|
}
|
|
|
|
/**
|
|
* Create a sandboxed version of shell.exec that delegates to docker exec.
|
|
* Same Tool interface — drop-in replacement for the host shell.exec.
|
|
*/
|
|
export function createSandboxedShellTool(sandbox: DockerSandbox): Tool {
|
|
return {
|
|
name: 'shell.exec',
|
|
description: 'Execute a shell command inside a sandboxed container and return stdout/stderr.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
command: { type: 'string', description: 'The shell command to execute' },
|
|
cwd: { type: 'string', description: 'Working directory inside the container (optional)' },
|
|
timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
|
|
},
|
|
required: ['command'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as ShellExecArgs;
|
|
const timeout = args.timeout ?? 30_000;
|
|
|
|
try {
|
|
const result = await sandbox.exec(args.command, {
|
|
cwd: args.cwd,
|
|
timeout,
|
|
});
|
|
|
|
const output = result.stdout + (result.stderr ? `\nstderr: ${result.stderr}` : '');
|
|
return { success: true, output };
|
|
} catch (error) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: error instanceof Error ? error.message : String(error),
|
|
};
|
|
}
|
|
},
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Create a sandboxed version of process.start that runs in the container.
|
|
* Uses `nohup ... &` via docker exec since we can't spawn detached inside containers.
|
|
*/
|
|
export function createSandboxedProcessStartTool(sandbox: DockerSandbox): Tool {
|
|
return {
|
|
name: 'process.start',
|
|
description: 'Start a command in the background inside a sandboxed container.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
command: { type: 'string', description: 'The shell command to run in the background' },
|
|
cwd: { type: 'string', description: 'Working directory inside the container (optional)' },
|
|
},
|
|
required: ['command'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as ProcessStartArgs;
|
|
|
|
try {
|
|
// Run via nohup + background in the container
|
|
const wrappedCmd = `nohup bash -c '${args.command.replace(/'/g, "'\\''")}' > /tmp/proc.log 2>&1 & echo $!`;
|
|
const result = await sandbox.exec(wrappedCmd, { cwd: args.cwd });
|
|
|
|
const pid = result.stdout.trim();
|
|
return {
|
|
success: true,
|
|
output: `Started sandboxed background process (PID ${pid})\nCommand: ${args.command}`,
|
|
};
|
|
} catch (error) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: error instanceof Error ? error.message : 'Failed to start sandboxed process',
|
|
};
|
|
}
|
|
},
|
|
};
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/sandbox/tools.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/sandbox/tools.ts src/sandbox/tools.test.ts
|
|
git commit -m "feat: add sandboxed tool wrappers for shell.exec and process.start"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: Sandbox Barrel Export + ToolRegistry.clone()
|
|
|
|
**Files:**
|
|
- Create: `src/sandbox/index.ts`
|
|
- Modify: `src/tools/registry.ts:19-97`
|
|
|
|
**Step 1: Write the failing test for ToolRegistry.clone()**
|
|
|
|
Add to a new test or extend existing tests. Create file `src/tools/registry.test.ts` (if it doesn't exist — check first):
|
|
|
|
```typescript
|
|
import { describe, it, expect } from 'vitest';
|
|
import { ToolRegistry } from './registry.js';
|
|
import type { Tool } from './types.js';
|
|
|
|
function makeTool(name: string): Tool {
|
|
return {
|
|
name,
|
|
description: `Mock ${name}`,
|
|
inputSchema: { type: 'object', properties: {} },
|
|
execute: async () => ({ success: true, output: '' }),
|
|
};
|
|
}
|
|
|
|
describe('ToolRegistry', () => {
|
|
describe('clone()', () => {
|
|
it('creates a copy with all tools', () => {
|
|
const reg = new ToolRegistry();
|
|
reg.register(makeTool('tool.a'));
|
|
reg.register(makeTool('tool.b'));
|
|
|
|
const cloned = reg.clone();
|
|
expect(cloned.list().map(t => t.name).sort()).toEqual(['tool.a', 'tool.b']);
|
|
});
|
|
|
|
it('inherits the policy from original', () => {
|
|
const reg = new ToolRegistry();
|
|
const mockPolicy = { filterTools: vi.fn(), isAllowed: vi.fn(), resolveAllowedNames: vi.fn(), getEffectiveProfile: vi.fn() };
|
|
reg.setPolicy(mockPolicy as any);
|
|
|
|
const cloned = reg.clone();
|
|
expect(cloned.getPolicy()).toBe(mockPolicy);
|
|
});
|
|
|
|
it('allows replacing tools in clone without affecting original', () => {
|
|
const reg = new ToolRegistry();
|
|
const originalTool = makeTool('shell.exec');
|
|
reg.register(originalTool);
|
|
|
|
const cloned = reg.clone();
|
|
const replacementTool = makeTool('shell.exec');
|
|
replacementTool.description = 'Sandboxed version';
|
|
|
|
cloned.replace(replacementTool);
|
|
expect(cloned.get('shell.exec')!.description).toBe('Sandboxed version');
|
|
expect(reg.get('shell.exec')!.description).toBe('Mock shell.exec');
|
|
});
|
|
});
|
|
|
|
describe('replace()', () => {
|
|
it('replaces an existing tool', () => {
|
|
const reg = new ToolRegistry();
|
|
reg.register(makeTool('tool.a'));
|
|
const replacement = makeTool('tool.a');
|
|
replacement.description = 'New description';
|
|
|
|
reg.replace(replacement);
|
|
expect(reg.get('tool.a')!.description).toBe('New description');
|
|
});
|
|
|
|
it('throws if tool does not exist', () => {
|
|
const reg = new ToolRegistry();
|
|
expect(() => reg.replace(makeTool('nonexistent'))).toThrow('not registered');
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
Note: Add `import { vi } from 'vitest'` to the imports at the top.
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/registry.test.ts`
|
|
Expected: FAIL — `clone()` and `replace()` don't exist on ToolRegistry
|
|
|
|
**Step 3: Add clone() and replace() to ToolRegistry**
|
|
|
|
In `src/tools/registry.ts`, add these two methods to the `ToolRegistry` class (after the `unregister` method, around line 32):
|
|
|
|
```typescript
|
|
/** Replace an existing tool with a new implementation. Throws if not registered. */
|
|
replace(tool: Tool): void {
|
|
if (!this.tools.has(tool.name)) {
|
|
throw new Error(`Tool '${tool.name}' is not registered — cannot replace`);
|
|
}
|
|
this.tools.set(tool.name, tool);
|
|
}
|
|
|
|
/** Create a shallow clone of this registry (new Map, same Tool objects + policy). */
|
|
clone(): ToolRegistry {
|
|
const cloned = new ToolRegistry();
|
|
for (const tool of this.tools.values()) {
|
|
cloned.register(tool);
|
|
}
|
|
if (this._policy) {
|
|
cloned.setPolicy(this._policy);
|
|
}
|
|
return cloned;
|
|
}
|
|
```
|
|
|
|
**Step 4: Create the sandbox barrel export**
|
|
|
|
Create file: `src/sandbox/index.ts`
|
|
|
|
```typescript
|
|
export { DockerSandbox, type DockerSandboxConfig, type ExecOptions, type ExecResult } from './docker.js';
|
|
export { SandboxManager } from './manager.js';
|
|
export { createSandboxedShellTool, createSandboxedProcessStartTool } from './tools.js';
|
|
```
|
|
|
|
**Step 5: Run tests to verify they pass**
|
|
|
|
Run: `pnpm vitest run src/tools/registry.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 6: Run full test suite**
|
|
|
|
Run: `pnpm test:run`
|
|
Expected: All tests pass
|
|
|
|
**Step 7: Commit**
|
|
|
|
```bash
|
|
git add src/sandbox/index.ts src/tools/registry.ts src/tools/registry.test.ts
|
|
git commit -m "feat: add ToolRegistry.clone() and replace() for per-session registries"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Agent Config Registry
|
|
|
|
**Files:**
|
|
- Create: `src/agents/registry.ts`
|
|
- Create: `src/agents/registry.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/agents/registry.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect } from 'vitest';
|
|
import { AgentConfigRegistry, type AgentConfig } from './registry.js';
|
|
|
|
describe('AgentConfigRegistry', () => {
|
|
describe('register()', () => {
|
|
it('registers a named agent config', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
const config: AgentConfig = { name: 'assistant', systemPrompt: 'Be helpful.' };
|
|
registry.register(config);
|
|
|
|
expect(registry.get('assistant')).toEqual(config);
|
|
});
|
|
|
|
it('throws on duplicate name', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
registry.register({ name: 'assistant' });
|
|
expect(() => registry.register({ name: 'assistant' })).toThrow('already registered');
|
|
});
|
|
});
|
|
|
|
describe('get()', () => {
|
|
it('returns undefined for unknown name', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
expect(registry.get('nonexistent')).toBeUndefined();
|
|
});
|
|
});
|
|
|
|
describe('list()', () => {
|
|
it('returns all registered configs', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
registry.register({ name: 'a' });
|
|
registry.register({ name: 'b' });
|
|
expect(registry.list().map(c => c.name).sort()).toEqual(['a', 'b']);
|
|
});
|
|
});
|
|
|
|
describe('loadFromConfig()', () => {
|
|
it('loads configs from a raw config object', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
registry.loadFromConfig({
|
|
assistant: {
|
|
system_prompt: 'Be helpful.',
|
|
model_tier: 'default',
|
|
tool_profile: 'messaging',
|
|
sandbox: false,
|
|
},
|
|
coder: {
|
|
model_tier: 'complex',
|
|
tool_profile: 'coding',
|
|
sandbox: true,
|
|
},
|
|
});
|
|
|
|
expect(registry.list()).toHaveLength(2);
|
|
const assistant = registry.get('assistant')!;
|
|
expect(assistant.systemPrompt).toBe('Be helpful.');
|
|
expect(assistant.modelTier).toBe('default');
|
|
expect(assistant.toolProfile).toBe('messaging');
|
|
|
|
const coder = registry.get('coder')!;
|
|
expect(coder.sandbox).toBe(true);
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/agents/registry.test.ts`
|
|
Expected: FAIL — cannot find module `./registry.js`
|
|
|
|
**Step 3: Implement AgentConfigRegistry**
|
|
|
|
Create file: `src/agents/registry.ts`
|
|
|
|
```typescript
|
|
import type { ToolProfile, ToolOverrideConfig } from '../config/schema.js';
|
|
import type { ModelTier } from '../models/router.js';
|
|
|
|
export interface AgentConfig {
|
|
name: string;
|
|
systemPrompt?: string;
|
|
modelTier?: ModelTier;
|
|
toolProfile?: ToolProfile;
|
|
toolOverrides?: ToolOverrideConfig;
|
|
sandbox?: boolean;
|
|
}
|
|
|
|
/**
|
|
* AgentConfigRegistry — stores named agent configurations.
|
|
* Loaded from YAML config at startup.
|
|
*/
|
|
export class AgentConfigRegistry {
|
|
private configs = new Map<string, AgentConfig>();
|
|
|
|
register(config: AgentConfig): void {
|
|
if (this.configs.has(config.name)) {
|
|
throw new Error(`Agent config '${config.name}' is already registered`);
|
|
}
|
|
this.configs.set(config.name, config);
|
|
}
|
|
|
|
get(name: string): AgentConfig | undefined {
|
|
return this.configs.get(name);
|
|
}
|
|
|
|
list(): AgentConfig[] {
|
|
return Array.from(this.configs.values());
|
|
}
|
|
|
|
/**
|
|
* Load agent configs from the parsed YAML config.
|
|
* Maps from the config schema format to the internal AgentConfig format.
|
|
*/
|
|
loadFromConfig(rawConfigs: Record<string, {
|
|
system_prompt?: string;
|
|
model_tier?: string;
|
|
tool_profile?: string;
|
|
tool_overrides?: ToolOverrideConfig;
|
|
sandbox?: boolean;
|
|
}>): void {
|
|
for (const [name, raw] of Object.entries(rawConfigs)) {
|
|
this.register({
|
|
name,
|
|
systemPrompt: raw.system_prompt,
|
|
modelTier: raw.model_tier as ModelTier | undefined,
|
|
toolProfile: raw.tool_profile as ToolProfile | undefined,
|
|
toolOverrides: raw.tool_overrides,
|
|
sandbox: raw.sandbox,
|
|
});
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/agents/registry.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/agents/registry.ts src/agents/registry.test.ts
|
|
git commit -m "feat: add AgentConfigRegistry for named agent configurations"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: Agent Router
|
|
|
|
**Files:**
|
|
- Create: `src/agents/router.ts`
|
|
- Create: `src/agents/router.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Create file: `src/agents/router.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect } from 'vitest';
|
|
import { AgentRouter, type RoutingConfig } from './router.js';
|
|
|
|
describe('AgentRouter', () => {
|
|
describe('resolve()', () => {
|
|
it('returns default_agent when no specific match', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: {},
|
|
senders: {},
|
|
});
|
|
expect(router.resolve('telegram', '12345')).toBe('assistant');
|
|
});
|
|
|
|
it('returns undefined when no default and no match', () => {
|
|
const router = new AgentRouter({
|
|
channels: {},
|
|
senders: {},
|
|
});
|
|
expect(router.resolve('telegram', '12345')).toBeUndefined();
|
|
});
|
|
|
|
it('matches exact sender', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: {},
|
|
senders: { 'telegram:12345': 'coder' },
|
|
});
|
|
expect(router.resolve('telegram', '12345')).toBe('coder');
|
|
});
|
|
|
|
it('matches sender with glob pattern', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: {},
|
|
senders: { 'slack:U0*': 'coder' },
|
|
});
|
|
expect(router.resolve('slack', 'U0ABC')).toBe('coder');
|
|
expect(router.resolve('slack', 'U1ABC')).toBeUndefined(); // no channel match, no default... wait
|
|
});
|
|
|
|
it('matches channel when no sender match', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: { discord: 'coder' },
|
|
senders: {},
|
|
});
|
|
expect(router.resolve('discord', 'any-user')).toBe('coder');
|
|
});
|
|
|
|
it('sender match takes priority over channel match', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: { discord: 'coder' },
|
|
senders: { 'discord:special-user': 'vip' },
|
|
});
|
|
expect(router.resolve('discord', 'special-user')).toBe('vip');
|
|
expect(router.resolve('discord', 'normal-user')).toBe('coder');
|
|
});
|
|
|
|
it('falls through: sender → channel → default', () => {
|
|
const router = new AgentRouter({
|
|
default_agent: 'fallback',
|
|
channels: { discord: 'guild-agent' },
|
|
senders: { 'discord:admin': 'admin-agent' },
|
|
});
|
|
expect(router.resolve('discord', 'admin')).toBe('admin-agent');
|
|
expect(router.resolve('discord', 'regular')).toBe('guild-agent');
|
|
expect(router.resolve('telegram', 'someone')).toBe('fallback');
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/agents/router.test.ts`
|
|
Expected: FAIL — cannot find module `./router.js`
|
|
|
|
**Step 3: Implement AgentRouter**
|
|
|
|
Create file: `src/agents/router.ts`
|
|
|
|
```typescript
|
|
/**
|
|
* AgentRouter resolves which agent config to use for a given channel+sender.
|
|
*
|
|
* Resolution order:
|
|
* 1. Exact sender match (channel:senderId)
|
|
* 2. Glob pattern sender match
|
|
* 3. Channel match
|
|
* 4. default_agent fallback
|
|
*/
|
|
|
|
export interface RoutingConfig {
|
|
default_agent?: string;
|
|
channels: Record<string, string>;
|
|
senders: Record<string, string>;
|
|
}
|
|
|
|
/**
|
|
* Convert a simple glob pattern to regex.
|
|
* Supports `*` (any chars) with `.` escaped.
|
|
*/
|
|
function patternToRegex(pattern: string): RegExp {
|
|
const escaped = pattern
|
|
.replace(/[.+^${}()|[\]\\]/g, '\\$&')
|
|
.replace(/\*/g, '.*');
|
|
return new RegExp(`^${escaped}$`);
|
|
}
|
|
|
|
export class AgentRouter {
|
|
private config: RoutingConfig;
|
|
|
|
constructor(config: RoutingConfig) {
|
|
this.config = config;
|
|
}
|
|
|
|
/**
|
|
* Resolve the agent config name for a channel + sender pair.
|
|
* Returns undefined if no match and no default.
|
|
*/
|
|
resolve(channel: string, senderId: string): string | undefined {
|
|
const senderKey = `${channel}:${senderId}`;
|
|
|
|
// 1. Exact sender match
|
|
if (this.config.senders[senderKey]) {
|
|
return this.config.senders[senderKey];
|
|
}
|
|
|
|
// 2. Glob pattern sender match
|
|
for (const [pattern, agentName] of Object.entries(this.config.senders)) {
|
|
if (pattern.includes('*') && patternToRegex(pattern).test(senderKey)) {
|
|
return agentName;
|
|
}
|
|
}
|
|
|
|
// 3. Channel match
|
|
if (this.config.channels[channel]) {
|
|
return this.config.channels[channel];
|
|
}
|
|
|
|
// 4. Default fallback
|
|
return this.config.default_agent;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/agents/router.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/agents/router.ts src/agents/router.test.ts
|
|
git commit -m "feat: add AgentRouter for config-based sender/channel routing"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 8: Agents Barrel Export
|
|
|
|
**Files:**
|
|
- Create: `src/agents/index.ts`
|
|
|
|
**Step 1: Create the barrel file**
|
|
|
|
Create file: `src/agents/index.ts`
|
|
|
|
```typescript
|
|
export { AgentConfigRegistry, type AgentConfig } from './registry.js';
|
|
export { AgentRouter, type RoutingConfig } from './router.js';
|
|
```
|
|
|
|
**Step 2: Verify build**
|
|
|
|
Run: `pnpm typecheck`
|
|
Expected: No errors
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add src/agents/index.ts
|
|
git commit -m "feat: add agents barrel export"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: Wire Everything Into the Daemon
|
|
|
|
**Files:**
|
|
- Modify: `src/daemon/index.ts`
|
|
|
|
This is the integration task. The daemon's `createMessageRouter()` needs to use the `AgentRouter` and `SandboxManager`.
|
|
|
|
**Step 1: Write the integration test**
|
|
|
|
Create file: `src/daemon/routing.test.ts`
|
|
|
|
```typescript
|
|
import { describe, it, expect, vi } from 'vitest';
|
|
import { AgentRouter } from '../agents/router.js';
|
|
import { AgentConfigRegistry } from '../agents/registry.js';
|
|
|
|
describe('daemon agent routing integration', () => {
|
|
it('resolves agent config for channel messages', () => {
|
|
const registry = new AgentConfigRegistry();
|
|
registry.loadFromConfig({
|
|
assistant: { system_prompt: 'Be helpful.', model_tier: 'default', tool_profile: 'messaging', sandbox: false },
|
|
coder: { system_prompt: 'Write code.', model_tier: 'complex', tool_profile: 'coding', sandbox: true },
|
|
});
|
|
|
|
const router = new AgentRouter({
|
|
default_agent: 'assistant',
|
|
channels: { discord: 'coder' },
|
|
senders: { 'telegram:admin': 'coder' },
|
|
});
|
|
|
|
// Discord user gets coder
|
|
const discordAgent = router.resolve('discord', 'user123');
|
|
expect(discordAgent).toBe('coder');
|
|
expect(registry.get(discordAgent!)!.systemPrompt).toBe('Write code.');
|
|
|
|
// Telegram admin gets coder
|
|
const telegramAdmin = router.resolve('telegram', 'admin');
|
|
expect(telegramAdmin).toBe('coder');
|
|
|
|
// Random telegram user gets assistant
|
|
const telegramUser = router.resolve('telegram', 'random');
|
|
expect(telegramUser).toBe('assistant');
|
|
expect(registry.get(telegramUser!)!.systemPrompt).toBe('Be helpful.');
|
|
});
|
|
|
|
it('uses default agent when no routing configured', () => {
|
|
const router = new AgentRouter({ channels: {}, senders: {} });
|
|
expect(router.resolve('telegram', '123')).toBeUndefined();
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/daemon/routing.test.ts`
|
|
Expected: PASS (these are testing already-built components together)
|
|
|
|
**Step 3: Modify daemon/index.ts**
|
|
|
|
Add imports at the top of `src/daemon/index.ts` (after existing imports):
|
|
|
|
```typescript
|
|
import { AgentConfigRegistry, AgentRouter } from '../agents/index.js';
|
|
import { SandboxManager, createSandboxedShellTool, createSandboxedProcessStartTool } from '../sandbox/index.js';
|
|
```
|
|
|
|
Add to `DaemonContext` interface:
|
|
|
|
```typescript
|
|
agentConfigRegistry: AgentConfigRegistry;
|
|
agentRouter: AgentRouter;
|
|
sandboxManager?: SandboxManager;
|
|
```
|
|
|
|
Modify `createMessageRouter()` to accept additional dependencies:
|
|
|
|
```typescript
|
|
function createMessageRouter(deps: {
|
|
sessionManager: SessionManager;
|
|
modelRouter: ModelRouter;
|
|
systemPrompt: string;
|
|
toolRegistry: ToolRegistry;
|
|
toolExecutor: ToolExecutor;
|
|
config: Config;
|
|
memoryStore?: MemoryStore;
|
|
agentConfigRegistry?: AgentConfigRegistry;
|
|
agentRouter?: AgentRouter;
|
|
sandboxManager?: SandboxManager;
|
|
}) {
|
|
```
|
|
|
|
Inside `getOrCreateAgent()`, resolve the agent config and create sandboxed registries:
|
|
|
|
```typescript
|
|
function getOrCreateAgent(channel: string, senderId: string): AgentOrchestrator {
|
|
// Resolve agent config name from routing
|
|
const agentConfigName = deps.agentRouter?.resolve(channel, senderId);
|
|
const agentConfig = agentConfigName ? deps.agentConfigRegistry?.get(agentConfigName) : undefined;
|
|
|
|
const cacheKey = agentConfigName
|
|
? `${channel}:${senderId}:${agentConfigName}`
|
|
: `${channel}:${senderId}`;
|
|
|
|
let agent = agents.get(cacheKey);
|
|
if (!agent) {
|
|
const session = deps.sessionManager.getSession(channel, senderId);
|
|
|
|
// Determine system prompt — agent config overrides global
|
|
const systemPrompt = agentConfig?.systemPrompt ?? deps.systemPrompt;
|
|
|
|
// Determine primary tier
|
|
const primaryTier = agentConfig?.modelTier ?? deps.config.agents.primary_tier ?? 'default';
|
|
|
|
// Determine tool policy context
|
|
const toolPolicyContext: ToolPolicyContext = {
|
|
agent: primaryTier,
|
|
provider: deps.config.models.default.provider,
|
|
};
|
|
|
|
// Determine tool registry — sandbox if configured
|
|
let toolRegistry = deps.toolRegistry;
|
|
if (agentConfig?.sandbox && deps.sandboxManager && deps.config.sandbox.enabled) {
|
|
// Create a cloned registry with sandboxed tools
|
|
toolRegistry = deps.toolRegistry.clone();
|
|
// Sandbox will be created lazily on first tool call
|
|
// For now, create a wrapper that handles lazy initialization
|
|
const sessionId = `${channel}:${senderId}`;
|
|
const sandbox = deps.sandboxManager;
|
|
const sandboxConfig = deps.config.sandbox;
|
|
|
|
// Replace shell.exec and process.start with lazy-sandboxed versions
|
|
const lazySandboxedShell: Tool = {
|
|
name: 'shell.exec',
|
|
description: 'Execute a shell command inside a sandboxed container.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
command: { type: 'string', description: 'The shell command to execute' },
|
|
cwd: { type: 'string', description: 'Working directory (optional)' },
|
|
timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
|
|
},
|
|
required: ['command'],
|
|
},
|
|
execute: async (rawArgs: unknown) => {
|
|
const dockerSandbox = await sandbox.getOrCreate(sessionId);
|
|
const tool = createSandboxedShellTool(dockerSandbox);
|
|
return tool.execute(rawArgs);
|
|
},
|
|
};
|
|
|
|
const lazySandboxedProcessStart: Tool = {
|
|
name: 'process.start',
|
|
description: 'Start a command in the background inside a sandboxed container.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
command: { type: 'string', description: 'The shell command to run' },
|
|
cwd: { type: 'string', description: 'Working directory (optional)' },
|
|
},
|
|
required: ['command'],
|
|
},
|
|
execute: async (rawArgs: unknown) => {
|
|
const dockerSandbox = await sandbox.getOrCreate(sessionId);
|
|
const tool = createSandboxedProcessStartTool(dockerSandbox);
|
|
return tool.execute(rawArgs);
|
|
},
|
|
};
|
|
|
|
toolRegistry.replace(lazySandboxedShell);
|
|
toolRegistry.replace(lazySandboxedProcessStart);
|
|
}
|
|
|
|
const delegationConfig: DelegationConfig = {
|
|
compaction: deps.config.agents.delegation.compaction ?? 'fast',
|
|
memory_extraction: deps.config.agents.delegation.memory_extraction ?? 'fast',
|
|
classification: deps.config.agents.delegation.classification ?? 'fast',
|
|
tool_summarisation: deps.config.agents.delegation.tool_summarisation ?? 'fast',
|
|
complex_reasoning: deps.config.agents.delegation.complex_reasoning ?? 'complex',
|
|
};
|
|
|
|
agent = new AgentOrchestrator({
|
|
modelRouter: deps.modelRouter,
|
|
systemPrompt,
|
|
session,
|
|
toolRegistry,
|
|
toolExecutor: deps.toolExecutor,
|
|
primaryTier,
|
|
delegation: delegationConfig,
|
|
maxDelegationDepth: deps.config.agents.max_delegation_depth ?? 3,
|
|
compaction: deps.config.compaction.enabled ? {
|
|
thresholdPct: deps.config.compaction.threshold_pct,
|
|
keepTurns: deps.config.compaction.keep_turns,
|
|
summaryMaxTokens: deps.config.compaction.summary_max_tokens,
|
|
} : undefined,
|
|
modelName: deps.config.models.default.model,
|
|
contextWindow: deps.config.models.default.context_window,
|
|
memoryStore: deps.memoryStore,
|
|
toolPolicyContext,
|
|
});
|
|
agents.set(cacheKey, agent);
|
|
}
|
|
return agent;
|
|
}
|
|
```
|
|
|
|
In `startDaemon()`, add agent config registry and router initialization after skills loading (around line 385):
|
|
|
|
```typescript
|
|
// Initialize agent config registry and router
|
|
const agentConfigRegistry = new AgentConfigRegistry();
|
|
if (config.agent_configs && Object.keys(config.agent_configs).length > 0) {
|
|
agentConfigRegistry.loadFromConfig(config.agent_configs);
|
|
console.log(`Loaded ${Object.keys(config.agent_configs).length} agent config(s): ${Object.keys(config.agent_configs).join(', ')}`);
|
|
}
|
|
|
|
const agentRouter = new AgentRouter(config.routing);
|
|
|
|
// Initialize sandbox manager if enabled
|
|
let sandboxManager: SandboxManager | undefined;
|
|
if (config.sandbox.enabled) {
|
|
const dockerAvailable = await DockerSandbox.isAvailable();
|
|
if (dockerAvailable) {
|
|
sandboxManager = new SandboxManager(config.sandbox);
|
|
console.log(`Docker sandbox enabled: image=${config.sandbox.image}, network=${config.sandbox.network}`);
|
|
} else {
|
|
console.warn('Docker sandbox enabled in config but Docker is not available — falling back to host execution');
|
|
}
|
|
}
|
|
```
|
|
|
|
Add sandbox shutdown hook:
|
|
|
|
```typescript
|
|
if (sandboxManager) {
|
|
lifecycle.onShutdown(async () => {
|
|
await sandboxManager!.destroyAll();
|
|
console.log('Docker sandboxes destroyed');
|
|
});
|
|
}
|
|
```
|
|
|
|
Pass new deps to `createMessageRouter()`:
|
|
|
|
```typescript
|
|
channelRegistry.setMessageHandler(createMessageRouter({
|
|
sessionManager,
|
|
modelRouter,
|
|
systemPrompt,
|
|
toolRegistry,
|
|
toolExecutor,
|
|
config,
|
|
memoryStore,
|
|
agentConfigRegistry,
|
|
agentRouter,
|
|
sandboxManager,
|
|
}));
|
|
```
|
|
|
|
Add to DaemonContext return:
|
|
|
|
```typescript
|
|
return {
|
|
config,
|
|
lifecycle,
|
|
sessionStore,
|
|
sessionManager,
|
|
hookEngine,
|
|
modelRouter,
|
|
toolRegistry,
|
|
toolExecutor,
|
|
gateway,
|
|
channelRegistry,
|
|
mcpManager,
|
|
skillRegistry,
|
|
skillInstaller,
|
|
agentConfigRegistry,
|
|
agentRouter,
|
|
sandboxManager,
|
|
};
|
|
```
|
|
|
|
Note: You'll need to import `DockerSandbox` and the `Tool` type at the top, and import `ToolPolicyContext`:
|
|
|
|
```typescript
|
|
import { DockerSandbox } from '../sandbox/index.js';
|
|
import type { Tool } from '../tools/types.js';
|
|
import type { ToolPolicyContext } from '../tools/policy.js';
|
|
```
|
|
|
|
**Step 4: Run full test suite**
|
|
|
|
Run: `pnpm test:run`
|
|
Expected: All tests pass
|
|
|
|
**Step 5: Run typecheck**
|
|
|
|
Run: `pnpm typecheck`
|
|
Expected: No errors
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add src/daemon/index.ts src/daemon/routing.test.ts
|
|
git commit -m "feat: wire Docker sandboxing and agent routing into daemon"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: Update state.json + Final Verification
|
|
|
|
**Files:**
|
|
- Modify: `docs/plans/state.json`
|
|
|
|
**Step 1: Run full test suite and typecheck**
|
|
|
|
Run: `pnpm test:run && pnpm typecheck`
|
|
Expected: All tests pass, no type errors
|
|
|
|
**Step 2: Update state.json**
|
|
|
|
Add the new P2 entries to `docs/plans/state.json` under the `p2-implementation` plan's `phases` object:
|
|
|
|
```json
|
|
"docker_sandboxing": {
|
|
"priority": "P2",
|
|
"status": "completed",
|
|
"description": "Docker container sandboxing for channel tool execution (shell.exec, process.start)",
|
|
"files_created": [
|
|
"src/sandbox/docker.ts",
|
|
"src/sandbox/docker.test.ts",
|
|
"src/sandbox/manager.ts",
|
|
"src/sandbox/manager.test.ts",
|
|
"src/sandbox/tools.ts",
|
|
"src/sandbox/tools.test.ts",
|
|
"src/sandbox/index.ts"
|
|
],
|
|
"files_modified": [
|
|
"src/config/schema.ts",
|
|
"src/config/index.ts",
|
|
"src/tools/registry.ts",
|
|
"src/daemon/index.ts"
|
|
],
|
|
"test_status": "N/N passing"
|
|
},
|
|
"multi_agent_routing": {
|
|
"priority": "P2",
|
|
"status": "completed",
|
|
"description": "Named agent configs with config-based channel/sender routing",
|
|
"files_created": [
|
|
"src/agents/registry.ts",
|
|
"src/agents/registry.test.ts",
|
|
"src/agents/router.ts",
|
|
"src/agents/router.test.ts",
|
|
"src/agents/index.ts",
|
|
"src/daemon/routing.test.ts",
|
|
"src/config/schema.test.ts"
|
|
],
|
|
"files_modified": [
|
|
"src/config/schema.ts",
|
|
"src/config/index.ts",
|
|
"src/daemon/index.ts"
|
|
],
|
|
"test_status": "N/N passing"
|
|
}
|
|
```
|
|
|
|
Update `overall_progress.p2_completion` to `"7/7 (100%)"` and `next_up` to `"p3 (group chat, gateway auth, gemini provider, browser control, additional providers)"`.
|
|
|
|
Update `overall_progress.total_test_count` with the actual count.
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add docs/plans/state.json
|
|
git commit -m "docs: update state.json with Docker sandbox and multi-agent routing"
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
| Task | Component | Est. Time |
|
|
|------|-----------|-----------|
|
|
| 1 | Config schemas (sandbox + agent_configs + routing) | 5 min |
|
|
| 2 | DockerSandbox class | 5 min |
|
|
| 3 | SandboxManager | 3 min |
|
|
| 4 | Sandboxed tool wrappers | 5 min |
|
|
| 5 | Barrel export + ToolRegistry.clone() | 3 min |
|
|
| 6 | AgentConfigRegistry | 3 min |
|
|
| 7 | AgentRouter | 3 min |
|
|
| 8 | Agents barrel export | 1 min |
|
|
| 9 | Daemon integration | 10 min |
|
|
| 10 | State update + verification | 3 min |
|
|
|
|
**Total estimated: ~40 minutes**
|