Files
flynn/docs/plans/2026-02-05-phase1-tool-framework.md
T
William Valentin aa95f2132c feat: add channel adapter abstraction with Telegram and WebChat adapters
Implement Phase 3 channel adapters that decouple message sources from
the agent via a uniform ChannelAdapter interface and ChannelRegistry.

- Add ChannelAdapter/InboundMessage/OutboundMessage types
- Add ChannelRegistry for adapter lifecycle and message routing
- Add TelegramAdapter (grammy bot, auth middleware, confirmations, chunking)
- Add WebChatAdapter (thin shim over GatewayServer)
- Refactor daemon to use ChannelRegistry with per-channel-per-user agents
- Add config.get/config.patch gateway handlers (Phase 2 loose end)
- Add system.restart gateway handler (Phase 2 loose end)
- Add implementation plans and design docs

Tests: 225 passing (33 new channel adapter + gateway handler tests)
2026-02-05 20:00:36 -08:00

2147 lines
64 KiB
Markdown

# Phase 1: Agent Tool Framework + Agent Loop — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Add a tool execution framework with native function calling (Anthropic/OpenAI) and an iterative agent loop so Flynn can run shell commands, read/write/edit files, fetch web pages, and chain multiple tool calls per turn.
**Architecture:** Tools are defined as typed objects with JSON Schema inputs and an async `execute` method. A ToolRegistry collects them and serializes to provider-specific formats. A ToolExecutor wraps execution with hook checks, timeouts, and output truncation. The NativeAgent gains an agentic loop: call model -> if tool_use, execute tools -> feed results back -> repeat until text response or max iterations.
**Tech Stack:** TypeScript (strict, NodeNext), Vitest, Anthropic SDK `@anthropic-ai/sdk`, OpenAI SDK `openai`, Node.js `child_process` for shell, `fs` for file ops, `fetch` for web.
**Build model policy:** Opus 4.6 supervises and reviews. Sonnet/Haiku via GitHub Copilot execute implementation tasks as subagents. Each task dispatched to a subagent, reviewed by Opus before committing.
---
## Task 0: SOUL.md + System Prompt Foundation
**Files:**
- Create: `SOUL.md` (project root)
- Modify: `src/daemon/index.ts` (load SOUL.md into system prompt)
**Step 1: Create SOUL.md**
Already created at project root. Defines Flynn's identity: direct, technical, opinionated, security-conscious. Loaded into every session.
**Step 2: Update daemon to load SOUL.md**
In `src/daemon/index.ts`, replace the hardcoded `SYSTEM_PROMPT` string with a loader that reads `SOUL.md` from the workspace root and prepends it to the system prompt:
```typescript
import { readFileSync, existsSync } from 'fs';
import { resolve } from 'path';
function loadSystemPrompt(): string {
const soulPath = resolve(process.cwd(), 'SOUL.md');
let soul = '';
if (existsSync(soulPath)) {
soul = readFileSync(soulPath, 'utf-8') + '\n\n';
}
return soul + TOOL_INSTRUCTIONS;
}
```
Where `TOOL_INSTRUCTIONS` is the tool-aware portion added in Task 17.
**Step 3: Commit**
```bash
git add SOUL.md src/daemon/index.ts
git commit -m "feat: add SOUL.md identity file and load into system prompt"
```
---
## Task 1: Tool Type Definitions
**Files:**
- Create: `src/tools/types.ts`
- Test: `src/tools/types.test.ts`
**Step 1: Write the failing test**
```typescript
// src/tools/types.test.ts
import { describe, it, expect } from 'vitest';
import type { Tool, ToolCall, ToolResult, ToolUseMessage, ToolResultMessage } from './types.js';
describe('Tool types', () => {
it('Tool interface is structurally correct', () => {
const tool: Tool = {
name: 'test.echo',
description: 'Echoes input',
inputSchema: {
type: 'object',
properties: { text: { type: 'string' } },
required: ['text'],
},
execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
};
expect(tool.name).toBe('test.echo');
expect(tool.inputSchema.type).toBe('object');
});
it('ToolCall has required fields', () => {
const call: ToolCall = { id: 'call_1', name: 'test.echo', args: { text: 'hi' } };
expect(call.id).toBe('call_1');
expect(call.name).toBe('test.echo');
});
it('ToolResult has success and output', () => {
const result: ToolResult = { success: true, output: 'hello' };
expect(result.success).toBe(true);
const errResult: ToolResult = { success: false, output: '', error: 'boom' };
expect(errResult.error).toBe('boom');
});
it('ToolUseMessage has correct shape', () => {
const msg: ToolUseMessage = {
role: 'assistant',
content: [{ type: 'tool_use', id: 'call_1', name: 'test.echo', input: { text: 'hi' } }],
};
expect(msg.role).toBe('assistant');
expect(msg.content[0].type).toBe('tool_use');
});
it('ToolResultMessage has correct shape', () => {
const msg: ToolResultMessage = {
role: 'user',
content: [{ type: 'tool_result', tool_use_id: 'call_1', content: 'output here' }],
};
expect(msg.role).toBe('user');
expect(msg.content[0].type).toBe('tool_result');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/types.test.ts`
Expected: FAIL — module `./types.js` not found
**Step 3: Write the implementation**
```typescript
// src/tools/types.ts
export interface ToolInputSchema {
type: 'object';
properties: Record<string, unknown>;
required?: string[];
}
export interface Tool {
name: string;
description: string;
inputSchema: ToolInputSchema;
execute(args: unknown): Promise<ToolResult>;
}
export interface ToolCall {
id: string;
name: string;
args: unknown;
}
export interface ToolResult {
success: boolean;
output: string;
error?: string;
}
// Content block for assistant messages containing tool calls
export interface ToolUseBlock {
type: 'tool_use';
id: string;
name: string;
input: unknown;
}
// Content block for user messages returning tool results
export interface ToolResultBlock {
type: 'tool_result';
tool_use_id: string;
content: string;
is_error?: boolean;
}
// Message from assistant requesting tool use
export interface ToolUseMessage {
role: 'assistant';
content: ToolUseBlock[];
}
// Message from user returning tool results
export interface ToolResultMessage {
role: 'user';
content: ToolResultBlock[];
}
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/types.test.ts`
Expected: PASS (all 5 tests)
**Step 5: Commit**
```bash
git add src/tools/types.ts src/tools/types.test.ts
git commit -m "feat(tools): add tool type definitions"
```
---
## Task 2: Tool Registry
**Files:**
- Create: `src/tools/registry.ts`
- Test: `src/tools/registry.test.ts`
**Step 1: Write the failing test**
```typescript
// src/tools/registry.test.ts
import { describe, it, expect } from 'vitest';
import { ToolRegistry } from './registry.js';
import type { Tool } from './types.js';
const echoTool: Tool = {
name: 'test.echo',
description: 'Echoes input back',
inputSchema: {
type: 'object',
properties: { text: { type: 'string', description: 'Text to echo' } },
required: ['text'],
},
execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
};
const greetTool: Tool = {
name: 'test.greet',
description: 'Greets someone',
inputSchema: {
type: 'object',
properties: { name: { type: 'string' } },
required: ['name'],
},
execute: async (args) => ({ success: true, output: `Hello ${(args as { name: string }).name}` }),
};
describe('ToolRegistry', () => {
it('registers and retrieves tools by name', () => {
const registry = new ToolRegistry();
registry.register(echoTool);
expect(registry.get('test.echo')).toBe(echoTool);
expect(registry.get('nonexistent')).toBeUndefined();
});
it('lists all registered tools', () => {
const registry = new ToolRegistry();
registry.register(echoTool);
registry.register(greetTool);
const tools = registry.list();
expect(tools).toHaveLength(2);
expect(tools.map(t => t.name)).toContain('test.echo');
expect(tools.map(t => t.name)).toContain('test.greet');
});
it('throws on duplicate registration', () => {
const registry = new ToolRegistry();
registry.register(echoTool);
expect(() => registry.register(echoTool)).toThrow('already registered');
});
it('serializes to Anthropic format', () => {
const registry = new ToolRegistry();
registry.register(echoTool);
const anthropicTools = registry.toAnthropicFormat();
expect(anthropicTools).toEqual([{
name: 'test.echo',
description: 'Echoes input back',
input_schema: echoTool.inputSchema,
}]);
});
it('serializes to OpenAI format', () => {
const registry = new ToolRegistry();
registry.register(echoTool);
const openaiTools = registry.toOpenAIFormat();
expect(openaiTools).toEqual([{
type: 'function',
function: {
name: 'test.echo',
description: 'Echoes input back',
parameters: echoTool.inputSchema,
},
}]);
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/registry.test.ts`
Expected: FAIL — module `./registry.js` not found
**Step 3: Write the implementation**
```typescript
// src/tools/registry.ts
import type { Tool, ToolInputSchema } from './types.js';
export interface AnthropicToolDef {
name: string;
description: string;
input_schema: ToolInputSchema;
}
export interface OpenAIToolDef {
type: 'function';
function: {
name: string;
description: string;
parameters: ToolInputSchema;
};
}
export class ToolRegistry {
private tools: Map<string, Tool> = new Map();
register(tool: Tool): void {
if (this.tools.has(tool.name)) {
throw new Error(`Tool '${tool.name}' is already registered`);
}
this.tools.set(tool.name, tool);
}
get(name: string): Tool | undefined {
return this.tools.get(name);
}
list(): Tool[] {
return Array.from(this.tools.values());
}
toAnthropicFormat(): AnthropicToolDef[] {
return this.list().map(t => ({
name: t.name,
description: t.description,
input_schema: t.inputSchema,
}));
}
toOpenAIFormat(): OpenAIToolDef[] {
return this.list().map(t => ({
type: 'function' as const,
function: {
name: t.name,
description: t.description,
parameters: t.inputSchema,
},
}));
}
}
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/registry.test.ts`
Expected: PASS (all 5 tests)
**Step 5: Commit**
```bash
git add src/tools/registry.ts src/tools/registry.test.ts
git commit -m "feat(tools): add ToolRegistry with provider serialization"
```
---
## Task 3: Tool Executor
**Files:**
- Create: `src/tools/executor.ts`
- Test: `src/tools/executor.test.ts`
**Step 1: Write the failing test**
```typescript
// src/tools/executor.test.ts
import { describe, it, expect, vi } from 'vitest';
import { ToolExecutor } from './executor.js';
import { ToolRegistry } from './registry.js';
import { HookEngine } from '../hooks/engine.js';
import type { Tool } from './types.js';
const echoTool: Tool = {
name: 'test.echo',
description: 'Echoes input',
inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
};
const slowTool: Tool = {
name: 'test.slow',
description: 'Takes forever',
inputSchema: { type: 'object', properties: {} },
execute: async () => {
await new Promise(r => setTimeout(r, 5000));
return { success: true, output: 'done' };
},
};
const failTool: Tool = {
name: 'test.fail',
description: 'Throws',
inputSchema: { type: 'object', properties: {} },
execute: async () => { throw new Error('kaboom'); },
};
const bigOutputTool: Tool = {
name: 'test.big',
description: 'Returns huge output',
inputSchema: { type: 'object', properties: {} },
execute: async () => ({ success: true, output: 'x'.repeat(100_000) }),
};
describe('ToolExecutor', () => {
it('executes a tool and returns result', async () => {
const registry = new ToolRegistry();
registry.register(echoTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const result = await executor.execute('test.echo', { text: 'hello' });
expect(result.success).toBe(true);
expect(result.output).toBe('hello');
});
it('returns error for unknown tool', async () => {
const registry = new ToolRegistry();
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const result = await executor.execute('nonexistent', {});
expect(result.success).toBe(false);
expect(result.error).toContain('not found');
});
it('catches tool execution errors', async () => {
const registry = new ToolRegistry();
registry.register(failTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const result = await executor.execute('test.fail', {});
expect(result.success).toBe(false);
expect(result.error).toContain('kaboom');
});
it('enforces timeout', async () => {
const registry = new ToolRegistry();
registry.register(slowTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks, { defaultTimeoutMs: 100 });
const result = await executor.execute('test.slow', {});
expect(result.success).toBe(false);
expect(result.error).toContain('timed out');
});
it('truncates large output', async () => {
const registry = new ToolRegistry();
registry.register(bigOutputTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks, { maxOutputBytes: 1000 });
const result = await executor.execute('test.big', {});
expect(result.success).toBe(true);
expect(result.output.length).toBeLessThanOrEqual(1100); // 1000 + truncation message
expect(result.output).toContain('[truncated]');
});
it('blocks on confirm hook and resolves when approved', async () => {
const registry = new ToolRegistry();
registry.register(echoTool);
const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
// Start execution (will block on confirmation)
const resultPromise = executor.execute('test.echo', { text: 'hi' });
// Approve the pending confirmation
const pending = hooks.getPendingConfirmations();
expect(pending).toHaveLength(1);
hooks.resolveConfirmation(pending[0].id, { approved: true });
const result = await resultPromise;
expect(result.success).toBe(true);
expect(result.output).toBe('hi');
});
it('blocks on confirm hook and returns denied', async () => {
const registry = new ToolRegistry();
registry.register(echoTool);
const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const resultPromise = executor.execute('test.echo', { text: 'hi' });
const pending = hooks.getPendingConfirmations();
hooks.resolveConfirmation(pending[0].id, { approved: false, reason: 'nope' });
const result = await resultPromise;
expect(result.success).toBe(false);
expect(result.error).toContain('denied');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/executor.test.ts`
Expected: FAIL — module `./executor.js` not found
**Step 3: Write the implementation**
```typescript
// src/tools/executor.ts
import type { ToolResult } from './types.js';
import type { ToolRegistry } from './registry.js';
import type { HookEngine } from '../hooks/engine.js';
export interface ToolExecutorConfig {
defaultTimeoutMs?: number;
maxOutputBytes?: number;
}
export class ToolExecutor {
private registry: ToolRegistry;
private hooks: HookEngine;
private defaultTimeoutMs: number;
private maxOutputBytes: number;
constructor(registry: ToolRegistry, hooks: HookEngine, config?: ToolExecutorConfig) {
this.registry = registry;
this.hooks = hooks;
this.defaultTimeoutMs = config?.defaultTimeoutMs ?? 30_000;
this.maxOutputBytes = config?.maxOutputBytes ?? 51_200;
}
async execute(toolName: string, args: unknown): Promise<ToolResult> {
const tool = this.registry.get(toolName);
if (!tool) {
return { success: false, output: '', error: `Tool '${toolName}' not found` };
}
// Check hooks
const action = this.hooks.getAction(toolName);
if (action === 'confirm') {
const hookResult = await this.hooks.requestConfirmation(
toolName,
args as Record<string, unknown>,
);
if (!hookResult.approved) {
return {
success: false,
output: '',
error: `Tool '${toolName}' denied by user: ${hookResult.reason ?? 'no reason'}`,
};
}
}
// Execute with timeout
try {
const result = await Promise.race([
tool.execute(args),
new Promise<ToolResult>((_, reject) =>
setTimeout(() => reject(new Error(`Tool '${toolName}' timed out after ${this.defaultTimeoutMs}ms`)), this.defaultTimeoutMs)
),
]);
// Truncate output if too large
if (result.output.length > this.maxOutputBytes) {
result.output = result.output.slice(0, this.maxOutputBytes) + '\n[truncated]';
}
return result;
} catch (error) {
return {
success: false,
output: '',
error: error instanceof Error ? error.message : String(error),
};
}
}
}
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/executor.test.ts`
Expected: PASS (all 7 tests)
**Step 5: Commit**
```bash
git add src/tools/executor.ts src/tools/executor.test.ts
git commit -m "feat(tools): add ToolExecutor with hooks, timeout, truncation"
```
---
## Task 4: Shell Exec Tool
**Files:**
- Create: `src/tools/builtin/shell.ts`
- Test: `src/tools/builtin/shell.test.ts`
**Step 1: Write the failing test**
```typescript
// src/tools/builtin/shell.test.ts
import { describe, it, expect } from 'vitest';
import { shellExecTool } from './shell.js';
import { tmpdir } from 'os';
import { mkdtempSync, writeFileSync, rmSync } from 'fs';
import { join } from 'path';
describe('shell.exec tool', () => {
it('has correct metadata', () => {
expect(shellExecTool.name).toBe('shell.exec');
expect(shellExecTool.inputSchema.required).toContain('command');
});
it('runs a simple command', async () => {
const result = await shellExecTool.execute({ command: 'echo hello' });
expect(result.success).toBe(true);
expect(result.output.trim()).toBe('hello');
});
it('captures stderr on failure', async () => {
const result = await shellExecTool.execute({ command: 'ls /nonexistent_dir_xyz' });
expect(result.success).toBe(false);
expect(result.error).toBeTruthy();
});
it('respects cwd parameter', async () => {
const dir = mkdtempSync(join(tmpdir(), 'flynn-test-'));
writeFileSync(join(dir, 'test.txt'), 'content');
try {
const result = await shellExecTool.execute({ command: 'ls test.txt', cwd: dir });
expect(result.success).toBe(true);
expect(result.output.trim()).toBe('test.txt');
} finally {
rmSync(dir, { recursive: true });
}
});
it('respects timeout parameter', async () => {
const result = await shellExecTool.execute({ command: 'sleep 10', timeout: 200 });
expect(result.success).toBe(false);
expect(result.error).toContain('timed out');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/builtin/shell.test.ts`
Expected: FAIL — module `./shell.js` not found
**Step 3: Write the implementation**
```typescript
// src/tools/builtin/shell.ts
import { execFile } from 'child_process';
import type { Tool, ToolResult } from '../types.js';
interface ShellExecArgs {
command: string;
cwd?: string;
timeout?: number;
}
export const shellExecTool: Tool = {
name: 'shell.exec',
description: 'Execute a shell command and return stdout/stderr. Use for running build commands, git operations, system tasks, etc.',
inputSchema: {
type: 'object',
properties: {
command: { type: 'string', description: 'The shell command to execute' },
cwd: { type: 'string', description: 'Working directory (optional)' },
timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
},
required: ['command'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as ShellExecArgs;
const timeout = args.timeout ?? 30_000;
return new Promise((resolve) => {
execFile('bash', ['-c', args.command], {
cwd: args.cwd,
timeout,
maxBuffer: 1024 * 1024, // 1MB
}, (error, stdout, stderr) => {
if (error) {
if (error.killed || error.signal === 'SIGTERM') {
resolve({ success: false, output: stdout, error: `Command timed out after ${timeout}ms` });
return;
}
resolve({
success: false,
output: stdout,
error: stderr || error.message,
});
return;
}
resolve({ success: true, output: stdout + (stderr ? `\nstderr: ${stderr}` : '') });
});
});
},
};
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/builtin/shell.test.ts`
Expected: PASS (all 5 tests)
**Step 5: Commit**
```bash
git add src/tools/builtin/shell.ts src/tools/builtin/shell.test.ts
git commit -m "feat(tools): add shell.exec builtin tool"
```
---
## Task 5: File Tools (read, write, edit, list)
**Files:**
- Create: `src/tools/builtin/file-read.ts`
- Create: `src/tools/builtin/file-write.ts`
- Create: `src/tools/builtin/file-edit.ts`
- Create: `src/tools/builtin/file-list.ts`
- Test: `src/tools/builtin/file.test.ts` (all four in one test file)
**Step 1: Write the failing test**
```typescript
// src/tools/builtin/file.test.ts
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { fileReadTool } from './file-read.js';
import { fileWriteTool } from './file-write.js';
import { fileEditTool } from './file-edit.js';
import { fileListTool } from './file-list.js';
import { mkdtempSync, writeFileSync, readFileSync, rmSync, mkdirSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';
let testDir: string;
beforeEach(() => {
testDir = mkdtempSync(join(tmpdir(), 'flynn-file-test-'));
});
afterEach(() => {
rmSync(testDir, { recursive: true });
});
describe('file.read', () => {
it('reads a file', async () => {
writeFileSync(join(testDir, 'hello.txt'), 'hello world');
const result = await fileReadTool.execute({ path: join(testDir, 'hello.txt') });
expect(result.success).toBe(true);
expect(result.output).toBe('hello world');
});
it('reads with offset and limit', async () => {
writeFileSync(join(testDir, 'lines.txt'), 'line1\nline2\nline3\nline4\n');
const result = await fileReadTool.execute({ path: join(testDir, 'lines.txt'), offset: 1, limit: 2 });
expect(result.success).toBe(true);
expect(result.output).toBe('line2\nline3');
});
it('returns error for missing file', async () => {
const result = await fileReadTool.execute({ path: join(testDir, 'nope.txt') });
expect(result.success).toBe(false);
expect(result.error).toBeTruthy();
});
});
describe('file.write', () => {
it('writes a new file', async () => {
const filePath = join(testDir, 'new.txt');
const result = await fileWriteTool.execute({ path: filePath, content: 'new content' });
expect(result.success).toBe(true);
expect(readFileSync(filePath, 'utf-8')).toBe('new content');
});
it('creates intermediate directories', async () => {
const filePath = join(testDir, 'sub', 'dir', 'file.txt');
const result = await fileWriteTool.execute({ path: filePath, content: 'deep' });
expect(result.success).toBe(true);
expect(readFileSync(filePath, 'utf-8')).toBe('deep');
});
});
describe('file.edit', () => {
it('replaces a string in a file', async () => {
const filePath = join(testDir, 'edit.txt');
writeFileSync(filePath, 'hello world');
const result = await fileEditTool.execute({
path: filePath,
old_string: 'world',
new_string: 'flynn',
});
expect(result.success).toBe(true);
expect(readFileSync(filePath, 'utf-8')).toBe('hello flynn');
});
it('fails if old_string not found', async () => {
const filePath = join(testDir, 'edit2.txt');
writeFileSync(filePath, 'hello world');
const result = await fileEditTool.execute({
path: filePath,
old_string: 'xyz',
new_string: 'abc',
});
expect(result.success).toBe(false);
expect(result.error).toContain('not found');
});
it('fails if old_string matches multiple times without replace_all', async () => {
const filePath = join(testDir, 'edit3.txt');
writeFileSync(filePath, 'aaa bbb aaa');
const result = await fileEditTool.execute({
path: filePath,
old_string: 'aaa',
new_string: 'ccc',
});
expect(result.success).toBe(false);
expect(result.error).toContain('multiple');
});
it('replaces all when replace_all is true', async () => {
const filePath = join(testDir, 'edit4.txt');
writeFileSync(filePath, 'aaa bbb aaa');
const result = await fileEditTool.execute({
path: filePath,
old_string: 'aaa',
new_string: 'ccc',
replace_all: true,
});
expect(result.success).toBe(true);
expect(readFileSync(filePath, 'utf-8')).toBe('ccc bbb ccc');
});
});
describe('file.list', () => {
it('lists files in a directory', async () => {
writeFileSync(join(testDir, 'a.txt'), '');
writeFileSync(join(testDir, 'b.ts'), '');
mkdirSync(join(testDir, 'sub'));
writeFileSync(join(testDir, 'sub', 'c.txt'), '');
const result = await fileListTool.execute({ path: testDir });
expect(result.success).toBe(true);
expect(result.output).toContain('a.txt');
expect(result.output).toContain('b.ts');
expect(result.output).toContain('sub');
});
it('filters with glob pattern', async () => {
writeFileSync(join(testDir, 'a.txt'), '');
writeFileSync(join(testDir, 'b.ts'), '');
const result = await fileListTool.execute({ path: testDir, pattern: '*.ts' });
expect(result.success).toBe(true);
expect(result.output).toContain('b.ts');
expect(result.output).not.toContain('a.txt');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/builtin/file.test.ts`
Expected: FAIL — modules not found
**Step 3: Write the implementations**
```typescript
// src/tools/builtin/file-read.ts
import { readFileSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';
interface FileReadArgs {
path: string;
offset?: number; // 0-based line offset
limit?: number; // number of lines
}
export const fileReadTool: Tool = {
name: 'file.read',
description: 'Read the contents of a file. Optionally read specific lines with offset and limit.',
inputSchema: {
type: 'object',
properties: {
path: { type: 'string', description: 'Absolute path to the file' },
offset: { type: 'number', description: 'Line offset to start reading from (0-based)' },
limit: { type: 'number', description: 'Number of lines to read' },
},
required: ['path'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as FileReadArgs;
try {
const content = readFileSync(args.path, 'utf-8');
if (args.offset !== undefined || args.limit !== undefined) {
const lines = content.split('\n');
const start = args.offset ?? 0;
const end = args.limit !== undefined ? start + args.limit : lines.length;
return { success: true, output: lines.slice(start, end).join('\n') };
}
return { success: true, output: content };
} catch (error) {
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
}
},
};
```
```typescript
// src/tools/builtin/file-write.ts
import { writeFileSync, mkdirSync } from 'fs';
import { dirname } from 'path';
import type { Tool, ToolResult } from '../types.js';
interface FileWriteArgs {
path: string;
content: string;
}
export const fileWriteTool: Tool = {
name: 'file.write',
description: 'Write content to a file. Creates the file and parent directories if they do not exist.',
inputSchema: {
type: 'object',
properties: {
path: { type: 'string', description: 'Absolute path to write to' },
content: { type: 'string', description: 'Content to write' },
},
required: ['path', 'content'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as FileWriteArgs;
try {
mkdirSync(dirname(args.path), { recursive: true });
writeFileSync(args.path, args.content, 'utf-8');
return { success: true, output: `Wrote ${args.content.length} bytes to ${args.path}` };
} catch (error) {
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
}
},
};
```
```typescript
// src/tools/builtin/file-edit.ts
import { readFileSync, writeFileSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';
interface FileEditArgs {
path: string;
old_string: string;
new_string: string;
replace_all?: boolean;
}
export const fileEditTool: Tool = {
name: 'file.edit',
description: 'Edit a file by replacing an exact string match. Fails if old_string is not found or matches multiple times (unless replace_all is true).',
inputSchema: {
type: 'object',
properties: {
path: { type: 'string', description: 'Absolute path to the file' },
old_string: { type: 'string', description: 'Exact string to find' },
new_string: { type: 'string', description: 'Replacement string' },
replace_all: { type: 'boolean', description: 'Replace all occurrences (default false)' },
},
required: ['path', 'old_string', 'new_string'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as FileEditArgs;
try {
const content = readFileSync(args.path, 'utf-8');
if (!content.includes(args.old_string)) {
return { success: false, output: '', error: `old_string not found in ${args.path}` };
}
// Count occurrences
const count = content.split(args.old_string).length - 1;
if (count > 1 && !args.replace_all) {
return { success: false, output: '', error: `old_string found multiple times (${count}). Use replace_all or provide more context.` };
}
const newContent = args.replace_all
? content.replaceAll(args.old_string, args.new_string)
: content.replace(args.old_string, args.new_string);
writeFileSync(args.path, newContent, 'utf-8');
return { success: true, output: `Edited ${args.path} (${count} replacement${count > 1 ? 's' : ''})` };
} catch (error) {
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
}
},
};
```
```typescript
// src/tools/builtin/file-list.ts
import { readdirSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';
interface FileListArgs {
path: string;
pattern?: string;
}
function matchGlob(name: string, pattern: string): boolean {
const regex = new RegExp('^' + pattern.replace(/\./g, '\\.').replace(/\*/g, '.*') + '$');
return regex.test(name);
}
export const fileListTool: Tool = {
name: 'file.list',
description: 'List files and directories in a given path. Optionally filter with a glob pattern.',
inputSchema: {
type: 'object',
properties: {
path: { type: 'string', description: 'Directory path to list' },
pattern: { type: 'string', description: 'Glob pattern to filter results (e.g. "*.ts")' },
},
required: ['path'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as FileListArgs;
try {
let entries = readdirSync(args.path, { withFileTypes: true });
if (args.pattern) {
entries = entries.filter(e => matchGlob(e.name, args.pattern!));
}
const output = entries
.map(e => e.isDirectory() ? `${e.name}/` : e.name)
.sort()
.join('\n');
return { success: true, output };
} catch (error) {
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
}
},
};
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/builtin/file.test.ts`
Expected: PASS (all 10 tests)
**Step 5: Commit**
```bash
git add src/tools/builtin/file-read.ts src/tools/builtin/file-write.ts src/tools/builtin/file-edit.ts src/tools/builtin/file-list.ts src/tools/builtin/file.test.ts
git commit -m "feat(tools): add file read/write/edit/list builtin tools"
```
---
## Task 6: Web Fetch Tool
**Files:**
- Create: `src/tools/builtin/web-fetch.ts`
- Test: `src/tools/builtin/web-fetch.test.ts`
**Step 1: Write the failing test**
```typescript
// src/tools/builtin/web-fetch.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { webFetchTool } from './web-fetch.js';
// Mock global fetch
const mockFetch = vi.fn();
vi.stubGlobal('fetch', mockFetch);
beforeEach(() => {
mockFetch.mockReset();
});
describe('web.fetch', () => {
it('has correct metadata', () => {
expect(webFetchTool.name).toBe('web.fetch');
expect(webFetchTool.inputSchema.required).toContain('url');
});
it('fetches a URL and returns body text', async () => {
mockFetch.mockResolvedValue({
ok: true,
status: 200,
text: async () => '<html><body><h1>Hello</h1><p>World</p></body></html>',
headers: new Headers({ 'content-type': 'text/html' }),
});
const result = await webFetchTool.execute({ url: 'https://example.com' });
expect(result.success).toBe(true);
expect(result.output).toBeTruthy();
expect(mockFetch).toHaveBeenCalledWith('https://example.com', expect.any(Object));
});
it('returns error on HTTP failure', async () => {
mockFetch.mockResolvedValue({
ok: false,
status: 404,
text: async () => 'Not Found',
headers: new Headers(),
});
const result = await webFetchTool.execute({ url: 'https://example.com/nope' });
expect(result.success).toBe(false);
expect(result.error).toContain('404');
});
it('returns error on network failure', async () => {
mockFetch.mockRejectedValue(new Error('network error'));
const result = await webFetchTool.execute({ url: 'https://down.example.com' });
expect(result.success).toBe(false);
expect(result.error).toContain('network error');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/tools/builtin/web-fetch.test.ts`
Expected: FAIL — module not found
**Step 3: Write the implementation**
```typescript
// src/tools/builtin/web-fetch.ts
import type { Tool, ToolResult } from '../types.js';
interface WebFetchArgs {
url: string;
timeout?: number;
}
export const webFetchTool: Tool = {
name: 'web.fetch',
description: 'Fetch the content of a URL via HTTP GET. Returns the response body as text.',
inputSchema: {
type: 'object',
properties: {
url: { type: 'string', description: 'The URL to fetch' },
timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
},
required: ['url'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as WebFetchArgs;
const timeout = args.timeout ?? 15_000;
try {
const response = await fetch(args.url, {
signal: AbortSignal.timeout(timeout),
headers: {
'User-Agent': 'Flynn/0.1 (personal AI assistant)',
'Accept': 'text/html, application/json, text/plain, */*',
},
});
if (!response.ok) {
return {
success: false,
output: '',
error: `HTTP ${response.status}: ${await response.text()}`,
};
}
const body = await response.text();
return { success: true, output: body };
} catch (error) {
return {
success: false,
output: '',
error: error instanceof Error ? error.message : String(error),
};
}
},
};
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/tools/builtin/web-fetch.test.ts`
Expected: PASS (all 4 tests)
**Step 5: Commit**
```bash
git add src/tools/builtin/web-fetch.ts src/tools/builtin/web-fetch.test.ts
git commit -m "feat(tools): add web.fetch builtin tool"
```
---
## Task 7: Tools Index + Register All Builtins
**Files:**
- Create: `src/tools/index.ts`
- Create: `src/tools/builtin/index.ts`
**Step 1: Create the barrel exports**
```typescript
// src/tools/builtin/index.ts
export { shellExecTool } from './shell.js';
export { fileReadTool } from './file-read.js';
export { fileWriteTool } from './file-write.js';
export { fileEditTool } from './file-edit.js';
export { fileListTool } from './file-list.js';
export { webFetchTool } from './web-fetch.js';
import type { Tool } from '../types.js';
import { shellExecTool } from './shell.js';
import { fileReadTool } from './file-read.js';
import { fileWriteTool } from './file-write.js';
import { fileEditTool } from './file-edit.js';
import { fileListTool } from './file-list.js';
import { webFetchTool } from './web-fetch.js';
export const allBuiltinTools: Tool[] = [
shellExecTool,
fileReadTool,
fileWriteTool,
fileEditTool,
fileListTool,
webFetchTool,
];
```
```typescript
// src/tools/index.ts
export type { Tool, ToolCall, ToolResult, ToolInputSchema, ToolUseBlock, ToolResultBlock, ToolUseMessage, ToolResultMessage } from './types.js';
export { ToolRegistry } from './registry.js';
export type { AnthropicToolDef, OpenAIToolDef } from './registry.js';
export { ToolExecutor } from './executor.js';
export type { ToolExecutorConfig } from './executor.js';
export { allBuiltinTools } from './builtin/index.js';
export { shellExecTool } from './builtin/shell.js';
export { fileReadTool } from './builtin/file-read.js';
export { fileWriteTool } from './builtin/file-write.js';
export { fileEditTool } from './builtin/file-edit.js';
export { fileListTool } from './builtin/file-list.js';
export { webFetchTool } from './builtin/web-fetch.js';
```
**Step 2: Run all tool tests to verify nothing broke**
Run: `pnpm vitest run src/tools/`
Expected: All tests PASS
**Step 3: Commit**
```bash
git add src/tools/index.ts src/tools/builtin/index.ts
git commit -m "feat(tools): add barrel exports and allBuiltinTools list"
```
---
## Task 8: Update Model Types for Tool Use
**Files:**
- Modify: `src/models/types.ts`
- Test: `src/models/types.test.ts` (new)
**Step 1: Write the failing test**
```typescript
// src/models/types.test.ts
import { describe, it, expect } from 'vitest';
import type { ChatRequest, ChatResponse, ToolMessage, ContentBlock } from './types.js';
describe('Model types with tool support', () => {
it('ChatRequest accepts tools array', () => {
const req: ChatRequest = {
messages: [{ role: 'user', content: 'hi' }],
tools: [{
name: 'test',
description: 'test tool',
input_schema: { type: 'object', properties: {} },
}],
};
expect(req.tools).toHaveLength(1);
});
it('ChatResponse has optional toolCalls', () => {
const resp: ChatResponse = {
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 0, outputTokens: 0 },
toolCalls: [{ id: 'call_1', name: 'test', args: {} }],
};
expect(resp.toolCalls).toHaveLength(1);
expect(resp.stopReason).toBe('tool_use');
});
it('ToolMessage represents tool results in conversation', () => {
const msg: ToolMessage = {
role: 'tool_result',
toolResults: [{ tool_use_id: 'call_1', content: 'result', is_error: false }],
};
expect(msg.role).toBe('tool_result');
expect(msg.toolResults).toHaveLength(1);
});
it('ContentBlock can be text or tool_use', () => {
const text: ContentBlock = { type: 'text', text: 'hello' };
const tool: ContentBlock = { type: 'tool_use', id: 'c1', name: 'test', input: {} };
expect(text.type).toBe('text');
expect(tool.type).toBe('tool_use');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/models/types.test.ts`
Expected: FAIL — `ToolMessage`, `ContentBlock` not exported
**Step 3: Update types.ts**
Update `src/models/types.ts` to add tool-related types. Keep ALL existing types unchanged, add new ones:
```typescript
// src/models/types.ts
export interface Message {
role: 'user' | 'assistant';
content: string;
timestamp?: number;
}
// Tool definition passed to model API
export interface ToolDefinition {
name: string;
description: string;
input_schema: {
type: 'object';
properties: Record<string, unknown>;
required?: string[];
};
}
// Individual tool call returned by model
export interface ModelToolCall {
id: string;
name: string;
args: unknown;
}
// Content blocks for multi-content responses
export type ContentBlock =
| { type: 'text'; text: string }
| { type: 'tool_use'; id: string; name: string; input: unknown };
// Tool result fed back into conversation
export interface ToolResultEntry {
tool_use_id: string;
content: string;
is_error?: boolean;
}
// Message type for tool results (distinct from user/assistant)
export interface ToolMessage {
role: 'tool_result';
toolResults: ToolResultEntry[];
}
// Union type for all messages in a conversation
export type ConversationMessage = Message | ToolMessage;
export interface ChatRequest {
messages: Message[];
system?: string;
maxTokens?: number;
tools?: ToolDefinition[];
}
export interface ChatResponse {
content: string;
stopReason: 'end_turn' | 'max_tokens' | 'stop_sequence' | 'tool_use' | string;
usage: TokenUsage;
toolCalls?: ModelToolCall[];
}
export interface TokenUsage {
inputTokens: number;
outputTokens: number;
}
export interface ChatStreamEvent {
type: 'content' | 'done' | 'error' | 'tool_use';
content?: string;
usage?: TokenUsage;
error?: Error;
toolCall?: ModelToolCall;
}
export interface StreamingModelClient {
chatStream(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
}
export interface ModelClient {
chat(request: ChatRequest): Promise<ChatResponse>;
chatStream?(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
}
```
**Step 4: Run test to verify it passes**
Run: `pnpm vitest run src/models/types.test.ts`
Expected: PASS
**Step 5: Run ALL existing model tests to verify no regressions**
Run: `pnpm vitest run src/models/`
Expected: All existing tests PASS (types are backward compatible — `Message` unchanged, new fields are optional)
**Step 6: Commit**
```bash
git add src/models/types.ts src/models/types.test.ts
git commit -m "feat(models): add tool use types to model interfaces"
```
---
## Task 9: Anthropic Tool Use Support
**Files:**
- Modify: `src/models/anthropic.ts`
- Modify: `src/models/anthropic.test.ts`
**Step 1: Write the failing test (add to existing test file)**
Add these tests to `src/models/anthropic.test.ts`:
```typescript
// Add after existing describe blocks in src/models/anthropic.test.ts
describe('AnthropicClient tool use', () => {
it('passes tools to API and parses tool_use response', async () => {
// This test requires updating the mock to return tool_use blocks
// We need to access the mock and override for this test
const Anthropic = (await import('@anthropic-ai/sdk')).default;
const mockInstance = new Anthropic();
// Override create to return tool_use
(mockInstance.messages.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
content: [
{ type: 'tool_use', id: 'toolu_01', name: 'shell.exec', input: { command: 'ls' } },
],
stop_reason: 'tool_use',
usage: { input_tokens: 20, output_tokens: 15 },
});
const client = new AnthropicClient({
apiKey: 'test-key',
model: 'claude-sonnet-4-20250514',
});
const response = await client.chat({
messages: [{ role: 'user', content: 'list files' }],
tools: [{
name: 'shell.exec',
description: 'Run shell command',
input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
}],
});
expect(response.stopReason).toBe('tool_use');
expect(response.toolCalls).toHaveLength(1);
expect(response.toolCalls![0]).toEqual({
id: 'toolu_01',
name: 'shell.exec',
args: { command: 'ls' },
});
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/models/anthropic.test.ts`
Expected: FAIL — `toolCalls` is undefined (current code only extracts text blocks)
**Step 3: Update anthropic.ts to support tool use**
Update the `chat` method in `src/models/anthropic.ts`:
Replace the `chat` method body. Key changes:
1. Pass `tools` to `messages.create()` when present
2. Parse both `text` and `tool_use` content blocks from response
3. Return `toolCalls` array when tool_use blocks present
Updated `chat` method:
```typescript
async chat(request: ChatRequest): Promise<ChatResponse> {
const params: Record<string, unknown> = {
model: this.model,
max_tokens: request.maxTokens ?? this.defaultMaxTokens,
system: request.system,
messages: request.messages.map((m) => ({
role: m.role,
content: m.content,
})),
};
if (request.tools && request.tools.length > 0) {
params.tools = request.tools;
}
const response = await this.client.messages.create(params as Parameters<typeof this.client.messages.create>[0]);
const textContent = response.content.find((c) => c.type === 'text');
const content = textContent?.type === 'text' ? textContent.text : '';
const toolCalls = response.content
.filter((c): c is { type: 'tool_use'; id: string; name: string; input: unknown } => c.type === 'tool_use')
.map(c => ({ id: c.id, name: c.name, args: c.input }));
return {
content,
stopReason: response.stop_reason ?? 'end_turn',
usage: {
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
},
...(toolCalls.length > 0 ? { toolCalls } : {}),
};
}
```
Also update `chatStream` similarly — pass tools param, and yield `tool_use` events for `content_block_start` events with `tool_use` type. (Details in implementation — the key addition is yielding `{ type: 'tool_use', toolCall: {...} }` events.)
**Step 4: Run tests to verify they pass**
Run: `pnpm vitest run src/models/anthropic.test.ts`
Expected: All PASS
**Step 5: Commit**
```bash
git add src/models/anthropic.ts src/models/anthropic.test.ts
git commit -m "feat(models): add tool use support to AnthropicClient"
```
---
## Task 10: OpenAI Tool Use Support
**Files:**
- Modify: `src/models/openai.ts`
- Modify: `src/models/openai.test.ts`
**Step 1: Write the failing test**
Add to `src/models/openai.test.ts`:
```typescript
describe('OpenAIClient tool use', () => {
it('passes tools to API and parses tool_calls response', async () => {
const OpenAI = (await import('openai')).default;
const mockInstance = new OpenAI();
(mockInstance.chat.completions.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
choices: [{
message: {
content: null,
tool_calls: [{
id: 'call_1',
type: 'function',
function: { name: 'shell.exec', arguments: '{"command":"ls"}' },
}],
},
finish_reason: 'tool_calls',
}],
usage: { prompt_tokens: 20, completion_tokens: 15 },
});
const client = new OpenAIClient({
apiKey: 'test-key',
model: 'gpt-4o',
});
const response = await client.chat({
messages: [{ role: 'user', content: 'list files' }],
tools: [{
name: 'shell.exec',
description: 'Run shell command',
input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
}],
});
expect(response.stopReason).toBe('tool_calls');
expect(response.toolCalls).toHaveLength(1);
expect(response.toolCalls![0]).toEqual({
id: 'call_1',
name: 'shell.exec',
args: { command: 'ls' },
});
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/models/openai.test.ts`
Expected: FAIL — `toolCalls` undefined
**Step 3: Update openai.ts**
Update `chat` method to:
1. Convert `tools` to OpenAI format (`{ type: 'function', function: { name, description, parameters } }`)
2. Parse `tool_calls` from response choice
3. Return `toolCalls` array with parsed JSON arguments
**Step 4: Run tests**
Run: `pnpm vitest run src/models/openai.test.ts`
Expected: PASS
**Step 5: Commit**
```bash
git add src/models/openai.ts src/models/openai.test.ts
git commit -m "feat(models): add tool use support to OpenAIClient"
```
---
## Task 11: Agent Loop
**Files:**
- Modify: `src/backends/native/agent.ts`
- Modify: `src/backends/native/agent.test.ts`
This is the biggest task. The NativeAgent `process()` method changes from single-turn to iterative loop.
**Step 1: Write the failing test**
Add to `src/backends/native/agent.test.ts`:
```typescript
import { ToolRegistry, ToolExecutor, allBuiltinTools } from '../../tools/index.js';
import { HookEngine } from '../../hooks/index.js';
import type { Tool } from '../../tools/index.js';
// Simple test tool
const echoTool: Tool = {
name: 'test.echo',
description: 'Echo',
inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
};
describe('NativeAgent tool loop', () => {
it('executes tool calls and feeds results back', async () => {
let callCount = 0;
const mockClient: ModelClient = {
chat: vi.fn().mockImplementation(() => {
callCount++;
if (callCount === 1) {
// First call: model requests tool use
return {
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 10, outputTokens: 5 },
toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'hello' } }],
};
}
// Second call: model gives final text response
return {
content: 'The tool returned: hello',
stopReason: 'end_turn',
usage: { inputTokens: 15, outputTokens: 10 },
};
}),
};
const registry = new ToolRegistry();
registry.register(echoTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const agent = new NativeAgent({
modelClient: mockClient,
systemPrompt: 'You are helpful.',
toolRegistry: registry,
toolExecutor: executor,
});
const response = await agent.process('echo hello');
expect(response).toBe('The tool returned: hello');
expect(mockClient.chat).toHaveBeenCalledTimes(2);
});
it('respects max iterations', async () => {
// Model always returns tool_use
const mockClient: ModelClient = {
chat: vi.fn().mockResolvedValue({
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 10, outputTokens: 5 },
toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'loop' } }],
}),
};
const registry = new ToolRegistry();
registry.register(echoTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const agent = new NativeAgent({
modelClient: mockClient,
systemPrompt: 'You are helpful.',
toolRegistry: registry,
toolExecutor: executor,
maxIterations: 3,
});
const response = await agent.process('loop forever');
expect(response).toContain('max iterations');
expect(mockClient.chat).toHaveBeenCalledTimes(3);
});
it('works without tools (backward compatible)', async () => {
const mockClient: ModelClient = {
chat: vi.fn().mockResolvedValue({
content: 'Hello!',
stopReason: 'end_turn',
usage: { inputTokens: 10, outputTokens: 5 },
}),
};
const agent = new NativeAgent({
modelClient: mockClient,
systemPrompt: 'You are helpful.',
});
const response = await agent.process('Hi');
expect(response).toBe('Hello!');
});
});
```
**Step 2: Run test to verify it fails**
Run: `pnpm vitest run src/backends/native/agent.test.ts`
Expected: FAIL — NativeAgent doesn't accept `toolRegistry`/`toolExecutor`
**Step 3: Rewrite agent.ts with tool loop**
The updated NativeAgent:
- `NativeAgentConfig` gains optional `toolRegistry`, `toolExecutor`, `maxIterations` fields
- `process()` becomes a loop: call model -> if `stopReason === 'tool_use'`, execute tools, append results, loop
- Conversation history stores both regular messages and tool messages
- Model receives tools from registry in each `chat()` call
- Max iterations (default 10) prevents infinite loops
- Backward compatible: if no registry/executor provided, works exactly as before
Key implementation details:
- Build Anthropic-format messages for tool results: `{ role: 'user', content: [{ type: 'tool_result', tool_use_id, content }] }`
- The agent needs to track the raw content blocks (not just text) for tool_use responses
- On max iterations, return a warning message
**Step 4: Run tests**
Run: `pnpm vitest run src/backends/native/agent.test.ts`
Expected: All PASS (existing + new)
**Step 5: Commit**
```bash
git add src/backends/native/agent.ts src/backends/native/agent.test.ts
git commit -m "feat(agent): add iterative tool use loop with max iterations"
```
---
## Task 12: Wire Tools into Daemon
**Files:**
- Modify: `src/daemon/index.ts`
**Step 1: No test needed (integration wiring)**
This is wiring code that creates the tool registry, registers all builtins, creates the executor, and passes them to the NativeAgent. No new logic, just composition.
**Step 2: Update daemon/index.ts**
Changes:
1. Import `ToolRegistry`, `ToolExecutor`, `allBuiltinTools` from `../tools/index.js`
2. After creating hookEngine, create registry and executor:
```typescript
const toolRegistry = new ToolRegistry();
for (const tool of allBuiltinTools) {
toolRegistry.register(tool);
}
const toolExecutor = new ToolExecutor(toolRegistry, hookEngine);
```
3. Pass `toolRegistry` and `toolExecutor` to NativeAgent constructor
4. Add `toolRegistry` and `toolExecutor` to `DaemonContext` interface
**Step 3: Run typecheck and existing tests**
Run: `pnpm typecheck && pnpm vitest run`
Expected: PASS
**Step 4: Commit**
```bash
git add src/daemon/index.ts
git commit -m "feat(daemon): wire tool registry and executor into agent"
```
---
## Task 13: Update TUI for Tool Display
**Files:**
- Modify: `src/frontends/tui/minimal.ts`
**Step 1: No new test (display-only change)**
The TUI's `handleMessage` method currently calls `modelClient.chatStream()` or `modelClient.chat()` directly. After this task, it should call `agent.process()` instead (which handles the tool loop internally), and display tool execution status.
However, for Phase 1, a simpler approach: the NativeAgent's `process()` returns only the final text. For tool status display, add an optional `onToolUse` callback to NativeAgentConfig that the TUI can hook into.
**Step 2: Add onToolUse callback to NativeAgent**
In `src/backends/native/agent.ts`, add to NativeAgentConfig:
```typescript
onToolUse?: (event: { type: 'start' | 'end'; tool: string; args?: unknown; result?: ToolResult }) => void;
```
The agent loop calls this before and after each tool execution.
**Step 3: Update MinimalTui to use agent instead of raw model client**
Change `MinimalTuiConfig` to accept `NativeAgent` instead of raw `ModelClient`. The `handleMessage` method calls `agent.process()` and the `onToolUse` callback prints tool status lines:
```
⚡ shell.exec: ls -la
✓ success (24 lines)
```
**Step 4: Run existing TUI tests**
Run: `pnpm vitest run src/frontends/tui/`
Expected: PASS (may need to update test mocks to use agent instead of raw client)
**Step 5: Commit**
```bash
git add src/backends/native/agent.ts src/frontends/tui/minimal.ts
git commit -m "feat(tui): display tool execution status in minimal TUI"
```
---
## Task 14: Update Telegram for Tool Display
**Files:**
- Modify: `src/frontends/telegram/bot.ts`
- Modify: `src/frontends/telegram/handlers.ts`
**Step 1: Update handlers to show tool status**
The Telegram message handler currently calls `agent.process(text)` and gets back text. With the onToolUse callback, we can send status messages during tool execution.
For Telegram, tool status should appear as edited messages or new messages:
- On tool start: Send a status message ("⚡ Running shell.exec...")
- On tool end: Edit the status message with result summary
- After loop completes: Send the final response
**Step 2: Update bot.ts**
The bot needs access to the agent's onToolUse callback, wired to send Telegram status messages for the active chat context.
**Step 3: Run tests**
Run: `pnpm vitest run src/frontends/telegram/`
Expected: PASS
**Step 4: Commit**
```bash
git add src/frontends/telegram/bot.ts src/frontends/telegram/handlers.ts
git commit -m "feat(telegram): display tool execution status messages"
```
---
## Task 15: Update Model Index Exports
**Files:**
- Modify: `src/models/index.ts`
**Step 1: Add new type exports**
```typescript
export type { ToolDefinition, ModelToolCall, ContentBlock, ToolResultEntry, ToolMessage, ConversationMessage } from './types.js';
```
**Step 2: Run typecheck**
Run: `pnpm typecheck`
Expected: PASS
**Step 3: Commit**
```bash
git add src/models/index.ts
git commit -m "feat(models): export tool-related types from index"
```
---
## Task 16: Full Integration Test
**Files:**
- Create: `src/tools/integration.test.ts`
**Step 1: Write integration test**
```typescript
// src/tools/integration.test.ts
import { describe, it, expect, vi } from 'vitest';
import { NativeAgent } from '../backends/native/agent.js';
import { ToolRegistry } from './registry.js';
import { ToolExecutor } from './executor.js';
import { HookEngine } from '../hooks/engine.js';
import { shellExecTool } from './builtin/shell.js';
import { fileReadTool } from './builtin/file-read.js';
import { fileWriteTool } from './builtin/file-write.js';
import type { ModelClient, ChatResponse } from '../models/types.js';
import { mkdtempSync, rmSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';
describe('Tool integration (end-to-end)', () => {
it('agent uses shell tool and returns result', async () => {
let callCount = 0;
const mockClient: ModelClient = {
chat: vi.fn().mockImplementation(() => {
callCount++;
if (callCount === 1) {
return {
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 10, outputTokens: 5 },
toolCalls: [{ id: 'c1', name: 'shell.exec', args: { command: 'echo integration_test' } }],
} satisfies ChatResponse;
}
return {
content: 'The command output was: integration_test',
stopReason: 'end_turn',
usage: { inputTokens: 20, outputTokens: 10 },
} satisfies ChatResponse;
}),
};
const registry = new ToolRegistry();
registry.register(shellExecTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const agent = new NativeAgent({
modelClient: mockClient,
systemPrompt: 'You have tools.',
toolRegistry: registry,
toolExecutor: executor,
});
const result = await agent.process('run echo integration_test');
expect(result).toContain('integration_test');
});
it('agent chains multiple tools', async () => {
const dir = mkdtempSync(join(tmpdir(), 'flynn-integ-'));
let callCount = 0;
const mockClient: ModelClient = {
chat: vi.fn().mockImplementation(() => {
callCount++;
if (callCount === 1) {
return {
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 10, outputTokens: 5 },
toolCalls: [{ id: 'c1', name: 'file.write', args: { path: join(dir, 'test.txt'), content: 'hello' } }],
};
}
if (callCount === 2) {
return {
content: '',
stopReason: 'tool_use',
usage: { inputTokens: 15, outputTokens: 8 },
toolCalls: [{ id: 'c2', name: 'file.read', args: { path: join(dir, 'test.txt') } }],
};
}
return {
content: 'I wrote and read the file. It contains: hello',
stopReason: 'end_turn',
usage: { inputTokens: 20, outputTokens: 10 },
};
}),
};
const registry = new ToolRegistry();
registry.register(fileWriteTool);
registry.register(fileReadTool);
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
const executor = new ToolExecutor(registry, hooks);
const agent = new NativeAgent({
modelClient: mockClient,
systemPrompt: 'You have file tools.',
toolRegistry: registry,
toolExecutor: executor,
});
try {
const result = await agent.process('write hello to test.txt then read it');
expect(result).toContain('hello');
expect(mockClient.chat).toHaveBeenCalledTimes(3);
} finally {
rmSync(dir, { recursive: true });
}
});
});
```
**Step 2: Run integration test**
Run: `pnpm vitest run src/tools/integration.test.ts`
Expected: PASS
**Step 3: Run full test suite**
Run: `pnpm vitest run`
Expected: All tests PASS
**Step 4: Run typecheck**
Run: `pnpm typecheck`
Expected: PASS (no type errors)
**Step 5: Commit**
```bash
git add src/tools/integration.test.ts
git commit -m "test: add end-to-end tool integration tests"
```
---
## Task 17: Update System Prompt for Tool Awareness
**Files:**
- Modify: `src/daemon/index.ts`
**Step 1: Update SYSTEM_PROMPT**
Add tool awareness to the system prompt so the model knows it has tools:
```typescript
const SYSTEM_PROMPT = `You are Flynn, a helpful personal AI assistant running on the user's machine. You are direct, concise, and helpful.
You have access to tools that let you interact with the system:
- shell.exec: Run shell commands (bash)
- file.read: Read file contents
- file.write: Write/create files
- file.edit: Edit files (find and replace)
- file.list: List directory contents
- web.fetch: Fetch web pages
Use tools when the user's request requires interacting with the filesystem, running commands, or fetching web content. For conversational questions, respond directly without tools.
Keep responses focused. Use markdown when it improves readability.`;
```
**Step 2: Run tests (nothing should break)**
Run: `pnpm vitest run`
Expected: PASS
**Step 3: Commit**
```bash
git add src/daemon/index.ts
git commit -m "feat(daemon): update system prompt with tool descriptions"
```
---
## Summary
| Task | Description | Files | Tests |
|------|-------------|-------|-------|
| 0 | SOUL.md + system prompt loader | `SOUL.md`, `src/daemon/index.ts` | 0 |
| 1 | Tool type definitions | `src/tools/types.ts` | 5 |
| 2 | Tool registry | `src/tools/registry.ts` | 5 |
| 3 | Tool executor | `src/tools/executor.ts` | 7 |
| 4 | Shell exec tool | `src/tools/builtin/shell.ts` | 5 |
| 5 | File tools (4 files) | `src/tools/builtin/file-*.ts` | 10 |
| 6 | Web fetch tool | `src/tools/builtin/web-fetch.ts` | 4 |
| 7 | Index/barrel exports | `src/tools/index.ts` + `builtin/index.ts` | 0 |
| 8 | Model types for tool use | `src/models/types.ts` | 4 |
| 9 | Anthropic tool use | `src/models/anthropic.ts` | 1+ |
| 10 | OpenAI tool use | `src/models/openai.ts` | 1+ |
| 11 | Agent loop | `src/backends/native/agent.ts` | 3+ |
| 12 | Wire into daemon | `src/daemon/index.ts` | 0 |
| 13 | TUI tool display | `src/frontends/tui/minimal.ts` | 0 |
| 14 | Telegram tool display | `src/frontends/telegram/*.ts` | 0 |
| 15 | Model index exports | `src/models/index.ts` | 0 |
| 16 | Integration tests | `src/tools/integration.test.ts` | 2 |
| 17 | System prompt update | `src/daemon/index.ts` | 0 |
**Total: ~47+ new tests across 18 tasks, ~16 new files, ~5 modified files**
**Execution model:** Opus 4.6 supervises and reviews. Subagents via GitHub Copilot execute implementation.
**Subagent models:**
- **Claude Haiku 4.5** (`github-copilot/claude-haiku-4.5`): Mechanical tasks (types, file tools, wiring, exports)
- **Claude Sonnet 4.5** (`github-copilot/claude-sonnet-4.5`): Complex tasks (registry, executor, model integration, agent loop, frontend updates)
**Task grouping for subagents:**
- **Haiku 4.5** (mechanical): Tasks 0, 1, 5, 6, 7, 12, 15, 17
- **Sonnet 4.5** (complex): Tasks 2, 3, 4, 8, 9, 10, 11, 13, 14, 16
**Estimated effort:** Tasks 0-7 are foundational (types + tools). Tasks 8-11 are core complexity (model integration + agent loop). Tasks 12-17 are wiring/polish.