aa95f2132c
Implement Phase 3 channel adapters that decouple message sources from the agent via a uniform ChannelAdapter interface and ChannelRegistry. - Add ChannelAdapter/InboundMessage/OutboundMessage types - Add ChannelRegistry for adapter lifecycle and message routing - Add TelegramAdapter (grammy bot, auth middleware, confirmations, chunking) - Add WebChatAdapter (thin shim over GatewayServer) - Refactor daemon to use ChannelRegistry with per-channel-per-user agents - Add config.get/config.patch gateway handlers (Phase 2 loose end) - Add system.restart gateway handler (Phase 2 loose end) - Add implementation plans and design docs Tests: 225 passing (33 new channel adapter + gateway handler tests)
2147 lines
64 KiB
Markdown
2147 lines
64 KiB
Markdown
# Phase 1: Agent Tool Framework + Agent Loop — Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Add a tool execution framework with native function calling (Anthropic/OpenAI) and an iterative agent loop so Flynn can run shell commands, read/write/edit files, fetch web pages, and chain multiple tool calls per turn.
|
|
|
|
**Architecture:** Tools are defined as typed objects with JSON Schema inputs and an async `execute` method. A ToolRegistry collects them and serializes to provider-specific formats. A ToolExecutor wraps execution with hook checks, timeouts, and output truncation. The NativeAgent gains an agentic loop: call model -> if tool_use, execute tools -> feed results back -> repeat until text response or max iterations.
|
|
|
|
**Tech Stack:** TypeScript (strict, NodeNext), Vitest, Anthropic SDK `@anthropic-ai/sdk`, OpenAI SDK `openai`, Node.js `child_process` for shell, `fs` for file ops, `fetch` for web.
|
|
|
|
**Build model policy:** Opus 4.6 supervises and reviews. Sonnet/Haiku via GitHub Copilot execute implementation tasks as subagents. Each task dispatched to a subagent, reviewed by Opus before committing.
|
|
|
|
---
|
|
|
|
## Task 0: SOUL.md + System Prompt Foundation
|
|
|
|
**Files:**
|
|
- Create: `SOUL.md` (project root)
|
|
- Modify: `src/daemon/index.ts` (load SOUL.md into system prompt)
|
|
|
|
**Step 1: Create SOUL.md**
|
|
|
|
Already created at project root. Defines Flynn's identity: direct, technical, opinionated, security-conscious. Loaded into every session.
|
|
|
|
**Step 2: Update daemon to load SOUL.md**
|
|
|
|
In `src/daemon/index.ts`, replace the hardcoded `SYSTEM_PROMPT` string with a loader that reads `SOUL.md` from the workspace root and prepends it to the system prompt:
|
|
|
|
```typescript
|
|
import { readFileSync, existsSync } from 'fs';
|
|
import { resolve } from 'path';
|
|
|
|
function loadSystemPrompt(): string {
|
|
const soulPath = resolve(process.cwd(), 'SOUL.md');
|
|
let soul = '';
|
|
if (existsSync(soulPath)) {
|
|
soul = readFileSync(soulPath, 'utf-8') + '\n\n';
|
|
}
|
|
return soul + TOOL_INSTRUCTIONS;
|
|
}
|
|
```
|
|
|
|
Where `TOOL_INSTRUCTIONS` is the tool-aware portion added in Task 17.
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add SOUL.md src/daemon/index.ts
|
|
git commit -m "feat: add SOUL.md identity file and load into system prompt"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 1: Tool Type Definitions
|
|
|
|
**Files:**
|
|
- Create: `src/tools/types.ts`
|
|
- Test: `src/tools/types.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/types.test.ts
|
|
import { describe, it, expect } from 'vitest';
|
|
import type { Tool, ToolCall, ToolResult, ToolUseMessage, ToolResultMessage } from './types.js';
|
|
|
|
describe('Tool types', () => {
|
|
it('Tool interface is structurally correct', () => {
|
|
const tool: Tool = {
|
|
name: 'test.echo',
|
|
description: 'Echoes input',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: { text: { type: 'string' } },
|
|
required: ['text'],
|
|
},
|
|
execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
|
|
};
|
|
|
|
expect(tool.name).toBe('test.echo');
|
|
expect(tool.inputSchema.type).toBe('object');
|
|
});
|
|
|
|
it('ToolCall has required fields', () => {
|
|
const call: ToolCall = { id: 'call_1', name: 'test.echo', args: { text: 'hi' } };
|
|
expect(call.id).toBe('call_1');
|
|
expect(call.name).toBe('test.echo');
|
|
});
|
|
|
|
it('ToolResult has success and output', () => {
|
|
const result: ToolResult = { success: true, output: 'hello' };
|
|
expect(result.success).toBe(true);
|
|
|
|
const errResult: ToolResult = { success: false, output: '', error: 'boom' };
|
|
expect(errResult.error).toBe('boom');
|
|
});
|
|
|
|
it('ToolUseMessage has correct shape', () => {
|
|
const msg: ToolUseMessage = {
|
|
role: 'assistant',
|
|
content: [{ type: 'tool_use', id: 'call_1', name: 'test.echo', input: { text: 'hi' } }],
|
|
};
|
|
expect(msg.role).toBe('assistant');
|
|
expect(msg.content[0].type).toBe('tool_use');
|
|
});
|
|
|
|
it('ToolResultMessage has correct shape', () => {
|
|
const msg: ToolResultMessage = {
|
|
role: 'user',
|
|
content: [{ type: 'tool_result', tool_use_id: 'call_1', content: 'output here' }],
|
|
};
|
|
expect(msg.role).toBe('user');
|
|
expect(msg.content[0].type).toBe('tool_result');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/types.test.ts`
|
|
Expected: FAIL — module `./types.js` not found
|
|
|
|
**Step 3: Write the implementation**
|
|
|
|
```typescript
|
|
// src/tools/types.ts
|
|
|
|
export interface ToolInputSchema {
|
|
type: 'object';
|
|
properties: Record<string, unknown>;
|
|
required?: string[];
|
|
}
|
|
|
|
export interface Tool {
|
|
name: string;
|
|
description: string;
|
|
inputSchema: ToolInputSchema;
|
|
execute(args: unknown): Promise<ToolResult>;
|
|
}
|
|
|
|
export interface ToolCall {
|
|
id: string;
|
|
name: string;
|
|
args: unknown;
|
|
}
|
|
|
|
export interface ToolResult {
|
|
success: boolean;
|
|
output: string;
|
|
error?: string;
|
|
}
|
|
|
|
// Content block for assistant messages containing tool calls
|
|
export interface ToolUseBlock {
|
|
type: 'tool_use';
|
|
id: string;
|
|
name: string;
|
|
input: unknown;
|
|
}
|
|
|
|
// Content block for user messages returning tool results
|
|
export interface ToolResultBlock {
|
|
type: 'tool_result';
|
|
tool_use_id: string;
|
|
content: string;
|
|
is_error?: boolean;
|
|
}
|
|
|
|
// Message from assistant requesting tool use
|
|
export interface ToolUseMessage {
|
|
role: 'assistant';
|
|
content: ToolUseBlock[];
|
|
}
|
|
|
|
// Message from user returning tool results
|
|
export interface ToolResultMessage {
|
|
role: 'user';
|
|
content: ToolResultBlock[];
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/types.test.ts`
|
|
Expected: PASS (all 5 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/types.ts src/tools/types.test.ts
|
|
git commit -m "feat(tools): add tool type definitions"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 2: Tool Registry
|
|
|
|
**Files:**
|
|
- Create: `src/tools/registry.ts`
|
|
- Test: `src/tools/registry.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/registry.test.ts
|
|
import { describe, it, expect } from 'vitest';
|
|
import { ToolRegistry } from './registry.js';
|
|
import type { Tool } from './types.js';
|
|
|
|
const echoTool: Tool = {
|
|
name: 'test.echo',
|
|
description: 'Echoes input back',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: { text: { type: 'string', description: 'Text to echo' } },
|
|
required: ['text'],
|
|
},
|
|
execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
|
|
};
|
|
|
|
const greetTool: Tool = {
|
|
name: 'test.greet',
|
|
description: 'Greets someone',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: { name: { type: 'string' } },
|
|
required: ['name'],
|
|
},
|
|
execute: async (args) => ({ success: true, output: `Hello ${(args as { name: string }).name}` }),
|
|
};
|
|
|
|
describe('ToolRegistry', () => {
|
|
it('registers and retrieves tools by name', () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
|
|
expect(registry.get('test.echo')).toBe(echoTool);
|
|
expect(registry.get('nonexistent')).toBeUndefined();
|
|
});
|
|
|
|
it('lists all registered tools', () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
registry.register(greetTool);
|
|
|
|
const tools = registry.list();
|
|
expect(tools).toHaveLength(2);
|
|
expect(tools.map(t => t.name)).toContain('test.echo');
|
|
expect(tools.map(t => t.name)).toContain('test.greet');
|
|
});
|
|
|
|
it('throws on duplicate registration', () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
expect(() => registry.register(echoTool)).toThrow('already registered');
|
|
});
|
|
|
|
it('serializes to Anthropic format', () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
|
|
const anthropicTools = registry.toAnthropicFormat();
|
|
expect(anthropicTools).toEqual([{
|
|
name: 'test.echo',
|
|
description: 'Echoes input back',
|
|
input_schema: echoTool.inputSchema,
|
|
}]);
|
|
});
|
|
|
|
it('serializes to OpenAI format', () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
|
|
const openaiTools = registry.toOpenAIFormat();
|
|
expect(openaiTools).toEqual([{
|
|
type: 'function',
|
|
function: {
|
|
name: 'test.echo',
|
|
description: 'Echoes input back',
|
|
parameters: echoTool.inputSchema,
|
|
},
|
|
}]);
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/registry.test.ts`
|
|
Expected: FAIL — module `./registry.js` not found
|
|
|
|
**Step 3: Write the implementation**
|
|
|
|
```typescript
|
|
// src/tools/registry.ts
|
|
import type { Tool, ToolInputSchema } from './types.js';
|
|
|
|
export interface AnthropicToolDef {
|
|
name: string;
|
|
description: string;
|
|
input_schema: ToolInputSchema;
|
|
}
|
|
|
|
export interface OpenAIToolDef {
|
|
type: 'function';
|
|
function: {
|
|
name: string;
|
|
description: string;
|
|
parameters: ToolInputSchema;
|
|
};
|
|
}
|
|
|
|
export class ToolRegistry {
|
|
private tools: Map<string, Tool> = new Map();
|
|
|
|
register(tool: Tool): void {
|
|
if (this.tools.has(tool.name)) {
|
|
throw new Error(`Tool '${tool.name}' is already registered`);
|
|
}
|
|
this.tools.set(tool.name, tool);
|
|
}
|
|
|
|
get(name: string): Tool | undefined {
|
|
return this.tools.get(name);
|
|
}
|
|
|
|
list(): Tool[] {
|
|
return Array.from(this.tools.values());
|
|
}
|
|
|
|
toAnthropicFormat(): AnthropicToolDef[] {
|
|
return this.list().map(t => ({
|
|
name: t.name,
|
|
description: t.description,
|
|
input_schema: t.inputSchema,
|
|
}));
|
|
}
|
|
|
|
toOpenAIFormat(): OpenAIToolDef[] {
|
|
return this.list().map(t => ({
|
|
type: 'function' as const,
|
|
function: {
|
|
name: t.name,
|
|
description: t.description,
|
|
parameters: t.inputSchema,
|
|
},
|
|
}));
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/registry.test.ts`
|
|
Expected: PASS (all 5 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/registry.ts src/tools/registry.test.ts
|
|
git commit -m "feat(tools): add ToolRegistry with provider serialization"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 3: Tool Executor
|
|
|
|
**Files:**
|
|
- Create: `src/tools/executor.ts`
|
|
- Test: `src/tools/executor.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/executor.test.ts
|
|
import { describe, it, expect, vi } from 'vitest';
|
|
import { ToolExecutor } from './executor.js';
|
|
import { ToolRegistry } from './registry.js';
|
|
import { HookEngine } from '../hooks/engine.js';
|
|
import type { Tool } from './types.js';
|
|
|
|
const echoTool: Tool = {
|
|
name: 'test.echo',
|
|
description: 'Echoes input',
|
|
inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
|
|
execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
|
|
};
|
|
|
|
const slowTool: Tool = {
|
|
name: 'test.slow',
|
|
description: 'Takes forever',
|
|
inputSchema: { type: 'object', properties: {} },
|
|
execute: async () => {
|
|
await new Promise(r => setTimeout(r, 5000));
|
|
return { success: true, output: 'done' };
|
|
},
|
|
};
|
|
|
|
const failTool: Tool = {
|
|
name: 'test.fail',
|
|
description: 'Throws',
|
|
inputSchema: { type: 'object', properties: {} },
|
|
execute: async () => { throw new Error('kaboom'); },
|
|
};
|
|
|
|
const bigOutputTool: Tool = {
|
|
name: 'test.big',
|
|
description: 'Returns huge output',
|
|
inputSchema: { type: 'object', properties: {} },
|
|
execute: async () => ({ success: true, output: 'x'.repeat(100_000) }),
|
|
};
|
|
|
|
describe('ToolExecutor', () => {
|
|
it('executes a tool and returns result', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const result = await executor.execute('test.echo', { text: 'hello' });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBe('hello');
|
|
});
|
|
|
|
it('returns error for unknown tool', async () => {
|
|
const registry = new ToolRegistry();
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const result = await executor.execute('nonexistent', {});
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('not found');
|
|
});
|
|
|
|
it('catches tool execution errors', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(failTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const result = await executor.execute('test.fail', {});
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('kaboom');
|
|
});
|
|
|
|
it('enforces timeout', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(slowTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks, { defaultTimeoutMs: 100 });
|
|
|
|
const result = await executor.execute('test.slow', {});
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('timed out');
|
|
});
|
|
|
|
it('truncates large output', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(bigOutputTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks, { maxOutputBytes: 1000 });
|
|
|
|
const result = await executor.execute('test.big', {});
|
|
expect(result.success).toBe(true);
|
|
expect(result.output.length).toBeLessThanOrEqual(1100); // 1000 + truncation message
|
|
expect(result.output).toContain('[truncated]');
|
|
});
|
|
|
|
it('blocks on confirm hook and resolves when approved', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
// Start execution (will block on confirmation)
|
|
const resultPromise = executor.execute('test.echo', { text: 'hi' });
|
|
|
|
// Approve the pending confirmation
|
|
const pending = hooks.getPendingConfirmations();
|
|
expect(pending).toHaveLength(1);
|
|
hooks.resolveConfirmation(pending[0].id, { approved: true });
|
|
|
|
const result = await resultPromise;
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBe('hi');
|
|
});
|
|
|
|
it('blocks on confirm hook and returns denied', async () => {
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const resultPromise = executor.execute('test.echo', { text: 'hi' });
|
|
|
|
const pending = hooks.getPendingConfirmations();
|
|
hooks.resolveConfirmation(pending[0].id, { approved: false, reason: 'nope' });
|
|
|
|
const result = await resultPromise;
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('denied');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/executor.test.ts`
|
|
Expected: FAIL — module `./executor.js` not found
|
|
|
|
**Step 3: Write the implementation**
|
|
|
|
```typescript
|
|
// src/tools/executor.ts
|
|
import type { ToolResult } from './types.js';
|
|
import type { ToolRegistry } from './registry.js';
|
|
import type { HookEngine } from '../hooks/engine.js';
|
|
|
|
export interface ToolExecutorConfig {
|
|
defaultTimeoutMs?: number;
|
|
maxOutputBytes?: number;
|
|
}
|
|
|
|
export class ToolExecutor {
|
|
private registry: ToolRegistry;
|
|
private hooks: HookEngine;
|
|
private defaultTimeoutMs: number;
|
|
private maxOutputBytes: number;
|
|
|
|
constructor(registry: ToolRegistry, hooks: HookEngine, config?: ToolExecutorConfig) {
|
|
this.registry = registry;
|
|
this.hooks = hooks;
|
|
this.defaultTimeoutMs = config?.defaultTimeoutMs ?? 30_000;
|
|
this.maxOutputBytes = config?.maxOutputBytes ?? 51_200;
|
|
}
|
|
|
|
async execute(toolName: string, args: unknown): Promise<ToolResult> {
|
|
const tool = this.registry.get(toolName);
|
|
if (!tool) {
|
|
return { success: false, output: '', error: `Tool '${toolName}' not found` };
|
|
}
|
|
|
|
// Check hooks
|
|
const action = this.hooks.getAction(toolName);
|
|
if (action === 'confirm') {
|
|
const hookResult = await this.hooks.requestConfirmation(
|
|
toolName,
|
|
args as Record<string, unknown>,
|
|
);
|
|
if (!hookResult.approved) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: `Tool '${toolName}' denied by user: ${hookResult.reason ?? 'no reason'}`,
|
|
};
|
|
}
|
|
}
|
|
|
|
// Execute with timeout
|
|
try {
|
|
const result = await Promise.race([
|
|
tool.execute(args),
|
|
new Promise<ToolResult>((_, reject) =>
|
|
setTimeout(() => reject(new Error(`Tool '${toolName}' timed out after ${this.defaultTimeoutMs}ms`)), this.defaultTimeoutMs)
|
|
),
|
|
]);
|
|
|
|
// Truncate output if too large
|
|
if (result.output.length > this.maxOutputBytes) {
|
|
result.output = result.output.slice(0, this.maxOutputBytes) + '\n[truncated]';
|
|
}
|
|
|
|
return result;
|
|
} catch (error) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: error instanceof Error ? error.message : String(error),
|
|
};
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/executor.test.ts`
|
|
Expected: PASS (all 7 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/executor.ts src/tools/executor.test.ts
|
|
git commit -m "feat(tools): add ToolExecutor with hooks, timeout, truncation"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Shell Exec Tool
|
|
|
|
**Files:**
|
|
- Create: `src/tools/builtin/shell.ts`
|
|
- Test: `src/tools/builtin/shell.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/builtin/shell.test.ts
|
|
import { describe, it, expect } from 'vitest';
|
|
import { shellExecTool } from './shell.js';
|
|
import { tmpdir } from 'os';
|
|
import { mkdtempSync, writeFileSync, rmSync } from 'fs';
|
|
import { join } from 'path';
|
|
|
|
describe('shell.exec tool', () => {
|
|
it('has correct metadata', () => {
|
|
expect(shellExecTool.name).toBe('shell.exec');
|
|
expect(shellExecTool.inputSchema.required).toContain('command');
|
|
});
|
|
|
|
it('runs a simple command', async () => {
|
|
const result = await shellExecTool.execute({ command: 'echo hello' });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output.trim()).toBe('hello');
|
|
});
|
|
|
|
it('captures stderr on failure', async () => {
|
|
const result = await shellExecTool.execute({ command: 'ls /nonexistent_dir_xyz' });
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toBeTruthy();
|
|
});
|
|
|
|
it('respects cwd parameter', async () => {
|
|
const dir = mkdtempSync(join(tmpdir(), 'flynn-test-'));
|
|
writeFileSync(join(dir, 'test.txt'), 'content');
|
|
try {
|
|
const result = await shellExecTool.execute({ command: 'ls test.txt', cwd: dir });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output.trim()).toBe('test.txt');
|
|
} finally {
|
|
rmSync(dir, { recursive: true });
|
|
}
|
|
});
|
|
|
|
it('respects timeout parameter', async () => {
|
|
const result = await shellExecTool.execute({ command: 'sleep 10', timeout: 200 });
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('timed out');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/shell.test.ts`
|
|
Expected: FAIL — module `./shell.js` not found
|
|
|
|
**Step 3: Write the implementation**
|
|
|
|
```typescript
|
|
// src/tools/builtin/shell.ts
|
|
import { execFile } from 'child_process';
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface ShellExecArgs {
|
|
command: string;
|
|
cwd?: string;
|
|
timeout?: number;
|
|
}
|
|
|
|
export const shellExecTool: Tool = {
|
|
name: 'shell.exec',
|
|
description: 'Execute a shell command and return stdout/stderr. Use for running build commands, git operations, system tasks, etc.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
command: { type: 'string', description: 'The shell command to execute' },
|
|
cwd: { type: 'string', description: 'Working directory (optional)' },
|
|
timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
|
|
},
|
|
required: ['command'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as ShellExecArgs;
|
|
const timeout = args.timeout ?? 30_000;
|
|
|
|
return new Promise((resolve) => {
|
|
execFile('bash', ['-c', args.command], {
|
|
cwd: args.cwd,
|
|
timeout,
|
|
maxBuffer: 1024 * 1024, // 1MB
|
|
}, (error, stdout, stderr) => {
|
|
if (error) {
|
|
if (error.killed || error.signal === 'SIGTERM') {
|
|
resolve({ success: false, output: stdout, error: `Command timed out after ${timeout}ms` });
|
|
return;
|
|
}
|
|
resolve({
|
|
success: false,
|
|
output: stdout,
|
|
error: stderr || error.message,
|
|
});
|
|
return;
|
|
}
|
|
resolve({ success: true, output: stdout + (stderr ? `\nstderr: ${stderr}` : '') });
|
|
});
|
|
});
|
|
},
|
|
};
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/shell.test.ts`
|
|
Expected: PASS (all 5 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/builtin/shell.ts src/tools/builtin/shell.test.ts
|
|
git commit -m "feat(tools): add shell.exec builtin tool"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 5: File Tools (read, write, edit, list)
|
|
|
|
**Files:**
|
|
- Create: `src/tools/builtin/file-read.ts`
|
|
- Create: `src/tools/builtin/file-write.ts`
|
|
- Create: `src/tools/builtin/file-edit.ts`
|
|
- Create: `src/tools/builtin/file-list.ts`
|
|
- Test: `src/tools/builtin/file.test.ts` (all four in one test file)
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/builtin/file.test.ts
|
|
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
|
|
import { fileReadTool } from './file-read.js';
|
|
import { fileWriteTool } from './file-write.js';
|
|
import { fileEditTool } from './file-edit.js';
|
|
import { fileListTool } from './file-list.js';
|
|
import { mkdtempSync, writeFileSync, readFileSync, rmSync, mkdirSync } from 'fs';
|
|
import { join } from 'path';
|
|
import { tmpdir } from 'os';
|
|
|
|
let testDir: string;
|
|
|
|
beforeEach(() => {
|
|
testDir = mkdtempSync(join(tmpdir(), 'flynn-file-test-'));
|
|
});
|
|
|
|
afterEach(() => {
|
|
rmSync(testDir, { recursive: true });
|
|
});
|
|
|
|
describe('file.read', () => {
|
|
it('reads a file', async () => {
|
|
writeFileSync(join(testDir, 'hello.txt'), 'hello world');
|
|
const result = await fileReadTool.execute({ path: join(testDir, 'hello.txt') });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBe('hello world');
|
|
});
|
|
|
|
it('reads with offset and limit', async () => {
|
|
writeFileSync(join(testDir, 'lines.txt'), 'line1\nline2\nline3\nline4\n');
|
|
const result = await fileReadTool.execute({ path: join(testDir, 'lines.txt'), offset: 1, limit: 2 });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBe('line2\nline3');
|
|
});
|
|
|
|
it('returns error for missing file', async () => {
|
|
const result = await fileReadTool.execute({ path: join(testDir, 'nope.txt') });
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toBeTruthy();
|
|
});
|
|
});
|
|
|
|
describe('file.write', () => {
|
|
it('writes a new file', async () => {
|
|
const filePath = join(testDir, 'new.txt');
|
|
const result = await fileWriteTool.execute({ path: filePath, content: 'new content' });
|
|
expect(result.success).toBe(true);
|
|
expect(readFileSync(filePath, 'utf-8')).toBe('new content');
|
|
});
|
|
|
|
it('creates intermediate directories', async () => {
|
|
const filePath = join(testDir, 'sub', 'dir', 'file.txt');
|
|
const result = await fileWriteTool.execute({ path: filePath, content: 'deep' });
|
|
expect(result.success).toBe(true);
|
|
expect(readFileSync(filePath, 'utf-8')).toBe('deep');
|
|
});
|
|
});
|
|
|
|
describe('file.edit', () => {
|
|
it('replaces a string in a file', async () => {
|
|
const filePath = join(testDir, 'edit.txt');
|
|
writeFileSync(filePath, 'hello world');
|
|
const result = await fileEditTool.execute({
|
|
path: filePath,
|
|
old_string: 'world',
|
|
new_string: 'flynn',
|
|
});
|
|
expect(result.success).toBe(true);
|
|
expect(readFileSync(filePath, 'utf-8')).toBe('hello flynn');
|
|
});
|
|
|
|
it('fails if old_string not found', async () => {
|
|
const filePath = join(testDir, 'edit2.txt');
|
|
writeFileSync(filePath, 'hello world');
|
|
const result = await fileEditTool.execute({
|
|
path: filePath,
|
|
old_string: 'xyz',
|
|
new_string: 'abc',
|
|
});
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('not found');
|
|
});
|
|
|
|
it('fails if old_string matches multiple times without replace_all', async () => {
|
|
const filePath = join(testDir, 'edit3.txt');
|
|
writeFileSync(filePath, 'aaa bbb aaa');
|
|
const result = await fileEditTool.execute({
|
|
path: filePath,
|
|
old_string: 'aaa',
|
|
new_string: 'ccc',
|
|
});
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('multiple');
|
|
});
|
|
|
|
it('replaces all when replace_all is true', async () => {
|
|
const filePath = join(testDir, 'edit4.txt');
|
|
writeFileSync(filePath, 'aaa bbb aaa');
|
|
const result = await fileEditTool.execute({
|
|
path: filePath,
|
|
old_string: 'aaa',
|
|
new_string: 'ccc',
|
|
replace_all: true,
|
|
});
|
|
expect(result.success).toBe(true);
|
|
expect(readFileSync(filePath, 'utf-8')).toBe('ccc bbb ccc');
|
|
});
|
|
});
|
|
|
|
describe('file.list', () => {
|
|
it('lists files in a directory', async () => {
|
|
writeFileSync(join(testDir, 'a.txt'), '');
|
|
writeFileSync(join(testDir, 'b.ts'), '');
|
|
mkdirSync(join(testDir, 'sub'));
|
|
writeFileSync(join(testDir, 'sub', 'c.txt'), '');
|
|
|
|
const result = await fileListTool.execute({ path: testDir });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toContain('a.txt');
|
|
expect(result.output).toContain('b.ts');
|
|
expect(result.output).toContain('sub');
|
|
});
|
|
|
|
it('filters with glob pattern', async () => {
|
|
writeFileSync(join(testDir, 'a.txt'), '');
|
|
writeFileSync(join(testDir, 'b.ts'), '');
|
|
|
|
const result = await fileListTool.execute({ path: testDir, pattern: '*.ts' });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toContain('b.ts');
|
|
expect(result.output).not.toContain('a.txt');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/file.test.ts`
|
|
Expected: FAIL — modules not found
|
|
|
|
**Step 3: Write the implementations**
|
|
|
|
```typescript
|
|
// src/tools/builtin/file-read.ts
|
|
import { readFileSync } from 'fs';
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface FileReadArgs {
|
|
path: string;
|
|
offset?: number; // 0-based line offset
|
|
limit?: number; // number of lines
|
|
}
|
|
|
|
export const fileReadTool: Tool = {
|
|
name: 'file.read',
|
|
description: 'Read the contents of a file. Optionally read specific lines with offset and limit.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
path: { type: 'string', description: 'Absolute path to the file' },
|
|
offset: { type: 'number', description: 'Line offset to start reading from (0-based)' },
|
|
limit: { type: 'number', description: 'Number of lines to read' },
|
|
},
|
|
required: ['path'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as FileReadArgs;
|
|
try {
|
|
const content = readFileSync(args.path, 'utf-8');
|
|
if (args.offset !== undefined || args.limit !== undefined) {
|
|
const lines = content.split('\n');
|
|
const start = args.offset ?? 0;
|
|
const end = args.limit !== undefined ? start + args.limit : lines.length;
|
|
return { success: true, output: lines.slice(start, end).join('\n') };
|
|
}
|
|
return { success: true, output: content };
|
|
} catch (error) {
|
|
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
|
|
}
|
|
},
|
|
};
|
|
```
|
|
|
|
```typescript
|
|
// src/tools/builtin/file-write.ts
|
|
import { writeFileSync, mkdirSync } from 'fs';
|
|
import { dirname } from 'path';
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface FileWriteArgs {
|
|
path: string;
|
|
content: string;
|
|
}
|
|
|
|
export const fileWriteTool: Tool = {
|
|
name: 'file.write',
|
|
description: 'Write content to a file. Creates the file and parent directories if they do not exist.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
path: { type: 'string', description: 'Absolute path to write to' },
|
|
content: { type: 'string', description: 'Content to write' },
|
|
},
|
|
required: ['path', 'content'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as FileWriteArgs;
|
|
try {
|
|
mkdirSync(dirname(args.path), { recursive: true });
|
|
writeFileSync(args.path, args.content, 'utf-8');
|
|
return { success: true, output: `Wrote ${args.content.length} bytes to ${args.path}` };
|
|
} catch (error) {
|
|
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
|
|
}
|
|
},
|
|
};
|
|
```
|
|
|
|
```typescript
|
|
// src/tools/builtin/file-edit.ts
|
|
import { readFileSync, writeFileSync } from 'fs';
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface FileEditArgs {
|
|
path: string;
|
|
old_string: string;
|
|
new_string: string;
|
|
replace_all?: boolean;
|
|
}
|
|
|
|
export const fileEditTool: Tool = {
|
|
name: 'file.edit',
|
|
description: 'Edit a file by replacing an exact string match. Fails if old_string is not found or matches multiple times (unless replace_all is true).',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
path: { type: 'string', description: 'Absolute path to the file' },
|
|
old_string: { type: 'string', description: 'Exact string to find' },
|
|
new_string: { type: 'string', description: 'Replacement string' },
|
|
replace_all: { type: 'boolean', description: 'Replace all occurrences (default false)' },
|
|
},
|
|
required: ['path', 'old_string', 'new_string'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as FileEditArgs;
|
|
try {
|
|
const content = readFileSync(args.path, 'utf-8');
|
|
|
|
if (!content.includes(args.old_string)) {
|
|
return { success: false, output: '', error: `old_string not found in ${args.path}` };
|
|
}
|
|
|
|
// Count occurrences
|
|
const count = content.split(args.old_string).length - 1;
|
|
if (count > 1 && !args.replace_all) {
|
|
return { success: false, output: '', error: `old_string found multiple times (${count}). Use replace_all or provide more context.` };
|
|
}
|
|
|
|
const newContent = args.replace_all
|
|
? content.replaceAll(args.old_string, args.new_string)
|
|
: content.replace(args.old_string, args.new_string);
|
|
|
|
writeFileSync(args.path, newContent, 'utf-8');
|
|
return { success: true, output: `Edited ${args.path} (${count} replacement${count > 1 ? 's' : ''})` };
|
|
} catch (error) {
|
|
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
|
|
}
|
|
},
|
|
};
|
|
```
|
|
|
|
```typescript
|
|
// src/tools/builtin/file-list.ts
|
|
import { readdirSync } from 'fs';
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface FileListArgs {
|
|
path: string;
|
|
pattern?: string;
|
|
}
|
|
|
|
function matchGlob(name: string, pattern: string): boolean {
|
|
const regex = new RegExp('^' + pattern.replace(/\./g, '\\.').replace(/\*/g, '.*') + '$');
|
|
return regex.test(name);
|
|
}
|
|
|
|
export const fileListTool: Tool = {
|
|
name: 'file.list',
|
|
description: 'List files and directories in a given path. Optionally filter with a glob pattern.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
path: { type: 'string', description: 'Directory path to list' },
|
|
pattern: { type: 'string', description: 'Glob pattern to filter results (e.g. "*.ts")' },
|
|
},
|
|
required: ['path'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as FileListArgs;
|
|
try {
|
|
let entries = readdirSync(args.path, { withFileTypes: true });
|
|
if (args.pattern) {
|
|
entries = entries.filter(e => matchGlob(e.name, args.pattern!));
|
|
}
|
|
const output = entries
|
|
.map(e => e.isDirectory() ? `${e.name}/` : e.name)
|
|
.sort()
|
|
.join('\n');
|
|
return { success: true, output };
|
|
} catch (error) {
|
|
return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
|
|
}
|
|
},
|
|
};
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/file.test.ts`
|
|
Expected: PASS (all 10 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/builtin/file-read.ts src/tools/builtin/file-write.ts src/tools/builtin/file-edit.ts src/tools/builtin/file-list.ts src/tools/builtin/file.test.ts
|
|
git commit -m "feat(tools): add file read/write/edit/list builtin tools"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Web Fetch Tool
|
|
|
|
**Files:**
|
|
- Create: `src/tools/builtin/web-fetch.ts`
|
|
- Test: `src/tools/builtin/web-fetch.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/tools/builtin/web-fetch.test.ts
|
|
import { describe, it, expect, vi, beforeEach } from 'vitest';
|
|
import { webFetchTool } from './web-fetch.js';
|
|
|
|
// Mock global fetch
|
|
const mockFetch = vi.fn();
|
|
vi.stubGlobal('fetch', mockFetch);
|
|
|
|
beforeEach(() => {
|
|
mockFetch.mockReset();
|
|
});
|
|
|
|
describe('web.fetch', () => {
|
|
it('has correct metadata', () => {
|
|
expect(webFetchTool.name).toBe('web.fetch');
|
|
expect(webFetchTool.inputSchema.required).toContain('url');
|
|
});
|
|
|
|
it('fetches a URL and returns body text', async () => {
|
|
mockFetch.mockResolvedValue({
|
|
ok: true,
|
|
status: 200,
|
|
text: async () => '<html><body><h1>Hello</h1><p>World</p></body></html>',
|
|
headers: new Headers({ 'content-type': 'text/html' }),
|
|
});
|
|
|
|
const result = await webFetchTool.execute({ url: 'https://example.com' });
|
|
expect(result.success).toBe(true);
|
|
expect(result.output).toBeTruthy();
|
|
expect(mockFetch).toHaveBeenCalledWith('https://example.com', expect.any(Object));
|
|
});
|
|
|
|
it('returns error on HTTP failure', async () => {
|
|
mockFetch.mockResolvedValue({
|
|
ok: false,
|
|
status: 404,
|
|
text: async () => 'Not Found',
|
|
headers: new Headers(),
|
|
});
|
|
|
|
const result = await webFetchTool.execute({ url: 'https://example.com/nope' });
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('404');
|
|
});
|
|
|
|
it('returns error on network failure', async () => {
|
|
mockFetch.mockRejectedValue(new Error('network error'));
|
|
|
|
const result = await webFetchTool.execute({ url: 'https://down.example.com' });
|
|
expect(result.success).toBe(false);
|
|
expect(result.error).toContain('network error');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/web-fetch.test.ts`
|
|
Expected: FAIL — module not found
|
|
|
|
**Step 3: Write the implementation**
|
|
|
|
```typescript
|
|
// src/tools/builtin/web-fetch.ts
|
|
import type { Tool, ToolResult } from '../types.js';
|
|
|
|
interface WebFetchArgs {
|
|
url: string;
|
|
timeout?: number;
|
|
}
|
|
|
|
export const webFetchTool: Tool = {
|
|
name: 'web.fetch',
|
|
description: 'Fetch the content of a URL via HTTP GET. Returns the response body as text.',
|
|
inputSchema: {
|
|
type: 'object',
|
|
properties: {
|
|
url: { type: 'string', description: 'The URL to fetch' },
|
|
timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
|
|
},
|
|
required: ['url'],
|
|
},
|
|
execute: async (rawArgs: unknown): Promise<ToolResult> => {
|
|
const args = rawArgs as WebFetchArgs;
|
|
const timeout = args.timeout ?? 15_000;
|
|
|
|
try {
|
|
const response = await fetch(args.url, {
|
|
signal: AbortSignal.timeout(timeout),
|
|
headers: {
|
|
'User-Agent': 'Flynn/0.1 (personal AI assistant)',
|
|
'Accept': 'text/html, application/json, text/plain, */*',
|
|
},
|
|
});
|
|
|
|
if (!response.ok) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: `HTTP ${response.status}: ${await response.text()}`,
|
|
};
|
|
}
|
|
|
|
const body = await response.text();
|
|
return { success: true, output: body };
|
|
} catch (error) {
|
|
return {
|
|
success: false,
|
|
output: '',
|
|
error: error instanceof Error ? error.message : String(error),
|
|
};
|
|
}
|
|
},
|
|
};
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/tools/builtin/web-fetch.test.ts`
|
|
Expected: PASS (all 4 tests)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/builtin/web-fetch.ts src/tools/builtin/web-fetch.test.ts
|
|
git commit -m "feat(tools): add web.fetch builtin tool"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: Tools Index + Register All Builtins
|
|
|
|
**Files:**
|
|
- Create: `src/tools/index.ts`
|
|
- Create: `src/tools/builtin/index.ts`
|
|
|
|
**Step 1: Create the barrel exports**
|
|
|
|
```typescript
|
|
// src/tools/builtin/index.ts
|
|
export { shellExecTool } from './shell.js';
|
|
export { fileReadTool } from './file-read.js';
|
|
export { fileWriteTool } from './file-write.js';
|
|
export { fileEditTool } from './file-edit.js';
|
|
export { fileListTool } from './file-list.js';
|
|
export { webFetchTool } from './web-fetch.js';
|
|
|
|
import type { Tool } from '../types.js';
|
|
import { shellExecTool } from './shell.js';
|
|
import { fileReadTool } from './file-read.js';
|
|
import { fileWriteTool } from './file-write.js';
|
|
import { fileEditTool } from './file-edit.js';
|
|
import { fileListTool } from './file-list.js';
|
|
import { webFetchTool } from './web-fetch.js';
|
|
|
|
export const allBuiltinTools: Tool[] = [
|
|
shellExecTool,
|
|
fileReadTool,
|
|
fileWriteTool,
|
|
fileEditTool,
|
|
fileListTool,
|
|
webFetchTool,
|
|
];
|
|
```
|
|
|
|
```typescript
|
|
// src/tools/index.ts
|
|
export type { Tool, ToolCall, ToolResult, ToolInputSchema, ToolUseBlock, ToolResultBlock, ToolUseMessage, ToolResultMessage } from './types.js';
|
|
export { ToolRegistry } from './registry.js';
|
|
export type { AnthropicToolDef, OpenAIToolDef } from './registry.js';
|
|
export { ToolExecutor } from './executor.js';
|
|
export type { ToolExecutorConfig } from './executor.js';
|
|
export { allBuiltinTools } from './builtin/index.js';
|
|
export { shellExecTool } from './builtin/shell.js';
|
|
export { fileReadTool } from './builtin/file-read.js';
|
|
export { fileWriteTool } from './builtin/file-write.js';
|
|
export { fileEditTool } from './builtin/file-edit.js';
|
|
export { fileListTool } from './builtin/file-list.js';
|
|
export { webFetchTool } from './builtin/web-fetch.js';
|
|
```
|
|
|
|
**Step 2: Run all tool tests to verify nothing broke**
|
|
|
|
Run: `pnpm vitest run src/tools/`
|
|
Expected: All tests PASS
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add src/tools/index.ts src/tools/builtin/index.ts
|
|
git commit -m "feat(tools): add barrel exports and allBuiltinTools list"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 8: Update Model Types for Tool Use
|
|
|
|
**Files:**
|
|
- Modify: `src/models/types.ts`
|
|
- Test: `src/models/types.test.ts` (new)
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
```typescript
|
|
// src/models/types.test.ts
|
|
import { describe, it, expect } from 'vitest';
|
|
import type { ChatRequest, ChatResponse, ToolMessage, ContentBlock } from './types.js';
|
|
|
|
describe('Model types with tool support', () => {
|
|
it('ChatRequest accepts tools array', () => {
|
|
const req: ChatRequest = {
|
|
messages: [{ role: 'user', content: 'hi' }],
|
|
tools: [{
|
|
name: 'test',
|
|
description: 'test tool',
|
|
input_schema: { type: 'object', properties: {} },
|
|
}],
|
|
};
|
|
expect(req.tools).toHaveLength(1);
|
|
});
|
|
|
|
it('ChatResponse has optional toolCalls', () => {
|
|
const resp: ChatResponse = {
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 0, outputTokens: 0 },
|
|
toolCalls: [{ id: 'call_1', name: 'test', args: {} }],
|
|
};
|
|
expect(resp.toolCalls).toHaveLength(1);
|
|
expect(resp.stopReason).toBe('tool_use');
|
|
});
|
|
|
|
it('ToolMessage represents tool results in conversation', () => {
|
|
const msg: ToolMessage = {
|
|
role: 'tool_result',
|
|
toolResults: [{ tool_use_id: 'call_1', content: 'result', is_error: false }],
|
|
};
|
|
expect(msg.role).toBe('tool_result');
|
|
expect(msg.toolResults).toHaveLength(1);
|
|
});
|
|
|
|
it('ContentBlock can be text or tool_use', () => {
|
|
const text: ContentBlock = { type: 'text', text: 'hello' };
|
|
const tool: ContentBlock = { type: 'tool_use', id: 'c1', name: 'test', input: {} };
|
|
expect(text.type).toBe('text');
|
|
expect(tool.type).toBe('tool_use');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/models/types.test.ts`
|
|
Expected: FAIL — `ToolMessage`, `ContentBlock` not exported
|
|
|
|
**Step 3: Update types.ts**
|
|
|
|
Update `src/models/types.ts` to add tool-related types. Keep ALL existing types unchanged, add new ones:
|
|
|
|
```typescript
|
|
// src/models/types.ts
|
|
|
|
export interface Message {
|
|
role: 'user' | 'assistant';
|
|
content: string;
|
|
timestamp?: number;
|
|
}
|
|
|
|
// Tool definition passed to model API
|
|
export interface ToolDefinition {
|
|
name: string;
|
|
description: string;
|
|
input_schema: {
|
|
type: 'object';
|
|
properties: Record<string, unknown>;
|
|
required?: string[];
|
|
};
|
|
}
|
|
|
|
// Individual tool call returned by model
|
|
export interface ModelToolCall {
|
|
id: string;
|
|
name: string;
|
|
args: unknown;
|
|
}
|
|
|
|
// Content blocks for multi-content responses
|
|
export type ContentBlock =
|
|
| { type: 'text'; text: string }
|
|
| { type: 'tool_use'; id: string; name: string; input: unknown };
|
|
|
|
// Tool result fed back into conversation
|
|
export interface ToolResultEntry {
|
|
tool_use_id: string;
|
|
content: string;
|
|
is_error?: boolean;
|
|
}
|
|
|
|
// Message type for tool results (distinct from user/assistant)
|
|
export interface ToolMessage {
|
|
role: 'tool_result';
|
|
toolResults: ToolResultEntry[];
|
|
}
|
|
|
|
// Union type for all messages in a conversation
|
|
export type ConversationMessage = Message | ToolMessage;
|
|
|
|
export interface ChatRequest {
|
|
messages: Message[];
|
|
system?: string;
|
|
maxTokens?: number;
|
|
tools?: ToolDefinition[];
|
|
}
|
|
|
|
export interface ChatResponse {
|
|
content: string;
|
|
stopReason: 'end_turn' | 'max_tokens' | 'stop_sequence' | 'tool_use' | string;
|
|
usage: TokenUsage;
|
|
toolCalls?: ModelToolCall[];
|
|
}
|
|
|
|
export interface TokenUsage {
|
|
inputTokens: number;
|
|
outputTokens: number;
|
|
}
|
|
|
|
export interface ChatStreamEvent {
|
|
type: 'content' | 'done' | 'error' | 'tool_use';
|
|
content?: string;
|
|
usage?: TokenUsage;
|
|
error?: Error;
|
|
toolCall?: ModelToolCall;
|
|
}
|
|
|
|
export interface StreamingModelClient {
|
|
chatStream(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
|
|
}
|
|
|
|
export interface ModelClient {
|
|
chat(request: ChatRequest): Promise<ChatResponse>;
|
|
chatStream?(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
|
|
}
|
|
```
|
|
|
|
**Step 4: Run test to verify it passes**
|
|
|
|
Run: `pnpm vitest run src/models/types.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Run ALL existing model tests to verify no regressions**
|
|
|
|
Run: `pnpm vitest run src/models/`
|
|
Expected: All existing tests PASS (types are backward compatible — `Message` unchanged, new fields are optional)
|
|
|
|
**Step 6: Commit**
|
|
|
|
```bash
|
|
git add src/models/types.ts src/models/types.test.ts
|
|
git commit -m "feat(models): add tool use types to model interfaces"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: Anthropic Tool Use Support
|
|
|
|
**Files:**
|
|
- Modify: `src/models/anthropic.ts`
|
|
- Modify: `src/models/anthropic.test.ts`
|
|
|
|
**Step 1: Write the failing test (add to existing test file)**
|
|
|
|
Add these tests to `src/models/anthropic.test.ts`:
|
|
|
|
```typescript
|
|
// Add after existing describe blocks in src/models/anthropic.test.ts
|
|
|
|
describe('AnthropicClient tool use', () => {
|
|
it('passes tools to API and parses tool_use response', async () => {
|
|
// This test requires updating the mock to return tool_use blocks
|
|
// We need to access the mock and override for this test
|
|
const Anthropic = (await import('@anthropic-ai/sdk')).default;
|
|
const mockInstance = new Anthropic();
|
|
|
|
// Override create to return tool_use
|
|
(mockInstance.messages.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
|
|
content: [
|
|
{ type: 'tool_use', id: 'toolu_01', name: 'shell.exec', input: { command: 'ls' } },
|
|
],
|
|
stop_reason: 'tool_use',
|
|
usage: { input_tokens: 20, output_tokens: 15 },
|
|
});
|
|
|
|
const client = new AnthropicClient({
|
|
apiKey: 'test-key',
|
|
model: 'claude-sonnet-4-20250514',
|
|
});
|
|
|
|
const response = await client.chat({
|
|
messages: [{ role: 'user', content: 'list files' }],
|
|
tools: [{
|
|
name: 'shell.exec',
|
|
description: 'Run shell command',
|
|
input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
|
|
}],
|
|
});
|
|
|
|
expect(response.stopReason).toBe('tool_use');
|
|
expect(response.toolCalls).toHaveLength(1);
|
|
expect(response.toolCalls![0]).toEqual({
|
|
id: 'toolu_01',
|
|
name: 'shell.exec',
|
|
args: { command: 'ls' },
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/models/anthropic.test.ts`
|
|
Expected: FAIL — `toolCalls` is undefined (current code only extracts text blocks)
|
|
|
|
**Step 3: Update anthropic.ts to support tool use**
|
|
|
|
Update the `chat` method in `src/models/anthropic.ts`:
|
|
|
|
Replace the `chat` method body. Key changes:
|
|
1. Pass `tools` to `messages.create()` when present
|
|
2. Parse both `text` and `tool_use` content blocks from response
|
|
3. Return `toolCalls` array when tool_use blocks present
|
|
|
|
Updated `chat` method:
|
|
|
|
```typescript
|
|
async chat(request: ChatRequest): Promise<ChatResponse> {
|
|
const params: Record<string, unknown> = {
|
|
model: this.model,
|
|
max_tokens: request.maxTokens ?? this.defaultMaxTokens,
|
|
system: request.system,
|
|
messages: request.messages.map((m) => ({
|
|
role: m.role,
|
|
content: m.content,
|
|
})),
|
|
};
|
|
|
|
if (request.tools && request.tools.length > 0) {
|
|
params.tools = request.tools;
|
|
}
|
|
|
|
const response = await this.client.messages.create(params as Parameters<typeof this.client.messages.create>[0]);
|
|
|
|
const textContent = response.content.find((c) => c.type === 'text');
|
|
const content = textContent?.type === 'text' ? textContent.text : '';
|
|
|
|
const toolCalls = response.content
|
|
.filter((c): c is { type: 'tool_use'; id: string; name: string; input: unknown } => c.type === 'tool_use')
|
|
.map(c => ({ id: c.id, name: c.name, args: c.input }));
|
|
|
|
return {
|
|
content,
|
|
stopReason: response.stop_reason ?? 'end_turn',
|
|
usage: {
|
|
inputTokens: response.usage.input_tokens,
|
|
outputTokens: response.usage.output_tokens,
|
|
},
|
|
...(toolCalls.length > 0 ? { toolCalls } : {}),
|
|
};
|
|
}
|
|
```
|
|
|
|
Also update `chatStream` similarly — pass tools param, and yield `tool_use` events for `content_block_start` events with `tool_use` type. (Details in implementation — the key addition is yielding `{ type: 'tool_use', toolCall: {...} }` events.)
|
|
|
|
**Step 4: Run tests to verify they pass**
|
|
|
|
Run: `pnpm vitest run src/models/anthropic.test.ts`
|
|
Expected: All PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/models/anthropic.ts src/models/anthropic.test.ts
|
|
git commit -m "feat(models): add tool use support to AnthropicClient"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 10: OpenAI Tool Use Support
|
|
|
|
**Files:**
|
|
- Modify: `src/models/openai.ts`
|
|
- Modify: `src/models/openai.test.ts`
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add to `src/models/openai.test.ts`:
|
|
|
|
```typescript
|
|
describe('OpenAIClient tool use', () => {
|
|
it('passes tools to API and parses tool_calls response', async () => {
|
|
const OpenAI = (await import('openai')).default;
|
|
const mockInstance = new OpenAI();
|
|
|
|
(mockInstance.chat.completions.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
|
|
choices: [{
|
|
message: {
|
|
content: null,
|
|
tool_calls: [{
|
|
id: 'call_1',
|
|
type: 'function',
|
|
function: { name: 'shell.exec', arguments: '{"command":"ls"}' },
|
|
}],
|
|
},
|
|
finish_reason: 'tool_calls',
|
|
}],
|
|
usage: { prompt_tokens: 20, completion_tokens: 15 },
|
|
});
|
|
|
|
const client = new OpenAIClient({
|
|
apiKey: 'test-key',
|
|
model: 'gpt-4o',
|
|
});
|
|
|
|
const response = await client.chat({
|
|
messages: [{ role: 'user', content: 'list files' }],
|
|
tools: [{
|
|
name: 'shell.exec',
|
|
description: 'Run shell command',
|
|
input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
|
|
}],
|
|
});
|
|
|
|
expect(response.stopReason).toBe('tool_calls');
|
|
expect(response.toolCalls).toHaveLength(1);
|
|
expect(response.toolCalls![0]).toEqual({
|
|
id: 'call_1',
|
|
name: 'shell.exec',
|
|
args: { command: 'ls' },
|
|
});
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/models/openai.test.ts`
|
|
Expected: FAIL — `toolCalls` undefined
|
|
|
|
**Step 3: Update openai.ts**
|
|
|
|
Update `chat` method to:
|
|
1. Convert `tools` to OpenAI format (`{ type: 'function', function: { name, description, parameters } }`)
|
|
2. Parse `tool_calls` from response choice
|
|
3. Return `toolCalls` array with parsed JSON arguments
|
|
|
|
**Step 4: Run tests**
|
|
|
|
Run: `pnpm vitest run src/models/openai.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/models/openai.ts src/models/openai.test.ts
|
|
git commit -m "feat(models): add tool use support to OpenAIClient"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 11: Agent Loop
|
|
|
|
**Files:**
|
|
- Modify: `src/backends/native/agent.ts`
|
|
- Modify: `src/backends/native/agent.test.ts`
|
|
|
|
This is the biggest task. The NativeAgent `process()` method changes from single-turn to iterative loop.
|
|
|
|
**Step 1: Write the failing test**
|
|
|
|
Add to `src/backends/native/agent.test.ts`:
|
|
|
|
```typescript
|
|
import { ToolRegistry, ToolExecutor, allBuiltinTools } from '../../tools/index.js';
|
|
import { HookEngine } from '../../hooks/index.js';
|
|
import type { Tool } from '../../tools/index.js';
|
|
|
|
// Simple test tool
|
|
const echoTool: Tool = {
|
|
name: 'test.echo',
|
|
description: 'Echo',
|
|
inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
|
|
execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
|
|
};
|
|
|
|
describe('NativeAgent tool loop', () => {
|
|
it('executes tool calls and feeds results back', async () => {
|
|
let callCount = 0;
|
|
const mockClient: ModelClient = {
|
|
chat: vi.fn().mockImplementation(() => {
|
|
callCount++;
|
|
if (callCount === 1) {
|
|
// First call: model requests tool use
|
|
return {
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 10, outputTokens: 5 },
|
|
toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'hello' } }],
|
|
};
|
|
}
|
|
// Second call: model gives final text response
|
|
return {
|
|
content: 'The tool returned: hello',
|
|
stopReason: 'end_turn',
|
|
usage: { inputTokens: 15, outputTokens: 10 },
|
|
};
|
|
}),
|
|
};
|
|
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const agent = new NativeAgent({
|
|
modelClient: mockClient,
|
|
systemPrompt: 'You are helpful.',
|
|
toolRegistry: registry,
|
|
toolExecutor: executor,
|
|
});
|
|
|
|
const response = await agent.process('echo hello');
|
|
expect(response).toBe('The tool returned: hello');
|
|
expect(mockClient.chat).toHaveBeenCalledTimes(2);
|
|
});
|
|
|
|
it('respects max iterations', async () => {
|
|
// Model always returns tool_use
|
|
const mockClient: ModelClient = {
|
|
chat: vi.fn().mockResolvedValue({
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 10, outputTokens: 5 },
|
|
toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'loop' } }],
|
|
}),
|
|
};
|
|
|
|
const registry = new ToolRegistry();
|
|
registry.register(echoTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const agent = new NativeAgent({
|
|
modelClient: mockClient,
|
|
systemPrompt: 'You are helpful.',
|
|
toolRegistry: registry,
|
|
toolExecutor: executor,
|
|
maxIterations: 3,
|
|
});
|
|
|
|
const response = await agent.process('loop forever');
|
|
expect(response).toContain('max iterations');
|
|
expect(mockClient.chat).toHaveBeenCalledTimes(3);
|
|
});
|
|
|
|
it('works without tools (backward compatible)', async () => {
|
|
const mockClient: ModelClient = {
|
|
chat: vi.fn().mockResolvedValue({
|
|
content: 'Hello!',
|
|
stopReason: 'end_turn',
|
|
usage: { inputTokens: 10, outputTokens: 5 },
|
|
}),
|
|
};
|
|
|
|
const agent = new NativeAgent({
|
|
modelClient: mockClient,
|
|
systemPrompt: 'You are helpful.',
|
|
});
|
|
|
|
const response = await agent.process('Hi');
|
|
expect(response).toBe('Hello!');
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run test to verify it fails**
|
|
|
|
Run: `pnpm vitest run src/backends/native/agent.test.ts`
|
|
Expected: FAIL — NativeAgent doesn't accept `toolRegistry`/`toolExecutor`
|
|
|
|
**Step 3: Rewrite agent.ts with tool loop**
|
|
|
|
The updated NativeAgent:
|
|
- `NativeAgentConfig` gains optional `toolRegistry`, `toolExecutor`, `maxIterations` fields
|
|
- `process()` becomes a loop: call model -> if `stopReason === 'tool_use'`, execute tools, append results, loop
|
|
- Conversation history stores both regular messages and tool messages
|
|
- Model receives tools from registry in each `chat()` call
|
|
- Max iterations (default 10) prevents infinite loops
|
|
- Backward compatible: if no registry/executor provided, works exactly as before
|
|
|
|
Key implementation details:
|
|
- Build Anthropic-format messages for tool results: `{ role: 'user', content: [{ type: 'tool_result', tool_use_id, content }] }`
|
|
- The agent needs to track the raw content blocks (not just text) for tool_use responses
|
|
- On max iterations, return a warning message
|
|
|
|
**Step 4: Run tests**
|
|
|
|
Run: `pnpm vitest run src/backends/native/agent.test.ts`
|
|
Expected: All PASS (existing + new)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/backends/native/agent.ts src/backends/native/agent.test.ts
|
|
git commit -m "feat(agent): add iterative tool use loop with max iterations"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 12: Wire Tools into Daemon
|
|
|
|
**Files:**
|
|
- Modify: `src/daemon/index.ts`
|
|
|
|
**Step 1: No test needed (integration wiring)**
|
|
|
|
This is wiring code that creates the tool registry, registers all builtins, creates the executor, and passes them to the NativeAgent. No new logic, just composition.
|
|
|
|
**Step 2: Update daemon/index.ts**
|
|
|
|
Changes:
|
|
1. Import `ToolRegistry`, `ToolExecutor`, `allBuiltinTools` from `../tools/index.js`
|
|
2. After creating hookEngine, create registry and executor:
|
|
```typescript
|
|
const toolRegistry = new ToolRegistry();
|
|
for (const tool of allBuiltinTools) {
|
|
toolRegistry.register(tool);
|
|
}
|
|
const toolExecutor = new ToolExecutor(toolRegistry, hookEngine);
|
|
```
|
|
3. Pass `toolRegistry` and `toolExecutor` to NativeAgent constructor
|
|
4. Add `toolRegistry` and `toolExecutor` to `DaemonContext` interface
|
|
|
|
**Step 3: Run typecheck and existing tests**
|
|
|
|
Run: `pnpm typecheck && pnpm vitest run`
|
|
Expected: PASS
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
git add src/daemon/index.ts
|
|
git commit -m "feat(daemon): wire tool registry and executor into agent"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 13: Update TUI for Tool Display
|
|
|
|
**Files:**
|
|
- Modify: `src/frontends/tui/minimal.ts`
|
|
|
|
**Step 1: No new test (display-only change)**
|
|
|
|
The TUI's `handleMessage` method currently calls `modelClient.chatStream()` or `modelClient.chat()` directly. After this task, it should call `agent.process()` instead (which handles the tool loop internally), and display tool execution status.
|
|
|
|
However, for Phase 1, a simpler approach: the NativeAgent's `process()` returns only the final text. For tool status display, add an optional `onToolUse` callback to NativeAgentConfig that the TUI can hook into.
|
|
|
|
**Step 2: Add onToolUse callback to NativeAgent**
|
|
|
|
In `src/backends/native/agent.ts`, add to NativeAgentConfig:
|
|
```typescript
|
|
onToolUse?: (event: { type: 'start' | 'end'; tool: string; args?: unknown; result?: ToolResult }) => void;
|
|
```
|
|
|
|
The agent loop calls this before and after each tool execution.
|
|
|
|
**Step 3: Update MinimalTui to use agent instead of raw model client**
|
|
|
|
Change `MinimalTuiConfig` to accept `NativeAgent` instead of raw `ModelClient`. The `handleMessage` method calls `agent.process()` and the `onToolUse` callback prints tool status lines:
|
|
|
|
```
|
|
⚡ shell.exec: ls -la
|
|
✓ success (24 lines)
|
|
```
|
|
|
|
**Step 4: Run existing TUI tests**
|
|
|
|
Run: `pnpm vitest run src/frontends/tui/`
|
|
Expected: PASS (may need to update test mocks to use agent instead of raw client)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/backends/native/agent.ts src/frontends/tui/minimal.ts
|
|
git commit -m "feat(tui): display tool execution status in minimal TUI"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 14: Update Telegram for Tool Display
|
|
|
|
**Files:**
|
|
- Modify: `src/frontends/telegram/bot.ts`
|
|
- Modify: `src/frontends/telegram/handlers.ts`
|
|
|
|
**Step 1: Update handlers to show tool status**
|
|
|
|
The Telegram message handler currently calls `agent.process(text)` and gets back text. With the onToolUse callback, we can send status messages during tool execution.
|
|
|
|
For Telegram, tool status should appear as edited messages or new messages:
|
|
- On tool start: Send a status message ("⚡ Running shell.exec...")
|
|
- On tool end: Edit the status message with result summary
|
|
- After loop completes: Send the final response
|
|
|
|
**Step 2: Update bot.ts**
|
|
|
|
The bot needs access to the agent's onToolUse callback, wired to send Telegram status messages for the active chat context.
|
|
|
|
**Step 3: Run tests**
|
|
|
|
Run: `pnpm vitest run src/frontends/telegram/`
|
|
Expected: PASS
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
git add src/frontends/telegram/bot.ts src/frontends/telegram/handlers.ts
|
|
git commit -m "feat(telegram): display tool execution status messages"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 15: Update Model Index Exports
|
|
|
|
**Files:**
|
|
- Modify: `src/models/index.ts`
|
|
|
|
**Step 1: Add new type exports**
|
|
|
|
```typescript
|
|
export type { ToolDefinition, ModelToolCall, ContentBlock, ToolResultEntry, ToolMessage, ConversationMessage } from './types.js';
|
|
```
|
|
|
|
**Step 2: Run typecheck**
|
|
|
|
Run: `pnpm typecheck`
|
|
Expected: PASS
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add src/models/index.ts
|
|
git commit -m "feat(models): export tool-related types from index"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 16: Full Integration Test
|
|
|
|
**Files:**
|
|
- Create: `src/tools/integration.test.ts`
|
|
|
|
**Step 1: Write integration test**
|
|
|
|
```typescript
|
|
// src/tools/integration.test.ts
|
|
import { describe, it, expect, vi } from 'vitest';
|
|
import { NativeAgent } from '../backends/native/agent.js';
|
|
import { ToolRegistry } from './registry.js';
|
|
import { ToolExecutor } from './executor.js';
|
|
import { HookEngine } from '../hooks/engine.js';
|
|
import { shellExecTool } from './builtin/shell.js';
|
|
import { fileReadTool } from './builtin/file-read.js';
|
|
import { fileWriteTool } from './builtin/file-write.js';
|
|
import type { ModelClient, ChatResponse } from '../models/types.js';
|
|
import { mkdtempSync, rmSync } from 'fs';
|
|
import { join } from 'path';
|
|
import { tmpdir } from 'os';
|
|
|
|
describe('Tool integration (end-to-end)', () => {
|
|
it('agent uses shell tool and returns result', async () => {
|
|
let callCount = 0;
|
|
const mockClient: ModelClient = {
|
|
chat: vi.fn().mockImplementation(() => {
|
|
callCount++;
|
|
if (callCount === 1) {
|
|
return {
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 10, outputTokens: 5 },
|
|
toolCalls: [{ id: 'c1', name: 'shell.exec', args: { command: 'echo integration_test' } }],
|
|
} satisfies ChatResponse;
|
|
}
|
|
return {
|
|
content: 'The command output was: integration_test',
|
|
stopReason: 'end_turn',
|
|
usage: { inputTokens: 20, outputTokens: 10 },
|
|
} satisfies ChatResponse;
|
|
}),
|
|
};
|
|
|
|
const registry = new ToolRegistry();
|
|
registry.register(shellExecTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const agent = new NativeAgent({
|
|
modelClient: mockClient,
|
|
systemPrompt: 'You have tools.',
|
|
toolRegistry: registry,
|
|
toolExecutor: executor,
|
|
});
|
|
|
|
const result = await agent.process('run echo integration_test');
|
|
expect(result).toContain('integration_test');
|
|
});
|
|
|
|
it('agent chains multiple tools', async () => {
|
|
const dir = mkdtempSync(join(tmpdir(), 'flynn-integ-'));
|
|
let callCount = 0;
|
|
|
|
const mockClient: ModelClient = {
|
|
chat: vi.fn().mockImplementation(() => {
|
|
callCount++;
|
|
if (callCount === 1) {
|
|
return {
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 10, outputTokens: 5 },
|
|
toolCalls: [{ id: 'c1', name: 'file.write', args: { path: join(dir, 'test.txt'), content: 'hello' } }],
|
|
};
|
|
}
|
|
if (callCount === 2) {
|
|
return {
|
|
content: '',
|
|
stopReason: 'tool_use',
|
|
usage: { inputTokens: 15, outputTokens: 8 },
|
|
toolCalls: [{ id: 'c2', name: 'file.read', args: { path: join(dir, 'test.txt') } }],
|
|
};
|
|
}
|
|
return {
|
|
content: 'I wrote and read the file. It contains: hello',
|
|
stopReason: 'end_turn',
|
|
usage: { inputTokens: 20, outputTokens: 10 },
|
|
};
|
|
}),
|
|
};
|
|
|
|
const registry = new ToolRegistry();
|
|
registry.register(fileWriteTool);
|
|
registry.register(fileReadTool);
|
|
const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
|
|
const executor = new ToolExecutor(registry, hooks);
|
|
|
|
const agent = new NativeAgent({
|
|
modelClient: mockClient,
|
|
systemPrompt: 'You have file tools.',
|
|
toolRegistry: registry,
|
|
toolExecutor: executor,
|
|
});
|
|
|
|
try {
|
|
const result = await agent.process('write hello to test.txt then read it');
|
|
expect(result).toContain('hello');
|
|
expect(mockClient.chat).toHaveBeenCalledTimes(3);
|
|
} finally {
|
|
rmSync(dir, { recursive: true });
|
|
}
|
|
});
|
|
});
|
|
```
|
|
|
|
**Step 2: Run integration test**
|
|
|
|
Run: `pnpm vitest run src/tools/integration.test.ts`
|
|
Expected: PASS
|
|
|
|
**Step 3: Run full test suite**
|
|
|
|
Run: `pnpm vitest run`
|
|
Expected: All tests PASS
|
|
|
|
**Step 4: Run typecheck**
|
|
|
|
Run: `pnpm typecheck`
|
|
Expected: PASS (no type errors)
|
|
|
|
**Step 5: Commit**
|
|
|
|
```bash
|
|
git add src/tools/integration.test.ts
|
|
git commit -m "test: add end-to-end tool integration tests"
|
|
```
|
|
|
|
---
|
|
|
|
## Task 17: Update System Prompt for Tool Awareness
|
|
|
|
**Files:**
|
|
- Modify: `src/daemon/index.ts`
|
|
|
|
**Step 1: Update SYSTEM_PROMPT**
|
|
|
|
Add tool awareness to the system prompt so the model knows it has tools:
|
|
|
|
```typescript
|
|
const SYSTEM_PROMPT = `You are Flynn, a helpful personal AI assistant running on the user's machine. You are direct, concise, and helpful.
|
|
|
|
You have access to tools that let you interact with the system:
|
|
- shell.exec: Run shell commands (bash)
|
|
- file.read: Read file contents
|
|
- file.write: Write/create files
|
|
- file.edit: Edit files (find and replace)
|
|
- file.list: List directory contents
|
|
- web.fetch: Fetch web pages
|
|
|
|
Use tools when the user's request requires interacting with the filesystem, running commands, or fetching web content. For conversational questions, respond directly without tools.
|
|
|
|
Keep responses focused. Use markdown when it improves readability.`;
|
|
```
|
|
|
|
**Step 2: Run tests (nothing should break)**
|
|
|
|
Run: `pnpm vitest run`
|
|
Expected: PASS
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add src/daemon/index.ts
|
|
git commit -m "feat(daemon): update system prompt with tool descriptions"
|
|
```
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
| Task | Description | Files | Tests |
|
|
|------|-------------|-------|-------|
|
|
| 0 | SOUL.md + system prompt loader | `SOUL.md`, `src/daemon/index.ts` | 0 |
|
|
| 1 | Tool type definitions | `src/tools/types.ts` | 5 |
|
|
| 2 | Tool registry | `src/tools/registry.ts` | 5 |
|
|
| 3 | Tool executor | `src/tools/executor.ts` | 7 |
|
|
| 4 | Shell exec tool | `src/tools/builtin/shell.ts` | 5 |
|
|
| 5 | File tools (4 files) | `src/tools/builtin/file-*.ts` | 10 |
|
|
| 6 | Web fetch tool | `src/tools/builtin/web-fetch.ts` | 4 |
|
|
| 7 | Index/barrel exports | `src/tools/index.ts` + `builtin/index.ts` | 0 |
|
|
| 8 | Model types for tool use | `src/models/types.ts` | 4 |
|
|
| 9 | Anthropic tool use | `src/models/anthropic.ts` | 1+ |
|
|
| 10 | OpenAI tool use | `src/models/openai.ts` | 1+ |
|
|
| 11 | Agent loop | `src/backends/native/agent.ts` | 3+ |
|
|
| 12 | Wire into daemon | `src/daemon/index.ts` | 0 |
|
|
| 13 | TUI tool display | `src/frontends/tui/minimal.ts` | 0 |
|
|
| 14 | Telegram tool display | `src/frontends/telegram/*.ts` | 0 |
|
|
| 15 | Model index exports | `src/models/index.ts` | 0 |
|
|
| 16 | Integration tests | `src/tools/integration.test.ts` | 2 |
|
|
| 17 | System prompt update | `src/daemon/index.ts` | 0 |
|
|
|
|
**Total: ~47+ new tests across 18 tasks, ~16 new files, ~5 modified files**
|
|
|
|
**Execution model:** Opus 4.6 supervises and reviews. Subagents via GitHub Copilot execute implementation.
|
|
|
|
**Subagent models:**
|
|
- **Claude Haiku 4.5** (`github-copilot/claude-haiku-4.5`): Mechanical tasks (types, file tools, wiring, exports)
|
|
- **Claude Sonnet 4.5** (`github-copilot/claude-sonnet-4.5`): Complex tasks (registry, executor, model integration, agent loop, frontend updates)
|
|
|
|
**Task grouping for subagents:**
|
|
- **Haiku 4.5** (mechanical): Tasks 0, 1, 5, 6, 7, 12, 15, 17
|
|
- **Sonnet 4.5** (complex): Tasks 2, 3, 4, 8, 9, 10, 11, 13, 14, 16
|
|
|
|
**Estimated effort:** Tasks 0-7 are foundational (types + tools). Tasks 8-11 are core complexity (model integration + agent loop). Tasks 12-17 are wiring/polish.
|