Files
flynn/docs/plans/2026-02-05-phase1-tool-framework.md
T
William Valentin aa95f2132c feat: add channel adapter abstraction with Telegram and WebChat adapters
Implement Phase 3 channel adapters that decouple message sources from
the agent via a uniform ChannelAdapter interface and ChannelRegistry.

- Add ChannelAdapter/InboundMessage/OutboundMessage types
- Add ChannelRegistry for adapter lifecycle and message routing
- Add TelegramAdapter (grammy bot, auth middleware, confirmations, chunking)
- Add WebChatAdapter (thin shim over GatewayServer)
- Refactor daemon to use ChannelRegistry with per-channel-per-user agents
- Add config.get/config.patch gateway handlers (Phase 2 loose end)
- Add system.restart gateway handler (Phase 2 loose end)
- Add implementation plans and design docs

Tests: 225 passing (33 new channel adapter + gateway handler tests)
2026-02-05 20:00:36 -08:00

64 KiB

Phase 1: Agent Tool Framework + Agent Loop — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Add a tool execution framework with native function calling (Anthropic/OpenAI) and an iterative agent loop so Flynn can run shell commands, read/write/edit files, fetch web pages, and chain multiple tool calls per turn.

Architecture: Tools are defined as typed objects with JSON Schema inputs and an async execute method. A ToolRegistry collects them and serializes to provider-specific formats. A ToolExecutor wraps execution with hook checks, timeouts, and output truncation. The NativeAgent gains an agentic loop: call model -> if tool_use, execute tools -> feed results back -> repeat until text response or max iterations.

Tech Stack: TypeScript (strict, NodeNext), Vitest, Anthropic SDK @anthropic-ai/sdk, OpenAI SDK openai, Node.js child_process for shell, fs for file ops, fetch for web.

Build model policy: Opus 4.6 supervises and reviews. Sonnet/Haiku via GitHub Copilot execute implementation tasks as subagents. Each task dispatched to a subagent, reviewed by Opus before committing.


Task 0: SOUL.md + System Prompt Foundation

Files:

  • Create: SOUL.md (project root)
  • Modify: src/daemon/index.ts (load SOUL.md into system prompt)

Step 1: Create SOUL.md

Already created at project root. Defines Flynn's identity: direct, technical, opinionated, security-conscious. Loaded into every session.

Step 2: Update daemon to load SOUL.md

In src/daemon/index.ts, replace the hardcoded SYSTEM_PROMPT string with a loader that reads SOUL.md from the workspace root and prepends it to the system prompt:

import { readFileSync, existsSync } from 'fs';
import { resolve } from 'path';

function loadSystemPrompt(): string {
  const soulPath = resolve(process.cwd(), 'SOUL.md');
  let soul = '';
  if (existsSync(soulPath)) {
    soul = readFileSync(soulPath, 'utf-8') + '\n\n';
  }
  return soul + TOOL_INSTRUCTIONS;
}

Where TOOL_INSTRUCTIONS is the tool-aware portion added in Task 17.

Step 3: Commit

git add SOUL.md src/daemon/index.ts
git commit -m "feat: add SOUL.md identity file and load into system prompt"

Task 1: Tool Type Definitions

Files:

  • Create: src/tools/types.ts
  • Test: src/tools/types.test.ts

Step 1: Write the failing test

// src/tools/types.test.ts
import { describe, it, expect } from 'vitest';
import type { Tool, ToolCall, ToolResult, ToolUseMessage, ToolResultMessage } from './types.js';

describe('Tool types', () => {
  it('Tool interface is structurally correct', () => {
    const tool: Tool = {
      name: 'test.echo',
      description: 'Echoes input',
      inputSchema: {
        type: 'object',
        properties: { text: { type: 'string' } },
        required: ['text'],
      },
      execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
    };

    expect(tool.name).toBe('test.echo');
    expect(tool.inputSchema.type).toBe('object');
  });

  it('ToolCall has required fields', () => {
    const call: ToolCall = { id: 'call_1', name: 'test.echo', args: { text: 'hi' } };
    expect(call.id).toBe('call_1');
    expect(call.name).toBe('test.echo');
  });

  it('ToolResult has success and output', () => {
    const result: ToolResult = { success: true, output: 'hello' };
    expect(result.success).toBe(true);

    const errResult: ToolResult = { success: false, output: '', error: 'boom' };
    expect(errResult.error).toBe('boom');
  });

  it('ToolUseMessage has correct shape', () => {
    const msg: ToolUseMessage = {
      role: 'assistant',
      content: [{ type: 'tool_use', id: 'call_1', name: 'test.echo', input: { text: 'hi' } }],
    };
    expect(msg.role).toBe('assistant');
    expect(msg.content[0].type).toBe('tool_use');
  });

  it('ToolResultMessage has correct shape', () => {
    const msg: ToolResultMessage = {
      role: 'user',
      content: [{ type: 'tool_result', tool_use_id: 'call_1', content: 'output here' }],
    };
    expect(msg.role).toBe('user');
    expect(msg.content[0].type).toBe('tool_result');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/types.test.ts Expected: FAIL — module ./types.js not found

Step 3: Write the implementation

// src/tools/types.ts

export interface ToolInputSchema {
  type: 'object';
  properties: Record<string, unknown>;
  required?: string[];
}

export interface Tool {
  name: string;
  description: string;
  inputSchema: ToolInputSchema;
  execute(args: unknown): Promise<ToolResult>;
}

export interface ToolCall {
  id: string;
  name: string;
  args: unknown;
}

export interface ToolResult {
  success: boolean;
  output: string;
  error?: string;
}

// Content block for assistant messages containing tool calls
export interface ToolUseBlock {
  type: 'tool_use';
  id: string;
  name: string;
  input: unknown;
}

// Content block for user messages returning tool results
export interface ToolResultBlock {
  type: 'tool_result';
  tool_use_id: string;
  content: string;
  is_error?: boolean;
}

// Message from assistant requesting tool use
export interface ToolUseMessage {
  role: 'assistant';
  content: ToolUseBlock[];
}

// Message from user returning tool results
export interface ToolResultMessage {
  role: 'user';
  content: ToolResultBlock[];
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/types.test.ts Expected: PASS (all 5 tests)

Step 5: Commit

git add src/tools/types.ts src/tools/types.test.ts
git commit -m "feat(tools): add tool type definitions"

Task 2: Tool Registry

Files:

  • Create: src/tools/registry.ts
  • Test: src/tools/registry.test.ts

Step 1: Write the failing test

// src/tools/registry.test.ts
import { describe, it, expect } from 'vitest';
import { ToolRegistry } from './registry.js';
import type { Tool } from './types.js';

const echoTool: Tool = {
  name: 'test.echo',
  description: 'Echoes input back',
  inputSchema: {
    type: 'object',
    properties: { text: { type: 'string', description: 'Text to echo' } },
    required: ['text'],
  },
  execute: async (args) => ({ success: true, output: String((args as { text: string }).text) }),
};

const greetTool: Tool = {
  name: 'test.greet',
  description: 'Greets someone',
  inputSchema: {
    type: 'object',
    properties: { name: { type: 'string' } },
    required: ['name'],
  },
  execute: async (args) => ({ success: true, output: `Hello ${(args as { name: string }).name}` }),
};

describe('ToolRegistry', () => {
  it('registers and retrieves tools by name', () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);

    expect(registry.get('test.echo')).toBe(echoTool);
    expect(registry.get('nonexistent')).toBeUndefined();
  });

  it('lists all registered tools', () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);
    registry.register(greetTool);

    const tools = registry.list();
    expect(tools).toHaveLength(2);
    expect(tools.map(t => t.name)).toContain('test.echo');
    expect(tools.map(t => t.name)).toContain('test.greet');
  });

  it('throws on duplicate registration', () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);
    expect(() => registry.register(echoTool)).toThrow('already registered');
  });

  it('serializes to Anthropic format', () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);

    const anthropicTools = registry.toAnthropicFormat();
    expect(anthropicTools).toEqual([{
      name: 'test.echo',
      description: 'Echoes input back',
      input_schema: echoTool.inputSchema,
    }]);
  });

  it('serializes to OpenAI format', () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);

    const openaiTools = registry.toOpenAIFormat();
    expect(openaiTools).toEqual([{
      type: 'function',
      function: {
        name: 'test.echo',
        description: 'Echoes input back',
        parameters: echoTool.inputSchema,
      },
    }]);
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/registry.test.ts Expected: FAIL — module ./registry.js not found

Step 3: Write the implementation

// src/tools/registry.ts
import type { Tool, ToolInputSchema } from './types.js';

export interface AnthropicToolDef {
  name: string;
  description: string;
  input_schema: ToolInputSchema;
}

export interface OpenAIToolDef {
  type: 'function';
  function: {
    name: string;
    description: string;
    parameters: ToolInputSchema;
  };
}

export class ToolRegistry {
  private tools: Map<string, Tool> = new Map();

  register(tool: Tool): void {
    if (this.tools.has(tool.name)) {
      throw new Error(`Tool '${tool.name}' is already registered`);
    }
    this.tools.set(tool.name, tool);
  }

  get(name: string): Tool | undefined {
    return this.tools.get(name);
  }

  list(): Tool[] {
    return Array.from(this.tools.values());
  }

  toAnthropicFormat(): AnthropicToolDef[] {
    return this.list().map(t => ({
      name: t.name,
      description: t.description,
      input_schema: t.inputSchema,
    }));
  }

  toOpenAIFormat(): OpenAIToolDef[] {
    return this.list().map(t => ({
      type: 'function' as const,
      function: {
        name: t.name,
        description: t.description,
        parameters: t.inputSchema,
      },
    }));
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/registry.test.ts Expected: PASS (all 5 tests)

Step 5: Commit

git add src/tools/registry.ts src/tools/registry.test.ts
git commit -m "feat(tools): add ToolRegistry with provider serialization"

Task 3: Tool Executor

Files:

  • Create: src/tools/executor.ts
  • Test: src/tools/executor.test.ts

Step 1: Write the failing test

// src/tools/executor.test.ts
import { describe, it, expect, vi } from 'vitest';
import { ToolExecutor } from './executor.js';
import { ToolRegistry } from './registry.js';
import { HookEngine } from '../hooks/engine.js';
import type { Tool } from './types.js';

const echoTool: Tool = {
  name: 'test.echo',
  description: 'Echoes input',
  inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
  execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
};

const slowTool: Tool = {
  name: 'test.slow',
  description: 'Takes forever',
  inputSchema: { type: 'object', properties: {} },
  execute: async () => {
    await new Promise(r => setTimeout(r, 5000));
    return { success: true, output: 'done' };
  },
};

const failTool: Tool = {
  name: 'test.fail',
  description: 'Throws',
  inputSchema: { type: 'object', properties: {} },
  execute: async () => { throw new Error('kaboom'); },
};

const bigOutputTool: Tool = {
  name: 'test.big',
  description: 'Returns huge output',
  inputSchema: { type: 'object', properties: {} },
  execute: async () => ({ success: true, output: 'x'.repeat(100_000) }),
};

describe('ToolExecutor', () => {
  it('executes a tool and returns result', async () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const result = await executor.execute('test.echo', { text: 'hello' });
    expect(result.success).toBe(true);
    expect(result.output).toBe('hello');
  });

  it('returns error for unknown tool', async () => {
    const registry = new ToolRegistry();
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const result = await executor.execute('nonexistent', {});
    expect(result.success).toBe(false);
    expect(result.error).toContain('not found');
  });

  it('catches tool execution errors', async () => {
    const registry = new ToolRegistry();
    registry.register(failTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const result = await executor.execute('test.fail', {});
    expect(result.success).toBe(false);
    expect(result.error).toContain('kaboom');
  });

  it('enforces timeout', async () => {
    const registry = new ToolRegistry();
    registry.register(slowTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks, { defaultTimeoutMs: 100 });

    const result = await executor.execute('test.slow', {});
    expect(result.success).toBe(false);
    expect(result.error).toContain('timed out');
  });

  it('truncates large output', async () => {
    const registry = new ToolRegistry();
    registry.register(bigOutputTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks, { maxOutputBytes: 1000 });

    const result = await executor.execute('test.big', {});
    expect(result.success).toBe(true);
    expect(result.output.length).toBeLessThanOrEqual(1100); // 1000 + truncation message
    expect(result.output).toContain('[truncated]');
  });

  it('blocks on confirm hook and resolves when approved', async () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);
    const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    // Start execution (will block on confirmation)
    const resultPromise = executor.execute('test.echo', { text: 'hi' });

    // Approve the pending confirmation
    const pending = hooks.getPendingConfirmations();
    expect(pending).toHaveLength(1);
    hooks.resolveConfirmation(pending[0].id, { approved: true });

    const result = await resultPromise;
    expect(result.success).toBe(true);
    expect(result.output).toBe('hi');
  });

  it('blocks on confirm hook and returns denied', async () => {
    const registry = new ToolRegistry();
    registry.register(echoTool);
    const hooks = new HookEngine({ confirm: ['test.*'], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const resultPromise = executor.execute('test.echo', { text: 'hi' });

    const pending = hooks.getPendingConfirmations();
    hooks.resolveConfirmation(pending[0].id, { approved: false, reason: 'nope' });

    const result = await resultPromise;
    expect(result.success).toBe(false);
    expect(result.error).toContain('denied');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/executor.test.ts Expected: FAIL — module ./executor.js not found

Step 3: Write the implementation

// src/tools/executor.ts
import type { ToolResult } from './types.js';
import type { ToolRegistry } from './registry.js';
import type { HookEngine } from '../hooks/engine.js';

export interface ToolExecutorConfig {
  defaultTimeoutMs?: number;
  maxOutputBytes?: number;
}

export class ToolExecutor {
  private registry: ToolRegistry;
  private hooks: HookEngine;
  private defaultTimeoutMs: number;
  private maxOutputBytes: number;

  constructor(registry: ToolRegistry, hooks: HookEngine, config?: ToolExecutorConfig) {
    this.registry = registry;
    this.hooks = hooks;
    this.defaultTimeoutMs = config?.defaultTimeoutMs ?? 30_000;
    this.maxOutputBytes = config?.maxOutputBytes ?? 51_200;
  }

  async execute(toolName: string, args: unknown): Promise<ToolResult> {
    const tool = this.registry.get(toolName);
    if (!tool) {
      return { success: false, output: '', error: `Tool '${toolName}' not found` };
    }

    // Check hooks
    const action = this.hooks.getAction(toolName);
    if (action === 'confirm') {
      const hookResult = await this.hooks.requestConfirmation(
        toolName,
        args as Record<string, unknown>,
      );
      if (!hookResult.approved) {
        return {
          success: false,
          output: '',
          error: `Tool '${toolName}' denied by user: ${hookResult.reason ?? 'no reason'}`,
        };
      }
    }

    // Execute with timeout
    try {
      const result = await Promise.race([
        tool.execute(args),
        new Promise<ToolResult>((_, reject) =>
          setTimeout(() => reject(new Error(`Tool '${toolName}' timed out after ${this.defaultTimeoutMs}ms`)), this.defaultTimeoutMs)
        ),
      ]);

      // Truncate output if too large
      if (result.output.length > this.maxOutputBytes) {
        result.output = result.output.slice(0, this.maxOutputBytes) + '\n[truncated]';
      }

      return result;
    } catch (error) {
      return {
        success: false,
        output: '',
        error: error instanceof Error ? error.message : String(error),
      };
    }
  }
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/executor.test.ts Expected: PASS (all 7 tests)

Step 5: Commit

git add src/tools/executor.ts src/tools/executor.test.ts
git commit -m "feat(tools): add ToolExecutor with hooks, timeout, truncation"

Task 4: Shell Exec Tool

Files:

  • Create: src/tools/builtin/shell.ts
  • Test: src/tools/builtin/shell.test.ts

Step 1: Write the failing test

// src/tools/builtin/shell.test.ts
import { describe, it, expect } from 'vitest';
import { shellExecTool } from './shell.js';
import { tmpdir } from 'os';
import { mkdtempSync, writeFileSync, rmSync } from 'fs';
import { join } from 'path';

describe('shell.exec tool', () => {
  it('has correct metadata', () => {
    expect(shellExecTool.name).toBe('shell.exec');
    expect(shellExecTool.inputSchema.required).toContain('command');
  });

  it('runs a simple command', async () => {
    const result = await shellExecTool.execute({ command: 'echo hello' });
    expect(result.success).toBe(true);
    expect(result.output.trim()).toBe('hello');
  });

  it('captures stderr on failure', async () => {
    const result = await shellExecTool.execute({ command: 'ls /nonexistent_dir_xyz' });
    expect(result.success).toBe(false);
    expect(result.error).toBeTruthy();
  });

  it('respects cwd parameter', async () => {
    const dir = mkdtempSync(join(tmpdir(), 'flynn-test-'));
    writeFileSync(join(dir, 'test.txt'), 'content');
    try {
      const result = await shellExecTool.execute({ command: 'ls test.txt', cwd: dir });
      expect(result.success).toBe(true);
      expect(result.output.trim()).toBe('test.txt');
    } finally {
      rmSync(dir, { recursive: true });
    }
  });

  it('respects timeout parameter', async () => {
    const result = await shellExecTool.execute({ command: 'sleep 10', timeout: 200 });
    expect(result.success).toBe(false);
    expect(result.error).toContain('timed out');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/builtin/shell.test.ts Expected: FAIL — module ./shell.js not found

Step 3: Write the implementation

// src/tools/builtin/shell.ts
import { execFile } from 'child_process';
import type { Tool, ToolResult } from '../types.js';

interface ShellExecArgs {
  command: string;
  cwd?: string;
  timeout?: number;
}

export const shellExecTool: Tool = {
  name: 'shell.exec',
  description: 'Execute a shell command and return stdout/stderr. Use for running build commands, git operations, system tasks, etc.',
  inputSchema: {
    type: 'object',
    properties: {
      command: { type: 'string', description: 'The shell command to execute' },
      cwd: { type: 'string', description: 'Working directory (optional)' },
      timeout: { type: 'number', description: 'Timeout in milliseconds (default 30000)' },
    },
    required: ['command'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as ShellExecArgs;
    const timeout = args.timeout ?? 30_000;

    return new Promise((resolve) => {
      execFile('bash', ['-c', args.command], {
        cwd: args.cwd,
        timeout,
        maxBuffer: 1024 * 1024, // 1MB
      }, (error, stdout, stderr) => {
        if (error) {
          if (error.killed || error.signal === 'SIGTERM') {
            resolve({ success: false, output: stdout, error: `Command timed out after ${timeout}ms` });
            return;
          }
          resolve({
            success: false,
            output: stdout,
            error: stderr || error.message,
          });
          return;
        }
        resolve({ success: true, output: stdout + (stderr ? `\nstderr: ${stderr}` : '') });
      });
    });
  },
};

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/builtin/shell.test.ts Expected: PASS (all 5 tests)

Step 5: Commit

git add src/tools/builtin/shell.ts src/tools/builtin/shell.test.ts
git commit -m "feat(tools): add shell.exec builtin tool"

Task 5: File Tools (read, write, edit, list)

Files:

  • Create: src/tools/builtin/file-read.ts
  • Create: src/tools/builtin/file-write.ts
  • Create: src/tools/builtin/file-edit.ts
  • Create: src/tools/builtin/file-list.ts
  • Test: src/tools/builtin/file.test.ts (all four in one test file)

Step 1: Write the failing test

// src/tools/builtin/file.test.ts
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { fileReadTool } from './file-read.js';
import { fileWriteTool } from './file-write.js';
import { fileEditTool } from './file-edit.js';
import { fileListTool } from './file-list.js';
import { mkdtempSync, writeFileSync, readFileSync, rmSync, mkdirSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';

let testDir: string;

beforeEach(() => {
  testDir = mkdtempSync(join(tmpdir(), 'flynn-file-test-'));
});

afterEach(() => {
  rmSync(testDir, { recursive: true });
});

describe('file.read', () => {
  it('reads a file', async () => {
    writeFileSync(join(testDir, 'hello.txt'), 'hello world');
    const result = await fileReadTool.execute({ path: join(testDir, 'hello.txt') });
    expect(result.success).toBe(true);
    expect(result.output).toBe('hello world');
  });

  it('reads with offset and limit', async () => {
    writeFileSync(join(testDir, 'lines.txt'), 'line1\nline2\nline3\nline4\n');
    const result = await fileReadTool.execute({ path: join(testDir, 'lines.txt'), offset: 1, limit: 2 });
    expect(result.success).toBe(true);
    expect(result.output).toBe('line2\nline3');
  });

  it('returns error for missing file', async () => {
    const result = await fileReadTool.execute({ path: join(testDir, 'nope.txt') });
    expect(result.success).toBe(false);
    expect(result.error).toBeTruthy();
  });
});

describe('file.write', () => {
  it('writes a new file', async () => {
    const filePath = join(testDir, 'new.txt');
    const result = await fileWriteTool.execute({ path: filePath, content: 'new content' });
    expect(result.success).toBe(true);
    expect(readFileSync(filePath, 'utf-8')).toBe('new content');
  });

  it('creates intermediate directories', async () => {
    const filePath = join(testDir, 'sub', 'dir', 'file.txt');
    const result = await fileWriteTool.execute({ path: filePath, content: 'deep' });
    expect(result.success).toBe(true);
    expect(readFileSync(filePath, 'utf-8')).toBe('deep');
  });
});

describe('file.edit', () => {
  it('replaces a string in a file', async () => {
    const filePath = join(testDir, 'edit.txt');
    writeFileSync(filePath, 'hello world');
    const result = await fileEditTool.execute({
      path: filePath,
      old_string: 'world',
      new_string: 'flynn',
    });
    expect(result.success).toBe(true);
    expect(readFileSync(filePath, 'utf-8')).toBe('hello flynn');
  });

  it('fails if old_string not found', async () => {
    const filePath = join(testDir, 'edit2.txt');
    writeFileSync(filePath, 'hello world');
    const result = await fileEditTool.execute({
      path: filePath,
      old_string: 'xyz',
      new_string: 'abc',
    });
    expect(result.success).toBe(false);
    expect(result.error).toContain('not found');
  });

  it('fails if old_string matches multiple times without replace_all', async () => {
    const filePath = join(testDir, 'edit3.txt');
    writeFileSync(filePath, 'aaa bbb aaa');
    const result = await fileEditTool.execute({
      path: filePath,
      old_string: 'aaa',
      new_string: 'ccc',
    });
    expect(result.success).toBe(false);
    expect(result.error).toContain('multiple');
  });

  it('replaces all when replace_all is true', async () => {
    const filePath = join(testDir, 'edit4.txt');
    writeFileSync(filePath, 'aaa bbb aaa');
    const result = await fileEditTool.execute({
      path: filePath,
      old_string: 'aaa',
      new_string: 'ccc',
      replace_all: true,
    });
    expect(result.success).toBe(true);
    expect(readFileSync(filePath, 'utf-8')).toBe('ccc bbb ccc');
  });
});

describe('file.list', () => {
  it('lists files in a directory', async () => {
    writeFileSync(join(testDir, 'a.txt'), '');
    writeFileSync(join(testDir, 'b.ts'), '');
    mkdirSync(join(testDir, 'sub'));
    writeFileSync(join(testDir, 'sub', 'c.txt'), '');

    const result = await fileListTool.execute({ path: testDir });
    expect(result.success).toBe(true);
    expect(result.output).toContain('a.txt');
    expect(result.output).toContain('b.ts');
    expect(result.output).toContain('sub');
  });

  it('filters with glob pattern', async () => {
    writeFileSync(join(testDir, 'a.txt'), '');
    writeFileSync(join(testDir, 'b.ts'), '');

    const result = await fileListTool.execute({ path: testDir, pattern: '*.ts' });
    expect(result.success).toBe(true);
    expect(result.output).toContain('b.ts');
    expect(result.output).not.toContain('a.txt');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/builtin/file.test.ts Expected: FAIL — modules not found

Step 3: Write the implementations

// src/tools/builtin/file-read.ts
import { readFileSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';

interface FileReadArgs {
  path: string;
  offset?: number;  // 0-based line offset
  limit?: number;   // number of lines
}

export const fileReadTool: Tool = {
  name: 'file.read',
  description: 'Read the contents of a file. Optionally read specific lines with offset and limit.',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Absolute path to the file' },
      offset: { type: 'number', description: 'Line offset to start reading from (0-based)' },
      limit: { type: 'number', description: 'Number of lines to read' },
    },
    required: ['path'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as FileReadArgs;
    try {
      const content = readFileSync(args.path, 'utf-8');
      if (args.offset !== undefined || args.limit !== undefined) {
        const lines = content.split('\n');
        const start = args.offset ?? 0;
        const end = args.limit !== undefined ? start + args.limit : lines.length;
        return { success: true, output: lines.slice(start, end).join('\n') };
      }
      return { success: true, output: content };
    } catch (error) {
      return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
    }
  },
};
// src/tools/builtin/file-write.ts
import { writeFileSync, mkdirSync } from 'fs';
import { dirname } from 'path';
import type { Tool, ToolResult } from '../types.js';

interface FileWriteArgs {
  path: string;
  content: string;
}

export const fileWriteTool: Tool = {
  name: 'file.write',
  description: 'Write content to a file. Creates the file and parent directories if they do not exist.',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Absolute path to write to' },
      content: { type: 'string', description: 'Content to write' },
    },
    required: ['path', 'content'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as FileWriteArgs;
    try {
      mkdirSync(dirname(args.path), { recursive: true });
      writeFileSync(args.path, args.content, 'utf-8');
      return { success: true, output: `Wrote ${args.content.length} bytes to ${args.path}` };
    } catch (error) {
      return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
    }
  },
};
// src/tools/builtin/file-edit.ts
import { readFileSync, writeFileSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';

interface FileEditArgs {
  path: string;
  old_string: string;
  new_string: string;
  replace_all?: boolean;
}

export const fileEditTool: Tool = {
  name: 'file.edit',
  description: 'Edit a file by replacing an exact string match. Fails if old_string is not found or matches multiple times (unless replace_all is true).',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Absolute path to the file' },
      old_string: { type: 'string', description: 'Exact string to find' },
      new_string: { type: 'string', description: 'Replacement string' },
      replace_all: { type: 'boolean', description: 'Replace all occurrences (default false)' },
    },
    required: ['path', 'old_string', 'new_string'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as FileEditArgs;
    try {
      const content = readFileSync(args.path, 'utf-8');

      if (!content.includes(args.old_string)) {
        return { success: false, output: '', error: `old_string not found in ${args.path}` };
      }

      // Count occurrences
      const count = content.split(args.old_string).length - 1;
      if (count > 1 && !args.replace_all) {
        return { success: false, output: '', error: `old_string found multiple times (${count}). Use replace_all or provide more context.` };
      }

      const newContent = args.replace_all
        ? content.replaceAll(args.old_string, args.new_string)
        : content.replace(args.old_string, args.new_string);

      writeFileSync(args.path, newContent, 'utf-8');
      return { success: true, output: `Edited ${args.path} (${count} replacement${count > 1 ? 's' : ''})` };
    } catch (error) {
      return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
    }
  },
};
// src/tools/builtin/file-list.ts
import { readdirSync } from 'fs';
import type { Tool, ToolResult } from '../types.js';

interface FileListArgs {
  path: string;
  pattern?: string;
}

function matchGlob(name: string, pattern: string): boolean {
  const regex = new RegExp('^' + pattern.replace(/\./g, '\\.').replace(/\*/g, '.*') + '$');
  return regex.test(name);
}

export const fileListTool: Tool = {
  name: 'file.list',
  description: 'List files and directories in a given path. Optionally filter with a glob pattern.',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Directory path to list' },
      pattern: { type: 'string', description: 'Glob pattern to filter results (e.g. "*.ts")' },
    },
    required: ['path'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as FileListArgs;
    try {
      let entries = readdirSync(args.path, { withFileTypes: true });
      if (args.pattern) {
        entries = entries.filter(e => matchGlob(e.name, args.pattern!));
      }
      const output = entries
        .map(e => e.isDirectory() ? `${e.name}/` : e.name)
        .sort()
        .join('\n');
      return { success: true, output };
    } catch (error) {
      return { success: false, output: '', error: error instanceof Error ? error.message : String(error) };
    }
  },
};

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/builtin/file.test.ts Expected: PASS (all 10 tests)

Step 5: Commit

git add src/tools/builtin/file-read.ts src/tools/builtin/file-write.ts src/tools/builtin/file-edit.ts src/tools/builtin/file-list.ts src/tools/builtin/file.test.ts
git commit -m "feat(tools): add file read/write/edit/list builtin tools"

Task 6: Web Fetch Tool

Files:

  • Create: src/tools/builtin/web-fetch.ts
  • Test: src/tools/builtin/web-fetch.test.ts

Step 1: Write the failing test

// src/tools/builtin/web-fetch.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest';
import { webFetchTool } from './web-fetch.js';

// Mock global fetch
const mockFetch = vi.fn();
vi.stubGlobal('fetch', mockFetch);

beforeEach(() => {
  mockFetch.mockReset();
});

describe('web.fetch', () => {
  it('has correct metadata', () => {
    expect(webFetchTool.name).toBe('web.fetch');
    expect(webFetchTool.inputSchema.required).toContain('url');
  });

  it('fetches a URL and returns body text', async () => {
    mockFetch.mockResolvedValue({
      ok: true,
      status: 200,
      text: async () => '<html><body><h1>Hello</h1><p>World</p></body></html>',
      headers: new Headers({ 'content-type': 'text/html' }),
    });

    const result = await webFetchTool.execute({ url: 'https://example.com' });
    expect(result.success).toBe(true);
    expect(result.output).toBeTruthy();
    expect(mockFetch).toHaveBeenCalledWith('https://example.com', expect.any(Object));
  });

  it('returns error on HTTP failure', async () => {
    mockFetch.mockResolvedValue({
      ok: false,
      status: 404,
      text: async () => 'Not Found',
      headers: new Headers(),
    });

    const result = await webFetchTool.execute({ url: 'https://example.com/nope' });
    expect(result.success).toBe(false);
    expect(result.error).toContain('404');
  });

  it('returns error on network failure', async () => {
    mockFetch.mockRejectedValue(new Error('network error'));

    const result = await webFetchTool.execute({ url: 'https://down.example.com' });
    expect(result.success).toBe(false);
    expect(result.error).toContain('network error');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/tools/builtin/web-fetch.test.ts Expected: FAIL — module not found

Step 3: Write the implementation

// src/tools/builtin/web-fetch.ts
import type { Tool, ToolResult } from '../types.js';

interface WebFetchArgs {
  url: string;
  timeout?: number;
}

export const webFetchTool: Tool = {
  name: 'web.fetch',
  description: 'Fetch the content of a URL via HTTP GET. Returns the response body as text.',
  inputSchema: {
    type: 'object',
    properties: {
      url: { type: 'string', description: 'The URL to fetch' },
      timeout: { type: 'number', description: 'Timeout in milliseconds (default 15000)' },
    },
    required: ['url'],
  },
  execute: async (rawArgs: unknown): Promise<ToolResult> => {
    const args = rawArgs as WebFetchArgs;
    const timeout = args.timeout ?? 15_000;

    try {
      const response = await fetch(args.url, {
        signal: AbortSignal.timeout(timeout),
        headers: {
          'User-Agent': 'Flynn/0.1 (personal AI assistant)',
          'Accept': 'text/html, application/json, text/plain, */*',
        },
      });

      if (!response.ok) {
        return {
          success: false,
          output: '',
          error: `HTTP ${response.status}: ${await response.text()}`,
        };
      }

      const body = await response.text();
      return { success: true, output: body };
    } catch (error) {
      return {
        success: false,
        output: '',
        error: error instanceof Error ? error.message : String(error),
      };
    }
  },
};

Step 4: Run test to verify it passes

Run: pnpm vitest run src/tools/builtin/web-fetch.test.ts Expected: PASS (all 4 tests)

Step 5: Commit

git add src/tools/builtin/web-fetch.ts src/tools/builtin/web-fetch.test.ts
git commit -m "feat(tools): add web.fetch builtin tool"

Task 7: Tools Index + Register All Builtins

Files:

  • Create: src/tools/index.ts
  • Create: src/tools/builtin/index.ts

Step 1: Create the barrel exports

// src/tools/builtin/index.ts
export { shellExecTool } from './shell.js';
export { fileReadTool } from './file-read.js';
export { fileWriteTool } from './file-write.js';
export { fileEditTool } from './file-edit.js';
export { fileListTool } from './file-list.js';
export { webFetchTool } from './web-fetch.js';

import type { Tool } from '../types.js';
import { shellExecTool } from './shell.js';
import { fileReadTool } from './file-read.js';
import { fileWriteTool } from './file-write.js';
import { fileEditTool } from './file-edit.js';
import { fileListTool } from './file-list.js';
import { webFetchTool } from './web-fetch.js';

export const allBuiltinTools: Tool[] = [
  shellExecTool,
  fileReadTool,
  fileWriteTool,
  fileEditTool,
  fileListTool,
  webFetchTool,
];
// src/tools/index.ts
export type { Tool, ToolCall, ToolResult, ToolInputSchema, ToolUseBlock, ToolResultBlock, ToolUseMessage, ToolResultMessage } from './types.js';
export { ToolRegistry } from './registry.js';
export type { AnthropicToolDef, OpenAIToolDef } from './registry.js';
export { ToolExecutor } from './executor.js';
export type { ToolExecutorConfig } from './executor.js';
export { allBuiltinTools } from './builtin/index.js';
export { shellExecTool } from './builtin/shell.js';
export { fileReadTool } from './builtin/file-read.js';
export { fileWriteTool } from './builtin/file-write.js';
export { fileEditTool } from './builtin/file-edit.js';
export { fileListTool } from './builtin/file-list.js';
export { webFetchTool } from './builtin/web-fetch.js';

Step 2: Run all tool tests to verify nothing broke

Run: pnpm vitest run src/tools/ Expected: All tests PASS

Step 3: Commit

git add src/tools/index.ts src/tools/builtin/index.ts
git commit -m "feat(tools): add barrel exports and allBuiltinTools list"

Task 8: Update Model Types for Tool Use

Files:

  • Modify: src/models/types.ts
  • Test: src/models/types.test.ts (new)

Step 1: Write the failing test

// src/models/types.test.ts
import { describe, it, expect } from 'vitest';
import type { ChatRequest, ChatResponse, ToolMessage, ContentBlock } from './types.js';

describe('Model types with tool support', () => {
  it('ChatRequest accepts tools array', () => {
    const req: ChatRequest = {
      messages: [{ role: 'user', content: 'hi' }],
      tools: [{
        name: 'test',
        description: 'test tool',
        input_schema: { type: 'object', properties: {} },
      }],
    };
    expect(req.tools).toHaveLength(1);
  });

  it('ChatResponse has optional toolCalls', () => {
    const resp: ChatResponse = {
      content: '',
      stopReason: 'tool_use',
      usage: { inputTokens: 0, outputTokens: 0 },
      toolCalls: [{ id: 'call_1', name: 'test', args: {} }],
    };
    expect(resp.toolCalls).toHaveLength(1);
    expect(resp.stopReason).toBe('tool_use');
  });

  it('ToolMessage represents tool results in conversation', () => {
    const msg: ToolMessage = {
      role: 'tool_result',
      toolResults: [{ tool_use_id: 'call_1', content: 'result', is_error: false }],
    };
    expect(msg.role).toBe('tool_result');
    expect(msg.toolResults).toHaveLength(1);
  });

  it('ContentBlock can be text or tool_use', () => {
    const text: ContentBlock = { type: 'text', text: 'hello' };
    const tool: ContentBlock = { type: 'tool_use', id: 'c1', name: 'test', input: {} };
    expect(text.type).toBe('text');
    expect(tool.type).toBe('tool_use');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/models/types.test.ts Expected: FAIL — ToolMessage, ContentBlock not exported

Step 3: Update types.ts

Update src/models/types.ts to add tool-related types. Keep ALL existing types unchanged, add new ones:

// src/models/types.ts

export interface Message {
  role: 'user' | 'assistant';
  content: string;
  timestamp?: number;
}

// Tool definition passed to model API
export interface ToolDefinition {
  name: string;
  description: string;
  input_schema: {
    type: 'object';
    properties: Record<string, unknown>;
    required?: string[];
  };
}

// Individual tool call returned by model
export interface ModelToolCall {
  id: string;
  name: string;
  args: unknown;
}

// Content blocks for multi-content responses
export type ContentBlock =
  | { type: 'text'; text: string }
  | { type: 'tool_use'; id: string; name: string; input: unknown };

// Tool result fed back into conversation
export interface ToolResultEntry {
  tool_use_id: string;
  content: string;
  is_error?: boolean;
}

// Message type for tool results (distinct from user/assistant)
export interface ToolMessage {
  role: 'tool_result';
  toolResults: ToolResultEntry[];
}

// Union type for all messages in a conversation
export type ConversationMessage = Message | ToolMessage;

export interface ChatRequest {
  messages: Message[];
  system?: string;
  maxTokens?: number;
  tools?: ToolDefinition[];
}

export interface ChatResponse {
  content: string;
  stopReason: 'end_turn' | 'max_tokens' | 'stop_sequence' | 'tool_use' | string;
  usage: TokenUsage;
  toolCalls?: ModelToolCall[];
}

export interface TokenUsage {
  inputTokens: number;
  outputTokens: number;
}

export interface ChatStreamEvent {
  type: 'content' | 'done' | 'error' | 'tool_use';
  content?: string;
  usage?: TokenUsage;
  error?: Error;
  toolCall?: ModelToolCall;
}

export interface StreamingModelClient {
  chatStream(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
}

export interface ModelClient {
  chat(request: ChatRequest): Promise<ChatResponse>;
  chatStream?(request: ChatRequest): AsyncIterable<ChatStreamEvent>;
}

Step 4: Run test to verify it passes

Run: pnpm vitest run src/models/types.test.ts Expected: PASS

Step 5: Run ALL existing model tests to verify no regressions

Run: pnpm vitest run src/models/ Expected: All existing tests PASS (types are backward compatible — Message unchanged, new fields are optional)

Step 6: Commit

git add src/models/types.ts src/models/types.test.ts
git commit -m "feat(models): add tool use types to model interfaces"

Task 9: Anthropic Tool Use Support

Files:

  • Modify: src/models/anthropic.ts
  • Modify: src/models/anthropic.test.ts

Step 1: Write the failing test (add to existing test file)

Add these tests to src/models/anthropic.test.ts:

// Add after existing describe blocks in src/models/anthropic.test.ts

describe('AnthropicClient tool use', () => {
  it('passes tools to API and parses tool_use response', async () => {
    // This test requires updating the mock to return tool_use blocks
    // We need to access the mock and override for this test
    const Anthropic = (await import('@anthropic-ai/sdk')).default;
    const mockInstance = new Anthropic();

    // Override create to return tool_use
    (mockInstance.messages.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
      content: [
        { type: 'tool_use', id: 'toolu_01', name: 'shell.exec', input: { command: 'ls' } },
      ],
      stop_reason: 'tool_use',
      usage: { input_tokens: 20, output_tokens: 15 },
    });

    const client = new AnthropicClient({
      apiKey: 'test-key',
      model: 'claude-sonnet-4-20250514',
    });

    const response = await client.chat({
      messages: [{ role: 'user', content: 'list files' }],
      tools: [{
        name: 'shell.exec',
        description: 'Run shell command',
        input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
      }],
    });

    expect(response.stopReason).toBe('tool_use');
    expect(response.toolCalls).toHaveLength(1);
    expect(response.toolCalls![0]).toEqual({
      id: 'toolu_01',
      name: 'shell.exec',
      args: { command: 'ls' },
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/models/anthropic.test.ts Expected: FAIL — toolCalls is undefined (current code only extracts text blocks)

Step 3: Update anthropic.ts to support tool use

Update the chat method in src/models/anthropic.ts:

Replace the chat method body. Key changes:

  1. Pass tools to messages.create() when present
  2. Parse both text and tool_use content blocks from response
  3. Return toolCalls array when tool_use blocks present

Updated chat method:

  async chat(request: ChatRequest): Promise<ChatResponse> {
    const params: Record<string, unknown> = {
      model: this.model,
      max_tokens: request.maxTokens ?? this.defaultMaxTokens,
      system: request.system,
      messages: request.messages.map((m) => ({
        role: m.role,
        content: m.content,
      })),
    };

    if (request.tools && request.tools.length > 0) {
      params.tools = request.tools;
    }

    const response = await this.client.messages.create(params as Parameters<typeof this.client.messages.create>[0]);

    const textContent = response.content.find((c) => c.type === 'text');
    const content = textContent?.type === 'text' ? textContent.text : '';

    const toolCalls = response.content
      .filter((c): c is { type: 'tool_use'; id: string; name: string; input: unknown } => c.type === 'tool_use')
      .map(c => ({ id: c.id, name: c.name, args: c.input }));

    return {
      content,
      stopReason: response.stop_reason ?? 'end_turn',
      usage: {
        inputTokens: response.usage.input_tokens,
        outputTokens: response.usage.output_tokens,
      },
      ...(toolCalls.length > 0 ? { toolCalls } : {}),
    };
  }

Also update chatStream similarly — pass tools param, and yield tool_use events for content_block_start events with tool_use type. (Details in implementation — the key addition is yielding { type: 'tool_use', toolCall: {...} } events.)

Step 4: Run tests to verify they pass

Run: pnpm vitest run src/models/anthropic.test.ts Expected: All PASS

Step 5: Commit

git add src/models/anthropic.ts src/models/anthropic.test.ts
git commit -m "feat(models): add tool use support to AnthropicClient"

Task 10: OpenAI Tool Use Support

Files:

  • Modify: src/models/openai.ts
  • Modify: src/models/openai.test.ts

Step 1: Write the failing test

Add to src/models/openai.test.ts:

describe('OpenAIClient tool use', () => {
  it('passes tools to API and parses tool_calls response', async () => {
    const OpenAI = (await import('openai')).default;
    const mockInstance = new OpenAI();

    (mockInstance.chat.completions.create as ReturnType<typeof vi.fn>).mockResolvedValueOnce({
      choices: [{
        message: {
          content: null,
          tool_calls: [{
            id: 'call_1',
            type: 'function',
            function: { name: 'shell.exec', arguments: '{"command":"ls"}' },
          }],
        },
        finish_reason: 'tool_calls',
      }],
      usage: { prompt_tokens: 20, completion_tokens: 15 },
    });

    const client = new OpenAIClient({
      apiKey: 'test-key',
      model: 'gpt-4o',
    });

    const response = await client.chat({
      messages: [{ role: 'user', content: 'list files' }],
      tools: [{
        name: 'shell.exec',
        description: 'Run shell command',
        input_schema: { type: 'object', properties: { command: { type: 'string' } }, required: ['command'] },
      }],
    });

    expect(response.stopReason).toBe('tool_calls');
    expect(response.toolCalls).toHaveLength(1);
    expect(response.toolCalls![0]).toEqual({
      id: 'call_1',
      name: 'shell.exec',
      args: { command: 'ls' },
    });
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/models/openai.test.ts Expected: FAIL — toolCalls undefined

Step 3: Update openai.ts

Update chat method to:

  1. Convert tools to OpenAI format ({ type: 'function', function: { name, description, parameters } })
  2. Parse tool_calls from response choice
  3. Return toolCalls array with parsed JSON arguments

Step 4: Run tests

Run: pnpm vitest run src/models/openai.test.ts Expected: PASS

Step 5: Commit

git add src/models/openai.ts src/models/openai.test.ts
git commit -m "feat(models): add tool use support to OpenAIClient"

Task 11: Agent Loop

Files:

  • Modify: src/backends/native/agent.ts
  • Modify: src/backends/native/agent.test.ts

This is the biggest task. The NativeAgent process() method changes from single-turn to iterative loop.

Step 1: Write the failing test

Add to src/backends/native/agent.test.ts:

import { ToolRegistry, ToolExecutor, allBuiltinTools } from '../../tools/index.js';
import { HookEngine } from '../../hooks/index.js';
import type { Tool } from '../../tools/index.js';

// Simple test tool
const echoTool: Tool = {
  name: 'test.echo',
  description: 'Echo',
  inputSchema: { type: 'object', properties: { text: { type: 'string' } }, required: ['text'] },
  execute: async (args) => ({ success: true, output: (args as { text: string }).text }),
};

describe('NativeAgent tool loop', () => {
  it('executes tool calls and feeds results back', async () => {
    let callCount = 0;
    const mockClient: ModelClient = {
      chat: vi.fn().mockImplementation(() => {
        callCount++;
        if (callCount === 1) {
          // First call: model requests tool use
          return {
            content: '',
            stopReason: 'tool_use',
            usage: { inputTokens: 10, outputTokens: 5 },
            toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'hello' } }],
          };
        }
        // Second call: model gives final text response
        return {
          content: 'The tool returned: hello',
          stopReason: 'end_turn',
          usage: { inputTokens: 15, outputTokens: 10 },
        };
      }),
    };

    const registry = new ToolRegistry();
    registry.register(echoTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const agent = new NativeAgent({
      modelClient: mockClient,
      systemPrompt: 'You are helpful.',
      toolRegistry: registry,
      toolExecutor: executor,
    });

    const response = await agent.process('echo hello');
    expect(response).toBe('The tool returned: hello');
    expect(mockClient.chat).toHaveBeenCalledTimes(2);
  });

  it('respects max iterations', async () => {
    // Model always returns tool_use
    const mockClient: ModelClient = {
      chat: vi.fn().mockResolvedValue({
        content: '',
        stopReason: 'tool_use',
        usage: { inputTokens: 10, outputTokens: 5 },
        toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'loop' } }],
      }),
    };

    const registry = new ToolRegistry();
    registry.register(echoTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const agent = new NativeAgent({
      modelClient: mockClient,
      systemPrompt: 'You are helpful.',
      toolRegistry: registry,
      toolExecutor: executor,
      maxIterations: 3,
    });

    const response = await agent.process('loop forever');
    expect(response).toContain('max iterations');
    expect(mockClient.chat).toHaveBeenCalledTimes(3);
  });

  it('works without tools (backward compatible)', async () => {
    const mockClient: ModelClient = {
      chat: vi.fn().mockResolvedValue({
        content: 'Hello!',
        stopReason: 'end_turn',
        usage: { inputTokens: 10, outputTokens: 5 },
      }),
    };

    const agent = new NativeAgent({
      modelClient: mockClient,
      systemPrompt: 'You are helpful.',
    });

    const response = await agent.process('Hi');
    expect(response).toBe('Hello!');
  });
});

Step 2: Run test to verify it fails

Run: pnpm vitest run src/backends/native/agent.test.ts Expected: FAIL — NativeAgent doesn't accept toolRegistry/toolExecutor

Step 3: Rewrite agent.ts with tool loop

The updated NativeAgent:

  • NativeAgentConfig gains optional toolRegistry, toolExecutor, maxIterations fields
  • process() becomes a loop: call model -> if stopReason === 'tool_use', execute tools, append results, loop
  • Conversation history stores both regular messages and tool messages
  • Model receives tools from registry in each chat() call
  • Max iterations (default 10) prevents infinite loops
  • Backward compatible: if no registry/executor provided, works exactly as before

Key implementation details:

  • Build Anthropic-format messages for tool results: { role: 'user', content: [{ type: 'tool_result', tool_use_id, content }] }
  • The agent needs to track the raw content blocks (not just text) for tool_use responses
  • On max iterations, return a warning message

Step 4: Run tests

Run: pnpm vitest run src/backends/native/agent.test.ts Expected: All PASS (existing + new)

Step 5: Commit

git add src/backends/native/agent.ts src/backends/native/agent.test.ts
git commit -m "feat(agent): add iterative tool use loop with max iterations"

Task 12: Wire Tools into Daemon

Files:

  • Modify: src/daemon/index.ts

Step 1: No test needed (integration wiring)

This is wiring code that creates the tool registry, registers all builtins, creates the executor, and passes them to the NativeAgent. No new logic, just composition.

Step 2: Update daemon/index.ts

Changes:

  1. Import ToolRegistry, ToolExecutor, allBuiltinTools from ../tools/index.js
  2. After creating hookEngine, create registry and executor:
    const toolRegistry = new ToolRegistry();
    for (const tool of allBuiltinTools) {
      toolRegistry.register(tool);
    }
    const toolExecutor = new ToolExecutor(toolRegistry, hookEngine);
    
  3. Pass toolRegistry and toolExecutor to NativeAgent constructor
  4. Add toolRegistry and toolExecutor to DaemonContext interface

Step 3: Run typecheck and existing tests

Run: pnpm typecheck && pnpm vitest run Expected: PASS

Step 4: Commit

git add src/daemon/index.ts
git commit -m "feat(daemon): wire tool registry and executor into agent"

Task 13: Update TUI for Tool Display

Files:

  • Modify: src/frontends/tui/minimal.ts

Step 1: No new test (display-only change)

The TUI's handleMessage method currently calls modelClient.chatStream() or modelClient.chat() directly. After this task, it should call agent.process() instead (which handles the tool loop internally), and display tool execution status.

However, for Phase 1, a simpler approach: the NativeAgent's process() returns only the final text. For tool status display, add an optional onToolUse callback to NativeAgentConfig that the TUI can hook into.

Step 2: Add onToolUse callback to NativeAgent

In src/backends/native/agent.ts, add to NativeAgentConfig:

onToolUse?: (event: { type: 'start' | 'end'; tool: string; args?: unknown; result?: ToolResult }) => void;

The agent loop calls this before and after each tool execution.

Step 3: Update MinimalTui to use agent instead of raw model client

Change MinimalTuiConfig to accept NativeAgent instead of raw ModelClient. The handleMessage method calls agent.process() and the onToolUse callback prints tool status lines:

⚡ shell.exec: ls -la
✓ success (24 lines)

Step 4: Run existing TUI tests

Run: pnpm vitest run src/frontends/tui/ Expected: PASS (may need to update test mocks to use agent instead of raw client)

Step 5: Commit

git add src/backends/native/agent.ts src/frontends/tui/minimal.ts
git commit -m "feat(tui): display tool execution status in minimal TUI"

Task 14: Update Telegram for Tool Display

Files:

  • Modify: src/frontends/telegram/bot.ts
  • Modify: src/frontends/telegram/handlers.ts

Step 1: Update handlers to show tool status

The Telegram message handler currently calls agent.process(text) and gets back text. With the onToolUse callback, we can send status messages during tool execution.

For Telegram, tool status should appear as edited messages or new messages:

  • On tool start: Send a status message (" Running shell.exec...")
  • On tool end: Edit the status message with result summary
  • After loop completes: Send the final response

Step 2: Update bot.ts

The bot needs access to the agent's onToolUse callback, wired to send Telegram status messages for the active chat context.

Step 3: Run tests

Run: pnpm vitest run src/frontends/telegram/ Expected: PASS

Step 4: Commit

git add src/frontends/telegram/bot.ts src/frontends/telegram/handlers.ts
git commit -m "feat(telegram): display tool execution status messages"

Task 15: Update Model Index Exports

Files:

  • Modify: src/models/index.ts

Step 1: Add new type exports

export type { ToolDefinition, ModelToolCall, ContentBlock, ToolResultEntry, ToolMessage, ConversationMessage } from './types.js';

Step 2: Run typecheck

Run: pnpm typecheck Expected: PASS

Step 3: Commit

git add src/models/index.ts
git commit -m "feat(models): export tool-related types from index"

Task 16: Full Integration Test

Files:

  • Create: src/tools/integration.test.ts

Step 1: Write integration test

// src/tools/integration.test.ts
import { describe, it, expect, vi } from 'vitest';
import { NativeAgent } from '../backends/native/agent.js';
import { ToolRegistry } from './registry.js';
import { ToolExecutor } from './executor.js';
import { HookEngine } from '../hooks/engine.js';
import { shellExecTool } from './builtin/shell.js';
import { fileReadTool } from './builtin/file-read.js';
import { fileWriteTool } from './builtin/file-write.js';
import type { ModelClient, ChatResponse } from '../models/types.js';
import { mkdtempSync, rmSync } from 'fs';
import { join } from 'path';
import { tmpdir } from 'os';

describe('Tool integration (end-to-end)', () => {
  it('agent uses shell tool and returns result', async () => {
    let callCount = 0;
    const mockClient: ModelClient = {
      chat: vi.fn().mockImplementation(() => {
        callCount++;
        if (callCount === 1) {
          return {
            content: '',
            stopReason: 'tool_use',
            usage: { inputTokens: 10, outputTokens: 5 },
            toolCalls: [{ id: 'c1', name: 'shell.exec', args: { command: 'echo integration_test' } }],
          } satisfies ChatResponse;
        }
        return {
          content: 'The command output was: integration_test',
          stopReason: 'end_turn',
          usage: { inputTokens: 20, outputTokens: 10 },
        } satisfies ChatResponse;
      }),
    };

    const registry = new ToolRegistry();
    registry.register(shellExecTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const agent = new NativeAgent({
      modelClient: mockClient,
      systemPrompt: 'You have tools.',
      toolRegistry: registry,
      toolExecutor: executor,
    });

    const result = await agent.process('run echo integration_test');
    expect(result).toContain('integration_test');
  });

  it('agent chains multiple tools', async () => {
    const dir = mkdtempSync(join(tmpdir(), 'flynn-integ-'));
    let callCount = 0;

    const mockClient: ModelClient = {
      chat: vi.fn().mockImplementation(() => {
        callCount++;
        if (callCount === 1) {
          return {
            content: '',
            stopReason: 'tool_use',
            usage: { inputTokens: 10, outputTokens: 5 },
            toolCalls: [{ id: 'c1', name: 'file.write', args: { path: join(dir, 'test.txt'), content: 'hello' } }],
          };
        }
        if (callCount === 2) {
          return {
            content: '',
            stopReason: 'tool_use',
            usage: { inputTokens: 15, outputTokens: 8 },
            toolCalls: [{ id: 'c2', name: 'file.read', args: { path: join(dir, 'test.txt') } }],
          };
        }
        return {
          content: 'I wrote and read the file. It contains: hello',
          stopReason: 'end_turn',
          usage: { inputTokens: 20, outputTokens: 10 },
        };
      }),
    };

    const registry = new ToolRegistry();
    registry.register(fileWriteTool);
    registry.register(fileReadTool);
    const hooks = new HookEngine({ confirm: [], log: [], silent: [] });
    const executor = new ToolExecutor(registry, hooks);

    const agent = new NativeAgent({
      modelClient: mockClient,
      systemPrompt: 'You have file tools.',
      toolRegistry: registry,
      toolExecutor: executor,
    });

    try {
      const result = await agent.process('write hello to test.txt then read it');
      expect(result).toContain('hello');
      expect(mockClient.chat).toHaveBeenCalledTimes(3);
    } finally {
      rmSync(dir, { recursive: true });
    }
  });
});

Step 2: Run integration test

Run: pnpm vitest run src/tools/integration.test.ts Expected: PASS

Step 3: Run full test suite

Run: pnpm vitest run Expected: All tests PASS

Step 4: Run typecheck

Run: pnpm typecheck Expected: PASS (no type errors)

Step 5: Commit

git add src/tools/integration.test.ts
git commit -m "test: add end-to-end tool integration tests"

Task 17: Update System Prompt for Tool Awareness

Files:

  • Modify: src/daemon/index.ts

Step 1: Update SYSTEM_PROMPT

Add tool awareness to the system prompt so the model knows it has tools:

const SYSTEM_PROMPT = `You are Flynn, a helpful personal AI assistant running on the user's machine. You are direct, concise, and helpful.

You have access to tools that let you interact with the system:
- shell.exec: Run shell commands (bash)
- file.read: Read file contents
- file.write: Write/create files
- file.edit: Edit files (find and replace)
- file.list: List directory contents
- web.fetch: Fetch web pages

Use tools when the user's request requires interacting with the filesystem, running commands, or fetching web content. For conversational questions, respond directly without tools.

Keep responses focused. Use markdown when it improves readability.`;

Step 2: Run tests (nothing should break)

Run: pnpm vitest run Expected: PASS

Step 3: Commit

git add src/daemon/index.ts
git commit -m "feat(daemon): update system prompt with tool descriptions"

Summary

Task Description Files Tests
0 SOUL.md + system prompt loader SOUL.md, src/daemon/index.ts 0
1 Tool type definitions src/tools/types.ts 5
2 Tool registry src/tools/registry.ts 5
3 Tool executor src/tools/executor.ts 7
4 Shell exec tool src/tools/builtin/shell.ts 5
5 File tools (4 files) src/tools/builtin/file-*.ts 10
6 Web fetch tool src/tools/builtin/web-fetch.ts 4
7 Index/barrel exports src/tools/index.ts + builtin/index.ts 0
8 Model types for tool use src/models/types.ts 4
9 Anthropic tool use src/models/anthropic.ts 1+
10 OpenAI tool use src/models/openai.ts 1+
11 Agent loop src/backends/native/agent.ts 3+
12 Wire into daemon src/daemon/index.ts 0
13 TUI tool display src/frontends/tui/minimal.ts 0
14 Telegram tool display src/frontends/telegram/*.ts 0
15 Model index exports src/models/index.ts 0
16 Integration tests src/tools/integration.test.ts 2
17 System prompt update src/daemon/index.ts 0

Total: ~47+ new tests across 18 tasks, ~16 new files, ~5 modified files

Execution model: Opus 4.6 supervises and reviews. Subagents via GitHub Copilot execute implementation.

Subagent models:

  • Claude Haiku 4.5 (github-copilot/claude-haiku-4.5): Mechanical tasks (types, file tools, wiring, exports)
  • Claude Sonnet 4.5 (github-copilot/claude-sonnet-4.5): Complex tasks (registry, executor, model integration, agent loop, frontend updates)

Task grouping for subagents:

  • Haiku 4.5 (mechanical): Tasks 0, 1, 5, 6, 7, 12, 15, 17
  • Sonnet 4.5 (complex): Tasks 2, 3, 4, 8, 9, 10, 11, 13, 14, 16

Estimated effort: Tasks 0-7 are foundational (types + tools). Tasks 8-11 are core complexity (model integration + agent loop). Tasks 12-17 are wiring/polish.