11 KiB
Testing Patterns
Analysis Date: 2026-02-09
Test Framework
Runner:
- Vitest v3.x
- Config: No
vitest.config.tsfile — uses Vitest defaults withpackage.jsontype: "module"
Assertion Library:
- Vitest built-in
expect()API (Chai-compatible)
Run Commands:
pnpm test # Run all tests in watch mode
pnpm test:run # Run all tests once (no watch)
pnpm test:run src/path/to/file.test.ts # Run a single test file
Test File Organization
Location:
- Co-located with source files — test files live next to the code they test
src/models/router.ts→src/models/router.test.tssrc/tools/policy.ts→src/tools/policy.test.tssrc/backends/native/agent.ts→src/backends/native/agent.test.ts
Naming:
*.test.tssuffix (no.spec.tsfiles exist)- Test file name matches source file name:
schema.test.tstestsschema.ts
Statistics:
- 88 test files across the codebase
- ~16,676 total lines of test code
- 152 source (non-test)
.tsfiles
Test Structure
Suite Organization:
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest';
import { ClassName } from './source-file.js';
describe('ClassName', () => {
it('does something specific', () => {
// arrange, act, assert
});
it('handles error case', () => {
// ...
});
});
Nested Describes for Grouping:
describe('ToolPolicy', () => {
describe('default config (full profile)', () => {
it('allows all tools when profile is full', () => { ... });
});
describe('profile filtering', () => {
it('minimal profile only allows read-only tools', () => { ... });
it('coding profile includes file writes and shell', () => { ... });
});
describe('edge cases', () => {
it('handles empty tool list', () => { ... });
});
});
Naming convention for it() blocks:
- Start with verb: "returns", "creates", "handles", "uses", "fires", "respects"
- Describe expected behavior:
'returns silent action for non-matching tools' - Error cases:
'returns error for missing file','fails if old_string not found'
Setup and Teardown
beforeEach/afterEach for test isolation:
// File system tests — create temp dir, clean up after
let testDir: string;
beforeEach(() => {
testDir = mkdtempSync(join(tmpdir(), 'flynn-file-test-'));
});
afterEach(() => {
rmSync(testDir, { recursive: true });
});
Database tests — create store, close and clean:
let store: SessionStore;
beforeEach(() => {
store = new SessionStore(dbPath);
});
afterEach(() => {
store.close();
if (existsSync(dbPath)) {
unlinkSync(dbPath);
}
});
Mock cleanup:
beforeEach(() => {
vi.clearAllMocks();
});
Mocking
Framework: Vitest built-in vi.fn() and vi.mock()
Pattern 1: Inline mock objects (most common):
const createMockClient = (): ModelClient => ({
chat: vi.fn().mockResolvedValue({
content: 'Hello!',
stopReason: 'end_turn',
usage: { inputTokens: 10, outputTokens: 5 },
} satisfies ChatResponse),
});
Pattern 2: Mock factory functions for reusable test doubles:
function mockMemoryStore(results: SearchResult[]): MemoryStore {
return {
search: vi.fn(() => results),
read: vi.fn(() => ''),
write: vi.fn(),
listNamespaces: vi.fn(() => []),
} as unknown as MemoryStore;
}
function mockVectorStore(results: VectorSearchResult[]): VectorStore {
return {
search: vi.fn(() => results),
upsertChunks: vi.fn(),
deleteNamespace: vi.fn(),
} as unknown as VectorStore;
}
Pattern 3: vi.mock() for module mocking:
vi.mock('./docker.js', () => ({
DockerSandbox: vi.fn().mockImplementation(() => ({
create: vi.fn().mockResolvedValue(undefined),
destroy: vi.fn().mockResolvedValue(undefined),
exec: vi.fn().mockResolvedValue({ stdout: '', stderr: '' }),
})),
}));
Pattern 4: vi.mock() with hoisted shared mocks:
const mockGenerateContent = vi.fn();
const mockGetGenerativeModel = vi.fn().mockReturnValue({
generateContent: mockGenerateContent,
});
vi.mock('@google/generative-ai', () => ({
GoogleGenerativeAI: vi.fn().mockImplementation(() => ({
getGenerativeModel: mockGetGenerativeModel,
})),
}));
Pattern 5: Sequential mock returns (multi-step interactions):
let callCount = 0;
const mockClient: ModelClient = {
chat: vi.fn().mockImplementation(() => {
callCount++;
if (callCount === 1) {
return {
content: '',
stopReason: 'tool_use',
toolCalls: [{ id: 'call_1', name: 'test.echo', args: { text: 'hello' } }],
usage: { inputTokens: 10, outputTokens: 5 },
};
}
return {
content: 'The tool returned: hello',
stopReason: 'end_turn',
usage: { inputTokens: 15, outputTokens: 10 },
};
}),
};
Pattern 6: Spying on console methods:
const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
// ... test code ...
expect(warnSpy).toHaveBeenCalledWith(expect.stringContaining('Output channel'));
warnSpy.mockRestore();
Pattern 7: as any / as unknown as T for partial mocks:
scheduler = new CronScheduler(jobs, mockChannelRegistry as any);
const mockClient = { chat: vi.fn() } as unknown as ModelClient;
What to Mock:
- External API clients (Anthropic, OpenAI, Gemini SDKs)
- Docker/container interactions
- Model clients in agent/orchestrator tests
- Channel adapters and registries in integration tests
console.warn/console.errorwhen testing warning paths
What NOT to Mock:
- Zod schema validation — test against real schemas
- In-memory data structures (Maps, Sets)
- Pure functions (formatters, parsers)
- File system when the test IS about file operations (use temp dirs instead)
Fixtures and Factories
Test Data — Helper functions over fixtures:
// Factory function for config objects
function defaultConfig(overrides: Partial<ToolsConfig> = {}): ToolsConfig {
return {
profile: 'full',
allow: [],
deny: [],
agents: {},
providers: {},
...overrides,
};
}
// Factory function for domain objects
function makeTool(name: string): Tool {
return {
name,
description: `Mock ${name}`,
inputSchema: { type: 'object', properties: {} },
execute: async () => ({ success: true, output: '' }),
};
}
// Factory for test messages
function makeMessages(count: number): Message[] {
const msgs: Message[] = [];
for (let i = 0; i < count; i++) {
msgs.push({
role: i % 2 === 0 ? 'user' : 'assistant',
content: `Message ${i}`,
});
}
return msgs;
}
Minimal configs for schema tests:
const minimalConfig = {
telegram: { bot_token: 'test', allowed_chat_ids: [1] },
models: { default: { provider: 'anthropic', model: 'claude-3' } },
};
Location:
- No separate fixtures directory — helper functions defined at top of each test file
- No shared test utilities file — each test is self-contained
Coverage
Requirements: No coverage thresholds enforced. No coverage config detected.
View Coverage:
pnpm test:run -- --coverage # If @vitest/coverage-v8 is installed
Test Types
Unit Tests (majority):
- Test individual classes and functions in isolation
- Mock external dependencies
- Files:
src/models/router.test.ts,src/tools/policy.test.ts,src/hooks/engine.test.ts
Integration Tests (some):
- Test real interactions between components
- File system tools use real temp directories:
src/tools/builtin/file.test.ts - Session store uses real SQLite:
src/session/store.test.ts - Gateway tests spin up real WebSocket server:
src/gateway/server.test.ts
E2E Tests:
- Not present — no end-to-end test framework
Common Patterns
Async Testing:
it('processes messages', async () => {
const response = await agent.process('Hi');
expect(response).toBe('Hello!');
});
Error/Rejection Testing:
it('throws when all providers fail', async () => {
await expect(router.chat({ messages: [{ role: 'user', content: 'Hi' }] }))
.rejects.toThrow('All model providers failed');
});
Zod Schema Rejection Testing:
it('rejects cron job with empty name', () => {
expect(() => configSchema.parse({
...baseConfig,
automation: {
cron: [{ name: '', schedule: '0 9 * * *', ... }],
},
})).toThrow();
});
Testing Callback Invocation:
it('calls onToolUse callback on start and end', async () => {
const onToolUse = vi.fn();
// ... setup ...
await agent.process('echo hi');
expect(onToolUse).toHaveBeenCalledTimes(2);
expect(onToolUse).toHaveBeenNthCalledWith(1, expect.objectContaining({
type: 'start',
tool: 'test.echo',
}));
});
Testing with satisfies for type-safe mocks:
chat: vi.fn().mockResolvedValue({
content: 'Hello!',
stopReason: 'end_turn',
usage: { inputTokens: 10, outputTokens: 5 },
} satisfies ChatResponse),
Testing Collection Contents:
const names = result.map(t => t.name);
expect(names).toContain('file.read');
expect(names).not.toContain('shell.exec');
Async Stream Testing:
it('streams from primary client', async () => {
const mockStream = async function* (): AsyncIterable<ChatStreamEvent> {
yield { type: 'content', content: 'Hello' };
yield { type: 'done', usage: { inputTokens: 5, outputTokens: 3 } };
};
const mockClient = {
chat: vi.fn(),
chatStream: vi.fn().mockReturnValue(mockStream()),
};
const chunks: string[] = [];
for await (const event of router.chatStream({ messages: [] })) {
if (event.type === 'content' && event.content) {
chunks.push(event.content);
}
}
expect(chunks).toEqual(['Hello']);
});
Integration Test with Real Server (beforeAll/afterAll):
let server: GatewayServer;
beforeAll(async () => {
server = new GatewayServer(config);
await server.start();
});
afterAll(async () => {
await server.stop();
});
function createClient(): Promise<WebSocket> {
return new Promise((resolve, reject) => {
const ws = new WebSocket(`ws://127.0.0.1:${TEST_PORT}`);
ws.on('open', () => resolve(ws));
ws.on('error', reject);
});
}
Conventions Summary
When writing tests for Flynn:
- Import from vitest:
import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'; - Import source with
.jsextension:import { ClassName } from './source-file.js'; - Co-locate test file next to source file as
source-file.test.ts - Use
describe/itblocks with descriptive behavior-focused names - Create mock factories as functions at the top of the test file
- Use
vi.fn()for mocks,vi.mock()for module mocking,as unknown as Tfor partial mocks - Clean up resources in
afterEach: temp dirs, database files, mock spies - Test both success and failure paths — every feature should have at least one error test
- Use helper factories to build test data, not shared fixture files
- Keep tests self-contained — each test file should be independently understandable
Testing analysis: 2026-02-09