docs: Add comprehensive documentation for production deployment and contribution

This commit adds 6 new documentation files to fill critical gaps:

- CONTRIBUTING.md: Developer onboarding guide with setup, workflow,
  code style, testing, and adding features

- TROUBLESHOOTING.md: Common issues and solutions for errors,
  model issues, tool issues, channel issues, gateway issues,
  configuration issues, and memory/database issues

- docs/api/PROTOCOL.md: Gateway JSON-RPC protocol documentation
  with connection, authentication, message format, methods,
  events, error codes, and example client implementation

- docs/api/TOOLS.md: Tools API documentation covering tool interface,
  input schema format, result format, tool patterns,
  tool registration, tool policy, execution flow, and
  builtin tools reference

- docs/deployment/PRODUCTION.md: Production deployment guide
  covering Docker deployment, systemd service, security,
  configuration, monitoring, backup & recovery, and
  performance tuning

- docs/performance/TUNING.md: Performance optimization guide
  covering context management, model routing, tool execution,
  memory & embeddings, session management, database
  performance, gateway performance, and resource usage

These files complement the existing excellent documentation
(README.md, AGENTS.md, ARCHITECTURE.md, STRUCTURE.md,
CONVENTIONS.md) to provide complete coverage for users,
developers, and operators.
This commit is contained in:
William Valentin
2026-02-13 16:07:29 -08:00
parent cc54b3a10c
commit 8a6cd7f559
6 changed files with 5143 additions and 0 deletions
+491
View File
@@ -0,0 +1,491 @@
# Contributing to Flynn
Thank you for your interest in contributing to Flynn! This guide will help you get started.
## Table of Contents
- [Quick Start](#quick-start)
- [Development Setup](#development-setup)
- [Development Workflow](#development-workflow)
- [Code Style](#code-style)
- [Testing](#testing)
- [Adding Features](#adding-features)
- [Commit Guidelines](#commit-guidelines)
- [Submitting Changes](#submitting-changes)
- [Getting Help](#getting-help)
## Quick Start
```bash
# Clone the repository
git clone <repo-url>
cd flynn
# Install dependencies
pnpm install
# Build the project
pnpm build
# Run the daemon
pnpm start
```
## Development Setup
### Prerequisites
- **Node.js** >= 22.0.0
- **pnpm** (package manager)
- **Docker** (optional, for sandbox features)
### Installation
```bash
# Install dependencies
pnpm install
# Verify TypeScript compiles
pnpm typecheck
# Run linter
pnpm lint
# Run tests
pnpm test
```
### Development Commands
| Command | Description |
|---------|-------------|
| `pnpm build` | Compile TypeScript to `dist/` |
| `pnpm dev` | Run daemon with watch mode (tsx watch) |
| `pnpm start` | Start production build |
| `pnpm tui` | Minimal TUI (readline) |
| `pnpm tui:fs` | Fullscreen TUI (React/Ink) |
| `pnpm test` | Run vitest in watch mode |
| `pnpm test:run` | Run tests once (CI) |
| `pnpm lint` | Run ESLint |
| `pnpm typecheck` | TypeScript check (no emit) |
### Running a Single Test File
```bash
pnpm test:run src/path/to/file.test.ts
```
### Configuration for Development
Create a development config:
```bash
cp config/default.yaml ~/.config/flynn/config.yaml
# Edit config with your API keys and settings
```
## Development Workflow
### Branching Strategy
1. **Main branch**: `main` - stable production code
2. **Feature branches**: `feature/description` - new features
3. **Bugfix branches**: `bugfix/description` - bug fixes
4. **Refactor branches**: `refactor/description` - code improvements
### Feature Development
1. Create a feature branch from `main`
```bash
git checkout -b feature/my-new-feature
```
2. Make your changes
3. Build and test
```bash
pnpm build
pnpm test
pnpm lint
pnpm typecheck
```
4. Commit your changes (see [Commit Guidelines](#commit-guidelines))
5. Push and create a pull request
### Committing Changes
Before committing, ensure:
- All tests pass: `pnpm test:run`
- Linting passes: `pnpm lint`
- Type checking passes: `pnpm typecheck`
- Build succeeds: `pnpm build`
```bash
git add .
git commit -m "feat: add my new feature"
```
## Code Style
Flynn follows specific conventions documented in [`.planning/codebase/CONVENTIONS.md`](.planning/codebase/CONVENTIONS.md).
### Key Guidelines
- **2-space indentation** (no tabs)
- **Single quotes** for strings
- **Trailing commas** in multiline structures
- **Semicolons** always used
- **camelCase** for functions/variables
- **PascalCase** for classes/interfaces
- **kebab-case** for source files (`my-feature.ts`)
- **PascalCase** for React components (`MyComponent.tsx`)
- Test files co-located with source: `file.test.ts` beside `file.ts`
### Import Organization
```typescript
// 1. Node.js stdlib
import { readFileSync } from 'fs';
import { execFile } from 'child_process';
// 2. Third-party packages
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
// 3. Local imports (always use .js extension)
import { NativeAgent } from './agent.js';
import type { Config } from '../config/schema.js';
```
### Error Handling
```typescript
// Pattern 1: Return ToolResult with error (tools)
try {
const result = await someOperation();
return { success: true, output: result };
} catch (error) {
return {
success: false,
output: '',
error: error instanceof Error ? error.message : String(error),
};
}
// Pattern 2: Throw with descriptive message (config/setup)
if (envValue === undefined) {
throw new Error(`Environment variable ${envVar} is not set`);
}
```
## Testing
### Test Framework
Flynn uses **Vitest** for testing. Test files are co-located with source files:
```
src/
├── agent.ts
├── agent.test.ts
├── models/
│ ├── anthropic.ts
│ └── anthropic.test.ts
```
### Writing Tests
```typescript
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { MyComponent } from './my-component.js';
describe('MyComponent', () => {
beforeEach(() => {
// Setup before each test
});
afterEach(() => {
// Cleanup after each test
});
it('should do something correctly', () => {
const result = MyComponent.doSomething();
expect(result).toBe('expected-value');
});
it('should handle errors gracefully', () => {
expect(() => MyComponent.doSomethingInvalid()).toThrow();
});
});
```
### Testing Guidelines
- Test both success and failure cases
- Clean up resources (files, directories) in `afterEach` or `it` blocks
- Mock external dependencies (APIs, databases, filesystem)
- Use `describe`/`it` pattern for organization
- Keep tests focused and independent
### Running Tests
```bash
# Watch mode (during development)
pnpm test
# Run once (CI/pre-commit)
pnpm test:run
# Run specific test file
pnpm test:run src/models/anthropic.test.ts
# Run tests matching pattern
pnpm test -- --grep "anthropic"
```
## Adding Features
### Adding a New Tool
Flynn tools follow three patterns:
#### Pattern 1: Static Tool (no dependencies)
```typescript
// src/tools/builtin/my-tool.ts
import type { Tool, ToolResult } from '../types.js';
interface MyToolArgs {
input: string;
}
export const myTool: Tool = {
name: 'my.tool',
description: 'Description of what this tool does',
inputSchema: {
type: 'object',
properties: {
input: { type: 'string', description: 'Input parameter' },
},
required: ['input'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as MyToolArgs;
// Implementation
return { success: true, output: 'result' };
},
};
```
#### Pattern 2: Factory Tool (needs dependency injection)
```typescript
// src/tools/builtin/memory-read.ts
import type { Tool, ToolResult } from '../types.js';
import type { MemoryStore } from '../../memory/store.js';
export function createMemoryReadTool(store: MemoryStore): Tool {
return {
name: 'memory.read',
description: 'Read from memory store',
inputSchema: {
type: 'object',
properties: {
namespace: { type: 'string', description: 'Memory namespace' },
},
required: ['namespace'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
const args = rawArgs as { namespace: string };
try {
const content = store.read(args.namespace);
return { success: true, output: content };
} catch (error) {
return {
success: false,
output: '',
error: error instanceof Error ? error.message : String(error),
};
}
},
};
}
```
#### Pattern 3: Multi-Factory (related tool set)
```typescript
// src/tools/builtin/index.ts
export function createMemoryTools(store: MemoryStore, hybridSearch?: HybridSearch): Tool[] {
return [
createMemoryReadTool(store),
createMemoryWriteTool(store),
createMemorySearchTool(store, hybridSearch),
];
}
```
**Registration Steps:**
1. Add export to `src/tools/builtin/index.ts`
2. Add export to `src/tools/index.ts`
3. Register in `src/daemon/index.ts` (call factory + register)
4. Add to tool profiles in `src/tools/policy.ts` if needed
5. Write tests in `src/tools/builtin/my-tool.test.ts`
### Adding a New Channel Adapter
1. Create directory: `src/channels/<platform>/`
2. Create `adapter.ts` implementing `ChannelAdapter` interface
3. Create `index.ts` re-exporting the adapter
4. Add test: `adapter.test.ts`
5. Register in `src/channels/index.ts`
6. Register in `src/daemon/index.ts`
7. Add config schema in `src/config/schema.ts`
### Adding a New Model Provider
1. Create `src/models/<provider>.ts` implementing `ModelClient` interface
2. Add export to `src/models/index.ts`
3. Add case in `src/daemon/index.ts` → `createClientFromConfig()`
4. Add to `src/config/schema.ts` → `modelConfigBaseSchema.provider` enum
5. Write tests in `src/models/<provider>.test.ts`
### Adding a New CLI Command
```typescript
// src/cli/my-cmd.ts
import { Command } from 'commander';
import { loadConfigSafe } from './shared.js';
export function registerMyCommand(program: Command) {
program
.command('my-cmd')
.description('Description of my command')
.option('-c, --config <path>', 'Config file path')
.action(async (options) => {
const configResult = await loadConfigSafe(options.config);
if (configResult.error) {
console.error(configResult.error);
process.exit(1);
}
// Implementation
});
}
```
Register in `src/cli/index.ts`:
```typescript
import { registerMyCommand } from './my-cmd.js';
// In registerCommands()
registerMyCommand(program);
```
## Commit Guidelines
### Commit Message Format
Follow conventional commits:
```
<type>(<scope>): <description>
[optional body]
[optional footer]
```
### Types
- `feat`: New feature
- `fix`: Bug fix
- `refactor`: Code refactoring (no functional change)
- `docs`: Documentation changes
- `test`: Test additions/modifications
- `chore`: Build process, dependencies, tooling
- `style`: Code style changes (formatting, semicolons, etc.)
### Examples
```
feat(tools): add image analysis tool
Implements image analysis using OpenAI Vision API.
Closes #123
fix(gateway): handle WebSocket disconnection gracefully
Prevents infinite loop when connection drops during event emission.
docs(readme): update quick start instructions
Clarify configuration steps for new users.
test(models): add Anthropic client retry tests
Verify exponential backoff and fallback behavior.
```
## Submitting Changes
### Pull Request Process
1. Ensure your branch is up to date with `main`
```bash
git fetch origin
git rebase origin/main
```
2. Push your branch
```bash
git push -u origin feature/my-new-feature
```
3. Create a pull request with:
- Clear description of changes
- Reference related issues
- Screenshots if UI changes
- Test results
- Breaking changes noted
### Code Review Checklist
Before submitting, verify:
- [ ] All tests pass (`pnpm test:run`)
- [ ] Linting passes (`pnpm lint`)
- [ ] Type checking passes (`pnpm typecheck`)
- [ ] Build succeeds (`pnpm build`)
- [ ] New features have tests
- [ ] Documentation updated (README, AGENTS.md, code comments)
- [ ] No console.log or debugger statements
- [ ] Sensitive data not committed (API keys, tokens)
- [ ] Commit messages follow format
## Getting Help
### Documentation
- **Architecture**: [`.planning/codebase/ARCHITECTURE.md`](.planning/codebase/ARCHITECTURE.md)
- **Structure**: [`.planning/codebase/STRUCTURE.md`](.planning/codebase/STRUCTURE.md)
- **Conventions**: [`.planning/codebase/CONVENTIONS.md`](.planning/codebase/CONVENTIONS.md)
- **Developer Guide**: [`AGENTS.md`](AGENTS.md)
- **User Documentation**: [`README.md`](README.md)
### Troubleshooting
See [`TROUBLESHOOTING.md`](TROUBLESHOOTING.md) for common issues and solutions.
### Questions?
- Open an issue for bugs or feature requests
- Start a discussion for questions
- Check existing issues and discussions first
---
Thank you for contributing to Flynn!
+693
View File
@@ -0,0 +1,693 @@
# Troubleshooting Flynn
This guide covers common issues, error messages, and debugging techniques for Flynn.
## Table of Contents
- [Common Errors](#common-errors)
- [Model Issues](#model-issues)
- [Tool Issues](#tool-issues)
- [Channel Issues](#channel-issues)
- [Gateway Issues](#gateway-issues)
- [Configuration Issues](#configuration-issues)
- [Memory & Database Issues](#memory--database-issues)
- [Debug Mode](#debug-mode)
- [Getting Help](#getting-help)
## Common Errors
### `Error: Environment variable FLYNN_CONFIG is not set`
Flynn can't find your configuration file.
**Solution:**
```bash
# Create config from template
cp config/default.yaml ~/.config/flynn/config.yaml
# Or specify config file explicitly
flynn start --config ~/my-config.yaml
```
### `Error: Failed to load config: ...`
Configuration validation failed. Check your YAML syntax.
**Solution:**
```bash
# Validate your config
flynn doctor --config ~/.config/flynn/config.yaml
# Check YAML syntax
cat ~/.config/flynn/config.yaml | yamllint
```
Common issues:
- Missing quotes around special characters
- Incorrect indentation (YAML is space-sensitive, 2 spaces)
- Invalid boolean values (use `true`/`false`, not `yes`/`no`)
### `Error: Tool 'xxx' is not allowed by tool policy`
Tool execution blocked by policy configuration.
**Solution:**
1. Check tool policy in config:
```yaml
tools:
policy: 'full' # or 'coding', 'messaging', 'minimal'
profiles:
full:
allow: ['*']
deny: []
```
2. Check agent-specific tool overrides:
```yaml
agents:
my-agent:
toolPolicy: 'full'
```
3. Verify tool is registered (check `src/daemon/index.ts`)
### `Error: Session not found`
Gateway tried to access a non-existent session.
**Solution:**
```bash
# List active sessions
flynn sessions
# Sessions are auto-created on first message
# Ensure you're sending to a valid channel/sender combination
```
## Model Issues
### Model refuses to answer / "I can't help with that"
The model may be rejecting requests due to content filters or safety constraints.
**Solution:**
1. Try rephrasing your request
2. Check if the model has specific content policies
3. Try a different model tier:
```bash
# In TUI, switch models
/model complex
```
### Rate Limit Errors / Too Many Requests
API rate limits exceeded.
**Solution:**
1. Reduce request frequency
2. Add retry configuration in config:
```yaml
models:
default:
anthropic:
apiKey: 'sk-...'
retry:
maxAttempts: 3
initialDelayMs: 1000
maxDelayMs: 30000
multiplier: 2
```
3. Switch to a different provider or tier
### Model Fallback Not Working
Fallback chain not triggering on errors.
**Solution:**
1. Check model router config:
```yaml
models:
router:
tiers:
default: 'anthropic:claude-sonnet-4-20250514'
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
```
2. Check error patterns in retry config (errors matching these patterns won't retry):
```yaml
retry:
nonRetryablePatterns:
- 'invalid_api_key'
- 'permission_denied'
```
3. Enable debug logging to see fallback decisions:
```bash
DEBUG='*' flynn start
```
### Context Window Exceeded
Conversation too long for model's context window.
**Solution:**
1. Check compaction settings:
```yaml
agents:
default:
compaction:
thresholdPct: 80
keepTurns: 4
summaryMaxTokens: 1024
```
2. Increase `keepTurns` to preserve more recent history
3. Increase `thresholdPct` to trigger compaction earlier
## Tool Issues
### Tool Execution Timeout
Tool took too long to complete.
**Solution:**
1. Check timeout config:
```yaml
tools:
executor:
defaultTimeoutMs: 30000
```
2. Increase timeout for specific tools (if supported):
```bash
shell.exec --timeout 60000 "long-running-command"
```
3. Check if tool is hanging (stuck process, network issue)
### Tool Permission Denied
Tool can't access file/execute command.
**Solution:**
1. Check file permissions:
```bash
ls -la /path/to/file
chmod +x /path/to/executable
```
2. Check Docker sandbox permissions (if enabled):
```bash
# Ensure Docker daemon is running
docker ps
# Check Flynn's Docker access
groups $USER | grep docker
```
3. Verify hook configuration (hooks may block tools):
```bash
flynn config | grep -A 20 hooks
```
### Tool Output Truncated
Output exceeds maximum size limit.
**Solution:**
Check max output config:
```yaml
tools:
executor:
maxOutputBytes: 51200 # 50KB default
```
Increase limit if needed, or use file operations to handle large outputs.
### Tool Not Found
Tool doesn't exist or isn't registered.
**Solution:**
```bash
# List available tools
flynn doctor --config ~/.config/flynn/config.yaml
# Check if tool is in builtin list
grep -r "name: 'your.tool'" src/tools/builtin/
```
If you added a custom tool, ensure it's registered in `src/daemon/index.ts`.
## Channel Issues
### Telegram Bot Not Responding
Telegram bot not receiving or processing messages.
**Solution:**
1. Check bot token in config:
```yaml
channels:
telegram:
enabled: true
token: '123456789:ABCdefGHIjklMNOpqrSTUvwxYZ'
```
2. Verify bot is running:
```bash
# Check Flynn logs for telegram startup
flynn start 2>&1 | grep telegram
```
3. Test bot via Telegram API:
```bash
curl https://api.telegram.org/bot<YOUR_TOKEN>/getMe
```
4. Check `allowed_chat_ids` whitelist:
```yaml
channels:
telegram:
allowedChatIds: ['123456789'] # Your chat ID
```
### Discord Bot Not Joining Channels
Discord bot permissions issues.
**Solution:**
1. Check bot token:
```yaml
channels:
discord:
enabled: true
token: 'MTIzNDU2Nzg5O...ABcDefGhIjKlMnOpQrStUvWxYz'
```
2. Verify guild/channel IDs in whitelist:
```yaml
channels:
discord:
allowedGuildIds: ['123456789012345678']
allowedChannelIds: ['123456789012345678']
```
3. Check bot permissions in Discord server (need `Read Messages`, `Send Messages`, `Embed Links`)
### Slack Webhooks Not Receiving
Slack event subscription issues.
**Solution:**
1. Verify signing secret in config:
```yaml
channels:
slack:
enabled: true
signingSecret: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6'
```
2. Check Slack app permissions and event subscriptions
3. Verify ngrok/tunnel if testing locally
### WhatsApp Bot Not Connecting
WhatsApp Web.js connection issues.
**Solution:**
1. Check phone number whitelist:
```yaml
channels:
whatsapp:
enabled: true
allowedNumbers: ['+1234567890']
```
2. WhatsApp requires QR code scan on first run - run in foreground:
```bash
flynn start # Watch console for QR code
```
3. If running in Docker, ensure it can display QR code or use saved session
## Gateway Issues
### WebSocket Connection Refused
Can't connect to gateway WebSocket.
**Solution:**
1. Check gateway is running:
```bash
curl http://localhost:18800/health
```
2. Check port configuration:
```yaml
gateway:
enabled: true
port: 18800
```
3. Check firewall rules:
```bash
sudo ufw allow 18800
```
4. Check auth token if configured:
```yaml
gateway:
auth:
token: 'your-secret-token'
```
### Gateway Lock Active
Only one WebSocket client allowed at a time.
**Solution:**
```yaml
gateway:
lock:
enabled: false # Disable lock
```
Or disconnect existing client first.
### Tailscale Serve Not Exposing
Tailscale integration not working.
**Solution:**
1. Ensure Tailscale is installed and running:
```bash
tailscale status
```
2. Check Tailscale config:
```yaml
gateway:
tailscaleServe:
enabled: true
hostname: 'my-flynn'
port: 443
```
3. Check Flynn logs for Tailscale errors:
```bash
flynn start 2>&1 | grep -i tailscale
```
## Configuration Issues
### Invalid YAML Syntax
Configuration file has YAML syntax errors.
**Solution:**
```bash
# Validate with YAML linter
pip install yamllint
yamllint ~/.config/flynn/config.yaml
# Or use online YAML validator
# https://www.yamllint.com/
```
Common mistakes:
- Using tabs instead of spaces (YAML requires spaces)
- Incorrect indentation (must be consistent)
- Missing quotes around special characters (`:`, `{`, `}`, `[`, `]`)
- Unclosed brackets or quotes
### Environment Variable Expansion Not Working
`${ENV_VAR}` in config not expanding.
**Solution:**
1. Ensure variable is set:
```bash
echo $MY_API_KEY
```
2. Check variable syntax in config:
```yaml
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}' # Correct
# Not: $ANTHROPIC_API_KEY or "${ANTHROPIC_API_KEY}"
```
3. Ensure no quotes around the entire value:
```yaml
# Wrong
apiKey: '${ANTHROPIC_API_KEY}'
# Correct
apiKey: '${ANTHROPIC_API_KEY}'
```
### Secrets Showing in Logs
API keys or sensitive data visible in logs.
**Solution:**
Flynn automatically redacts secrets when showing config with `flynn config`.
If you see secrets in output, check:
- Don't log config manually (use the built-in redaction)
- Don't commit config files to git (add to `.gitignore`)
- Use `FLYNN_CONFIG` env var for sensitive paths
## Memory & Database Issues
### SQLite Database Locked
Can't write to session or vector database.
**Solution:**
1. Check if another Flynn instance is running:
```bash
ps aux | grep flynn
```
2. Kill existing instances:
```bash
pkill flynn
```
3. Check database permissions:
```bash
ls -la ~/.local/share/flynn/*.db
chmod 644 ~/.local/share/flynn/*.db
```
### Memory Not Saving
Memory writes not persisting.
**Solution:**
1. Check memory directory:
```bash
ls -la ~/.local/share/flynn/memory/
```
2. Check namespace format (must be valid filename):
```bash
# Good
memory.write --namespace 'my-knowledge'
# Bad
memory.write --namespace 'my/namespace' # Creates directory
```
3. Check disk space:
```bash
df -h ~/.local/share/flynn/
```
### Vector Search Returns No Results
Hybrid search not finding matches.
**Solution:**
1. Ensure embeddings are generated:
```bash
# Check vector database
sqlite3 ~/.local/share/flynn/vectors.db "SELECT COUNT(*) FROM embeddings;"
```
2. Check embedding provider config:
```yaml
memory:
embeddings:
provider: 'openai' # or 'gemini', 'ollama', 'llamacpp'
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small'
```
3. Try keyword search only:
```bash
memory.search --query 'my query' --keyword-only
```
## Debug Mode
### Enable Debug Logging
Flynn uses console logging. Enable verbose output:
```bash
# Set DEBUG environment variable (if using debug packages)
DEBUG='*' flynn start
# Or redirect all output to file
flynn start > /tmp/flynn.log 2>&1
# Monitor logs in real-time
tail -f /tmp/flynn.log
```
### Run Doctor Command
Use the built-in diagnostic tool:
```bash
# Full check
flynn doctor
# Check specific component
flynn doctor --config ~/.config/flynn/config.yaml
# Check connectivity
flynn doctor --check connectivity
```
The doctor checks:
- Config syntax and validation
- Model provider connectivity
- Channel adapter status
- Memory and database integrity
- Gateway health
### Test Components Individually
```bash
# Test model connectivity
flynn send "Hello, world!"
# Test specific tool (in TUI)
/file.read /path/to/file
# Test channel
# Send message via platform (Telegram, Discord, etc.)
# Test gateway
curl http://localhost:18800/health
```
### Inspect Session Data
```bash
# View sessions
flynn sessions
# Inspect session database directly
sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM sessions LIMIT 5;"
# View session messages
sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM messages WHERE session_id = '...';"
```
### Enable Heartbeat Monitoring
Flynn includes a heartbeat monitor that checks system health:
```yaml
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
```
Failures will be logged and notifications sent (if configured).
## Getting Help
### Log Your Issue
When reporting issues, include:
1. **Error message** (full stack trace if available)
2. **Flynn version**: `flynn --version`
3. **Configuration** (redact secrets): `flynn config`
4. **Node version**: `node --version`
5. **OS**: `uname -a`
6. **Steps to reproduce**
### Check Existing Issues
Search GitHub issues for similar problems:
- https://github.com/your-repo/flynn/issues
### Ask for Help
- **GitHub Issues**: Bug reports and feature requests
- **GitHub Discussions**: Questions and community help
- **Documentation**: Check README.md, AGENTS.md, and this file
### Provide Diagnostic Output
```bash
# Run doctor and save output
flynn doctor > /tmp/flynn-doctor.txt 2>&1
# Get config (redacted)
flynn config > /tmp/flynn-config.txt
# Get logs
journalctl -u flynn -n 100 --no-pager > /tmp/flynn-logs.txt
# Attach these files to your issue
```
---
Still having issues? Open a GitHub issue with the diagnostic output above.
+1007
View File
File diff suppressed because it is too large Load Diff
+1193
View File
File diff suppressed because it is too large Load Diff
+913
View File
@@ -0,0 +1,913 @@
# Production Deployment Guide
This guide covers deploying Flynn in a production environment.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Docker Deployment](#docker-deployment)
- [Systemd Service](#systemd-service)
- [Security](#security)
- [Configuration](#configuration)
- [Monitoring](#monitoring)
- [Backup & Recovery](#backup--recovery)
- [Performance Tuning](#performance-tuning)
- [Scaling Considerations](#scaling-considerations)
## Prerequisites
### System Requirements
- **OS**: Linux (Ubuntu 22.04+ recommended) or macOS
- **Node.js**: >= 22.0.0
- **Memory**: Minimum 2GB, 4GB+ recommended
- **Disk**: 10GB+ for sessions, memory, and vectors
- **Docker**: Required for sandbox features (optional)
### Network Requirements
- Public IP or VPN (Tailscale recommended) for remote access
- Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
- Outbound HTTPS access for model providers and web tools
### External Services (Optional)
- **Model Providers**: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
- **Email**: SMTP server for email notifications
- **Object Storage**: MinIO or S3 for backups (optional)
## Docker Deployment
### Quick Start
Using the provided `docker-compose.yml`:
```bash
# Clone repository
git clone <repo-url>
cd flynn
# Create config
cp config/default.yaml config/production.yaml
# Edit config/production.yaml with your settings
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
```
### Dockerfile
The multi-stage Dockerfile:
```dockerfile
# Stage 1: Build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Runtime
FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY config ./config
COPY src/gateway/ui ./dist/gateway/ui
# Create data directory
RUN mkdir -p /data
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:18800/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
# Expose gateway port
EXPOSE 18800
# Run
CMD ["node", "dist/cli/index.js", "start"]
```
### Docker Compose Configuration
```yaml
version: '3.8'
services:
flynn:
build: .
container_name: flynn
restart: unless-stopped
ports:
- "18800:18800"
volumes:
- ./config/production.yaml:/flynn/config.yaml:ro
- flynn_data:/data
- /var/run/docker.sock:/var/run/docker.sock # For sandbox
environment:
- NODE_ENV=production
- FLYNN_CONFIG=/flynn/config.yaml
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:18800/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
whisper:
image: openai/whisper-server:latest
container_name: whisper-server
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- whisper_cache:/cache
environment:
- WHISPER_MODEL=base
- WHISPER_HTTP_PORT=8080
volumes:
flynn_data:
whisper_cache:
```
### Environment Variables
```bash
# Node environment
export NODE_ENV=production
# Config path
export FLYNN_CONFIG=/path/to/config.yaml
# Data directory (default: ~/.local/share/flynn)
export FLYNN_DATA_DIR=/var/lib/flynn
# Optional: Override model provider credentials
export ANTHROPIC_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
```
## Systemd Service
### Service File
Create `/etc/systemd/system/flynn.service`:
```ini
[Unit]
Description=Flynn AI Assistant Daemon
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=flynn
Group=flynn
WorkingDirectory=/opt/flynn
Environment="NODE_ENV=production"
Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
Environment="FLYNN_DATA_DIR=/var/lib/flynn"
ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=flynn
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run
# Resource limits
MemoryLimit=2G
MemorySwap=0
CPUQuota=200%
[Install]
WantedBy=multi-user.target
```
### Create Flynn User
```bash
# Create user and group
sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
sudo groupadd flynn
# Create directories
sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn
# Copy binaries and config
sudo cp -r dist/* /opt/flynn/
sudo cp config/production.yaml /etc/flynn/config.yaml
sudo chown -R root:root /opt/flynn /etc/flynn
sudo chmod 644 /etc/flynn/config.yaml
```
### Enable and Start Service
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable flynn
# Start service
sudo systemctl start flynn
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -f
# Restart service
sudo systemctl restart flynn
```
### Service Management
```bash
# Stop service
sudo systemctl stop flynn
# Reload config (requires restart)
sudo systemctl restart flynn
# Check if running
sudo systemctl is-active flynn
# View recent logs
sudo journalctl -u flynn -n 100 --no-pager
```
## Security
### Secrets Management
Never commit secrets to version control. Use one of these approaches:
#### Environment Variables
```yaml
# config/production.yaml
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}'
```
Set in `/etc/flynn/.env` or systemd service file:
```ini
Environment="ANTHROPIC_API_KEY=sk-..."
```
#### HashiCorp Vault (Advanced)
Use a secrets manager and inject at runtime:
```bash
vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
rm /tmp/anthropic_key.txt
```
### Authentication
#### Gateway Auth
```yaml
# config/production.yaml
gateway:
enabled: true
auth:
token: 'your-random-token-here' # Generate with: openssl rand -hex 32
trustTailscaleIdentity: true
applyToHttp: true
```
Generate a secure token:
```bash
openssl rand -hex 32
```
#### Channel Whitelists
Restrict who can interact with Flynn:
```yaml
channels:
telegram:
allowedChatIds: ['123456789'] # Your Telegram chat ID
discord:
allowedGuildIds: ['987654321098765432']
allowedChannelIds: ['123456789012345678']
slack:
allowedChannelIds: ['C12345678']
signingSecret: '${SLACK_SIGNING_SECRET}'
```
### Network Security
#### Firewall
```bash
# Ubuntu/Debian (ufw)
sudo ufw allow 22/tcp # SSH
sudo ufw allow 18800/tcp # Flynn gateway
sudo ufw enable
# CentOS/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=18800/tcp
sudo firewall-cmd --reload
```
#### Reverse Proxy (Nginx)
Place Flynn behind Nginx for TLS:
```nginx
server {
listen 443 ssl http2;
server_name flynn.example.com;
ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;
# WebSocket upgrade
location / {
proxy_pass http://localhost:18800;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Health check endpoint (no auth required)
location /health {
proxy_pass http://localhost:18800/health;
access_log off;
}
}
```
Obtain TLS certificate with Let's Encrypt:
```bash
sudo certbot --nginx -d flynn.example.com
```
### File Permissions
```bash
# Data directory
sudo chmod 750 /var/lib/flynn
sudo chown flynn:flynn /var/lib/flynn
# Config file
sudo chmod 640 /etc/flynn/config.yaml
sudo chown root:flynn /etc/flynn/config.yaml
# Logs
sudo chmod 750 /var/log/flynn
sudo chown flynn:flynn /var/log/flynn
```
### Sandbox Security
Docker sandbox adds isolation but requires careful configuration:
```yaml
# config/production.yaml
sandbox:
enabled: true
image: 'node:22-alpine'
dockerSocket: '/var/run/docker.sock'
resourceLimits:
memory: '512m'
cpus: '0.5'
timeoutSec: 60
networkMode: 'none' # No network access
```
Ensure Docker is secured:
```bash
# Run Docker as Flynn user
sudo usermod -aG docker flynn
# Configure Docker daemon security
sudo vim /etc/docker/daemon.json
```
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"live-restore": true,
"userland-proxy": false
}
```
## Configuration
### Production Config Template
```yaml
# config/production.yaml
# Base config for production deployment
# ── Gateway ───────────────────────────────────────────────────────────────
gateway:
enabled: true
port: 18800
auth:
token: '${GATEWAY_TOKEN}'
trustTailscaleIdentity: true
applyToHttp: true
lock:
enabled: true
tailscaleServe:
enabled: false # Set to true to expose via Tailscale
hostname: 'flynn'
port: 443
# ── Models ─────────────────────────────────────────────────────────────────
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}'
model: 'claude-sonnet-4-20250514'
maxTokens: 4096
router:
tiers:
default: 'anthropic:claude-sonnet-4-20250514'
fast: 'anthropic:claude-haiku-4-20250514'
complex: 'anthropic:claude-opus-4-20250514'
local: 'ollama:llama3'
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
retry:
maxAttempts: 3
initialDelayMs: 1000
multiplier: 2
maxDelayMs: 30000
# ── Channels ───────────────────────────────────────────────────────────────
channels:
telegram:
enabled: true
token: '${TELEGRAM_BOT_TOKEN}'
allowedChatIds: ['123456789']
discord:
enabled: false
slack:
enabled: false
whatsapp:
enabled: false
# ── Sessions ───────────────────────────────────────────────────────────────
sessions:
ttl: '7d'
maxSessions: 100
# ── Memory ────────────────────────────────────────────────────────────────
memory:
enabled: true
embeddings:
provider: 'openai'
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small'
# ── Tools ─────────────────────────────────────────────────────────────────
tools:
policy: 'coding' # Restrict tool access
executor:
defaultTimeoutMs: 30000
maxOutputBytes: 51200
sandbox:
enabled: false # Enable if using Docker
# ── Agents ────────────────────────────────────────────────────────────────
agents:
default:
modelTier: 'default'
toolPolicy: 'coding'
compaction:
thresholdPct: 80
keepTurns: 4
summaryMaxTokens: 1024
# ── Automation ────────────────────────────────────────────────────────────
automation:
cron:
enabled: false
webhooks:
enabled: false
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
# ── Logging ───────────────────────────────────────────────────────────────
logging:
level: 'info' # debug, info, warn, error
```
### Config Validation
Validate config before starting:
```bash
flynn doctor --config /etc/flynn/config.yaml
```
## Monitoring
### Health Checks
Flynn provides a health check endpoint:
```bash
# HTTP health check
curl http://localhost:18800/health
# Response
{
"status": "ok",
"version": "0.1.0",
"uptime": 12345
}
```
### Logs
#### Journalctl (systemd)
```bash
# Follow logs
sudo journalctl -u flynn -f
# View last 100 lines
sudo journalctl -u flynn -n 100 --no-pager
# View logs since yesterday
sudo journalctl -u flynn --since yesterday
# Search for errors
sudo journalctl -u flynn | grep -i error
```
#### Log Rotation
Configure logrotate for systemd journal:
```bash
sudo vim /etc/systemd/journald.conf
```
```
[Journal]
SystemMaxUse=100M
MaxRetentionSec=7day
```
Restart systemd:
```bash
sudo systemctl restart systemd-journald
```
### Heartbeat Monitor
Enable built-in heartbeat monitoring:
```yaml
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
- type: 'webhook'
url: 'https://hooks.slack.com/services/...'
```
### External Monitoring
#### Prometheus (Optional)
Use Node.js prom-client for metrics (not currently implemented):
```yaml
# Future feature
monitoring:
prometheus:
enabled: true
port: 9090
```
#### Uptime Monitoring
Use external services:
- UptimeRobot
- Pingdom
- Better Uptime
Monitor:
- Gateway HTTP health endpoint
- WebSocket connection
- Response time
## Backup & Recovery
### What to Backup
1. **Configuration**: `/etc/flynn/config.yaml`
2. **Sessions**: SQLite database at `~/.local/share/flynn/sessions.db`
3. **Memory Files**: `~/.local/share/flynn/memory/`
4. **Vectors**: SQLite database at `~/.local/share/flynn/vectors.db`
5. **Pairing Codes**: SQLite table within sessions.db
### Backup Script
Create `/usr/local/bin/flynn-backup.sh`:
```bash
#!/bin/bash
set -e
BACKUP_DIR="/var/backups/flynn"
DATA_DIR="/var/lib/flynn"
CONFIG_DIR="/etc/flynn"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Stop Flynn
sudo systemctl stop flynn
# Create backup
tar -czf "$BACKUP_FILE" \
"$CONFIG_DIR/config.yaml" \
"$DATA_DIR/sessions.db" \
"$DATA_DIR/vectors.db" \
"$DATA_DIR/memory/"
# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete
# Restart Flynn
sudo systemctl start flynn
echo "Backup created: $BACKUP_FILE"
```
Make executable:
```bash
sudo chmod +x /usr/local/bin/flynn-backup.sh
```
### Cron Job
Add to root crontab:
```bash
sudo crontab -e
```
```
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1
```
### Restore
```bash
# Stop Flynn
sudo systemctl stop flynn
# Extract backup
sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /
# Start Flynn
sudo systemctl start flynn
```
### Database Maintenance
Run SQLite vacuum periodically:
```bash
sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
sqlite3 /var/lib/flynn/vectors.db "VACUUM;"
```
Add to crontab (monthly):
```
0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1
```
## Performance Tuning
### Node.js Tuning
Set Node.js options for production:
```bash
# In systemd service
Environment="NODE_OPTIONS=--max-old-space-size=2048"
# Or via environment variable
export NODE_OPTIONS="--max-old-space-size=2048"
```
### Context Management
Optimize compaction settings:
```yaml
agents:
default:
compaction:
thresholdPct: 75 # Trigger earlier
keepTurns: 6 # Keep more context
summaryMaxTokens: 2048 # Better summaries
```
### SQLite Performance
Enable WAL mode:
```bash
sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;" # 64MB
```
### Model Routing
Configure tiers for optimal cost/latency:
```yaml
models:
router:
tiers:
fast: 'anthropic:claude-haiku-4-20250514' # Quick tasks
default: 'anthropic:claude-sonnet-4-20250514' # General use
complex: 'anthropic:claude-opus-4-20250514' # Complex reasoning
local: 'ollama:llama3' # Fallback
```
### Caching (Future)
Consider adding caching for:
- Repeated tool calls
- Memory search results
- Model responses for common queries
## Scaling Considerations
### Single-Operator Scope
Flynn is designed for a single operator with multiple concurrent users. Limitations:
- **Max Concurrent Sessions**: ~100 (depends on model rate limits)
- **Throughput**: ~10-20 messages/second (varies by model)
- **Memory Usage**: 2-4GB for moderate usage
### When to Scale Up
Consider scaling if:
- Consistent CPU usage > 80%
- Memory usage > 4GB
- Frequent rate limiting from model providers
- Slow response times > 30 seconds
### Scaling Strategies
1. **Horizontal Scaling**: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)
2. **Vertical Scaling**: Increase server resources (CPU, memory)
3. **Multi-Instance Architecture** (future):
- Shared session storage (PostgreSQL/Redis)
- Message queue for request distribution
- Session affinity for stateful connections
### Cost Optimization
- Use local models for non-critical tasks
- Cache embeddings
- Optimize compaction to reduce token usage
- Use efficient models for delegated tasks
## Troubleshooting Production Issues
### Service Won't Start
```bash
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -n 50 --no-pager
# Validate config
flynn doctor --config /etc/flynn/config.yaml
```
### High Memory Usage
```bash
# Check memory
free -h
# Check process memory
ps aux | grep flynn
# Restart service
sudo systemctl restart flynn
```
### Gateway Connection Issues
```bash
# Check if port is listening
sudo ss -tlnp | grep 18800
# Check firewall
sudo ufw status
# Test connectivity
curl http://localhost:18800/health
```
### Slow Response Times
```bash
# Check CPU usage
top
# Check model provider status
# Verify API keys are valid
# Check network latency
# Enable debug logging
DEBUG='*' sudo systemctl restart flynn
```
---
For additional help, see:
- [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md)
- [README.md](../../README.md)
- GitHub Issues
+846
View File
@@ -0,0 +1,846 @@
# Performance Tuning Guide
This guide covers performance optimization techniques for Flynn in production environments.
## Table of Contents
- [Overview](#overview)
- [Context Management](#context-management)
- [Model Routing](#model-routing)
- [Tool Execution](#tool-execution)
- [Memory & Embeddings](#memory--embeddings)
- [Session Management](#session-management)
- [Database Performance](#database-performance)
- [Gateway Performance](#gateway-performance)
- [Resource Usage](#resource-usage)
- [Monitoring & Profiling](#monitoring--profiling)
## Overview
Flynn's performance depends on several factors:
1. **Context window efficiency**: How efficiently tokens are used
2. **Model selection**: Choosing the right model for each task
3. **Tool execution**: Fast, reliable tool responses
4. **I/O operations**: Database and file system access
5. **Concurrency**: Handling multiple simultaneous requests
### Performance Goals
- **Response time**: < 5 seconds for simple queries
- **Context efficiency**: > 80% token utilization
- **Throughput**: 10-20 concurrent conversations
- **Resource usage**: < 2GB memory, < 50% CPU
## Context Management
### Compaction Settings
Context compaction prevents conversations from exceeding model context windows.
```yaml
agents:
default:
compaction:
# Trigger compaction at 75% of context window
thresholdPct: 75
# Keep last 6 turns (user + assistant pairs)
keepTurns: 6
# Allow 2048 tokens for summary
summaryMaxTokens: 2048
# Preserve high-importance messages
importanceThreshold: 0.8
```
### Tuning Guidelines
**For fast interactions:**
```yaml
thresholdPct: 60 # Compact early
keepTurns: 2 # Minimal history
summaryMaxTokens: 512 # Short summaries
```
**For complex reasoning:**
```yaml
thresholdPct: 85 # Maximize context
keepTurns: 10 # More history
summaryMaxTokens: 4096 # Detailed summaries
```
### Context Depth Levels
Control how much context is injected into the system prompt:
```yaml
prompt:
contextDepth: 'normal' # minimal | normal | detailed | debug
```
- `minimal`: Only basic system prompt
- `normal`: System prompt + basic memory
- `detailed`: Full memory + tool descriptions
- `debug`: Verbose context (development only)
### Token Counting
Flynn uses rule-based token estimation (fast but approximate).
**Enable tokenizer for accuracy (slower):**
```typescript
// Currently not implemented
// Future: Use tiktoken or similar for exact token counts
```
## Model Routing
### Tier Configuration
Optimize model tiers for cost and latency:
```yaml
models:
router:
tiers:
# Fast, cheap: Quick tasks, delegated calls
fast: 'anthropic:claude-haiku-4-20250514'
# Default: General conversation
default: 'anthropic:claude-sonnet-4-20250514'
# Complex: Deep reasoning, analysis
complex: 'anthropic:claude-opus-4-20250514'
# Fallback: Local models when cloud fails
local: 'ollama:llama3'
```
### Delegation Tasks
Map delegation tasks to appropriate tiers:
```yaml
agents:
default:
delegation:
tiers:
compaction: 'fast' # Summarize history
memoryExtraction: 'fast' # Extract facts
classification: 'default' # Classify intent
toolSummarization: 'default' # Summarize tool results
complexReasoning: 'complex' # Deep analysis
```
### Fallback Chains
Configure fallback chains for resilience:
```yaml
models:
router:
# Try same model on different provider
tierFallbacks:
default:
- 'github:claude-sonnet-4-5'
- 'openai:gpt-4o-mini'
# Global fallback when all tiers fail
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
```
### Retry Configuration
Optimize retry behavior for different scenarios:
```yaml
models:
router:
retry:
# More retries for transient failures
maxAttempts: 3
# Start with 1s delay
initialDelayMs: 1000
# Exponential backoff
multiplier: 2
# Max 30s between retries
maxDelayMs: 30000
# Don't retry auth errors
nonRetryablePatterns:
- 'invalid_api_key'
- 'permission_denied'
- 'rate_limit_exceeded'
```
**For production reliability:**
```yaml
maxAttempts: 5
initialDelayMs: 500
multiplier: 1.5
maxDelayMs: 60000
```
### Cost Estimation
Monitor token usage and costs:
```typescript
// Model costs (examples)
const MODEL_COSTS = {
'anthropic:claude-sonnet-4-20250514': {
input: 3.0, // $3 per 1M input tokens
output: 15.0 // $15 per 1M output tokens
},
'anthropic:claude-haiku-4-20250514': {
input: 0.25,
output: 1.25
}
};
```
Track usage with `AgentOrchestrator.getUsageStats()`.
## Tool Execution
### Timeout Configuration
Set appropriate timeouts for different tool types:
```yaml
tools:
executor:
# Default 30s timeout
defaultTimeoutMs: 30000
# Max 50KB output
maxOutputBytes: 51200
```
**For long-running tools:**
```yaml
tools:
executor:
defaultTimeoutMs: 60000 # 60s
```
**For fast tools:**
```yaml
tools:
executor:
defaultTimeoutMs: 10000 # 10s
```
### Caching (Future)
Implement caching for repeated operations:
```yaml
# Not yet implemented
tools:
cache:
enabled: true
ttl: 300 # 5 minutes
maxSize: 1000
excludePatterns:
- 'shell.exec'
- 'process.*'
```
### Sandbox Performance
Docker sandbox adds overhead. Optimize:
```yaml
sandbox:
enabled: true
image: 'node:22-alpine'
# Resource limits
resourceLimits:
memory: '512m'
cpus: '0.5'
timeoutSec: 60
# Use host networking if safe
networkMode: 'host' # Faster than bridge mode
```
**For best performance:**
```yaml
sandbox:
enabled: false # Disable if not needed
```
### Parallel Tool Execution
Flynn executes tools sequentially. For parallel execution:
```typescript
// Future enhancement
const results = await Promise.all([
toolRegistry.execute('tool1', args1),
toolRegistry.execute('tool2', args2),
toolRegistry.execute('tool3', args3)
]);
```
## Memory & Embeddings
### Embedding Provider Selection
Choose embedding provider based on latency and cost:
```yaml
memory:
embeddings:
provider: 'openai' # openai | gemini | ollama | llamacpp | voyage
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small' # Fastest
# Alternative: Local embeddings
ollama:
host: 'localhost:11434'
model: 'nomic-embed-text'
```
**Latency comparison:**
- OpenAI `text-embedding-3-small`: ~100ms
- Gemini: ~200ms
- Ollama `nomic-embed-text`: ~500ms (local)
- llama.cpp: ~300ms (local)
### Text Chunking
Optimize chunking for better search:
```yaml
memory:
embeddings:
chunking:
# Smaller chunks for precision
maxChunkSize: 512
# Overlap for context preservation
chunkOverlap: 50
# Don't chunk small documents
minChunkSize: 128
```
**For fast indexing:**
```yaml
maxChunkSize: 1024
chunkOverlap: 100
```
**For precise search:**
```yaml
maxChunkSize: 256
chunkOverlap: 25
```
### Hybrid Search Tuning
Balance keyword and vector search:
```yaml
memory:
search:
# Weight vector search higher
vectorWeight: 0.7
keywordWeight: 0.3
# Return top results
limit: 10
# Minimum relevance threshold
threshold: 0.5
```
**For keyword-heavy queries:**
```yaml
vectorWeight: 0.4
keywordWeight: 0.6
```
**For semantic queries:**
```yaml
vectorWeight: 0.8
keywordWeight: 0.2
```
### Embedding Caching
Cache embeddings to avoid recomputation:
```yaml
memory:
embeddings:
cache:
enabled: true
ttl: 86400 # 24 hours
```
## Session Management
### TTL Configuration
Set appropriate session TTLs:
```yaml
sessions:
ttl: '7d' # Keep sessions for 7 days
# Maximum concurrent sessions
maxSessions: 100
```
**For memory efficiency:**
```yaml
ttl: '1d'
maxSessions: 50
```
**For long-term memory:**
```yaml
ttl: '30d'
maxSessions: 200
```
### Session Pruning
Prune old sessions regularly:
```yaml
automation:
sessionPruner:
enabled: true
interval: '1h' # Run every hour
# Prune sessions older than TTL
pruneOlderThan: '7d'
```
### Session Indexing
Optimize session search with indexes:
```sql
-- SQLite indexes
CREATE INDEX idx_sessions_created_at ON sessions(created_at);
CREATE INDEX idx_sessions_last_active ON sessions(last_active_at);
CREATE INDEX idx_messages_session_id ON messages(session_id);
```
## Database Performance
### SQLite Configuration
Optimize SQLite for Flynn's workload:
```bash
# In SQLite connection setup
PRAGMA journal_mode = WAL; -- Better concurrency
PRAGMA synchronous = NORMAL; -- Faster writes
PRAGMA cache_size = -64000; -- 64MB cache
PRAGMA temp_store = MEMORY; -- Store temp data in memory
PRAGMA mmap_size = 268435456; -- 256MB mmap
PRAGMA page_size = 4096; -- Default page size
```
### Connection Pooling
Flynn uses single SQLite connection per database. For high concurrency, consider:
```typescript
// Future: Connection pool
import Database from 'better-sqlite3';
const pool = new ConnectionPool({
filename: '/path/to/database.db',
maxConnections: 10
});
```
### Query Optimization
Use indexed columns in queries:
```typescript
// Good: Uses index
const sessions = db.prepare(`
SELECT * FROM sessions
WHERE last_active_at > ?
ORDER BY last_active_at DESC
LIMIT 10
`).all(threshold);
// Bad: Full table scan
const sessions = db.prepare(`
SELECT * FROM sessions
WHERE message_count > ?
`).all(threshold);
```
### Vacuum and Analyze
Regular maintenance improves performance:
```bash
# Vacuum to reclaim space
sqlite3 sessions.db "VACUUM;"
# Analyze for query optimization
sqlite3 sessions.db "ANALYZE;"
# Rebuild indexes
sqlite3 sessions.db "REINDEX;"
```
Add to crontab (monthly):
```
0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM; ANALYZE;" >> /var/log/flynn-maintenance.log 2>&1
```
## Gateway Performance
### Connection Limits
Limit concurrent connections:
```yaml
gateway:
enabled: true
port: 18800
# Maximum concurrent WebSocket connections
maxConnections: 50
# Single-client lock
lock:
enabled: true # Only one client at a time
```
**For multiple users:**
```yaml
gateway:
maxConnections: 100
lock:
enabled: false
```
### Lane Queue
The lane queue serializes requests per session:
```yaml
gateway:
laneQueue:
# Max requests per session
maxDepth: 10
# Request timeout
requestTimeoutMs: 30000
```
### WebSocket Optimization
Configure WebSocket for performance:
```typescript
// Gateway server WebSocket options
const wsOptions = {
// Enable compression
perMessageDeflate: {
threshold: 1024
},
// Ping interval (heartbeat)
clientTracking: true,
// Maximum message size
maxPayload: 16 * 1024 * 1024 // 16MB
};
```
### HTTP Server
Optimize HTTP server for static files:
```yaml
gateway:
static:
# Enable gzip compression
gzip: true
# Cache static assets
cacheControl: 'public, max-age=3600'
# Serve index.html for SPA routes
spa: true
```
## Resource Usage
### Node.js Options
Tune Node.js for production:
```bash
# Increase memory limit
export NODE_OPTIONS="--max-old-space-size=4096"
# Enable optimizations
export NODE_OPTIONS="--max-old-space-size=4096 --optimize-for-size --gc-interval=100"
```
In systemd service:
```ini
Environment="NODE_OPTIONS=--max-old-space-size=4096"
```
### Process Limits
Set appropriate limits:
```ini
[Service]
# Memory limit (2GB)
MemoryLimit=2G
MemorySwap=0
# CPU quota (200% = 2 cores)
CPUQuota=200%
# File descriptors
LimitNOFILE=65536
```
### Docker Resource Limits
Constrain Docker container:
```yaml
services:
flynn:
deploy:
resources:
limits:
cpus: '2.0'
memory: 2G
reservations:
cpus: '1.0'
memory: 1G
```
### Memory Monitoring
Monitor memory usage:
```bash
# Check Flynn memory
ps aux | grep flynn
# System memory
free -h
# Node.js heap stats (add to code)
console.log('Heap used:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB');
```
## Monitoring & Profiling
### Health Checks
Enable gateway health endpoint:
```yaml
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
```
Check health:
```bash
curl http://localhost:18800/health
```
### Logging Levels
Configure logging appropriately:
```yaml
logging:
level: 'info' # debug | info | warn | error
```
**Development:** `debug` - All messages
**Production:** `info` - Normal operation
**Minimal:** `warn` - Only warnings and errors
### Performance Metrics
Track key metrics:
```typescript
// Future: Metrics collection
interface Metrics {
// Response times
avgResponseTime: number;
p95ResponseTime: number;
p99ResponseTime: number;
// Throughput
requestsPerSecond: number;
concurrentSessions: number;
// Token usage
avgInputTokens: number;
avgOutputTokens: number;
totalTokens: number;
// Errors
errorRate: number;
timeoutRate: number;
}
```
### Profiling
Profile Node.js execution:
```bash
# Generate CPU profile
node --prof dist/cli/index.js start
# Process profile
node --prof-process isolate-*.log > profile.txt
# Analyze with Chrome DevTools
# Open chrome://inspect and load profile
```
### Flamegraphs
Generate flamegraphs for bottleneck analysis:
```bash
# Install 0x
npm install -g 0x
# Run with profiler
0x dist/cli/index.js start
```
## Common Performance Issues
### High Memory Usage
**Symptoms:**
- OOM errors
- Slow garbage collection
- System swapping
**Solutions:**
1. Reduce `keepTurns` in compaction
2. Decrease session TTL
3. Prune old sessions
4. Increase Node.js memory limit
5. Check for memory leaks
### Slow Response Times
**Symptoms:**
- Responses > 10 seconds
- Timeouts
- Poor user experience
**Solutions:**
1. Switch to faster model tier
2. Enable compaction
3. Use local fallbacks
4. Optimize tool timeouts
5. Check network latency
### High CPU Usage
**Symptoms:**
- CPU > 80%
- Slow system
- High latency
**Solutions:**
1. Reduce concurrent sessions
2. Optimize database queries
3. Use efficient embeddings
4. Disable unnecessary features
5. Scale vertically (more CPU)
### Database Locks
**Symptoms:**
- SQLite database locked errors
- Slow writes
- Concurrent access issues
**Solutions:**
1. Enable WAL mode
2. Reduce write frequency
3. Use connection pooling
4. Add appropriate indexes
### Model Rate Limits
**Symptoms:**
- 429 Too Many Requests errors
- Frequent fallbacks
- Increased latency
**Solutions:**
1. Configure retry with exponential backoff
2. Use faster models for delegated tasks
3. Implement request queuing
4. Add local model fallbacks
## Performance Checklist
Before deploying to production, verify:
- [ ] Compaction configured with appropriate threshold
- [ ] Model tiers configured for cost/latency
- [ ] Fallback chains configured
- [ ] Tool timeouts set appropriately
- [ ] Session TTL reasonable for use case
- [ ] SQLite optimized (WAL mode, cache size)
- [ ] Database indexes created
- [ ] Gateway connection limits set
- [ ] Memory limits configured
- [ ] Monitoring enabled
- [ ] Logging level set to `info` or `warn`
- [ ] Health checks working
- [ ] Backup/restore tested
---
For more information:
- [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md)
- [PRODUCTION.md](../deployment/PRODUCTION.md)
- [ARCHITECTURE.md](../../.planning/codebase/ARCHITECTURE.md)