docs: Add comprehensive documentation for production deployment and contribution

This commit adds 6 new documentation files to fill critical gaps: - CONTRIBUTING.md: Developer onboarding guide with setup, workflow, code style, testing, and adding features - TROUBLESHOOTING.md: Common issues and solutions for errors, model issues, tool issues, channel issues, gateway issues, configuration issues, and memory/database issues - docs/api/PROTOCOL.md: Gateway JSON-RPC protocol documentation with connection, authentication, message format, methods, events, error codes, and example client implementation - docs/api/TOOLS.md: Tools API documentation covering tool interface, input schema format, result format, tool patterns, tool registration, tool policy, execution flow, and builtin tools reference - docs/deployment/PRODUCTION.md: Production deployment guide covering Docker deployment, systemd service, security, configuration, monitoring, backup & recovery, and performance tuning - docs/performance/TUNING.md: Performance optimization guide covering context management, model routing, tool execution, memory & embeddings, session management, database performance, gateway performance, and resource usage These files complement the existing excellent documentation (README.md, AGENTS.md, ARCHITECTURE.md, STRUCTURE.md, CONVENTIONS.md) to provide complete coverage for users, developers, and operators.
2026-02-13 16:07:29 -08:00
parent cc54b3a10c
commit 8a6cd7f559
6 changed files with 5143 additions and 0 deletions
@@ -0,0 +1,491 @@
+# Contributing to Flynn
+
+Thank you for your interest in contributing to Flynn! This guide will help you get started.
+
+## Table of Contents
+
+- [Quick Start](#quick-start)
+- [Development Setup](#development-setup)
+- [Development Workflow](#development-workflow)
+- [Code Style](#code-style)
+- [Testing](#testing)
+- [Adding Features](#adding-features)
+- [Commit Guidelines](#commit-guidelines)
+- [Submitting Changes](#submitting-changes)
+- [Getting Help](#getting-help)
+
+## Quick Start
+
+```bash
+# Clone the repository
+git clone <repo-url>
+cd flynn
+
+# Install dependencies
+pnpm install
+
+# Build the project
+pnpm build
+
+# Run the daemon
+pnpm start
+```
+
+## Development Setup
+
+### Prerequisites
+
+- **Node.js** >= 22.0.0
+- **pnpm** (package manager)
+- **Docker** (optional, for sandbox features)
+
+### Installation
+
+```bash
+# Install dependencies
+pnpm install
+
+# Verify TypeScript compiles
+pnpm typecheck
+
+# Run linter
+pnpm lint
+
+# Run tests
+pnpm test
+```
+
+### Development Commands
+
+| Command | Description |
+|---------|-------------|
+| `pnpm build` | Compile TypeScript to `dist/` |
+| `pnpm dev` | Run daemon with watch mode (tsx watch) |
+| `pnpm start` | Start production build |
+| `pnpm tui` | Minimal TUI (readline) |
+| `pnpm tui:fs` | Fullscreen TUI (React/Ink) |
+| `pnpm test` | Run vitest in watch mode |
+| `pnpm test:run` | Run tests once (CI) |
+| `pnpm lint` | Run ESLint |
+| `pnpm typecheck` | TypeScript check (no emit) |
+
+### Running a Single Test File
+
+```bash
+pnpm test:run src/path/to/file.test.ts
+```
+
+### Configuration for Development
+
+Create a development config:
+
+```bash
+cp config/default.yaml ~/.config/flynn/config.yaml
+# Edit config with your API keys and settings
+```
+
+## Development Workflow
+
+### Branching Strategy
+
+1. **Main branch**: `main` - stable production code
+2. **Feature branches**: `feature/description` - new features
+3. **Bugfix branches**: `bugfix/description` - bug fixes
+4. **Refactor branches**: `refactor/description` - code improvements
+
+### Feature Development
+
+1. Create a feature branch from `main`
+   ```bash
+   git checkout -b feature/my-new-feature
+   ```
+
+2. Make your changes
+
+3. Build and test
+   ```bash
+   pnpm build
+   pnpm test
+   pnpm lint
+   pnpm typecheck
+   ```
+
+4. Commit your changes (see [Commit Guidelines](#commit-guidelines))
+
+5. Push and create a pull request
+
+### Committing Changes
+
+Before committing, ensure:
+- All tests pass: `pnpm test:run`
+- Linting passes: `pnpm lint`
+- Type checking passes: `pnpm typecheck`
+- Build succeeds: `pnpm build`
+
+```bash
+git add .
+git commit -m "feat: add my new feature"
+```
+
+## Code Style
+
+Flynn follows specific conventions documented in [`.planning/codebase/CONVENTIONS.md`](.planning/codebase/CONVENTIONS.md).
+
+### Key Guidelines
+
+- **2-space indentation** (no tabs)
+- **Single quotes** for strings
+- **Trailing commas** in multiline structures
+- **Semicolons** always used
+- **camelCase** for functions/variables
+- **PascalCase** for classes/interfaces
+- **kebab-case** for source files (`my-feature.ts`)
+- **PascalCase** for React components (`MyComponent.tsx`)
+- Test files co-located with source: `file.test.ts` beside `file.ts`
+
+### Import Organization
+
+```typescript
+// 1. Node.js stdlib
+import { readFileSync } from 'fs';
+import { execFile } from 'child_process';
+
+// 2. Third-party packages
+import Anthropic from '@anthropic-ai/sdk';
+import { z } from 'zod';
+
+// 3. Local imports (always use .js extension)
+import { NativeAgent } from './agent.js';
+import type { Config } from '../config/schema.js';
+```
+
+### Error Handling
+
+```typescript
+// Pattern 1: Return ToolResult with error (tools)
+try {
+  const result = await someOperation();
+  return { success: true, output: result };
+} catch (error) {
+  return {
+    success: false,
+    output: '',
+    error: error instanceof Error ? error.message : String(error),
+  };
+}
+
+// Pattern 2: Throw with descriptive message (config/setup)
+if (envValue === undefined) {
+  throw new Error(`Environment variable ${envVar} is not set`);
+}
+```
+
+## Testing
+
+### Test Framework
+
+Flynn uses **Vitest** for testing. Test files are co-located with source files:
+
+```
+src/
+├── agent.ts
+├── agent.test.ts
+├── models/
+│   ├── anthropic.ts
+│   └── anthropic.test.ts
+```
+
+### Writing Tests
+
+```typescript
+import { describe, it, expect, beforeEach, afterEach } from 'vitest';
+import { MyComponent } from './my-component.js';
+
+describe('MyComponent', () => {
+  beforeEach(() => {
+    // Setup before each test
+  });
+
+  afterEach(() => {
+    // Cleanup after each test
+  });
+
+  it('should do something correctly', () => {
+    const result = MyComponent.doSomething();
+    expect(result).toBe('expected-value');
+  });
+
+  it('should handle errors gracefully', () => {
+    expect(() => MyComponent.doSomethingInvalid()).toThrow();
+  });
+});
+```
+
+### Testing Guidelines
+
+- Test both success and failure cases
+- Clean up resources (files, directories) in `afterEach` or `it` blocks
+- Mock external dependencies (APIs, databases, filesystem)
+- Use `describe`/`it` pattern for organization
+- Keep tests focused and independent
+
+### Running Tests
+
+```bash
+# Watch mode (during development)
+pnpm test
+
+# Run once (CI/pre-commit)
+pnpm test:run
+
+# Run specific test file
+pnpm test:run src/models/anthropic.test.ts
+
+# Run tests matching pattern
+pnpm test -- --grep "anthropic"
+```
+
+## Adding Features
+
+### Adding a New Tool
+
+Flynn tools follow three patterns:
+
+#### Pattern 1: Static Tool (no dependencies)
+
+```typescript
+// src/tools/builtin/my-tool.ts
+import type { Tool, ToolResult } from '../types.js';
+
+interface MyToolArgs {
+  input: string;
+}
+
+export const myTool: Tool = {
+  name: 'my.tool',
+  description: 'Description of what this tool does',
+  inputSchema: {
+    type: 'object',
+    properties: {
+      input: { type: 'string', description: 'Input parameter' },
+    },
+    required: ['input'],
+  },
+  execute: async (rawArgs: unknown): Promise<ToolResult> => {
+    const args = rawArgs as MyToolArgs;
+    // Implementation
+    return { success: true, output: 'result' };
+  },
+};
+```
+
+#### Pattern 2: Factory Tool (needs dependency injection)
+
+```typescript
+// src/tools/builtin/memory-read.ts
+import type { Tool, ToolResult } from '../types.js';
+import type { MemoryStore } from '../../memory/store.js';
+
+export function createMemoryReadTool(store: MemoryStore): Tool {
+  return {
+    name: 'memory.read',
+    description: 'Read from memory store',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        namespace: { type: 'string', description: 'Memory namespace' },
+      },
+      required: ['namespace'],
+    },
+    execute: async (rawArgs: unknown): Promise<ToolResult> => {
+      const args = rawArgs as { namespace: string };
+      try {
+        const content = store.read(args.namespace);
+        return { success: true, output: content };
+      } catch (error) {
+        return {
+          success: false,
+          output: '',
+          error: error instanceof Error ? error.message : String(error),
+        };
+      }
+    },
+  };
+}
+```
+
+#### Pattern 3: Multi-Factory (related tool set)
+
+```typescript
+// src/tools/builtin/index.ts
+export function createMemoryTools(store: MemoryStore, hybridSearch?: HybridSearch): Tool[] {
+  return [
+    createMemoryReadTool(store),
+    createMemoryWriteTool(store),
+    createMemorySearchTool(store, hybridSearch),
+  ];
+}
+```
+
+**Registration Steps:**
+
+1. Add export to `src/tools/builtin/index.ts`
+2. Add export to `src/tools/index.ts`
+3. Register in `src/daemon/index.ts` (call factory + register)
+4. Add to tool profiles in `src/tools/policy.ts` if needed
+5. Write tests in `src/tools/builtin/my-tool.test.ts`
+
+### Adding a New Channel Adapter
+
+1. Create directory: `src/channels/<platform>/`
+2. Create `adapter.ts` implementing `ChannelAdapter` interface
+3. Create `index.ts` re-exporting the adapter
+4. Add test: `adapter.test.ts`
+5. Register in `src/channels/index.ts`
+6. Register in `src/daemon/index.ts`
+7. Add config schema in `src/config/schema.ts`
+
+### Adding a New Model Provider
+
+1. Create `src/models/<provider>.ts` implementing `ModelClient` interface
+2. Add export to `src/models/index.ts`
+3. Add case in `src/daemon/index.ts` → `createClientFromConfig()`
+4. Add to `src/config/schema.ts` → `modelConfigBaseSchema.provider` enum
+5. Write tests in `src/models/<provider>.test.ts`
+
+### Adding a New CLI Command
+
+```typescript
+// src/cli/my-cmd.ts
+import { Command } from 'commander';
+import { loadConfigSafe } from './shared.js';
+
+export function registerMyCommand(program: Command) {
+  program
+    .command('my-cmd')
+    .description('Description of my command')
+    .option('-c, --config <path>', 'Config file path')
+    .action(async (options) => {
+      const configResult = await loadConfigSafe(options.config);
+      if (configResult.error) {
+        console.error(configResult.error);
+        process.exit(1);
+      }
+      // Implementation
+    });
+}
+```
+
+Register in `src/cli/index.ts`:
+```typescript
+import { registerMyCommand } from './my-cmd.js';
+
+// In registerCommands()
+registerMyCommand(program);
+```
+
+## Commit Guidelines
+
+### Commit Message Format
+
+Follow conventional commits:
+
+```
+<type>(<scope>): <description>
+
+[optional body]
+
+[optional footer]
+```
+
+### Types
+
+- `feat`: New feature
+- `fix`: Bug fix
+- `refactor`: Code refactoring (no functional change)
+- `docs`: Documentation changes
+- `test`: Test additions/modifications
+- `chore`: Build process, dependencies, tooling
+- `style`: Code style changes (formatting, semicolons, etc.)
+
+### Examples
+
+```
+feat(tools): add image analysis tool
+
+Implements image analysis using OpenAI Vision API.
+
+Closes #123
+
+fix(gateway): handle WebSocket disconnection gracefully
+
+Prevents infinite loop when connection drops during event emission.
+
+docs(readme): update quick start instructions
+
+Clarify configuration steps for new users.
+
+test(models): add Anthropic client retry tests
+
+Verify exponential backoff and fallback behavior.
+```
+
+## Submitting Changes
+
+### Pull Request Process
+
+1. Ensure your branch is up to date with `main`
+   ```bash
+   git fetch origin
+   git rebase origin/main
+   ```
+
+2. Push your branch
+   ```bash
+   git push -u origin feature/my-new-feature
+   ```
+
+3. Create a pull request with:
+   - Clear description of changes
+   - Reference related issues
+   - Screenshots if UI changes
+   - Test results
+   - Breaking changes noted
+
+### Code Review Checklist
+
+Before submitting, verify:
+
+- [ ] All tests pass (`pnpm test:run`)
+- [ ] Linting passes (`pnpm lint`)
+- [ ] Type checking passes (`pnpm typecheck`)
+- [ ] Build succeeds (`pnpm build`)
+- [ ] New features have tests
+- [ ] Documentation updated (README, AGENTS.md, code comments)
+- [ ] No console.log or debugger statements
+- [ ] Sensitive data not committed (API keys, tokens)
+- [ ] Commit messages follow format
+
+## Getting Help
+
+### Documentation
+
+- **Architecture**: [`.planning/codebase/ARCHITECTURE.md`](.planning/codebase/ARCHITECTURE.md)
+- **Structure**: [`.planning/codebase/STRUCTURE.md`](.planning/codebase/STRUCTURE.md)
+- **Conventions**: [`.planning/codebase/CONVENTIONS.md`](.planning/codebase/CONVENTIONS.md)
+- **Developer Guide**: [`AGENTS.md`](AGENTS.md)
+- **User Documentation**: [`README.md`](README.md)
+
+### Troubleshooting
+
+See [`TROUBLESHOOTING.md`](TROUBLESHOOTING.md) for common issues and solutions.
+
+### Questions?
+
+- Open an issue for bugs or feature requests
+- Start a discussion for questions
+- Check existing issues and discussions first
+
+---
+
+Thank you for contributing to Flynn!
@@ -0,0 +1,693 @@
+# Troubleshooting Flynn
+
+This guide covers common issues, error messages, and debugging techniques for Flynn.
+
+## Table of Contents
+
+- [Common Errors](#common-errors)
+- [Model Issues](#model-issues)
+- [Tool Issues](#tool-issues)
+- [Channel Issues](#channel-issues)
+- [Gateway Issues](#gateway-issues)
+- [Configuration Issues](#configuration-issues)
+- [Memory & Database Issues](#memory--database-issues)
+- [Debug Mode](#debug-mode)
+- [Getting Help](#getting-help)
+
+## Common Errors
+
+### `Error: Environment variable FLYNN_CONFIG is not set`
+
+Flynn can't find your configuration file.
+
+**Solution:**
+
+```bash
+# Create config from template
+cp config/default.yaml ~/.config/flynn/config.yaml
+
+# Or specify config file explicitly
+flynn start --config ~/my-config.yaml
+```
+
+### `Error: Failed to load config: ...`
+
+Configuration validation failed. Check your YAML syntax.
+
+**Solution:**
+
+```bash
+# Validate your config
+flynn doctor --config ~/.config/flynn/config.yaml
+
+# Check YAML syntax
+cat ~/.config/flynn/config.yaml | yamllint
+```
+
+Common issues:
+- Missing quotes around special characters
+- Incorrect indentation (YAML is space-sensitive, 2 spaces)
+- Invalid boolean values (use `true`/`false`, not `yes`/`no`)
+
+### `Error: Tool 'xxx' is not allowed by tool policy`
+
+Tool execution blocked by policy configuration.
+
+**Solution:**
+
+1. Check tool policy in config:
+   ```yaml
+   tools:
+     policy: 'full'  # or 'coding', 'messaging', 'minimal'
+     profiles:
+       full:
+         allow: ['*']
+         deny: []
+   ```
+
+2. Check agent-specific tool overrides:
+   ```yaml
+   agents:
+     my-agent:
+       toolPolicy: 'full'
+   ```
+
+3. Verify tool is registered (check `src/daemon/index.ts`)
+
+### `Error: Session not found`
+
+Gateway tried to access a non-existent session.
+
+**Solution:**
+
+```bash
+# List active sessions
+flynn sessions
+
+# Sessions are auto-created on first message
+# Ensure you're sending to a valid channel/sender combination
+```
+
+## Model Issues
+
+### Model refuses to answer / "I can't help with that"
+
+The model may be rejecting requests due to content filters or safety constraints.
+
+**Solution:**
+
+1. Try rephrasing your request
+2. Check if the model has specific content policies
+3. Try a different model tier:
+   ```bash
+   # In TUI, switch models
+   /model complex
+   ```
+
+### Rate Limit Errors / Too Many Requests
+
+API rate limits exceeded.
+
+**Solution:**
+
+1. Reduce request frequency
+2. Add retry configuration in config:
+   ```yaml
+   models:
+     default:
+       anthropic:
+         apiKey: 'sk-...'
+         retry:
+           maxAttempts: 3
+           initialDelayMs: 1000
+           maxDelayMs: 30000
+           multiplier: 2
+   ```
+
+3. Switch to a different provider or tier
+
+### Model Fallback Not Working
+
+Fallback chain not triggering on errors.
+
+**Solution:**
+
+1. Check model router config:
+   ```yaml
+   models:
+     router:
+       tiers:
+         default: 'anthropic:claude-sonnet-4-20250514'
+         fallbackChain:
+           - 'github:claude-sonnet-4-5'
+           - 'local:ollama:llama3'
+   ```
+
+2. Check error patterns in retry config (errors matching these patterns won't retry):
+   ```yaml
+   retry:
+     nonRetryablePatterns:
+       - 'invalid_api_key'
+       - 'permission_denied'
+   ```
+
+3. Enable debug logging to see fallback decisions:
+   ```bash
+   DEBUG='*' flynn start
+   ```
+
+### Context Window Exceeded
+
+Conversation too long for model's context window.
+
+**Solution:**
+
+1. Check compaction settings:
+   ```yaml
+   agents:
+     default:
+       compaction:
+         thresholdPct: 80
+         keepTurns: 4
+         summaryMaxTokens: 1024
+   ```
+
+2. Increase `keepTurns` to preserve more recent history
+3. Increase `thresholdPct` to trigger compaction earlier
+
+## Tool Issues
+
+### Tool Execution Timeout
+
+Tool took too long to complete.
+
+**Solution:**
+
+1. Check timeout config:
+   ```yaml
+   tools:
+     executor:
+       defaultTimeoutMs: 30000
+   ```
+
+2. Increase timeout for specific tools (if supported):
+   ```bash
+   shell.exec --timeout 60000 "long-running-command"
+   ```
+
+3. Check if tool is hanging (stuck process, network issue)
+
+### Tool Permission Denied
+
+Tool can't access file/execute command.
+
+**Solution:**
+
+1. Check file permissions:
+   ```bash
+   ls -la /path/to/file
+   chmod +x /path/to/executable
+   ```
+
+2. Check Docker sandbox permissions (if enabled):
+   ```bash
+   # Ensure Docker daemon is running
+   docker ps
+
+   # Check Flynn's Docker access
+   groups $USER | grep docker
+   ```
+
+3. Verify hook configuration (hooks may block tools):
+   ```bash
+   flynn config | grep -A 20 hooks
+   ```
+
+### Tool Output Truncated
+
+Output exceeds maximum size limit.
+
+**Solution:**
+
+Check max output config:
+```yaml
+tools:
+  executor:
+    maxOutputBytes: 51200  # 50KB default
+```
+
+Increase limit if needed, or use file operations to handle large outputs.
+
+### Tool Not Found
+
+Tool doesn't exist or isn't registered.
+
+**Solution:**
+
+```bash
+# List available tools
+flynn doctor --config ~/.config/flynn/config.yaml
+
+# Check if tool is in builtin list
+grep -r "name: 'your.tool'" src/tools/builtin/
+```
+
+If you added a custom tool, ensure it's registered in `src/daemon/index.ts`.
+
+## Channel Issues
+
+### Telegram Bot Not Responding
+
+Telegram bot not receiving or processing messages.
+
+**Solution:**
+
+1. Check bot token in config:
+   ```yaml
+   channels:
+     telegram:
+       enabled: true
+       token: '123456789:ABCdefGHIjklMNOpqrSTUvwxYZ'
+   ```
+
+2. Verify bot is running:
+   ```bash
+   # Check Flynn logs for telegram startup
+   flynn start 2>&1 | grep telegram
+   ```
+
+3. Test bot via Telegram API:
+   ```bash
+   curl https://api.telegram.org/bot<YOUR_TOKEN>/getMe
+   ```
+
+4. Check `allowed_chat_ids` whitelist:
+   ```yaml
+   channels:
+     telegram:
+       allowedChatIds: ['123456789']  # Your chat ID
+   ```
+
+### Discord Bot Not Joining Channels
+
+Discord bot permissions issues.
+
+**Solution:**
+
+1. Check bot token:
+   ```yaml
+   channels:
+     discord:
+       enabled: true
+       token: 'MTIzNDU2Nzg5O...ABcDefGhIjKlMnOpQrStUvWxYz'
+   ```
+
+2. Verify guild/channel IDs in whitelist:
+   ```yaml
+   channels:
+     discord:
+       allowedGuildIds: ['123456789012345678']
+       allowedChannelIds: ['123456789012345678']
+   ```
+
+3. Check bot permissions in Discord server (need `Read Messages`, `Send Messages`, `Embed Links`)
+
+### Slack Webhooks Not Receiving
+
+Slack event subscription issues.
+
+**Solution:**
+
+1. Verify signing secret in config:
+   ```yaml
+   channels:
+     slack:
+       enabled: true
+       signingSecret: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6'
+   ```
+
+2. Check Slack app permissions and event subscriptions
+3. Verify ngrok/tunnel if testing locally
+
+### WhatsApp Bot Not Connecting
+
+WhatsApp Web.js connection issues.
+
+**Solution:**
+
+1. Check phone number whitelist:
+   ```yaml
+   channels:
+     whatsapp:
+       enabled: true
+       allowedNumbers: ['+1234567890']
+   ```
+
+2. WhatsApp requires QR code scan on first run - run in foreground:
+   ```bash
+   flynn start  # Watch console for QR code
+   ```
+
+3. If running in Docker, ensure it can display QR code or use saved session
+
+## Gateway Issues
+
+### WebSocket Connection Refused
+
+Can't connect to gateway WebSocket.
+
+**Solution:**
+
+1. Check gateway is running:
+   ```bash
+   curl http://localhost:18800/health
+   ```
+
+2. Check port configuration:
+   ```yaml
+   gateway:
+     enabled: true
+     port: 18800
+   ```
+
+3. Check firewall rules:
+   ```bash
+   sudo ufw allow 18800
+   ```
+
+4. Check auth token if configured:
+   ```yaml
+   gateway:
+     auth:
+       token: 'your-secret-token'
+   ```
+
+### Gateway Lock Active
+
+Only one WebSocket client allowed at a time.
+
+**Solution:**
+
+```yaml
+gateway:
+  lock:
+    enabled: false  # Disable lock
+```
+
+Or disconnect existing client first.
+
+### Tailscale Serve Not Exposing
+
+Tailscale integration not working.
+
+**Solution:**
+
+1. Ensure Tailscale is installed and running:
+   ```bash
+   tailscale status
+   ```
+
+2. Check Tailscale config:
+   ```yaml
+   gateway:
+     tailscaleServe:
+       enabled: true
+       hostname: 'my-flynn'
+       port: 443
+   ```
+
+3. Check Flynn logs for Tailscale errors:
+   ```bash
+   flynn start 2>&1 | grep -i tailscale
+   ```
+
+## Configuration Issues
+
+### Invalid YAML Syntax
+
+Configuration file has YAML syntax errors.
+
+**Solution:**
+
+```bash
+# Validate with YAML linter
+pip install yamllint
+yamllint ~/.config/flynn/config.yaml
+
+# Or use online YAML validator
+# https://www.yamllint.com/
+```
+
+Common mistakes:
+- Using tabs instead of spaces (YAML requires spaces)
+- Incorrect indentation (must be consistent)
+- Missing quotes around special characters (`:`, `{`, `}`, `[`, `]`)
+- Unclosed brackets or quotes
+
+### Environment Variable Expansion Not Working
+
+`${ENV_VAR}` in config not expanding.
+
+**Solution:**
+
+1. Ensure variable is set:
+   ```bash
+   echo $MY_API_KEY
+   ```
+
+2. Check variable syntax in config:
+   ```yaml
+   models:
+     default:
+       anthropic:
+         apiKey: '${ANTHROPIC_API_KEY}'  # Correct
+         # Not: $ANTHROPIC_API_KEY or "${ANTHROPIC_API_KEY}"
+   ```
+
+3. Ensure no quotes around the entire value:
+   ```yaml
+   # Wrong
+   apiKey: '${ANTHROPIC_API_KEY}'
+
+   # Correct
+   apiKey: '${ANTHROPIC_API_KEY}'
+   ```
+
+### Secrets Showing in Logs
+
+API keys or sensitive data visible in logs.
+
+**Solution:**
+
+Flynn automatically redacts secrets when showing config with `flynn config`.
+
+If you see secrets in output, check:
+- Don't log config manually (use the built-in redaction)
+- Don't commit config files to git (add to `.gitignore`)
+- Use `FLYNN_CONFIG` env var for sensitive paths
+
+## Memory & Database Issues
+
+### SQLite Database Locked
+
+Can't write to session or vector database.
+
+**Solution:**
+
+1. Check if another Flynn instance is running:
+   ```bash
+   ps aux | grep flynn
+   ```
+
+2. Kill existing instances:
+   ```bash
+   pkill flynn
+   ```
+
+3. Check database permissions:
+   ```bash
+   ls -la ~/.local/share/flynn/*.db
+   chmod 644 ~/.local/share/flynn/*.db
+   ```
+
+### Memory Not Saving
+
+Memory writes not persisting.
+
+**Solution:**
+
+1. Check memory directory:
+   ```bash
+   ls -la ~/.local/share/flynn/memory/
+   ```
+
+2. Check namespace format (must be valid filename):
+   ```bash
+   # Good
+   memory.write --namespace 'my-knowledge'
+
+   # Bad
+   memory.write --namespace 'my/namespace'  # Creates directory
+   ```
+
+3. Check disk space:
+   ```bash
+   df -h ~/.local/share/flynn/
+   ```
+
+### Vector Search Returns No Results
+
+Hybrid search not finding matches.
+
+**Solution:**
+
+1. Ensure embeddings are generated:
+   ```bash
+   # Check vector database
+   sqlite3 ~/.local/share/flynn/vectors.db "SELECT COUNT(*) FROM embeddings;"
+   ```
+
+2. Check embedding provider config:
+   ```yaml
+   memory:
+     embeddings:
+       provider: 'openai'  # or 'gemini', 'ollama', 'llamacpp'
+       openai:
+         apiKey: '${OPENAI_API_KEY}'
+         model: 'text-embedding-3-small'
+   ```
+
+3. Try keyword search only:
+   ```bash
+   memory.search --query 'my query' --keyword-only
+   ```
+
+## Debug Mode
+
+### Enable Debug Logging
+
+Flynn uses console logging. Enable verbose output:
+
+```bash
+# Set DEBUG environment variable (if using debug packages)
+DEBUG='*' flynn start
+
+# Or redirect all output to file
+flynn start > /tmp/flynn.log 2>&1
+
+# Monitor logs in real-time
+tail -f /tmp/flynn.log
+```
+
+### Run Doctor Command
+
+Use the built-in diagnostic tool:
+
+```bash
+# Full check
+flynn doctor
+
+# Check specific component
+flynn doctor --config ~/.config/flynn/config.yaml
+
+# Check connectivity
+flynn doctor --check connectivity
+```
+
+The doctor checks:
+- Config syntax and validation
+- Model provider connectivity
+- Channel adapter status
+- Memory and database integrity
+- Gateway health
+
+### Test Components Individually
+
+```bash
+# Test model connectivity
+flynn send "Hello, world!"
+
+# Test specific tool (in TUI)
+/file.read /path/to/file
+
+# Test channel
+# Send message via platform (Telegram, Discord, etc.)
+
+# Test gateway
+curl http://localhost:18800/health
+```
+
+### Inspect Session Data
+
+```bash
+# View sessions
+flynn sessions
+
+# Inspect session database directly
+sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM sessions LIMIT 5;"
+
+# View session messages
+sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM messages WHERE session_id = '...';"
+```
+
+### Enable Heartbeat Monitoring
+
+Flynn includes a heartbeat monitor that checks system health:
+
+```yaml
+automation:
+  heartbeat:
+    enabled: true
+    interval: '5m'
+    checks:
+      - 'gateway'
+      - 'model'
+      - 'channels'
+      - 'memory'
+      - 'disk'
+```
+
+Failures will be logged and notifications sent (if configured).
+
+## Getting Help
+
+### Log Your Issue
+
+When reporting issues, include:
+
+1. **Error message** (full stack trace if available)
+2. **Flynn version**: `flynn --version`
+3. **Configuration** (redact secrets): `flynn config`
+4. **Node version**: `node --version`
+5. **OS**: `uname -a`
+6. **Steps to reproduce**
+
+### Check Existing Issues
+
+Search GitHub issues for similar problems:
+- https://github.com/your-repo/flynn/issues
+
+### Ask for Help
+
+- **GitHub Issues**: Bug reports and feature requests
+- **GitHub Discussions**: Questions and community help
+- **Documentation**: Check README.md, AGENTS.md, and this file
+
+### Provide Diagnostic Output
+
+```bash
+# Run doctor and save output
+flynn doctor > /tmp/flynn-doctor.txt 2>&1
+
+# Get config (redacted)
+flynn config > /tmp/flynn-config.txt
+
+# Get logs
+journalctl -u flynn -n 100 --no-pager > /tmp/flynn-logs.txt
+
+# Attach these files to your issue
+```
+
+---
+
+Still having issues? Open a GitHub issue with the diagnostic output above.
@@ -0,0 +1,913 @@
+# Production Deployment Guide
+
+This guide covers deploying Flynn in a production environment.
+
+## Table of Contents
+
+- [Prerequisites](#prerequisites)
+- [Docker Deployment](#docker-deployment)
+- [Systemd Service](#systemd-service)
+- [Security](#security)
+- [Configuration](#configuration)
+- [Monitoring](#monitoring)
+- [Backup & Recovery](#backup--recovery)
+- [Performance Tuning](#performance-tuning)
+- [Scaling Considerations](#scaling-considerations)
+
+## Prerequisites
+
+### System Requirements
+
+- **OS**: Linux (Ubuntu 22.04+ recommended) or macOS
+- **Node.js**: >= 22.0.0
+- **Memory**: Minimum 2GB, 4GB+ recommended
+- **Disk**: 10GB+ for sessions, memory, and vectors
+- **Docker**: Required for sandbox features (optional)
+
+### Network Requirements
+
+- Public IP or VPN (Tailscale recommended) for remote access
+- Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
+- Outbound HTTPS access for model providers and web tools
+
+### External Services (Optional)
+
+- **Model Providers**: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
+- **Email**: SMTP server for email notifications
+- **Object Storage**: MinIO or S3 for backups (optional)
+
+## Docker Deployment
+
+### Quick Start
+
+Using the provided `docker-compose.yml`:
+
+```bash
+# Clone repository
+git clone <repo-url>
+cd flynn
+
+# Create config
+cp config/default.yaml config/production.yaml
+# Edit config/production.yaml with your settings
+
+# Start services
+docker-compose up -d
+
+# View logs
+docker-compose logs -f
+```
+
+### Dockerfile
+
+The multi-stage Dockerfile:
+
+```dockerfile
+# Stage 1: Build
+FROM node:22-alpine AS builder
+WORKDIR /app
+COPY package*.json ./
+RUN npm ci --only=production
+COPY . .
+RUN npm run build
+
+# Stage 2: Runtime
+FROM node:22-alpine
+WORKDIR /app
+COPY --from=builder /app/dist ./dist
+COPY --from=builder /app/node_modules ./node_modules
+COPY config ./config
+COPY src/gateway/ui ./dist/gateway/ui
+
+# Create data directory
+RUN mkdir -p /data
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+  CMD node -e "require('http').get('http://localhost:18800/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
+
+# Expose gateway port
+EXPOSE 18800
+
+# Run
+CMD ["node", "dist/cli/index.js", "start"]
+```
+
+### Docker Compose Configuration
+
+```yaml
+version: '3.8'
+
+services:
+  flynn:
+    build: .
+    container_name: flynn
+    restart: unless-stopped
+    ports:
+      - "18800:18800"
+    volumes:
+      - ./config/production.yaml:/flynn/config.yaml:ro
+      - flynn_data:/data
+      - /var/run/docker.sock:/var/run/docker.sock  # For sandbox
+    environment:
+      - NODE_ENV=production
+      - FLYNN_CONFIG=/flynn/config.yaml
+    healthcheck:
+      test: ["CMD", "wget", "--spider", "-q", "http://localhost:18800/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 5s
+
+  whisper:
+    image: openai/whisper-server:latest
+    container_name: whisper-server
+    restart: unless-stopped
+    ports:
+      - "8080:8080"
+    volumes:
+      - whisper_cache:/cache
+    environment:
+      - WHISPER_MODEL=base
+      - WHISPER_HTTP_PORT=8080
+
+volumes:
+  flynn_data:
+  whisper_cache:
+```
+
+### Environment Variables
+
+```bash
+# Node environment
+export NODE_ENV=production
+
+# Config path
+export FLYNN_CONFIG=/path/to/config.yaml
+
+# Data directory (default: ~/.local/share/flynn)
+export FLYNN_DATA_DIR=/var/lib/flynn
+
+# Optional: Override model provider credentials
+export ANTHROPIC_API_KEY=sk-...
+export OPENAI_API_KEY=sk-...
+```
+
+## Systemd Service
+
+### Service File
+
+Create `/etc/systemd/system/flynn.service`:
+
+```ini
+[Unit]
+Description=Flynn AI Assistant Daemon
+After=network.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=flynn
+Group=flynn
+WorkingDirectory=/opt/flynn
+Environment="NODE_ENV=production"
+Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
+Environment="FLYNN_DATA_DIR=/var/lib/flynn"
+ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
+Restart=always
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+SyslogIdentifier=flynn
+
+# Security hardening
+NoNewPrivileges=true
+PrivateTmp=true
+ProtectSystem=strict
+ProtectHome=true
+ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run
+
+# Resource limits
+MemoryLimit=2G
+MemorySwap=0
+CPUQuota=200%
+
+[Install]
+WantedBy=multi-user.target
+```
+
+### Create Flynn User
+
+```bash
+# Create user and group
+sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
+sudo groupadd flynn
+
+# Create directories
+sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
+sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn
+
+# Copy binaries and config
+sudo cp -r dist/* /opt/flynn/
+sudo cp config/production.yaml /etc/flynn/config.yaml
+sudo chown -R root:root /opt/flynn /etc/flynn
+sudo chmod 644 /etc/flynn/config.yaml
+```
+
+### Enable and Start Service
+
+```bash
+# Reload systemd
+sudo systemctl daemon-reload
+
+# Enable service (start on boot)
+sudo systemctl enable flynn
+
+# Start service
+sudo systemctl start flynn
+
+# Check status
+sudo systemctl status flynn
+
+# View logs
+sudo journalctl -u flynn -f
+
+# Restart service
+sudo systemctl restart flynn
+```
+
+### Service Management
+
+```bash
+# Stop service
+sudo systemctl stop flynn
+
+# Reload config (requires restart)
+sudo systemctl restart flynn
+
+# Check if running
+sudo systemctl is-active flynn
+
+# View recent logs
+sudo journalctl -u flynn -n 100 --no-pager
+```
+
+## Security
+
+### Secrets Management
+
+Never commit secrets to version control. Use one of these approaches:
+
+#### Environment Variables
+
+```yaml
+# config/production.yaml
+models:
+  default:
+    anthropic:
+      apiKey: '${ANTHROPIC_API_KEY}'
+```
+
+Set in `/etc/flynn/.env` or systemd service file:
+```ini
+Environment="ANTHROPIC_API_KEY=sk-..."
+```
+
+#### HashiCorp Vault (Advanced)
+
+Use a secrets manager and inject at runtime:
+
+```bash
+vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
+export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
+rm /tmp/anthropic_key.txt
+```
+
+### Authentication
+
+#### Gateway Auth
+
+```yaml
+# config/production.yaml
+gateway:
+  enabled: true
+  auth:
+    token: 'your-random-token-here'  # Generate with: openssl rand -hex 32
+    trustTailscaleIdentity: true
+    applyToHttp: true
+```
+
+Generate a secure token:
+```bash
+openssl rand -hex 32
+```
+
+#### Channel Whitelists
+
+Restrict who can interact with Flynn:
+
+```yaml
+channels:
+  telegram:
+    allowedChatIds: ['123456789']  # Your Telegram chat ID
+  discord:
+    allowedGuildIds: ['987654321098765432']
+    allowedChannelIds: ['123456789012345678']
+  slack:
+    allowedChannelIds: ['C12345678']
+    signingSecret: '${SLACK_SIGNING_SECRET}'
+```
+
+### Network Security
+
+#### Firewall
+
+```bash
+# Ubuntu/Debian (ufw)
+sudo ufw allow 22/tcp    # SSH
+sudo ufw allow 18800/tcp  # Flynn gateway
+sudo ufw enable
+
+# CentOS/RHEL (firewalld)
+sudo firewall-cmd --permanent --add-port=18800/tcp
+sudo firewall-cmd --reload
+```
+
+#### Reverse Proxy (Nginx)
+
+Place Flynn behind Nginx for TLS:
+
+```nginx
+server {
+    listen 443 ssl http2;
+    server_name flynn.example.com;
+
+    ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
+    ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;
+
+    # WebSocket upgrade
+    location / {
+        proxy_pass http://localhost:18800;
+        proxy_http_version 1.1;
+        proxy_set_header Upgrade $http_upgrade;
+        proxy_set_header Connection "upgrade";
+        proxy_set_header Host $host;
+        proxy_set_header X-Real-IP $remote_addr;
+        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+        proxy_set_header X-Forwarded-Proto $scheme;
+
+        # Timeouts
+        proxy_connect_timeout 60s;
+        proxy_send_timeout 60s;
+        proxy_read_timeout 60s;
+    }
+
+    # Health check endpoint (no auth required)
+    location /health {
+        proxy_pass http://localhost:18800/health;
+        access_log off;
+    }
+}
+```
+
+Obtain TLS certificate with Let's Encrypt:
+```bash
+sudo certbot --nginx -d flynn.example.com
+```
+
+### File Permissions
+
+```bash
+# Data directory
+sudo chmod 750 /var/lib/flynn
+sudo chown flynn:flynn /var/lib/flynn
+
+# Config file
+sudo chmod 640 /etc/flynn/config.yaml
+sudo chown root:flynn /etc/flynn/config.yaml
+
+# Logs
+sudo chmod 750 /var/log/flynn
+sudo chown flynn:flynn /var/log/flynn
+```
+
+### Sandbox Security
+
+Docker sandbox adds isolation but requires careful configuration:
+
+```yaml
+# config/production.yaml
+sandbox:
+  enabled: true
+  image: 'node:22-alpine'
+  dockerSocket: '/var/run/docker.sock'
+  resourceLimits:
+    memory: '512m'
+    cpus: '0.5'
+    timeoutSec: 60
+  networkMode: 'none'  # No network access
+```
+
+Ensure Docker is secured:
+```bash
+# Run Docker as Flynn user
+sudo usermod -aG docker flynn
+
+# Configure Docker daemon security
+sudo vim /etc/docker/daemon.json
+```
+
+```json
+{
+  "log-driver": "json-file",
+  "log-opts": {
+    "max-size": "10m",
+    "max-file": "3"
+  },
+  "live-restore": true,
+  "userland-proxy": false
+}
+```
+
+## Configuration
+
+### Production Config Template
+
+```yaml
+# config/production.yaml
+# Base config for production deployment
+
+# ── Gateway ───────────────────────────────────────────────────────────────
+gateway:
+  enabled: true
+  port: 18800
+  auth:
+    token: '${GATEWAY_TOKEN}'
+    trustTailscaleIdentity: true
+    applyToHttp: true
+  lock:
+    enabled: true
+  tailscaleServe:
+    enabled: false  # Set to true to expose via Tailscale
+    hostname: 'flynn'
+    port: 443
+
+# ── Models ─────────────────────────────────────────────────────────────────
+models:
+  default:
+    anthropic:
+      apiKey: '${ANTHROPIC_API_KEY}'
+      model: 'claude-sonnet-4-20250514'
+      maxTokens: 4096
+
+  router:
+    tiers:
+      default: 'anthropic:claude-sonnet-4-20250514'
+      fast: 'anthropic:claude-haiku-4-20250514'
+      complex: 'anthropic:claude-opus-4-20250514'
+      local: 'ollama:llama3'
+
+    fallbackChain:
+      - 'github:claude-sonnet-4-5'
+      - 'local:ollama:llama3'
+
+    retry:
+      maxAttempts: 3
+      initialDelayMs: 1000
+      multiplier: 2
+      maxDelayMs: 30000
+
+# ── Channels ───────────────────────────────────────────────────────────────
+channels:
+  telegram:
+    enabled: true
+    token: '${TELEGRAM_BOT_TOKEN}'
+    allowedChatIds: ['123456789']
+
+  discord:
+    enabled: false
+
+  slack:
+    enabled: false
+
+  whatsapp:
+    enabled: false
+
+# ── Sessions ───────────────────────────────────────────────────────────────
+sessions:
+  ttl: '7d'
+  maxSessions: 100
+
+# ── Memory ────────────────────────────────────────────────────────────────
+memory:
+  enabled: true
+  embeddings:
+    provider: 'openai'
+    openai:
+      apiKey: '${OPENAI_API_KEY}'
+      model: 'text-embedding-3-small'
+
+# ── Tools ─────────────────────────────────────────────────────────────────
+tools:
+  policy: 'coding'  # Restrict tool access
+
+  executor:
+    defaultTimeoutMs: 30000
+    maxOutputBytes: 51200
+
+  sandbox:
+    enabled: false  # Enable if using Docker
+
+# ── Agents ────────────────────────────────────────────────────────────────
+agents:
+  default:
+    modelTier: 'default'
+    toolPolicy: 'coding'
+    compaction:
+      thresholdPct: 80
+      keepTurns: 4
+      summaryMaxTokens: 1024
+
+# ── Automation ────────────────────────────────────────────────────────────
+automation:
+  cron:
+    enabled: false
+
+  webhooks:
+    enabled: false
+
+  heartbeat:
+    enabled: true
+    interval: '5m'
+    checks:
+      - 'gateway'
+      - 'model'
+      - 'channels'
+      - 'memory'
+      - 'disk'
+    notifications:
+      - type: 'telegram'
+        chatId: '123456789'
+
+# ── Logging ───────────────────────────────────────────────────────────────
+logging:
+  level: 'info'  # debug, info, warn, error
+```
+
+### Config Validation
+
+Validate config before starting:
+
+```bash
+flynn doctor --config /etc/flynn/config.yaml
+```
+
+## Monitoring
+
+### Health Checks
+
+Flynn provides a health check endpoint:
+
+```bash
+# HTTP health check
+curl http://localhost:18800/health
+
+# Response
+{
+  "status": "ok",
+  "version": "0.1.0",
+  "uptime": 12345
+}
+```
+
+### Logs
+
+#### Journalctl (systemd)
+
+```bash
+# Follow logs
+sudo journalctl -u flynn -f
+
+# View last 100 lines
+sudo journalctl -u flynn -n 100 --no-pager
+
+# View logs since yesterday
+sudo journalctl -u flynn --since yesterday
+
+# Search for errors
+sudo journalctl -u flynn | grep -i error
+```
+
+#### Log Rotation
+
+Configure logrotate for systemd journal:
+
+```bash
+sudo vim /etc/systemd/journald.conf
+```
+
+```
+[Journal]
+SystemMaxUse=100M
+MaxRetentionSec=7day
+```
+
+Restart systemd:
+```bash
+sudo systemctl restart systemd-journald
+```
+
+### Heartbeat Monitor
+
+Enable built-in heartbeat monitoring:
+
+```yaml
+automation:
+  heartbeat:
+    enabled: true
+    interval: '5m'
+    checks:
+      - 'gateway'
+      - 'model'
+      - 'channels'
+      - 'memory'
+      - 'disk'
+    notifications:
+      - type: 'telegram'
+        chatId: '123456789'
+      - type: 'webhook'
+        url: 'https://hooks.slack.com/services/...'
+```
+
+### External Monitoring
+
+#### Prometheus (Optional)
+
+Use Node.js prom-client for metrics (not currently implemented):
+
+```yaml
+# Future feature
+monitoring:
+  prometheus:
+    enabled: true
+    port: 9090
+```
+
+#### Uptime Monitoring
+
+Use external services:
+- UptimeRobot
+- Pingdom
+- Better Uptime
+
+Monitor:
+- Gateway HTTP health endpoint
+- WebSocket connection
+- Response time
+
+## Backup & Recovery
+
+### What to Backup
+
+1. **Configuration**: `/etc/flynn/config.yaml`
+2. **Sessions**: SQLite database at `~/.local/share/flynn/sessions.db`
+3. **Memory Files**: `~/.local/share/flynn/memory/`
+4. **Vectors**: SQLite database at `~/.local/share/flynn/vectors.db`
+5. **Pairing Codes**: SQLite table within sessions.db
+
+### Backup Script
+
+Create `/usr/local/bin/flynn-backup.sh`:
+
+```bash
+#!/bin/bash
+set -e
+
+BACKUP_DIR="/var/backups/flynn"
+DATA_DIR="/var/lib/flynn"
+CONFIG_DIR="/etc/flynn"
+DATE=$(date +%Y%m%d_%H%M%S)
+BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"
+
+# Create backup directory
+mkdir -p "$BACKUP_DIR"
+
+# Stop Flynn
+sudo systemctl stop flynn
+
+# Create backup
+tar -czf "$BACKUP_FILE" \
+  "$CONFIG_DIR/config.yaml" \
+  "$DATA_DIR/sessions.db" \
+  "$DATA_DIR/vectors.db" \
+  "$DATA_DIR/memory/"
+
+# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
+find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete
+
+# Restart Flynn
+sudo systemctl start flynn
+
+echo "Backup created: $BACKUP_FILE"
+```
+
+Make executable:
+```bash
+sudo chmod +x /usr/local/bin/flynn-backup.sh
+```
+
+### Cron Job
+
+Add to root crontab:
+
+```bash
+sudo crontab -e
+```
+
+```
+# Daily backup at 2 AM
+0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1
+```
+
+### Restore
+
+```bash
+# Stop Flynn
+sudo systemctl stop flynn
+
+# Extract backup
+sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /
+
+# Start Flynn
+sudo systemctl start flynn
+```
+
+### Database Maintenance
+
+Run SQLite vacuum periodically:
+
+```bash
+sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
+sqlite3 /var/lib/flynn/vectors.db "VACUUM;"
+```
+
+Add to crontab (monthly):
+```
+0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1
+```
+
+## Performance Tuning
+
+### Node.js Tuning
+
+Set Node.js options for production:
+
+```bash
+# In systemd service
+Environment="NODE_OPTIONS=--max-old-space-size=2048"
+
+# Or via environment variable
+export NODE_OPTIONS="--max-old-space-size=2048"
+```
+
+### Context Management
+
+Optimize compaction settings:
+
+```yaml
+agents:
+  default:
+    compaction:
+      thresholdPct: 75  # Trigger earlier
+      keepTurns: 6      # Keep more context
+      summaryMaxTokens: 2048  # Better summaries
+```
+
+### SQLite Performance
+
+Enable WAL mode:
+
+```bash
+sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
+sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
+sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;"  # 64MB
+```
+
+### Model Routing
+
+Configure tiers for optimal cost/latency:
+
+```yaml
+models:
+  router:
+    tiers:
+      fast: 'anthropic:claude-haiku-4-20250514'      # Quick tasks
+      default: 'anthropic:claude-sonnet-4-20250514'  # General use
+      complex: 'anthropic:claude-opus-4-20250514'     # Complex reasoning
+      local: 'ollama:llama3'                          # Fallback
+```
+
+### Caching (Future)
+
+Consider adding caching for:
+- Repeated tool calls
+- Memory search results
+- Model responses for common queries
+
+## Scaling Considerations
+
+### Single-Operator Scope
+
+Flynn is designed for a single operator with multiple concurrent users. Limitations:
+
+- **Max Concurrent Sessions**: ~100 (depends on model rate limits)
+- **Throughput**: ~10-20 messages/second (varies by model)
+- **Memory Usage**: 2-4GB for moderate usage
+
+### When to Scale Up
+
+Consider scaling if:
+- Consistent CPU usage > 80%
+- Memory usage > 4GB
+- Frequent rate limiting from model providers
+- Slow response times > 30 seconds
+
+### Scaling Strategies
+
+1. **Horizontal Scaling**: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)
+
+2. **Vertical Scaling**: Increase server resources (CPU, memory)
+
+3. **Multi-Instance Architecture** (future):
+   - Shared session storage (PostgreSQL/Redis)
+   - Message queue for request distribution
+   - Session affinity for stateful connections
+
+### Cost Optimization
+
+- Use local models for non-critical tasks
+- Cache embeddings
+- Optimize compaction to reduce token usage
+- Use efficient models for delegated tasks
+
+## Troubleshooting Production Issues
+
+### Service Won't Start
+
+```bash
+# Check status
+sudo systemctl status flynn
+
+# View logs
+sudo journalctl -u flynn -n 50 --no-pager
+
+# Validate config
+flynn doctor --config /etc/flynn/config.yaml
+```
+
+### High Memory Usage
+
+```bash
+# Check memory
+free -h
+
+# Check process memory
+ps aux | grep flynn
+
+# Restart service
+sudo systemctl restart flynn
+```
+
+### Gateway Connection Issues
+
+```bash
+# Check if port is listening
+sudo ss -tlnp | grep 18800
+
+# Check firewall
+sudo ufw status
+
+# Test connectivity
+curl http://localhost:18800/health
+```
+
+### Slow Response Times
+
+```bash
+# Check CPU usage
+top
+
+# Check model provider status
+# Verify API keys are valid
+# Check network latency
+
+# Enable debug logging
+DEBUG='*' sudo systemctl restart flynn
+```
+
+---
+
+For additional help, see:
+- [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md)
+- [README.md](../../README.md)
+- GitHub Issues
@@ -0,0 +1,846 @@
+# Performance Tuning Guide
+
+This guide covers performance optimization techniques for Flynn in production environments.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Context Management](#context-management)
+- [Model Routing](#model-routing)
+- [Tool Execution](#tool-execution)
+- [Memory & Embeddings](#memory--embeddings)
+- [Session Management](#session-management)
+- [Database Performance](#database-performance)
+- [Gateway Performance](#gateway-performance)
+- [Resource Usage](#resource-usage)
+- [Monitoring & Profiling](#monitoring--profiling)
+
+## Overview
+
+Flynn's performance depends on several factors:
+
+1. **Context window efficiency**: How efficiently tokens are used
+2. **Model selection**: Choosing the right model for each task
+3. **Tool execution**: Fast, reliable tool responses
+4. **I/O operations**: Database and file system access
+5. **Concurrency**: Handling multiple simultaneous requests
+
+### Performance Goals
+
+- **Response time**: < 5 seconds for simple queries
+- **Context efficiency**: > 80% token utilization
+- **Throughput**: 10-20 concurrent conversations
+- **Resource usage**: < 2GB memory, < 50% CPU
+
+## Context Management
+
+### Compaction Settings
+
+Context compaction prevents conversations from exceeding model context windows.
+
+```yaml
+agents:
+  default:
+    compaction:
+      # Trigger compaction at 75% of context window
+      thresholdPct: 75
+
+      # Keep last 6 turns (user + assistant pairs)
+      keepTurns: 6
+
+      # Allow 2048 tokens for summary
+      summaryMaxTokens: 2048
+
+      # Preserve high-importance messages
+      importanceThreshold: 0.8
+```
+
+### Tuning Guidelines
+
+**For fast interactions:**
+```yaml
+thresholdPct: 60      # Compact early
+keepTurns: 2          # Minimal history
+summaryMaxTokens: 512  # Short summaries
+```
+
+**For complex reasoning:**
+```yaml
+thresholdPct: 85      # Maximize context
+keepTurns: 10         # More history
+summaryMaxTokens: 4096 # Detailed summaries
+```
+
+### Context Depth Levels
+
+Control how much context is injected into the system prompt:
+
+```yaml
+prompt:
+  contextDepth: 'normal'  # minimal | normal | detailed | debug
+```
+
+- `minimal`: Only basic system prompt
+- `normal`: System prompt + basic memory
+- `detailed`: Full memory + tool descriptions
+- `debug`: Verbose context (development only)
+
+### Token Counting
+
+Flynn uses rule-based token estimation (fast but approximate).
+
+**Enable tokenizer for accuracy (slower):**
+
+```typescript
+// Currently not implemented
+// Future: Use tiktoken or similar for exact token counts
+```
+
+## Model Routing
+
+### Tier Configuration
+
+Optimize model tiers for cost and latency:
+
+```yaml
+models:
+  router:
+    tiers:
+      # Fast, cheap: Quick tasks, delegated calls
+      fast: 'anthropic:claude-haiku-4-20250514'
+
+      # Default: General conversation
+      default: 'anthropic:claude-sonnet-4-20250514'
+
+      # Complex: Deep reasoning, analysis
+      complex: 'anthropic:claude-opus-4-20250514'
+
+      # Fallback: Local models when cloud fails
+      local: 'ollama:llama3'
+```
+
+### Delegation Tasks
+
+Map delegation tasks to appropriate tiers:
+
+```yaml
+agents:
+  default:
+    delegation:
+      tiers:
+        compaction: 'fast'           # Summarize history
+        memoryExtraction: 'fast'     # Extract facts
+        classification: 'default'     # Classify intent
+        toolSummarization: 'default' # Summarize tool results
+        complexReasoning: 'complex'  # Deep analysis
+```
+
+### Fallback Chains
+
+Configure fallback chains for resilience:
+
+```yaml
+models:
+  router:
+    # Try same model on different provider
+    tierFallbacks:
+      default:
+        - 'github:claude-sonnet-4-5'
+        - 'openai:gpt-4o-mini'
+
+    # Global fallback when all tiers fail
+    fallbackChain:
+      - 'github:claude-sonnet-4-5'
+      - 'local:ollama:llama3'
+```
+
+### Retry Configuration
+
+Optimize retry behavior for different scenarios:
+
+```yaml
+models:
+  router:
+    retry:
+      # More retries for transient failures
+      maxAttempts: 3
+
+      # Start with 1s delay
+      initialDelayMs: 1000
+
+      # Exponential backoff
+      multiplier: 2
+
+      # Max 30s between retries
+      maxDelayMs: 30000
+
+      # Don't retry auth errors
+      nonRetryablePatterns:
+        - 'invalid_api_key'
+        - 'permission_denied'
+        - 'rate_limit_exceeded'
+```
+
+**For production reliability:**
+```yaml
+maxAttempts: 5
+initialDelayMs: 500
+multiplier: 1.5
+maxDelayMs: 60000
+```
+
+### Cost Estimation
+
+Monitor token usage and costs:
+
+```typescript
+// Model costs (examples)
+const MODEL_COSTS = {
+  'anthropic:claude-sonnet-4-20250514': {
+    input: 3.0,    // $3 per 1M input tokens
+    output: 15.0    // $15 per 1M output tokens
+  },
+  'anthropic:claude-haiku-4-20250514': {
+    input: 0.25,
+    output: 1.25
+  }
+};
+```
+
+Track usage with `AgentOrchestrator.getUsageStats()`.
+
+## Tool Execution
+
+### Timeout Configuration
+
+Set appropriate timeouts for different tool types:
+
+```yaml
+tools:
+  executor:
+    # Default 30s timeout
+    defaultTimeoutMs: 30000
+
+    # Max 50KB output
+    maxOutputBytes: 51200
+```
+
+**For long-running tools:**
+```yaml
+tools:
+  executor:
+    defaultTimeoutMs: 60000  # 60s
+```
+
+**For fast tools:**
+```yaml
+tools:
+  executor:
+    defaultTimeoutMs: 10000  # 10s
+```
+
+### Caching (Future)
+
+Implement caching for repeated operations:
+
+```yaml
+# Not yet implemented
+tools:
+  cache:
+    enabled: true
+    ttl: 300  # 5 minutes
+    maxSize: 1000
+    excludePatterns:
+      - 'shell.exec'
+      - 'process.*'
+```
+
+### Sandbox Performance
+
+Docker sandbox adds overhead. Optimize:
+
+```yaml
+sandbox:
+  enabled: true
+  image: 'node:22-alpine'
+
+  # Resource limits
+  resourceLimits:
+    memory: '512m'
+    cpus: '0.5'
+    timeoutSec: 60
+
+  # Use host networking if safe
+  networkMode: 'host'  # Faster than bridge mode
+```
+
+**For best performance:**
+```yaml
+sandbox:
+  enabled: false  # Disable if not needed
+```
+
+### Parallel Tool Execution
+
+Flynn executes tools sequentially. For parallel execution:
+
+```typescript
+// Future enhancement
+const results = await Promise.all([
+  toolRegistry.execute('tool1', args1),
+  toolRegistry.execute('tool2', args2),
+  toolRegistry.execute('tool3', args3)
+]);
+```
+
+## Memory & Embeddings
+
+### Embedding Provider Selection
+
+Choose embedding provider based on latency and cost:
+
+```yaml
+memory:
+  embeddings:
+    provider: 'openai'  # openai | gemini | ollama | llamacpp | voyage
+
+    openai:
+      apiKey: '${OPENAI_API_KEY}'
+      model: 'text-embedding-3-small'  # Fastest
+
+    # Alternative: Local embeddings
+    ollama:
+      host: 'localhost:11434'
+      model: 'nomic-embed-text'
+```
+
+**Latency comparison:**
+- OpenAI `text-embedding-3-small`: ~100ms
+- Gemini: ~200ms
+- Ollama `nomic-embed-text`: ~500ms (local)
+- llama.cpp: ~300ms (local)
+
+### Text Chunking
+
+Optimize chunking for better search:
+
+```yaml
+memory:
+  embeddings:
+    chunking:
+      # Smaller chunks for precision
+      maxChunkSize: 512
+
+      # Overlap for context preservation
+      chunkOverlap: 50
+
+      # Don't chunk small documents
+      minChunkSize: 128
+```
+
+**For fast indexing:**
+```yaml
+maxChunkSize: 1024
+chunkOverlap: 100
+```
+
+**For precise search:**
+```yaml
+maxChunkSize: 256
+chunkOverlap: 25
+```
+
+### Hybrid Search Tuning
+
+Balance keyword and vector search:
+
+```yaml
+memory:
+  search:
+    # Weight vector search higher
+    vectorWeight: 0.7
+    keywordWeight: 0.3
+
+    # Return top results
+    limit: 10
+
+    # Minimum relevance threshold
+    threshold: 0.5
+```
+
+**For keyword-heavy queries:**
+```yaml
+vectorWeight: 0.4
+keywordWeight: 0.6
+```
+
+**For semantic queries:**
+```yaml
+vectorWeight: 0.8
+keywordWeight: 0.2
+```
+
+### Embedding Caching
+
+Cache embeddings to avoid recomputation:
+
+```yaml
+memory:
+  embeddings:
+    cache:
+      enabled: true
+      ttl: 86400  # 24 hours
+```
+
+## Session Management
+
+### TTL Configuration
+
+Set appropriate session TTLs:
+
+```yaml
+sessions:
+  ttl: '7d'  # Keep sessions for 7 days
+
+  # Maximum concurrent sessions
+  maxSessions: 100
+```
+
+**For memory efficiency:**
+```yaml
+ttl: '1d'
+maxSessions: 50
+```
+
+**For long-term memory:**
+```yaml
+ttl: '30d'
+maxSessions: 200
+```
+
+### Session Pruning
+
+Prune old sessions regularly:
+
+```yaml
+automation:
+  sessionPruner:
+    enabled: true
+    interval: '1h'  # Run every hour
+
+    # Prune sessions older than TTL
+    pruneOlderThan: '7d'
+```
+
+### Session Indexing
+
+Optimize session search with indexes:
+
+```sql
+-- SQLite indexes
+CREATE INDEX idx_sessions_created_at ON sessions(created_at);
+CREATE INDEX idx_sessions_last_active ON sessions(last_active_at);
+CREATE INDEX idx_messages_session_id ON messages(session_id);
+```
+
+## Database Performance
+
+### SQLite Configuration
+
+Optimize SQLite for Flynn's workload:
+
+```bash
+# In SQLite connection setup
+PRAGMA journal_mode = WAL;        -- Better concurrency
+PRAGMA synchronous = NORMAL;      -- Faster writes
+PRAGMA cache_size = -64000;       -- 64MB cache
+PRAGMA temp_store = MEMORY;        -- Store temp data in memory
+PRAGMA mmap_size = 268435456;     -- 256MB mmap
+PRAGMA page_size = 4096;          -- Default page size
+```
+
+### Connection Pooling
+
+Flynn uses single SQLite connection per database. For high concurrency, consider:
+
+```typescript
+// Future: Connection pool
+import Database from 'better-sqlite3';
+
+const pool = new ConnectionPool({
+  filename: '/path/to/database.db',
+  maxConnections: 10
+});
+```
+
+### Query Optimization
+
+Use indexed columns in queries:
+
+```typescript
+// Good: Uses index
+const sessions = db.prepare(`
+  SELECT * FROM sessions
+  WHERE last_active_at > ?
+  ORDER BY last_active_at DESC
+  LIMIT 10
+`).all(threshold);
+
+// Bad: Full table scan
+const sessions = db.prepare(`
+  SELECT * FROM sessions
+  WHERE message_count > ?
+`).all(threshold);
+```
+
+### Vacuum and Analyze
+
+Regular maintenance improves performance:
+
+```bash
+# Vacuum to reclaim space
+sqlite3 sessions.db "VACUUM;"
+
+# Analyze for query optimization
+sqlite3 sessions.db "ANALYZE;"
+
+# Rebuild indexes
+sqlite3 sessions.db "REINDEX;"
+```
+
+Add to crontab (monthly):
+```
+0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM; ANALYZE;" >> /var/log/flynn-maintenance.log 2>&1
+```
+
+## Gateway Performance
+
+### Connection Limits
+
+Limit concurrent connections:
+
+```yaml
+gateway:
+  enabled: true
+  port: 18800
+
+  # Maximum concurrent WebSocket connections
+  maxConnections: 50
+
+  # Single-client lock
+  lock:
+    enabled: true  # Only one client at a time
+```
+
+**For multiple users:**
+```yaml
+gateway:
+  maxConnections: 100
+  lock:
+    enabled: false
+```
+
+### Lane Queue
+
+The lane queue serializes requests per session:
+
+```yaml
+gateway:
+  laneQueue:
+    # Max requests per session
+    maxDepth: 10
+
+    # Request timeout
+    requestTimeoutMs: 30000
+```
+
+### WebSocket Optimization
+
+Configure WebSocket for performance:
+
+```typescript
+// Gateway server WebSocket options
+const wsOptions = {
+  // Enable compression
+  perMessageDeflate: {
+    threshold: 1024
+  },
+
+  // Ping interval (heartbeat)
+  clientTracking: true,
+
+  // Maximum message size
+  maxPayload: 16 * 1024 * 1024  // 16MB
+};
+```
+
+### HTTP Server
+
+Optimize HTTP server for static files:
+
+```yaml
+gateway:
+  static:
+    # Enable gzip compression
+    gzip: true
+
+    # Cache static assets
+    cacheControl: 'public, max-age=3600'
+
+    # Serve index.html for SPA routes
+    spa: true
+```
+
+## Resource Usage
+
+### Node.js Options
+
+Tune Node.js for production:
+
+```bash
+# Increase memory limit
+export NODE_OPTIONS="--max-old-space-size=4096"
+
+# Enable optimizations
+export NODE_OPTIONS="--max-old-space-size=4096 --optimize-for-size --gc-interval=100"
+```
+
+In systemd service:
+```ini
+Environment="NODE_OPTIONS=--max-old-space-size=4096"
+```
+
+### Process Limits
+
+Set appropriate limits:
+
+```ini
+[Service]
+# Memory limit (2GB)
+MemoryLimit=2G
+MemorySwap=0
+
+# CPU quota (200% = 2 cores)
+CPUQuota=200%
+
+# File descriptors
+LimitNOFILE=65536
+```
+
+### Docker Resource Limits
+
+Constrain Docker container:
+
+```yaml
+services:
+  flynn:
+    deploy:
+      resources:
+        limits:
+          cpus: '2.0'
+          memory: 2G
+        reservations:
+          cpus: '1.0'
+          memory: 1G
+```
+
+### Memory Monitoring
+
+Monitor memory usage:
+
+```bash
+# Check Flynn memory
+ps aux | grep flynn
+
+# System memory
+free -h
+
+# Node.js heap stats (add to code)
+console.log('Heap used:', process.memoryUsage().heapUsed / 1024 / 1024, 'MB');
+```
+
+## Monitoring & Profiling
+
+### Health Checks
+
+Enable gateway health endpoint:
+
+```yaml
+automation:
+  heartbeat:
+    enabled: true
+    interval: '5m'
+    checks:
+      - 'gateway'
+      - 'model'
+      - 'channels'
+      - 'memory'
+      - 'disk'
+```
+
+Check health:
+```bash
+curl http://localhost:18800/health
+```
+
+### Logging Levels
+
+Configure logging appropriately:
+
+```yaml
+logging:
+  level: 'info'  # debug | info | warn | error
+```
+
+**Development:** `debug` - All messages
+**Production:** `info` - Normal operation
+**Minimal:** `warn` - Only warnings and errors
+
+### Performance Metrics
+
+Track key metrics:
+
+```typescript
+// Future: Metrics collection
+interface Metrics {
+  // Response times
+  avgResponseTime: number;
+  p95ResponseTime: number;
+  p99ResponseTime: number;
+
+  // Throughput
+  requestsPerSecond: number;
+  concurrentSessions: number;
+
+  // Token usage
+  avgInputTokens: number;
+  avgOutputTokens: number;
+  totalTokens: number;
+
+  // Errors
+  errorRate: number;
+  timeoutRate: number;
+}
+```
+
+### Profiling
+
+Profile Node.js execution:
+
+```bash
+# Generate CPU profile
+node --prof dist/cli/index.js start
+
+# Process profile
+node --prof-process isolate-*.log > profile.txt
+
+# Analyze with Chrome DevTools
+# Open chrome://inspect and load profile
+```
+
+### Flamegraphs
+
+Generate flamegraphs for bottleneck analysis:
+
+```bash
+# Install 0x
+npm install -g 0x
+
+# Run with profiler
+0x dist/cli/index.js start
+```
+
+## Common Performance Issues
+
+### High Memory Usage
+
+**Symptoms:**
+- OOM errors
+- Slow garbage collection
+- System swapping
+
+**Solutions:**
+1. Reduce `keepTurns` in compaction
+2. Decrease session TTL
+3. Prune old sessions
+4. Increase Node.js memory limit
+5. Check for memory leaks
+
+### Slow Response Times
+
+**Symptoms:**
+- Responses > 10 seconds
+- Timeouts
+- Poor user experience
+
+**Solutions:**
+1. Switch to faster model tier
+2. Enable compaction
+3. Use local fallbacks
+4. Optimize tool timeouts
+5. Check network latency
+
+### High CPU Usage
+
+**Symptoms:**
+- CPU > 80%
+- Slow system
+- High latency
+
+**Solutions:**
+1. Reduce concurrent sessions
+2. Optimize database queries
+3. Use efficient embeddings
+4. Disable unnecessary features
+5. Scale vertically (more CPU)
+
+### Database Locks
+
+**Symptoms:**
+- SQLite database locked errors
+- Slow writes
+- Concurrent access issues
+
+**Solutions:**
+1. Enable WAL mode
+2. Reduce write frequency
+3. Use connection pooling
+4. Add appropriate indexes
+
+### Model Rate Limits
+
+**Symptoms:**
+- 429 Too Many Requests errors
+- Frequent fallbacks
+- Increased latency
+
+**Solutions:**
+1. Configure retry with exponential backoff
+2. Use faster models for delegated tasks
+3. Implement request queuing
+4. Add local model fallbacks
+
+## Performance Checklist
+
+Before deploying to production, verify:
+
+- [ ] Compaction configured with appropriate threshold
+- [ ] Model tiers configured for cost/latency
+- [ ] Fallback chains configured
+- [ ] Tool timeouts set appropriately
+- [ ] Session TTL reasonable for use case
+- [ ] SQLite optimized (WAL mode, cache size)
+- [ ] Database indexes created
+- [ ] Gateway connection limits set
+- [ ] Memory limits configured
+- [ ] Monitoring enabled
+- [ ] Logging level set to `info` or `warn`
+- [ ] Health checks working
+- [ ] Backup/restore tested
+
+---
+
+For more information:
+- [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md)
+- [PRODUCTION.md](../deployment/PRODUCTION.md)
+- [ARCHITECTURE.md](../../.planning/codebase/ARCHITECTURE.md)