This commit adds 6 new documentation files to fill critical gaps: - CONTRIBUTING.md: Developer onboarding guide with setup, workflow, code style, testing, and adding features - TROUBLESHOOTING.md: Common issues and solutions for errors, model issues, tool issues, channel issues, gateway issues, configuration issues, and memory/database issues - docs/api/PROTOCOL.md: Gateway JSON-RPC protocol documentation with connection, authentication, message format, methods, events, error codes, and example client implementation - docs/api/TOOLS.md: Tools API documentation covering tool interface, input schema format, result format, tool patterns, tool registration, tool policy, execution flow, and builtin tools reference - docs/deployment/PRODUCTION.md: Production deployment guide covering Docker deployment, systemd service, security, configuration, monitoring, backup & recovery, and performance tuning - docs/performance/TUNING.md: Performance optimization guide covering context management, model routing, tool execution, memory & embeddings, session management, database performance, gateway performance, and resource usage These files complement the existing excellent documentation (README.md, AGENTS.md, ARCHITECTURE.md, STRUCTURE.md, CONVENTIONS.md) to provide complete coverage for users, developers, and operators.
19 KiB
Production Deployment Guide
This guide covers deploying Flynn in a production environment.
Table of Contents
- Prerequisites
- Docker Deployment
- Systemd Service
- Security
- Configuration
- Monitoring
- Backup & Recovery
- Performance Tuning
- Scaling Considerations
Prerequisites
System Requirements
- OS: Linux (Ubuntu 22.04+ recommended) or macOS
- Node.js: >= 22.0.0
- Memory: Minimum 2GB, 4GB+ recommended
- Disk: 10GB+ for sessions, memory, and vectors
- Docker: Required for sandbox features (optional)
Network Requirements
- Public IP or VPN (Tailscale recommended) for remote access
- Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
- Outbound HTTPS access for model providers and web tools
External Services (Optional)
- Model Providers: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
- Email: SMTP server for email notifications
- Object Storage: MinIO or S3 for backups (optional)
Docker Deployment
Quick Start
Using the provided docker-compose.yml:
# Clone repository
git clone <repo-url>
cd flynn
# Create config
cp config/default.yaml config/production.yaml
# Edit config/production.yaml with your settings
# Start services
docker-compose up -d
# View logs
docker-compose logs -f
Dockerfile
The multi-stage Dockerfile:
# Stage 1: Build
FROM node:22-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
# Stage 2: Runtime
FROM node:22-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY config ./config
COPY src/gateway/ui ./dist/gateway/ui
# Create data directory
RUN mkdir -p /data
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:18800/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
# Expose gateway port
EXPOSE 18800
# Run
CMD ["node", "dist/cli/index.js", "start"]
Docker Compose Configuration
version: '3.8'
services:
flynn:
build: .
container_name: flynn
restart: unless-stopped
ports:
- "18800:18800"
volumes:
- ./config/production.yaml:/flynn/config.yaml:ro
- flynn_data:/data
- /var/run/docker.sock:/var/run/docker.sock # For sandbox
environment:
- NODE_ENV=production
- FLYNN_CONFIG=/flynn/config.yaml
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:18800/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 5s
whisper:
image: openai/whisper-server:latest
container_name: whisper-server
restart: unless-stopped
ports:
- "8080:8080"
volumes:
- whisper_cache:/cache
environment:
- WHISPER_MODEL=base
- WHISPER_HTTP_PORT=8080
volumes:
flynn_data:
whisper_cache:
Environment Variables
# Node environment
export NODE_ENV=production
# Config path
export FLYNN_CONFIG=/path/to/config.yaml
# Data directory (default: ~/.local/share/flynn)
export FLYNN_DATA_DIR=/var/lib/flynn
# Optional: Override model provider credentials
export ANTHROPIC_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
Systemd Service
Service File
Create /etc/systemd/system/flynn.service:
[Unit]
Description=Flynn AI Assistant Daemon
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=flynn
Group=flynn
WorkingDirectory=/opt/flynn
Environment="NODE_ENV=production"
Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
Environment="FLYNN_DATA_DIR=/var/lib/flynn"
ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=flynn
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run
# Resource limits
MemoryLimit=2G
MemorySwap=0
CPUQuota=200%
[Install]
WantedBy=multi-user.target
Create Flynn User
# Create user and group
sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
sudo groupadd flynn
# Create directories
sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn
# Copy binaries and config
sudo cp -r dist/* /opt/flynn/
sudo cp config/production.yaml /etc/flynn/config.yaml
sudo chown -R root:root /opt/flynn /etc/flynn
sudo chmod 644 /etc/flynn/config.yaml
Enable and Start Service
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable flynn
# Start service
sudo systemctl start flynn
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -f
# Restart service
sudo systemctl restart flynn
Service Management
# Stop service
sudo systemctl stop flynn
# Reload config (requires restart)
sudo systemctl restart flynn
# Check if running
sudo systemctl is-active flynn
# View recent logs
sudo journalctl -u flynn -n 100 --no-pager
Security
Secrets Management
Never commit secrets to version control. Use one of these approaches:
Environment Variables
# config/production.yaml
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}'
Set in /etc/flynn/.env or systemd service file:
Environment="ANTHROPIC_API_KEY=sk-..."
HashiCorp Vault (Advanced)
Use a secrets manager and inject at runtime:
vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
rm /tmp/anthropic_key.txt
Authentication
Gateway Auth
# config/production.yaml
gateway:
enabled: true
auth:
token: 'your-random-token-here' # Generate with: openssl rand -hex 32
trustTailscaleIdentity: true
applyToHttp: true
Generate a secure token:
openssl rand -hex 32
Channel Whitelists
Restrict who can interact with Flynn:
channels:
telegram:
allowedChatIds: ['123456789'] # Your Telegram chat ID
discord:
allowedGuildIds: ['987654321098765432']
allowedChannelIds: ['123456789012345678']
slack:
allowedChannelIds: ['C12345678']
signingSecret: '${SLACK_SIGNING_SECRET}'
Network Security
Firewall
# Ubuntu/Debian (ufw)
sudo ufw allow 22/tcp # SSH
sudo ufw allow 18800/tcp # Flynn gateway
sudo ufw enable
# CentOS/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=18800/tcp
sudo firewall-cmd --reload
Reverse Proxy (Nginx)
Place Flynn behind Nginx for TLS:
server {
listen 443 ssl http2;
server_name flynn.example.com;
ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;
# WebSocket upgrade
location / {
proxy_pass http://localhost:18800;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Health check endpoint (no auth required)
location /health {
proxy_pass http://localhost:18800/health;
access_log off;
}
}
Obtain TLS certificate with Let's Encrypt:
sudo certbot --nginx -d flynn.example.com
File Permissions
# Data directory
sudo chmod 750 /var/lib/flynn
sudo chown flynn:flynn /var/lib/flynn
# Config file
sudo chmod 640 /etc/flynn/config.yaml
sudo chown root:flynn /etc/flynn/config.yaml
# Logs
sudo chmod 750 /var/log/flynn
sudo chown flynn:flynn /var/log/flynn
Sandbox Security
Docker sandbox adds isolation but requires careful configuration:
# config/production.yaml
sandbox:
enabled: true
image: 'node:22-alpine'
dockerSocket: '/var/run/docker.sock'
resourceLimits:
memory: '512m'
cpus: '0.5'
timeoutSec: 60
networkMode: 'none' # No network access
Ensure Docker is secured:
# Run Docker as Flynn user
sudo usermod -aG docker flynn
# Configure Docker daemon security
sudo vim /etc/docker/daemon.json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"live-restore": true,
"userland-proxy": false
}
Configuration
Production Config Template
# config/production.yaml
# Base config for production deployment
# ── Gateway ───────────────────────────────────────────────────────────────
gateway:
enabled: true
port: 18800
auth:
token: '${GATEWAY_TOKEN}'
trustTailscaleIdentity: true
applyToHttp: true
lock:
enabled: true
tailscaleServe:
enabled: false # Set to true to expose via Tailscale
hostname: 'flynn'
port: 443
# ── Models ─────────────────────────────────────────────────────────────────
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}'
model: 'claude-sonnet-4-20250514'
maxTokens: 4096
router:
tiers:
default: 'anthropic:claude-sonnet-4-20250514'
fast: 'anthropic:claude-haiku-4-20250514'
complex: 'anthropic:claude-opus-4-20250514'
local: 'ollama:llama3'
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
retry:
maxAttempts: 3
initialDelayMs: 1000
multiplier: 2
maxDelayMs: 30000
# ── Channels ───────────────────────────────────────────────────────────────
channels:
telegram:
enabled: true
token: '${TELEGRAM_BOT_TOKEN}'
allowedChatIds: ['123456789']
discord:
enabled: false
slack:
enabled: false
whatsapp:
enabled: false
# ── Sessions ───────────────────────────────────────────────────────────────
sessions:
ttl: '7d'
maxSessions: 100
# ── Memory ────────────────────────────────────────────────────────────────
memory:
enabled: true
embeddings:
provider: 'openai'
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small'
# ── Tools ─────────────────────────────────────────────────────────────────
tools:
policy: 'coding' # Restrict tool access
executor:
defaultTimeoutMs: 30000
maxOutputBytes: 51200
sandbox:
enabled: false # Enable if using Docker
# ── Agents ────────────────────────────────────────────────────────────────
agents:
default:
modelTier: 'default'
toolPolicy: 'coding'
compaction:
thresholdPct: 80
keepTurns: 4
summaryMaxTokens: 1024
# ── Automation ────────────────────────────────────────────────────────────
automation:
cron:
enabled: false
webhooks:
enabled: false
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
# ── Logging ───────────────────────────────────────────────────────────────
logging:
level: 'info' # debug, info, warn, error
Config Validation
Validate config before starting:
flynn doctor --config /etc/flynn/config.yaml
Monitoring
Health Checks
Flynn provides a health check endpoint:
# HTTP health check
curl http://localhost:18800/health
# Response
{
"status": "ok",
"version": "0.1.0",
"uptime": 12345
}
Logs
Journalctl (systemd)
# Follow logs
sudo journalctl -u flynn -f
# View last 100 lines
sudo journalctl -u flynn -n 100 --no-pager
# View logs since yesterday
sudo journalctl -u flynn --since yesterday
# Search for errors
sudo journalctl -u flynn | grep -i error
Log Rotation
Configure logrotate for systemd journal:
sudo vim /etc/systemd/journald.conf
[Journal]
SystemMaxUse=100M
MaxRetentionSec=7day
Restart systemd:
sudo systemctl restart systemd-journald
Heartbeat Monitor
Enable built-in heartbeat monitoring:
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
- type: 'webhook'
url: 'https://hooks.slack.com/services/...'
External Monitoring
Prometheus (Optional)
Use Node.js prom-client for metrics (not currently implemented):
# Future feature
monitoring:
prometheus:
enabled: true
port: 9090
Uptime Monitoring
Use external services:
- UptimeRobot
- Pingdom
- Better Uptime
Monitor:
- Gateway HTTP health endpoint
- WebSocket connection
- Response time
Backup & Recovery
What to Backup
- Configuration:
/etc/flynn/config.yaml - Sessions: SQLite database at
~/.local/share/flynn/sessions.db - Memory Files:
~/.local/share/flynn/memory/ - Vectors: SQLite database at
~/.local/share/flynn/vectors.db - Pairing Codes: SQLite table within sessions.db
Backup Script
Create /usr/local/bin/flynn-backup.sh:
#!/bin/bash
set -e
BACKUP_DIR="/var/backups/flynn"
DATA_DIR="/var/lib/flynn"
CONFIG_DIR="/etc/flynn"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Stop Flynn
sudo systemctl stop flynn
# Create backup
tar -czf "$BACKUP_FILE" \
"$CONFIG_DIR/config.yaml" \
"$DATA_DIR/sessions.db" \
"$DATA_DIR/vectors.db" \
"$DATA_DIR/memory/"
# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete
# Restart Flynn
sudo systemctl start flynn
echo "Backup created: $BACKUP_FILE"
Make executable:
sudo chmod +x /usr/local/bin/flynn-backup.sh
Cron Job
Add to root crontab:
sudo crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1
Restore
# Stop Flynn
sudo systemctl stop flynn
# Extract backup
sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /
# Start Flynn
sudo systemctl start flynn
Database Maintenance
Run SQLite vacuum periodically:
sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
sqlite3 /var/lib/flynn/vectors.db "VACUUM;"
Add to crontab (monthly):
0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1
Performance Tuning
Node.js Tuning
Set Node.js options for production:
# In systemd service
Environment="NODE_OPTIONS=--max-old-space-size=2048"
# Or via environment variable
export NODE_OPTIONS="--max-old-space-size=2048"
Context Management
Optimize compaction settings:
agents:
default:
compaction:
thresholdPct: 75 # Trigger earlier
keepTurns: 6 # Keep more context
summaryMaxTokens: 2048 # Better summaries
SQLite Performance
Enable WAL mode:
sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;" # 64MB
Model Routing
Configure tiers for optimal cost/latency:
models:
router:
tiers:
fast: 'anthropic:claude-haiku-4-20250514' # Quick tasks
default: 'anthropic:claude-sonnet-4-20250514' # General use
complex: 'anthropic:claude-opus-4-20250514' # Complex reasoning
local: 'ollama:llama3' # Fallback
Caching (Future)
Consider adding caching for:
- Repeated tool calls
- Memory search results
- Model responses for common queries
Scaling Considerations
Single-Operator Scope
Flynn is designed for a single operator with multiple concurrent users. Limitations:
- Max Concurrent Sessions: ~100 (depends on model rate limits)
- Throughput: ~10-20 messages/second (varies by model)
- Memory Usage: 2-4GB for moderate usage
When to Scale Up
Consider scaling if:
- Consistent CPU usage > 80%
- Memory usage > 4GB
- Frequent rate limiting from model providers
- Slow response times > 30 seconds
Scaling Strategies
-
Horizontal Scaling: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)
-
Vertical Scaling: Increase server resources (CPU, memory)
-
Multi-Instance Architecture (future):
- Shared session storage (PostgreSQL/Redis)
- Message queue for request distribution
- Session affinity for stateful connections
Cost Optimization
- Use local models for non-critical tasks
- Cache embeddings
- Optimize compaction to reduce token usage
- Use efficient models for delegated tasks
Troubleshooting Production Issues
Service Won't Start
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -n 50 --no-pager
# Validate config
flynn doctor --config /etc/flynn/config.yaml
High Memory Usage
# Check memory
free -h
# Check process memory
ps aux | grep flynn
# Restart service
sudo systemctl restart flynn
Gateway Connection Issues
# Check if port is listening
sudo ss -tlnp | grep 18800
# Check firewall
sudo ufw status
# Test connectivity
curl http://localhost:18800/health
Slow Response Times
# Check CPU usage
top
# Check model provider status
# Verify API keys are valid
# Check network latency
# Enable debug logging
DEBUG='*' sudo systemctl restart flynn
For additional help, see:
- TROUBLESHOOTING.md
- README.md
- GitHub Issues