Files
flynn/docs/deployment/PRODUCTION.md
T

871 lines
18 KiB
Markdown

# Production Deployment Guide
This guide covers deploying Flynn in a production environment.
## Table of Contents
- [Prerequisites](#prerequisites)
- [Docker Deployment](#docker-deployment)
- [Systemd Service](#systemd-service)
- [Security](#security)
- [Configuration](#configuration)
- [Monitoring](#monitoring)
- [Backup & Recovery](#backup--recovery)
- [Performance Tuning](#performance-tuning)
- [Scaling Considerations](#scaling-considerations)
## Prerequisites
### System Requirements
- **OS**: Linux (Ubuntu 22.04+ recommended) or macOS
- **Node.js**: >= 22.0.0
- **Memory**: Minimum 2GB, 4GB+ recommended
- **Disk**: 10GB+ for sessions, memory, and vectors
- **Docker**: Required for sandbox features (optional)
### Network Requirements
- Public IP or VPN (Tailscale recommended) for remote access
- Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
- Outbound HTTPS access for model providers and web tools
### External Services (Optional)
- **Model Providers**: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
- **Email**: SMTP server for email notifications
- **Object Storage**: MinIO or S3 for backups (optional)
## Docker Deployment
### Quick Start
Using the provided `docker-compose.yml`:
```bash
# Clone repository
git clone <repo-url>
cd flynn
# Create config
cp config/default.yaml config/production.yaml
# Edit config/production.yaml with your settings
# Start services
docker compose up -d
# View logs
docker compose logs -f
```
### Dockerfile
Use the repo Dockerfile: `Dockerfile`.
Notes:
- Multi-stage build (builder + runtime).
- Uses `corepack` + `pnpm` with `pnpm-lock.yaml` for reproducible installs.
- Exposes port `18800` and runs `dist/cli/index.js start`.
### Docker Compose Configuration
Use the repo compose file: `docker-compose.yml`.
The important parts to customize:
- Mount your config: `./config/production.yaml:/config/config.yaml:ro`
- Set provider keys (`ANTHROPIC_API_KEY`, etc.)
- Optionally set gateway token auth (`FLYNN_SERVER_TOKEN`)
### Environment Variables
```bash
# Node environment
export NODE_ENV=production
# Config path
export FLYNN_CONFIG=/path/to/config.yaml
# Data directory (default: ~/.local/share/flynn)
export FLYNN_DATA_DIR=/var/lib/flynn
# Optional: Override model provider credentials
export ANTHROPIC_API_KEY=sk-...
export OPENAI_API_KEY=sk-...
```
## Systemd Service
### Service File
Create `/etc/systemd/system/flynn.service`:
```ini
[Unit]
Description=Flynn AI Assistant Daemon
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=flynn
Group=flynn
WorkingDirectory=/opt/flynn
Environment="NODE_ENV=production"
Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
Environment="FLYNN_DATA_DIR=/var/lib/flynn"
ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=flynn
# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run
# Resource limits
MemoryLimit=2G
MemorySwap=0
CPUQuota=200%
[Install]
WantedBy=multi-user.target
```
### Create Flynn User
```bash
# Create user and group
sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
sudo groupadd flynn
# Create directories
sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn
# Copy binaries and config
sudo cp -r dist/* /opt/flynn/
sudo cp config/production.yaml /etc/flynn/config.yaml
sudo chown -R root:root /opt/flynn /etc/flynn
sudo chmod 644 /etc/flynn/config.yaml
```
### Enable and Start Service
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable service (start on boot)
sudo systemctl enable flynn
# Start service
sudo systemctl start flynn
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -f
# Restart service
sudo systemctl restart flynn
```
### Service Management
```bash
# Stop service
sudo systemctl stop flynn
# Reload config (requires restart)
sudo systemctl restart flynn
# Check if running
sudo systemctl is-active flynn
# View recent logs
sudo journalctl -u flynn -n 100 --no-pager
```
## Security
### Secrets Management
Never commit secrets to version control. Use one of these approaches:
#### Environment Variables
```yaml
# config/production.yaml
models:
default:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: '${ANTHROPIC_API_KEY}'
```
Set in `/etc/flynn/.env` or systemd service file:
```ini
Environment="ANTHROPIC_API_KEY=sk-..."
```
#### HashiCorp Vault (Advanced)
Use a secrets manager and inject at runtime:
```bash
vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
rm /tmp/anthropic_key.txt
```
### Authentication
#### Gateway Auth
```yaml
# config/production.yaml
server:
token: 'your-random-token-here' # Generate with: openssl rand -hex 32
tailscale_identity: true
auth_http: true
lock: false
```
Generate a secure token:
```bash
openssl rand -hex 32
```
#### Safe Defaults (Recommended)
These defaults align with `docs/security/SAFE_PERSONAL_AGENT.md`:
```yaml
pairing:
enabled: true
tools:
profile: messaging
sandbox:
enabled: true
```
#### Channel Whitelists
Restrict who can interact with Flynn:
```yaml
channels:
telegram:
allowedChatIds: ['123456789'] # Your Telegram chat ID
discord:
allowedGuildIds: ['987654321098765432']
allowedChannelIds: ['123456789012345678']
slack:
allowedChannelIds: ['C12345678']
signingSecret: '${SLACK_SIGNING_SECRET}'
```
### Network Security
#### Firewall
```bash
# Ubuntu/Debian (ufw)
sudo ufw allow 22/tcp # SSH
sudo ufw allow 18800/tcp # Flynn gateway
sudo ufw enable
# CentOS/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=18800/tcp
sudo firewall-cmd --reload
```
#### Reverse Proxy (Nginx)
Place Flynn behind Nginx for TLS:
```nginx
server {
listen 443 ssl http2;
server_name flynn.example.com;
ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;
# WebSocket upgrade
location / {
proxy_pass http://localhost:18800;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Health check endpoint (no auth required)
location /health {
proxy_pass http://localhost:18800/health;
access_log off;
}
}
```
Obtain TLS certificate with Let's Encrypt:
```bash
sudo certbot --nginx -d flynn.example.com
```
### File Permissions
```bash
# Data directory
sudo chmod 750 /var/lib/flynn
sudo chown flynn:flynn /var/lib/flynn
# Config file
sudo chmod 640 /etc/flynn/config.yaml
sudo chown root:flynn /etc/flynn/config.yaml
# Logs
sudo chmod 750 /var/log/flynn
sudo chown flynn:flynn /var/log/flynn
```
### Sandbox Security
Docker sandbox adds isolation but requires careful configuration:
```yaml
# config/production.yaml
sandbox:
enabled: true
image: 'node:22-alpine'
dockerSocket: '/var/run/docker.sock'
resourceLimits:
memory: '512m'
cpus: '0.5'
timeoutSec: 60
networkMode: 'none' # No network access
```
Ensure Docker is secured:
```bash
# Run Docker as Flynn user
sudo usermod -aG docker flynn
# Configure Docker daemon security
sudo vim /etc/docker/daemon.json
```
```json
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"live-restore": true,
"userland-proxy": false
}
```
## Configuration
### Production Config Template
```yaml
# config/production.yaml
# Base config for production deployment
# ── Gateway ───────────────────────────────────────────────────────────────
gateway:
enabled: true
port: 18800
auth:
token: '${GATEWAY_TOKEN}'
trustTailscaleIdentity: true
applyToHttp: true
lock:
enabled: true
tailscaleServe:
enabled: false # Set to true to expose via Tailscale
hostname: 'flynn'
port: 443
# ── Models ─────────────────────────────────────────────────────────────────
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}'
model: 'claude-sonnet-4-20250514'
maxTokens: 4096
router:
tiers:
default: 'anthropic:claude-sonnet-4-20250514'
fast: 'anthropic:claude-haiku-4-20250514'
complex: 'anthropic:claude-opus-4-20250514'
local: 'ollama:llama3'
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
retry:
maxAttempts: 3
initialDelayMs: 1000
multiplier: 2
maxDelayMs: 30000
# ── Channels ───────────────────────────────────────────────────────────────
channels:
telegram:
enabled: true
token: '${TELEGRAM_BOT_TOKEN}'
allowedChatIds: ['123456789']
discord:
enabled: false
slack:
enabled: false
whatsapp:
enabled: false
# ── Sessions ───────────────────────────────────────────────────────────────
sessions:
ttl: '7d'
maxSessions: 100
# ── Memory ────────────────────────────────────────────────────────────────
memory:
enabled: true
embeddings:
provider: 'openai'
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small'
# ── Tools ─────────────────────────────────────────────────────────────────
tools:
policy: 'coding' # Restrict tool access
executor:
defaultTimeoutMs: 30000
maxOutputBytes: 51200
sandbox:
enabled: false # Enable if using Docker
# ── Agents ────────────────────────────────────────────────────────────────
agents:
default:
modelTier: 'default'
toolPolicy: 'coding'
compaction:
thresholdPct: 80
keepTurns: 4
summaryMaxTokens: 1024
# ── Automation ────────────────────────────────────────────────────────────
automation:
cron:
enabled: false
webhooks:
enabled: false
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
# ── Logging ───────────────────────────────────────────────────────────────
logging:
level: 'info' # debug, info, warn, error
```
### Config Validation
Validate config before starting:
```bash
flynn doctor --config /etc/flynn/config.yaml
```
## Monitoring
### Health Checks
Flynn provides a health check endpoint:
```bash
# HTTP health check
curl http://localhost:18800/health
# Response
{
"status": "ok",
"version": "0.1.0",
"uptime": 12345
}
```
### Logs
#### Journalctl (systemd)
```bash
# Follow logs
sudo journalctl -u flynn -f
# View last 100 lines
sudo journalctl -u flynn -n 100 --no-pager
# View logs since yesterday
sudo journalctl -u flynn --since yesterday
# Search for errors
sudo journalctl -u flynn | grep -i error
```
#### Log Rotation
Configure logrotate for systemd journal:
```bash
sudo vim /etc/systemd/journald.conf
```
```
[Journal]
SystemMaxUse=100M
MaxRetentionSec=7day
```
Restart systemd:
```bash
sudo systemctl restart systemd-journald
```
### Heartbeat Monitor
Enable built-in heartbeat monitoring:
```yaml
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
notifications:
- type: 'telegram'
chatId: '123456789'
- type: 'webhook'
url: 'https://hooks.slack.com/services/...'
```
### External Monitoring
#### Prometheus (Optional)
Use Node.js prom-client for metrics (not currently implemented):
```yaml
# Future feature
monitoring:
prometheus:
enabled: true
port: 9090
```
#### Uptime Monitoring
Use external services:
- UptimeRobot
- Pingdom
- Better Uptime
Monitor:
- Gateway HTTP health endpoint
- WebSocket connection
- Response time
## Backup & Recovery
### What to Backup
1. **Configuration**: `/etc/flynn/config.yaml`
2. **Sessions**: SQLite database at `~/.local/share/flynn/sessions.db`
3. **Memory Files**: `~/.local/share/flynn/memory/`
4. **Vectors**: SQLite database at `~/.local/share/flynn/vectors.db`
5. **Pairing Codes**: SQLite table within sessions.db
### Backup Script
Create `/usr/local/bin/flynn-backup.sh`:
```bash
#!/bin/bash
set -e
BACKUP_DIR="/var/backups/flynn"
DATA_DIR="/var/lib/flynn"
CONFIG_DIR="/etc/flynn"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Stop Flynn
sudo systemctl stop flynn
# Create backup
tar -czf "$BACKUP_FILE" \
"$CONFIG_DIR/config.yaml" \
"$DATA_DIR/sessions.db" \
"$DATA_DIR/vectors.db" \
"$DATA_DIR/memory/"
# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete
# Restart Flynn
sudo systemctl start flynn
echo "Backup created: $BACKUP_FILE"
```
Make executable:
```bash
sudo chmod +x /usr/local/bin/flynn-backup.sh
```
### Cron Job
Add to root crontab:
```bash
sudo crontab -e
```
```
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1
```
### Restore
```bash
# Stop Flynn
sudo systemctl stop flynn
# Extract backup
sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /
# Start Flynn
sudo systemctl start flynn
```
### Database Maintenance
Run SQLite vacuum periodically:
```bash
sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
sqlite3 /var/lib/flynn/vectors.db "VACUUM;"
```
Add to crontab (monthly):
```
0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1
```
## Performance Tuning
### Node.js Tuning
Set Node.js options for production:
```bash
# In systemd service
Environment="NODE_OPTIONS=--max-old-space-size=2048"
# Or via environment variable
export NODE_OPTIONS="--max-old-space-size=2048"
```
### Context Management
Optimize compaction settings:
```yaml
agents:
default:
compaction:
thresholdPct: 75 # Trigger earlier
keepTurns: 6 # Keep more context
summaryMaxTokens: 2048 # Better summaries
```
### SQLite Performance
Enable WAL mode:
```bash
sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;" # 64MB
```
### Model Routing
Configure tiers for optimal cost/latency:
```yaml
models:
router:
tiers:
fast: 'anthropic:claude-haiku-4-20250514' # Quick tasks
default: 'anthropic:claude-sonnet-4-20250514' # General use
complex: 'anthropic:claude-opus-4-20250514' # Complex reasoning
local: 'ollama:llama3' # Fallback
```
### Caching (Future)
Consider adding caching for:
- Repeated tool calls
- Memory search results
- Model responses for common queries
## Scaling Considerations
### Single-Operator Scope
Flynn is designed for a single operator with multiple concurrent users. Limitations:
- **Max Concurrent Sessions**: ~100 (depends on model rate limits)
- **Throughput**: ~10-20 messages/second (varies by model)
- **Memory Usage**: 2-4GB for moderate usage
### When to Scale Up
Consider scaling if:
- Consistent CPU usage > 80%
- Memory usage > 4GB
- Frequent rate limiting from model providers
- Slow response times > 30 seconds
### Scaling Strategies
1. **Horizontal Scaling**: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)
2. **Vertical Scaling**: Increase server resources (CPU, memory)
3. **Multi-Instance Architecture** (future):
- Shared session storage (PostgreSQL/Redis)
- Message queue for request distribution
- Session affinity for stateful connections
### Cost Optimization
- Use local models for non-critical tasks
- Cache embeddings
- Optimize compaction to reduce token usage
- Use efficient models for delegated tasks
## Troubleshooting Production Issues
### Service Won't Start
```bash
# Check status
sudo systemctl status flynn
# View logs
sudo journalctl -u flynn -n 50 --no-pager
# Validate config
flynn doctor --config /etc/flynn/config.yaml
```
### High Memory Usage
```bash
# Check memory
free -h
# Check process memory
ps aux | grep flynn
# Restart service
sudo systemctl restart flynn
```
### Gateway Connection Issues
```bash
# Check if port is listening
sudo ss -tlnp | grep 18800
# Check firewall
sudo ufw status
# Test connectivity
curl http://localhost:18800/health
```
### Slow Response Times
```bash
# Check CPU usage
top
# Check model provider status
# Verify API keys are valid
# Check network latency
# Enable debug logging
DEBUG='*' sudo systemctl restart flynn
```
---
For additional help, see:
- [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md)
- [README.md](../../README.md)
- GitHub Issues