# Production Deployment Guide This guide covers deploying Flynn in a production environment. ## Table of Contents - [Prerequisites](#prerequisites) - [Docker Deployment](#docker-deployment) - [Nix Deployment](#nix-deployment) - [PaaS Deployment](#paas-deployment) - [Systemd Service](#systemd-service) - [Security](#security) - [Configuration](#configuration) - [Monitoring](#monitoring) - [Backup & Recovery](#backup--recovery) - [Performance Tuning](#performance-tuning) - [Scaling Considerations](#scaling-considerations) ## Prerequisites ### System Requirements - **OS**: Linux (Ubuntu 22.04+ recommended) or macOS - **Node.js**: >= 22.0.0 - **Memory**: Minimum 2GB, 4GB+ recommended - **Disk**: 10GB+ for sessions, memory, and vectors - **Docker**: Required for sandbox features (optional) ### Network Requirements - Public IP or VPN (Tailscale recommended) for remote access - Open ports: 18800 (gateway), optional 443 (Tailscale Serve) - Outbound HTTPS access for model providers and web tools ### External Services (Optional) - **Model Providers**: Anthropic, OpenAI, GitHub Models, etc. (API keys required) - **Email**: SMTP server for email notifications - **Object Storage**: MinIO or S3 for backups (optional) ## Docker Deployment ### Quick Start Using the provided `docker-compose.yml`: ```bash # Clone repository git clone cd flynn # Create config cp config/default.yaml config/production.yaml # Edit config/production.yaml with your settings # Start services docker compose up -d # View logs docker compose logs -f ``` ### Dockerfile Use the repo Dockerfile: `Dockerfile`. Notes: - Multi-stage build (builder + runtime). - Uses `corepack` + `pnpm` with `pnpm-lock.yaml` for reproducible installs. - Exposes port `18800` and runs `dist/cli/index.js start`. ### Docker Compose Configuration Use the repo compose file: `docker-compose.yml`. The important parts to customize: - Mount your config: `./config/production.yaml:/config/config.yaml:ro` - Set provider keys (`ANTHROPIC_API_KEY`, etc.) - Optionally set gateway token auth (`FLYNN_SERVER_TOKEN`) ### Environment Variables ```bash # Node environment export NODE_ENV=production # Config path export FLYNN_CONFIG=/path/to/config.yaml # Data directory (default: ~/.local/share/flynn) export FLYNN_DATA_DIR=/var/lib/flynn # Optional: Override model provider credentials export ANTHROPIC_API_KEY=sk-... export OPENAI_API_KEY=sk-... ``` ## Nix Deployment If you use Nix, this repo ships a flake (package + dev shell + optional NixOS module). See `docs/deployment/NIX.md`. ## PaaS Deployment Templates and notes for Fly.io / Railway / Render are in `docs/deployment/PAAS.md`. ## Systemd Service ### Service File Create `/etc/systemd/system/flynn.service`: ```ini [Unit] Description=Flynn AI Assistant Daemon After=network.target Wants=network-online.target [Service] Type=simple User=flynn Group=flynn WorkingDirectory=/opt/flynn Environment="NODE_ENV=production" Environment="FLYNN_CONFIG=/etc/flynn/config.yaml" Environment="FLYNN_DATA_DIR=/var/lib/flynn" ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start Restart=always RestartSec=10 StandardOutput=journal StandardError=journal SyslogIdentifier=flynn # Security hardening NoNewPrivileges=true PrivateTmp=true ProtectSystem=strict ProtectHome=true ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run # Resource limits MemoryLimit=2G MemorySwap=0 CPUQuota=200% [Install] WantedBy=multi-user.target ``` ### Create Flynn User ```bash # Create user and group sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn sudo groupadd flynn # Create directories sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn # Copy binaries and config sudo cp -r dist/* /opt/flynn/ sudo cp config/production.yaml /etc/flynn/config.yaml sudo chown -R root:root /opt/flynn /etc/flynn sudo chmod 644 /etc/flynn/config.yaml ``` ### Enable and Start Service ```bash # Reload systemd sudo systemctl daemon-reload # Enable service (start on boot) sudo systemctl enable flynn # Start service sudo systemctl start flynn # Check status sudo systemctl status flynn # View logs sudo journalctl -u flynn -f # Restart service sudo systemctl restart flynn ``` ### Service Management ```bash # Stop service sudo systemctl stop flynn # Reload config (requires restart) sudo systemctl restart flynn # Check if running sudo systemctl is-active flynn # View recent logs sudo journalctl -u flynn -n 100 --no-pager ``` ## Security ### Secrets Management Never commit secrets to version control. Use one of these approaches: #### Environment Variables ```yaml # config/production.yaml models: default: provider: anthropic model: claude-sonnet-4-20250514 api_key: '${ANTHROPIC_API_KEY}' ``` Set in `/etc/flynn/.env` or systemd service file: ```ini Environment="ANTHROPIC_API_KEY=sk-..." ``` #### HashiCorp Vault (Advanced) Use a secrets manager and inject at runtime: ```bash vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt) rm /tmp/anthropic_key.txt ``` ### Authentication #### Gateway Auth ```yaml # config/production.yaml server: token: 'your-random-token-here' # Generate with: openssl rand -hex 32 tailscale_identity: true auth_http: true lock: false max_request_body_bytes: 1048576 ws_rate_limit: enabled: true capacity: 30 refill_per_sec: 15 max_violations: 8 violation_window_ms: 10000 ``` Generate a secure token: ```bash openssl rand -hex 32 ``` #### Safe Defaults (Recommended) These defaults align with `docs/security/SAFE_PERSONAL_AGENT.md`: ```yaml pairing: enabled: true tools: profile: messaging sandbox: enabled: true ``` #### Channel Whitelists Restrict who can interact with Flynn: ```yaml channels: telegram: allowedChatIds: ['123456789'] # Your Telegram chat ID discord: allowedGuildIds: ['987654321098765432'] allowedChannelIds: ['123456789012345678'] slack: allowedChannelIds: ['C12345678'] signingSecret: '${SLACK_SIGNING_SECRET}' ``` ### Network Security #### Firewall ```bash # Ubuntu/Debian (ufw) sudo ufw allow 22/tcp # SSH sudo ufw allow 18800/tcp # Flynn gateway sudo ufw enable # CentOS/RHEL (firewalld) sudo firewall-cmd --permanent --add-port=18800/tcp sudo firewall-cmd --reload ``` #### Reverse Proxy (Nginx) Place Flynn behind Nginx for TLS: ```nginx server { listen 443 ssl http2; server_name flynn.example.com; ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem; # WebSocket upgrade location / { proxy_pass http://localhost:18800; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; # Timeouts proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; } # Health check endpoint (no auth required) location /health { proxy_pass http://localhost:18800/health; access_log off; } } ``` Obtain TLS certificate with Let's Encrypt: ```bash sudo certbot --nginx -d flynn.example.com ``` ### File Permissions ```bash # Data directory sudo chmod 750 /var/lib/flynn sudo chown flynn:flynn /var/lib/flynn # Config file sudo chmod 640 /etc/flynn/config.yaml sudo chown root:flynn /etc/flynn/config.yaml # Logs sudo chmod 750 /var/log/flynn sudo chown flynn:flynn /var/log/flynn ``` ### Sandbox Security Docker sandbox adds isolation but requires careful configuration: ```yaml # config/production.yaml sandbox: enabled: true image: 'node:22-alpine' dockerSocket: '/var/run/docker.sock' resourceLimits: memory: '512m' cpus: '0.5' timeoutSec: 60 networkMode: 'none' # No network access ``` Ensure Docker is secured: ```bash # Run Docker as Flynn user sudo usermod -aG docker flynn # Configure Docker daemon security sudo vim /etc/docker/daemon.json ``` ```json { "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }, "live-restore": true, "userland-proxy": false } ``` ## Configuration ### Production Config Template ```yaml # config/production.yaml # Base config for production deployment # ── Gateway ─────────────────────────────────────────────────────────────── gateway: enabled: true port: 18800 auth: token: '${GATEWAY_TOKEN}' trustTailscaleIdentity: true applyToHttp: true lock: enabled: true tailscaleServe: enabled: false # Set to true to expose via Tailscale hostname: 'flynn' port: 443 # ── Models ───────────────────────────────────────────────────────────────── models: default: anthropic: apiKey: '${ANTHROPIC_API_KEY}' model: 'claude-sonnet-4-20250514' maxTokens: 4096 router: tiers: default: 'anthropic:claude-sonnet-4-20250514' fast: 'anthropic:claude-haiku-4-20250514' complex: 'anthropic:claude-opus-4-20250514' local: 'ollama:llama3' fallbackChain: - 'github:claude-sonnet-4-5' - 'local:ollama:llama3' retry: maxAttempts: 3 initialDelayMs: 1000 multiplier: 2 maxDelayMs: 30000 # ── Channels ─────────────────────────────────────────────────────────────── channels: telegram: enabled: true token: '${TELEGRAM_BOT_TOKEN}' allowedChatIds: ['123456789'] discord: enabled: false slack: enabled: false whatsapp: enabled: false # ── Sessions ─────────────────────────────────────────────────────────────── sessions: ttl: '7d' maxSessions: 100 # ── Memory ──────────────────────────────────────────────────────────────── memory: enabled: true embeddings: provider: 'openai' openai: apiKey: '${OPENAI_API_KEY}' model: 'text-embedding-3-small' # ── Tools ───────────────────────────────────────────────────────────────── tools: policy: 'coding' # Restrict tool access executor: defaultTimeoutMs: 30000 maxOutputBytes: 51200 sandbox: enabled: false # Enable if using Docker # ── Agents ──────────────────────────────────────────────────────────────── agents: default: modelTier: 'default' toolPolicy: 'coding' compaction: thresholdPct: 80 keepTurns: 4 summaryMaxTokens: 1024 # ── Automation ──────────────────────────────────────────────────────────── automation: cron: enabled: false webhooks: enabled: false heartbeat: enabled: true interval: '5m' checks: - 'gateway' - 'model' - 'channels' - 'memory' - 'disk' notifications: - type: 'telegram' chatId: '123456789' # ── Logging ─────────────────────────────────────────────────────────────── logging: level: 'info' # debug, info, warn, error ``` ### Config Validation Validate config before starting: ```bash flynn doctor --config /etc/flynn/config.yaml ``` ## Monitoring ### Health Checks Flynn provides a health check endpoint: ```bash # HTTP health check curl http://localhost:18800/health # Response { "status": "ok", "version": "0.1.0", "uptime": 12345 } ``` ### Logs #### Journalctl (systemd) ```bash # Follow logs sudo journalctl -u flynn -f # View last 100 lines sudo journalctl -u flynn -n 100 --no-pager # View logs since yesterday sudo journalctl -u flynn --since yesterday # Search for errors sudo journalctl -u flynn | grep -i error ``` #### Log Rotation Configure logrotate for systemd journal: ```bash sudo vim /etc/systemd/journald.conf ``` ``` [Journal] SystemMaxUse=100M MaxRetentionSec=7day ``` Restart systemd: ```bash sudo systemctl restart systemd-journald ``` ### Heartbeat Monitor Enable built-in heartbeat monitoring: ```yaml automation: heartbeat: enabled: true interval: '5m' checks: - 'gateway' - 'model' - 'channels' - 'memory' - 'disk' notifications: - type: 'telegram' chatId: '123456789' - type: 'webhook' url: 'https://hooks.slack.com/services/...' ``` ### External Monitoring #### Prometheus (Optional) Use Node.js prom-client for metrics (not currently implemented): ```yaml # Future feature monitoring: prometheus: enabled: true port: 9090 ``` #### Uptime Monitoring Use external services: - UptimeRobot - Pingdom - Better Uptime Monitor: - Gateway HTTP health endpoint - WebSocket connection - Response time ## Backup & Recovery ### What to Backup 1. **Configuration**: `/etc/flynn/config.yaml` 2. **Sessions**: SQLite database at `~/.local/share/flynn/sessions.db` 3. **Memory Files**: `~/.local/share/flynn/memory/` 4. **Vectors**: SQLite database at `~/.local/share/flynn/vectors.db` 5. **Pairing Codes**: SQLite table within sessions.db ### Backup Script Create `/usr/local/bin/flynn-backup.sh`: ```bash #!/bin/bash set -e BACKUP_DIR="/var/backups/flynn" DATA_DIR="/var/lib/flynn" CONFIG_DIR="/etc/flynn" DATE=$(date +%Y%m%d_%H%M%S) BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz" # Create backup directory mkdir -p "$BACKUP_DIR" # Stop Flynn sudo systemctl stop flynn # Create backup tar -czf "$BACKUP_FILE" \ "$CONFIG_DIR/config.yaml" \ "$DATA_DIR/sessions.db" \ "$DATA_DIR/vectors.db" \ "$DATA_DIR/memory/" # Compress old backups (keep last 7 daily, 4 weekly, 12 monthly) find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete # Restart Flynn sudo systemctl start flynn echo "Backup created: $BACKUP_FILE" ``` Make executable: ```bash sudo chmod +x /usr/local/bin/flynn-backup.sh ``` ### Cron Job Add to root crontab: ```bash sudo crontab -e ``` ``` # Daily backup at 2 AM 0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1 ``` ### Restore ```bash # Stop Flynn sudo systemctl stop flynn # Extract backup sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C / # Start Flynn sudo systemctl start flynn ``` ### Database Maintenance Run SQLite vacuum periodically: ```bash sqlite3 /var/lib/flynn/sessions.db "VACUUM;" sqlite3 /var/lib/flynn/vectors.db "VACUUM;" ``` Add to crontab (monthly): ``` 0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1 ``` ## Performance Tuning ### Node.js Tuning Set Node.js options for production: ```bash # In systemd service Environment="NODE_OPTIONS=--max-old-space-size=2048" # Or via environment variable export NODE_OPTIONS="--max-old-space-size=2048" ``` ### Context Management Optimize compaction settings: ```yaml agents: default: compaction: thresholdPct: 75 # Trigger earlier keepTurns: 6 # Keep more context summaryMaxTokens: 2048 # Better summaries ``` ### SQLite Performance Enable WAL mode: ```bash sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;" sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;" sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;" # 64MB ``` ### Model Routing Configure tiers for optimal cost/latency: ```yaml models: router: tiers: fast: 'anthropic:claude-haiku-4-20250514' # Quick tasks default: 'anthropic:claude-sonnet-4-20250514' # General use complex: 'anthropic:claude-opus-4-20250514' # Complex reasoning local: 'ollama:llama3' # Fallback ``` ### Caching (Future) Consider adding caching for: - Repeated tool calls - Memory search results - Model responses for common queries ## Scaling Considerations ### Single-Operator Scope Flynn is designed for a single operator with multiple concurrent users. Limitations: - **Max Concurrent Sessions**: ~100 (depends on model rate limits) - **Throughput**: ~10-20 messages/second (varies by model) - **Memory Usage**: 2-4GB for moderate usage ### When to Scale Up Consider scaling if: - Consistent CPU usage > 80% - Memory usage > 4GB - Frequent rate limiting from model providers - Slow response times > 30 seconds ### Scaling Strategies 1. **Horizontal Scaling**: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful) 2. **Vertical Scaling**: Increase server resources (CPU, memory) 3. **Multi-Instance Architecture** (future): - Shared session storage (PostgreSQL/Redis) - Message queue for request distribution - Session affinity for stateful connections ### Cost Optimization - Use local models for non-critical tasks - Cache embeddings - Optimize compaction to reduce token usage - Use efficient models for delegated tasks ## Troubleshooting Production Issues ### Service Won't Start ```bash # Check status sudo systemctl status flynn # View logs sudo journalctl -u flynn -n 50 --no-pager # Validate config flynn doctor --config /etc/flynn/config.yaml ``` ### High Memory Usage ```bash # Check memory free -h # Check process memory ps aux | grep flynn # Restart service sudo systemctl restart flynn ``` ### Gateway Connection Issues ```bash # Check if port is listening sudo ss -tlnp | grep 18800 # Check firewall sudo ufw status # Test connectivity curl http://localhost:18800/health ``` ### Slow Response Times ```bash # Check CPU usage top # Check model provider status # Verify API keys are valid # Check network latency # Enable debug logging DEBUG='*' sudo systemctl restart flynn ``` --- For additional help, see: - [TROUBLESHOOTING.md](../../TROUBLESHOOTING.md) - [README.md](../../README.md) - GitHub Issues