Files
flynn/docs/deployment/PRODUCTION.md
T

18 KiB

Production Deployment Guide

This guide covers deploying Flynn in a production environment.

Table of Contents

Prerequisites

System Requirements

  • OS: Linux (Ubuntu 22.04+ recommended) or macOS
  • Node.js: >= 22.0.0
  • Memory: Minimum 2GB, 4GB+ recommended
  • Disk: 10GB+ for sessions, memory, and vectors
  • Docker: Required for sandbox features (optional)

Network Requirements

  • Public IP or VPN (Tailscale recommended) for remote access
  • Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
  • Outbound HTTPS access for model providers and web tools

External Services (Optional)

  • Model Providers: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
  • Email: SMTP server for email notifications
  • Object Storage: MinIO or S3 for backups (optional)

Docker Deployment

Quick Start

Using the provided docker-compose.yml:

# Clone repository
git clone <repo-url>
cd flynn

# Create config
cp config/default.yaml config/production.yaml
# Edit config/production.yaml with your settings

# Start services
docker compose up -d

# View logs
docker compose logs -f

Dockerfile

Use the repo Dockerfile: Dockerfile.

Notes:

  • Multi-stage build (builder + runtime).
  • Uses corepack + pnpm with pnpm-lock.yaml for reproducible installs.
  • Exposes port 18800 and runs dist/cli/index.js start.

Docker Compose Configuration

Use the repo compose file: docker-compose.yml.

The important parts to customize:

  • Mount your config: ./config/production.yaml:/config/config.yaml:ro
  • Set provider keys (ANTHROPIC_API_KEY, etc.)
  • Optionally set gateway token auth (FLYNN_SERVER_TOKEN)

Environment Variables

# Node environment
export NODE_ENV=production

# Config path
export FLYNN_CONFIG=/path/to/config.yaml

# Data directory (default: ~/.local/share/flynn)
export FLYNN_DATA_DIR=/var/lib/flynn

# Optional: Override model provider credentials
export ANTHROPIC_API_KEY=sk-...
export OPENAI_API_KEY=sk-...

Systemd Service

Service File

Create /etc/systemd/system/flynn.service:

[Unit]
Description=Flynn AI Assistant Daemon
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=flynn
Group=flynn
WorkingDirectory=/opt/flynn
Environment="NODE_ENV=production"
Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
Environment="FLYNN_DATA_DIR=/var/lib/flynn"
ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=flynn

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run

# Resource limits
MemoryLimit=2G
MemorySwap=0
CPUQuota=200%

[Install]
WantedBy=multi-user.target

Create Flynn User

# Create user and group
sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
sudo groupadd flynn

# Create directories
sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn

# Copy binaries and config
sudo cp -r dist/* /opt/flynn/
sudo cp config/production.yaml /etc/flynn/config.yaml
sudo chown -R root:root /opt/flynn /etc/flynn
sudo chmod 644 /etc/flynn/config.yaml

Enable and Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable flynn

# Start service
sudo systemctl start flynn

# Check status
sudo systemctl status flynn

# View logs
sudo journalctl -u flynn -f

# Restart service
sudo systemctl restart flynn

Service Management

# Stop service
sudo systemctl stop flynn

# Reload config (requires restart)
sudo systemctl restart flynn

# Check if running
sudo systemctl is-active flynn

# View recent logs
sudo journalctl -u flynn -n 100 --no-pager

Security

Secrets Management

Never commit secrets to version control. Use one of these approaches:

Environment Variables

# config/production.yaml
models:
  default:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: '${ANTHROPIC_API_KEY}'

Set in /etc/flynn/.env or systemd service file:

Environment="ANTHROPIC_API_KEY=sk-..."

HashiCorp Vault (Advanced)

Use a secrets manager and inject at runtime:

vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
rm /tmp/anthropic_key.txt

Authentication

Gateway Auth

# config/production.yaml
server:
  token: 'your-random-token-here'  # Generate with: openssl rand -hex 32
  tailscale_identity: true
  auth_http: true
  lock: false

Generate a secure token:

openssl rand -hex 32

These defaults align with docs/security/SAFE_PERSONAL_AGENT.md:

pairing:
  enabled: true

tools:
  profile: messaging

sandbox:
  enabled: true

Channel Whitelists

Restrict who can interact with Flynn:

channels:
  telegram:
    allowedChatIds: ['123456789']  # Your Telegram chat ID
  discord:
    allowedGuildIds: ['987654321098765432']
    allowedChannelIds: ['123456789012345678']
  slack:
    allowedChannelIds: ['C12345678']
    signingSecret: '${SLACK_SIGNING_SECRET}'

Network Security

Firewall

# Ubuntu/Debian (ufw)
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 18800/tcp  # Flynn gateway
sudo ufw enable

# CentOS/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=18800/tcp
sudo firewall-cmd --reload

Reverse Proxy (Nginx)

Place Flynn behind Nginx for TLS:

server {
    listen 443 ssl http2;
    server_name flynn.example.com;

    ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;

    # WebSocket upgrade
    location / {
        proxy_pass http://localhost:18800;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Health check endpoint (no auth required)
    location /health {
        proxy_pass http://localhost:18800/health;
        access_log off;
    }
}

Obtain TLS certificate with Let's Encrypt:

sudo certbot --nginx -d flynn.example.com

File Permissions

# Data directory
sudo chmod 750 /var/lib/flynn
sudo chown flynn:flynn /var/lib/flynn

# Config file
sudo chmod 640 /etc/flynn/config.yaml
sudo chown root:flynn /etc/flynn/config.yaml

# Logs
sudo chmod 750 /var/log/flynn
sudo chown flynn:flynn /var/log/flynn

Sandbox Security

Docker sandbox adds isolation but requires careful configuration:

# config/production.yaml
sandbox:
  enabled: true
  image: 'node:22-alpine'
  dockerSocket: '/var/run/docker.sock'
  resourceLimits:
    memory: '512m'
    cpus: '0.5'
    timeoutSec: 60
  networkMode: 'none'  # No network access

Ensure Docker is secured:

# Run Docker as Flynn user
sudo usermod -aG docker flynn

# Configure Docker daemon security
sudo vim /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "live-restore": true,
  "userland-proxy": false
}

Configuration

Production Config Template

# config/production.yaml
# Base config for production deployment

# ── Gateway ───────────────────────────────────────────────────────────────
gateway:
  enabled: true
  port: 18800
  auth:
    token: '${GATEWAY_TOKEN}'
    trustTailscaleIdentity: true
    applyToHttp: true
  lock:
    enabled: true
  tailscaleServe:
    enabled: false  # Set to true to expose via Tailscale
    hostname: 'flynn'
    port: 443

# ── Models ─────────────────────────────────────────────────────────────────
models:
  default:
    anthropic:
      apiKey: '${ANTHROPIC_API_KEY}'
      model: 'claude-sonnet-4-20250514'
      maxTokens: 4096

  router:
    tiers:
      default: 'anthropic:claude-sonnet-4-20250514'
      fast: 'anthropic:claude-haiku-4-20250514'
      complex: 'anthropic:claude-opus-4-20250514'
      local: 'ollama:llama3'

    fallbackChain:
      - 'github:claude-sonnet-4-5'
      - 'local:ollama:llama3'

    retry:
      maxAttempts: 3
      initialDelayMs: 1000
      multiplier: 2
      maxDelayMs: 30000

# ── Channels ───────────────────────────────────────────────────────────────
channels:
  telegram:
    enabled: true
    token: '${TELEGRAM_BOT_TOKEN}'
    allowedChatIds: ['123456789']

  discord:
    enabled: false

  slack:
    enabled: false

  whatsapp:
    enabled: false

# ── Sessions ───────────────────────────────────────────────────────────────
sessions:
  ttl: '7d'
  maxSessions: 100

# ── Memory ────────────────────────────────────────────────────────────────
memory:
  enabled: true
  embeddings:
    provider: 'openai'
    openai:
      apiKey: '${OPENAI_API_KEY}'
      model: 'text-embedding-3-small'

# ── Tools ─────────────────────────────────────────────────────────────────
tools:
  policy: 'coding'  # Restrict tool access

  executor:
    defaultTimeoutMs: 30000
    maxOutputBytes: 51200

  sandbox:
    enabled: false  # Enable if using Docker

# ── Agents ────────────────────────────────────────────────────────────────
agents:
  default:
    modelTier: 'default'
    toolPolicy: 'coding'
    compaction:
      thresholdPct: 80
      keepTurns: 4
      summaryMaxTokens: 1024

# ── Automation ────────────────────────────────────────────────────────────
automation:
  cron:
    enabled: false

  webhooks:
    enabled: false

  heartbeat:
    enabled: true
    interval: '5m'
    checks:
      - 'gateway'
      - 'model'
      - 'channels'
      - 'memory'
      - 'disk'
    notifications:
      - type: 'telegram'
        chatId: '123456789'

# ── Logging ───────────────────────────────────────────────────────────────
logging:
  level: 'info'  # debug, info, warn, error

Config Validation

Validate config before starting:

flynn doctor --config /etc/flynn/config.yaml

Monitoring

Health Checks

Flynn provides a health check endpoint:

# HTTP health check
curl http://localhost:18800/health

# Response
{
  "status": "ok",
  "version": "0.1.0",
  "uptime": 12345
}

Logs

Journalctl (systemd)

# Follow logs
sudo journalctl -u flynn -f

# View last 100 lines
sudo journalctl -u flynn -n 100 --no-pager

# View logs since yesterday
sudo journalctl -u flynn --since yesterday

# Search for errors
sudo journalctl -u flynn | grep -i error

Log Rotation

Configure logrotate for systemd journal:

sudo vim /etc/systemd/journald.conf
[Journal]
SystemMaxUse=100M
MaxRetentionSec=7day

Restart systemd:

sudo systemctl restart systemd-journald

Heartbeat Monitor

Enable built-in heartbeat monitoring:

automation:
  heartbeat:
    enabled: true
    interval: '5m'
    checks:
      - 'gateway'
      - 'model'
      - 'channels'
      - 'memory'
      - 'disk'
    notifications:
      - type: 'telegram'
        chatId: '123456789'
      - type: 'webhook'
        url: 'https://hooks.slack.com/services/...'

External Monitoring

Prometheus (Optional)

Use Node.js prom-client for metrics (not currently implemented):

# Future feature
monitoring:
  prometheus:
    enabled: true
    port: 9090

Uptime Monitoring

Use external services:

  • UptimeRobot
  • Pingdom
  • Better Uptime

Monitor:

  • Gateway HTTP health endpoint
  • WebSocket connection
  • Response time

Backup & Recovery

What to Backup

  1. Configuration: /etc/flynn/config.yaml
  2. Sessions: SQLite database at ~/.local/share/flynn/sessions.db
  3. Memory Files: ~/.local/share/flynn/memory/
  4. Vectors: SQLite database at ~/.local/share/flynn/vectors.db
  5. Pairing Codes: SQLite table within sessions.db

Backup Script

Create /usr/local/bin/flynn-backup.sh:

#!/bin/bash
set -e

BACKUP_DIR="/var/backups/flynn"
DATA_DIR="/var/lib/flynn"
CONFIG_DIR="/etc/flynn"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Stop Flynn
sudo systemctl stop flynn

# Create backup
tar -czf "$BACKUP_FILE" \
  "$CONFIG_DIR/config.yaml" \
  "$DATA_DIR/sessions.db" \
  "$DATA_DIR/vectors.db" \
  "$DATA_DIR/memory/"

# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete

# Restart Flynn
sudo systemctl start flynn

echo "Backup created: $BACKUP_FILE"

Make executable:

sudo chmod +x /usr/local/bin/flynn-backup.sh

Cron Job

Add to root crontab:

sudo crontab -e
# Daily backup at 2 AM
0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1

Restore

# Stop Flynn
sudo systemctl stop flynn

# Extract backup
sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /

# Start Flynn
sudo systemctl start flynn

Database Maintenance

Run SQLite vacuum periodically:

sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
sqlite3 /var/lib/flynn/vectors.db "VACUUM;"

Add to crontab (monthly):

0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1

Performance Tuning

Node.js Tuning

Set Node.js options for production:

# In systemd service
Environment="NODE_OPTIONS=--max-old-space-size=2048"

# Or via environment variable
export NODE_OPTIONS="--max-old-space-size=2048"

Context Management

Optimize compaction settings:

agents:
  default:
    compaction:
      thresholdPct: 75  # Trigger earlier
      keepTurns: 6      # Keep more context
      summaryMaxTokens: 2048  # Better summaries

SQLite Performance

Enable WAL mode:

sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;"  # 64MB

Model Routing

Configure tiers for optimal cost/latency:

models:
  router:
    tiers:
      fast: 'anthropic:claude-haiku-4-20250514'      # Quick tasks
      default: 'anthropic:claude-sonnet-4-20250514'  # General use
      complex: 'anthropic:claude-opus-4-20250514'     # Complex reasoning
      local: 'ollama:llama3'                          # Fallback

Caching (Future)

Consider adding caching for:

  • Repeated tool calls
  • Memory search results
  • Model responses for common queries

Scaling Considerations

Single-Operator Scope

Flynn is designed for a single operator with multiple concurrent users. Limitations:

  • Max Concurrent Sessions: ~100 (depends on model rate limits)
  • Throughput: ~10-20 messages/second (varies by model)
  • Memory Usage: 2-4GB for moderate usage

When to Scale Up

Consider scaling if:

  • Consistent CPU usage > 80%
  • Memory usage > 4GB
  • Frequent rate limiting from model providers
  • Slow response times > 30 seconds

Scaling Strategies

  1. Horizontal Scaling: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)

  2. Vertical Scaling: Increase server resources (CPU, memory)

  3. Multi-Instance Architecture (future):

    • Shared session storage (PostgreSQL/Redis)
    • Message queue for request distribution
    • Session affinity for stateful connections

Cost Optimization

  • Use local models for non-critical tasks
  • Cache embeddings
  • Optimize compaction to reduce token usage
  • Use efficient models for delegated tasks

Troubleshooting Production Issues

Service Won't Start

# Check status
sudo systemctl status flynn

# View logs
sudo journalctl -u flynn -n 50 --no-pager

# Validate config
flynn doctor --config /etc/flynn/config.yaml

High Memory Usage

# Check memory
free -h

# Check process memory
ps aux | grep flynn

# Restart service
sudo systemctl restart flynn

Gateway Connection Issues

# Check if port is listening
sudo ss -tlnp | grep 18800

# Check firewall
sudo ufw status

# Test connectivity
curl http://localhost:18800/health

Slow Response Times

# Check CPU usage
top

# Check model provider status
# Verify API keys are valid
# Check network latency

# Enable debug logging
DEBUG='*' sudo systemctl restart flynn

For additional help, see: