will/flynn

Fork 0

Files

T

William Valentin 2177413833 feat(deploy): add Nix flake + NixOS module

2026-02-15 18:26:10 -08:00

18 KiB

Raw Blame History

Production Deployment Guide

This guide covers deploying Flynn in a production environment.

Prerequisites
Docker Deployment
Nix Deployment
Systemd Service
Security
Configuration
Monitoring
Backup & Recovery
Performance Tuning
Scaling Considerations

Prerequisites

System Requirements

OS: Linux (Ubuntu 22.04+ recommended) or macOS
Node.js: >= 22.0.0
Memory: Minimum 2GB, 4GB+ recommended
Disk: 10GB+ for sessions, memory, and vectors
Docker: Required for sandbox features (optional)

Network Requirements

Public IP or VPN (Tailscale recommended) for remote access
Open ports: 18800 (gateway), optional 443 (Tailscale Serve)
Outbound HTTPS access for model providers and web tools

External Services (Optional)

Model Providers: Anthropic, OpenAI, GitHub Models, etc. (API keys required)
Email: SMTP server for email notifications
Object Storage: MinIO or S3 for backups (optional)

Docker Deployment

Quick Start

Using the provided docker-compose.yml:

# Clone repository
git clone <repo-url>
cd flynn

# Create config
cp config/default.yaml config/production.yaml
# Edit config/production.yaml with your settings

# Start services
docker compose up -d

# View logs
docker compose logs -f

Dockerfile

Use the repo Dockerfile: Dockerfile.

Notes:

Multi-stage build (builder + runtime).
Uses corepack + pnpm with pnpm-lock.yaml for reproducible installs.
Exposes port 18800 and runs dist/cli/index.js start.

Docker Compose Configuration

Use the repo compose file: docker-compose.yml.

The important parts to customize:

Mount your config: ./config/production.yaml:/config/config.yaml:ro
Set provider keys (ANTHROPIC_API_KEY, etc.)
Optionally set gateway token auth (FLYNN_SERVER_TOKEN)

Environment Variables

# Node environment
export NODE_ENV=production

# Config path
export FLYNN_CONFIG=/path/to/config.yaml

# Data directory (default: ~/.local/share/flynn)
export FLYNN_DATA_DIR=/var/lib/flynn

# Optional: Override model provider credentials
export ANTHROPIC_API_KEY=sk-...
export OPENAI_API_KEY=sk-...

Nix Deployment

If you use Nix, this repo ships a flake (package + dev shell + optional NixOS module). See docs/deployment/NIX.md.

Systemd Service

Service File

Create /etc/systemd/system/flynn.service:

[Unit]
Description=Flynn AI Assistant Daemon
After=network.target
Wants=network-online.target

[Service]
Type=simple
User=flynn
Group=flynn
WorkingDirectory=/opt/flynn
Environment="NODE_ENV=production"
Environment="FLYNN_CONFIG=/etc/flynn/config.yaml"
Environment="FLYNN_DATA_DIR=/var/lib/flynn"
ExecStart=/usr/local/bin/node /opt/flynn/dist/cli/index.js start
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
SyslogIdentifier=flynn

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/flynn /var/log/flynn /var/run

# Resource limits
MemoryLimit=2G
MemorySwap=0
CPUQuota=200%

[Install]
WantedBy=multi-user.target

Create Flynn User

# Create user and group
sudo useradd --system --home /var/lib/flynn --shell /usr/sbin/nologin flynn
sudo groupadd flynn

# Create directories
sudo mkdir -p /opt/flynn /etc/flynn /var/lib/flynn /var/log/flynn
sudo chown -R flynn:flynn /opt/flynn /var/lib/flynn /var/log/flynn

# Copy binaries and config
sudo cp -r dist/* /opt/flynn/
sudo cp config/production.yaml /etc/flynn/config.yaml
sudo chown -R root:root /opt/flynn /etc/flynn
sudo chmod 644 /etc/flynn/config.yaml

Enable and Start Service

# Reload systemd
sudo systemctl daemon-reload

# Enable service (start on boot)
sudo systemctl enable flynn

# Start service
sudo systemctl start flynn

# Check status
sudo systemctl status flynn

# View logs
sudo journalctl -u flynn -f

# Restart service
sudo systemctl restart flynn

Service Management

# Stop service
sudo systemctl stop flynn

# Reload config (requires restart)
sudo systemctl restart flynn

# Check if running
sudo systemctl is-active flynn

# View recent logs
sudo journalctl -u flynn -n 100 --no-pager

Security

Secrets Management

Never commit secrets to version control. Use one of these approaches:

Environment Variables

# config/production.yaml
models:
  default:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: '${ANTHROPIC_API_KEY}'

Set in /etc/flynn/.env or systemd service file:

Environment="ANTHROPIC_API_KEY=sk-..."

HashiCorp Vault (Advanced)

Use a secrets manager and inject at runtime:

vault kv get -field=api_key secret/anthropic > /tmp/anthropic_key.txt
export ANTHROPIC_API_KEY=$(cat /tmp/anthropic_key.txt)
rm /tmp/anthropic_key.txt

Authentication

Gateway Auth

# config/production.yaml
server:
  token: 'your-random-token-here'  # Generate with: openssl rand -hex 32
  tailscale_identity: true
  auth_http: true
  lock: false

Generate a secure token:

openssl rand -hex 32

Safe Defaults (Recommended)

These defaults align with docs/security/SAFE_PERSONAL_AGENT.md:

pairing:
  enabled: true

tools:
  profile: messaging

sandbox:
  enabled: true

Channel Whitelists

Restrict who can interact with Flynn:

channels:
  telegram:
    allowedChatIds: ['123456789']  # Your Telegram chat ID
  discord:
    allowedGuildIds: ['987654321098765432']
    allowedChannelIds: ['123456789012345678']
  slack:
    allowedChannelIds: ['C12345678']
    signingSecret: '${SLACK_SIGNING_SECRET}'

Network Security

Firewall

# Ubuntu/Debian (ufw)
sudo ufw allow 22/tcp    # SSH
sudo ufw allow 18800/tcp  # Flynn gateway
sudo ufw enable

# CentOS/RHEL (firewalld)
sudo firewall-cmd --permanent --add-port=18800/tcp
sudo firewall-cmd --reload

Reverse Proxy (Nginx)

Place Flynn behind Nginx for TLS:

server {
    listen 443 ssl http2;
    server_name flynn.example.com;

    ssl_certificate /etc/letsencrypt/live/flynn.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/flynn.example.com/privkey.pem;

    # WebSocket upgrade
    location / {
        proxy_pass http://localhost:18800;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
    }

    # Health check endpoint (no auth required)
    location /health {
        proxy_pass http://localhost:18800/health;
        access_log off;
    }
}

Obtain TLS certificate with Let's Encrypt:

sudo certbot --nginx -d flynn.example.com

File Permissions

# Data directory
sudo chmod 750 /var/lib/flynn
sudo chown flynn:flynn /var/lib/flynn

# Config file
sudo chmod 640 /etc/flynn/config.yaml
sudo chown root:flynn /etc/flynn/config.yaml

# Logs
sudo chmod 750 /var/log/flynn
sudo chown flynn:flynn /var/log/flynn

Sandbox Security

Docker sandbox adds isolation but requires careful configuration:

# config/production.yaml
sandbox:
  enabled: true
  image: 'node:22-alpine'
  dockerSocket: '/var/run/docker.sock'
  resourceLimits:
    memory: '512m'
    cpus: '0.5'
    timeoutSec: 60
  networkMode: 'none'  # No network access

Ensure Docker is secured:

# Run Docker as Flynn user
sudo usermod -aG docker flynn

# Configure Docker daemon security
sudo vim /etc/docker/daemon.json

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "live-restore": true,
  "userland-proxy": false
}

Configuration

Production Config Template

# config/production.yaml
# Base config for production deployment

# ── Gateway ───────────────────────────────────────────────────────────────
gateway:
  enabled: true
  port: 18800
  auth:
    token: '${GATEWAY_TOKEN}'
    trustTailscaleIdentity: true
    applyToHttp: true
  lock:
    enabled: true
  tailscaleServe:
    enabled: false  # Set to true to expose via Tailscale
    hostname: 'flynn'
    port: 443

# ── Models ─────────────────────────────────────────────────────────────────
models:
  default:
    anthropic:
      apiKey: '${ANTHROPIC_API_KEY}'
      model: 'claude-sonnet-4-20250514'
      maxTokens: 4096

  router:
    tiers:
      default: 'anthropic:claude-sonnet-4-20250514'
      fast: 'anthropic:claude-haiku-4-20250514'
      complex: 'anthropic:claude-opus-4-20250514'
      local: 'ollama:llama3'

    fallbackChain:
      - 'github:claude-sonnet-4-5'
      - 'local:ollama:llama3'

    retry:
      maxAttempts: 3
      initialDelayMs: 1000
      multiplier: 2
      maxDelayMs: 30000

# ── Channels ───────────────────────────────────────────────────────────────
channels:
  telegram:
    enabled: true
    token: '${TELEGRAM_BOT_TOKEN}'
    allowedChatIds: ['123456789']

  discord:
    enabled: false

  slack:
    enabled: false

  whatsapp:
    enabled: false

# ── Sessions ───────────────────────────────────────────────────────────────
sessions:
  ttl: '7d'
  maxSessions: 100

# ── Memory ────────────────────────────────────────────────────────────────
memory:
  enabled: true
  embeddings:
    provider: 'openai'
    openai:
      apiKey: '${OPENAI_API_KEY}'
      model: 'text-embedding-3-small'

# ── Tools ─────────────────────────────────────────────────────────────────
tools:
  policy: 'coding'  # Restrict tool access

  executor:
    defaultTimeoutMs: 30000
    maxOutputBytes: 51200

  sandbox:
    enabled: false  # Enable if using Docker

# ── Agents ────────────────────────────────────────────────────────────────
agents:
  default:
    modelTier: 'default'
    toolPolicy: 'coding'
    compaction:
      thresholdPct: 80
      keepTurns: 4
      summaryMaxTokens: 1024

# ── Automation ────────────────────────────────────────────────────────────
automation:
  cron:
    enabled: false

  webhooks:
    enabled: false

  heartbeat:
    enabled: true
    interval: '5m'
    checks:
      - 'gateway'
      - 'model'
      - 'channels'
      - 'memory'
      - 'disk'
    notifications:
      - type: 'telegram'
        chatId: '123456789'

# ── Logging ───────────────────────────────────────────────────────────────
logging:
  level: 'info'  # debug, info, warn, error

Config Validation

Validate config before starting:

flynn doctor --config /etc/flynn/config.yaml

Monitoring

Health Checks

Flynn provides a health check endpoint:

# HTTP health check
curl http://localhost:18800/health

# Response
{
  "status": "ok",
  "version": "0.1.0",
  "uptime": 12345
}

Logs

Journalctl (systemd)

# Follow logs
sudo journalctl -u flynn -f

# View last 100 lines
sudo journalctl -u flynn -n 100 --no-pager

# View logs since yesterday
sudo journalctl -u flynn --since yesterday

# Search for errors
sudo journalctl -u flynn | grep -i error

Log Rotation

Configure logrotate for systemd journal:

sudo vim /etc/systemd/journald.conf

[Journal]
SystemMaxUse=100M
MaxRetentionSec=7day

Restart systemd:

sudo systemctl restart systemd-journald

Heartbeat Monitor

Enable built-in heartbeat monitoring:

automation:
  heartbeat:
    enabled: true
    interval: '5m'
    checks:
      - 'gateway'
      - 'model'
      - 'channels'
      - 'memory'
      - 'disk'
    notifications:
      - type: 'telegram'
        chatId: '123456789'
      - type: 'webhook'
        url: 'https://hooks.slack.com/services/...'

External Monitoring

Prometheus (Optional)

Use Node.js prom-client for metrics (not currently implemented):

# Future feature
monitoring:
  prometheus:
    enabled: true
    port: 9090

Uptime Monitoring

Use external services:

UptimeRobot
Pingdom
Better Uptime

Monitor:

Gateway HTTP health endpoint
WebSocket connection
Response time

Backup & Recovery

What to Backup

Configuration: /etc/flynn/config.yaml
Sessions: SQLite database at ~/.local/share/flynn/sessions.db
Memory Files: ~/.local/share/flynn/memory/
Vectors: SQLite database at ~/.local/share/flynn/vectors.db
Pairing Codes: SQLite table within sessions.db

Backup Script

Create /usr/local/bin/flynn-backup.sh:

#!/bin/bash
set -e

BACKUP_DIR="/var/backups/flynn"
DATA_DIR="/var/lib/flynn"
CONFIG_DIR="/etc/flynn"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/flynn_$DATE.tar.gz"

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Stop Flynn
sudo systemctl stop flynn

# Create backup
tar -czf "$BACKUP_FILE" \
  "$CONFIG_DIR/config.yaml" \
  "$DATA_DIR/sessions.db" \
  "$DATA_DIR/vectors.db" \
  "$DATA_DIR/memory/"

# Compress old backups (keep last 7 daily, 4 weekly, 12 monthly)
find "$BACKUP_DIR" -name "flynn_*.tar.gz" -mtime +90 -delete

# Restart Flynn
sudo systemctl start flynn

echo "Backup created: $BACKUP_FILE"

Make executable:

sudo chmod +x /usr/local/bin/flynn-backup.sh

Cron Job

Add to root crontab:

sudo crontab -e

# Daily backup at 2 AM
0 2 * * * /usr/local/bin/flynn-backup.sh >> /var/log/flynn-backup.log 2>&1

Restore

# Stop Flynn
sudo systemctl stop flynn

# Extract backup
sudo tar -xzf /var/backups/flynn/flynn_20250213_020000.tar.gz -C /

# Start Flynn
sudo systemctl start flynn

Database Maintenance

Run SQLite vacuum periodically:

sqlite3 /var/lib/flynn/sessions.db "VACUUM;"
sqlite3 /var/lib/flynn/vectors.db "VACUUM;"

Add to crontab (monthly):

0 0 1 * * sqlite3 /var/lib/flynn/sessions.db "VACUUM;" >> /var/log/flynn-maintenance.log 2>&1

Performance Tuning

Node.js Tuning

Set Node.js options for production:

# In systemd service
Environment="NODE_OPTIONS=--max-old-space-size=2048"

# Or via environment variable
export NODE_OPTIONS="--max-old-space-size=2048"

Context Management

Optimize compaction settings:

agents:
  default:
    compaction:
      thresholdPct: 75  # Trigger earlier
      keepTurns: 6      # Keep more context
      summaryMaxTokens: 2048  # Better summaries

SQLite Performance

Enable WAL mode:

sqlite3 /var/lib/flynn/sessions.db "PRAGMA journal_mode=WAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA synchronous=NORMAL;"
sqlite3 /var/lib/flynn/sessions.db "PRAGMA cache_size=-64000;"  # 64MB

Model Routing

Configure tiers for optimal cost/latency:

models:
  router:
    tiers:
      fast: 'anthropic:claude-haiku-4-20250514'      # Quick tasks
      default: 'anthropic:claude-sonnet-4-20250514'  # General use
      complex: 'anthropic:claude-opus-4-20250514'     # Complex reasoning
      local: 'ollama:llama3'                          # Fallback

Caching (Future)

Consider adding caching for:

Repeated tool calls
Memory search results
Model responses for common queries

Scaling Considerations

Single-Operator Scope

Flynn is designed for a single operator with multiple concurrent users. Limitations:

Max Concurrent Sessions: ~100 (depends on model rate limits)
Throughput: ~10-20 messages/second (varies by model)
Memory Usage: 2-4GB for moderate usage

When to Scale Up

Consider scaling if:

Consistent CPU usage > 80%
Memory usage > 4GB
Frequent rate limiting from model providers
Slow response times > 30 seconds

Scaling Strategies

Horizontal Scaling: Deploy multiple Flynn instances behind a load balancer (not currently supported - sessions are stateful)
Vertical Scaling: Increase server resources (CPU, memory)
Multi-Instance Architecture (future):
- Shared session storage (PostgreSQL/Redis)
- Message queue for request distribution
- Session affinity for stateful connections

Cost Optimization

Use local models for non-critical tasks
Cache embeddings
Optimize compaction to reduce token usage
Use efficient models for delegated tasks

Troubleshooting Production Issues

Service Won't Start

# Check status
sudo systemctl status flynn

# View logs
sudo journalctl -u flynn -n 50 --no-pager

# Validate config
flynn doctor --config /etc/flynn/config.yaml

High Memory Usage

# Check memory
free -h

# Check process memory
ps aux | grep flynn

# Restart service
sudo systemctl restart flynn

Gateway Connection Issues

# Check if port is listening
sudo ss -tlnp | grep 18800

# Check firewall
sudo ufw status

# Test connectivity
curl http://localhost:18800/health

Slow Response Times

# Check CPU usage
top

# Check model provider status
# Verify API keys are valid
# Check network latency

# Enable debug logging
DEBUG='*' sudo systemctl restart flynn

For additional help, see:

18 KiB Raw Blame History

Production Deployment Guide

Table of Contents

Prerequisites

System Requirements

Network Requirements

External Services (Optional)

Docker Deployment

Quick Start

Dockerfile

Docker Compose Configuration

Environment Variables

Nix Deployment

Systemd Service

Service File

Create Flynn User

Enable and Start Service

Service Management

Security

Secrets Management

Environment Variables

HashiCorp Vault (Advanced)

Authentication

Gateway Auth

Safe Defaults (Recommended)

Channel Whitelists

Network Security

Firewall

Reverse Proxy (Nginx)

File Permissions

Sandbox Security

Configuration

Production Config Template

Config Validation

Monitoring

Health Checks

Logs

Journalctl (systemd)

Log Rotation

Heartbeat Monitor

External Monitoring

Prometheus (Optional)

Uptime Monitoring

Backup & Recovery

What to Backup

Backup Script

Cron Job

Restore

Database Maintenance

Performance Tuning

Node.js Tuning

Context Management

SQLite Performance

Model Routing

Caching (Future)

Scaling Considerations

Single-Operator Scope

When to Scale Up

Scaling Strategies

Cost Optimization

Troubleshooting Production Issues

Service Won't Start

High Memory Usage

Gateway Connection Issues

Slow Response Times

18 KiB

Raw Blame History