Files
flynn/.planning/codebase/INTEGRATIONS.md
T
William Valentin 5c531a760d docs: document native audio support across README, CHANGELOG, config, and planning docs
- README: add audio.transcribe to tool list, update media pipeline description,
  add Native Audio Support and Audio Transcription config sections, add
  supports_audio per-tier override example
- SOUL.md: add audio.transcribe to available tools list
- CHANGELOG: add native audio support and audio.transcribe tool entries
- config/default.yaml: add commented audio config section, supports_audio hint
- INTEGRATIONS.md: expand audio section with native passthrough, capabilities,
  smart routing, AudioSource type, token estimation, audio.transcribe tool
- STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings
- ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
2026-02-11 18:41:53 -08:00

14 KiB

External Integrations

Analysis Date: 2026-02-09

AI Model Providers

Flynn supports 10 model providers via a unified ModelClient interface (src/models/types.ts). Each provider implements chat() and optionally chatStream(). The ModelRouter (src/models/router.ts) manages tier-based routing (fast/default/complex/local) with fallback chains.

Anthropic:

  • SDK: @anthropic-ai/sdk (src/models/anthropic.ts)
  • Auth: ANTHROPIC_API_KEY env var or api_key in config
  • Features: Streaming, tool use, extended thinking mode, multimodal (images)
  • Extended thinking: { type: 'enabled', budget_tokens: 4096 } on request

OpenAI:

  • SDK: openai (src/models/openai.ts)
  • Auth: OPENAI_API_KEY env var or api_key in config
  • Features: Tool use, multimodal (images via data URIs or URLs)
  • Also powers: OpenRouter, ZhipuAI, xAI via baseURL override

Google Gemini:

  • SDK: @google/generative-ai (src/models/gemini.ts)
  • Auth: GOOGLE_API_KEY env var or api_key in config
  • Features: Streaming, tool use, extended thinking, multimodal

AWS Bedrock:

  • SDK: @aws-sdk/client-bedrock-runtime (src/models/bedrock.ts)
  • Auth: AWS_REGION env var + IAM credentials or explicit accessKeyId/secretAccessKey in config
  • Features: Streaming (ConverseStream), tool use, multimodal
  • Models: Meta Llama, Amazon Titan (cost-tracked in src/models/costs.ts)

GitHub Models (Copilot):

  • SDK: openai (OpenAI-compatible API) (src/models/github.ts)
  • Auth: GITHUB_TOKEN env var or OAuth device flow (src/auth/github.ts)
  • Endpoint: https://api.githubcopilot.com
  • Auto-fallback: When an Anthropic tier fails, Flynn automatically tries the same model via GitHub Models before the global fallback chain (src/daemon/index.ts createAutoFallbackClient())
  • OAuth device flow: Uses client ID Ov23li8tweQw6odWQebz, stores token at ~/.config/flynn/auth.json

OpenRouter:

  • SDK: openai with baseURL: https://openrouter.ai/api/v1 (src/daemon/index.ts)
  • Auth: OPENROUTER_API_KEY env var or api_key in config

ZhipuAI:

  • SDK: openai with baseURL: https://api.z.ai/api/paas/v4 (src/daemon/index.ts)
  • Auth: ZHIPUAI_API_KEY env var or api_key in config

xAI (Grok):

  • SDK: openai with baseURL: https://api.x.ai/v1 (src/daemon/index.ts)
  • Auth: XAI_API_KEY env var or api_key in config

Ollama (Local):

  • SDK: ollama (src/models/local/ollama.ts)
  • Auth: None (local server)
  • Endpoint: Configurable host (default: http://localhost:11434)
  • Config: num_gpu option for GPU layer control

llama.cpp (Local):

  • SDK: Raw fetch HTTP calls (src/models/local/llamacpp.ts)
  • Auth: Optional auth_token header
  • Endpoint: Configurable (default: http://localhost:8080)

Embedding Providers

Embedding providers (src/memory/embeddings.ts) power the hybrid vector + keyword search system. Factory function: createEmbeddingProvider().

OpenAI Embeddings:

  • SDK: openai (lazy import)
  • Auth: OPENAI_API_KEY or config api_key
  • Default model: text-embedding-3-small, default dims: 1536

Gemini Embeddings:

  • SDK: @google/generative-ai (lazy import)
  • Auth: GOOGLE_API_KEY or config api_key
  • Uses batchEmbedContents for efficiency, default dims: 768

Ollama Embeddings:

  • SDK: ollama (lazy import)
  • Auth: None (local)
  • Configurable host endpoint, default dims: 768

LlamaCpp Embeddings:

  • SDK: Raw fetch to /embedding endpoint
  • Auth: None
  • Default endpoint: http://localhost:8080, default dims: 768

Voyage AI Embeddings:

  • SDK: openai (OpenAI-compatible API, lazy import)
  • Auth: VOYAGE_API_KEY env var or config api_key
  • Endpoint: https://api.voyageai.com/v1, default dims: 1024

Data Storage

Session Database (SQLite):

  • Library: better-sqlite3 (src/session/store.ts)
  • Location: {dataDir}/sessions.db
  • Schema: messages table with id, session_id, role, content, created_at
  • TTL-based pruning: Configurable via sessions.ttl (default: 30 days), hourly cleanup

Vector Database (SQLite):

  • Library: better-sqlite3 (src/memory/vector-store.ts)
  • Location: {dataDir}/vectors.db
  • Stores embedding chunks as Float32Array BLOBs
  • Content hashing for deduplication
  • Background indexer runs every 30 seconds

Memory Store (Filesystem):

  • Location: {dataDir}/memory/ (src/memory/store.ts)
  • Format: Markdown files organized by namespace
  • Layout: global.md, user.md, sessions/{id}.md
  • Hybrid search: Keyword + vector (configurable weight via hybrid_weight, default 0.7)

File Storage:

  • Local filesystem only — no cloud object storage

Caching:

  • In-memory response cache for web fetch tool (5-minute TTL) (src/tools/builtin/web-fetch.ts)
  • No external cache service (Redis, etc.)

Channel Adapters (Messaging Platforms)

All adapters implement ChannelAdapter interface (src/channels/types.ts): connect(), disconnect(), send(), onMessage().

Telegram:

  • SDK: grammy (src/channels/telegram/)
  • Auth: Bot token via telegram.bot_token config
  • Features: Long polling, chat ID allowlist, mention requirement, pairing codes, image/audio attachments

Discord:

  • SDK: discord.js (src/channels/discord/)
  • Auth: Bot token via discord.bot_token config
  • Features: Guild/channel allowlists, mention requirement, pairing codes

Slack:

  • SDK: @slack/bolt (src/channels/slack/)
  • Auth: bot_token, app_token, signing_secret in config
  • Features: Socket mode, channel allowlists, mention requirement, pairing codes

WhatsApp:

  • SDK: whatsapp-web.js (src/channels/whatsapp/)
  • Auth: QR code scanning (web client emulation)
  • Features: Number/group allowlists, mention requirement, custom data directory, pairing codes

WebChat:

  • Implementation: Gateway WebSocket bridge (src/channels/webchat/)
  • Auth: Gateway token or Tailscale identity
  • UI: Vanilla JS dashboard at src/gateway/ui/ (HTML + CSS + JS, no framework)

Authentication & Identity

GitHub OAuth (Device Flow):

  • Implementation: src/auth/github.ts
  • Client ID: Ov23li8tweQw6odWQebz (GitHub Copilot)
  • Flow: Device code → User authorization → Token polling
  • Storage: ~/.config/flynn/auth.json (600 permissions)
  • Priority: GITHUB_TOKEN env → stored OAuth token → null

Gateway Auth:

  • Static bearer token (server.token in config)
  • Tailscale identity header trust (server.tailscale_identity)
  • HTTP auth optional (server.auth_http)
  • Gateway lock: Single-client WebSocket mode (server.lock)

DM Pairing Codes:

  • Implementation: src/channels/pairing.ts, src/session/store.ts (SQLite persistence)
  • Purpose: Authenticate unknown senders via one-time codes
  • Config: pairing.enabled, pairing.code_ttl (default 5m), pairing.code_length (default 6)
  • Gateway handlers for code generation/verification
  • TUI /pair command execution (generate/list/revoke) in src/frontends/tui/minimal.ts
  • Persistence: PairingStore interface with SQLite pairing_approved table -- approved senders survive daemon restarts

Gmail OAuth2:

  • SDK: googleapis (src/automation/gmail.ts)
  • Credentials: ~/.config/flynn/gmail-credentials.json
  • Token: ~/.config/flynn/gmail-token.json
  • Setup: flynn gmail-auth CLI command

Automation

Cron Scheduler:

  • Library: croner (src/automation/cron.ts)
  • Config: automation.cron[] — each job has name, schedule, message, output.channel, output.peer
  • Implements ChannelAdapter to inject cron-triggered messages into the channel registry
  • Features: Enable/disable per job, timezone support, runtime management tools

Webhooks:

  • Implementation: src/automation/webhooks.ts
  • Auth: HMAC-SHA256 signature verification (X-Webhook-Signature header)
  • Templates: {{body}} and {{json.field}} placeholders
  • Route: POST /webhooks/{name} on the gateway HTTP server
  • Config: automation.webhooks[] with name, secret, message, output

Gmail Watcher:

  • SDK: googleapis (src/automation/gmail.ts)
  • Modes: Pub/Sub push notifications or polling fallback
  • Pub/Sub topic: projects/flynn-agent/topics/gmail-push
  • Watch renewal: Every 6 days (Google watch expires at ~7 days)
  • Config: automation.gmail with watch_labels, poll_interval, history_start
  • Route: POST /gmail/push on gateway for Pub/Sub push

Heartbeat Monitor:

  • Implementation: src/automation/heartbeat.ts
  • Checks: gateway, model, channels, memory, disk
  • Config: automation.heartbeat with interval, checks, failure_threshold, disk_threshold_mb
  • Notification: Sends to configured channel/peer on failures

Web & Content Tools

Web Search (Brave / SearXNG):

  • Implementation: src/tools/builtin/web-search.ts
  • Brave Search API: https://api.search.brave.com/res/v1/web/search
    • Auth: X-Subscription-Token header via web_search.api_key
  • SearXNG: Self-hosted instance via web_search.endpoint
    • Auth: None (private instance)
  • Config: web_search.provider (brave or searxng), web_search.max_results

Web Fetch (Readability):

  • Libraries: linkedom, @mozilla/readability, turndown (src/tools/builtin/web-fetch.ts)
  • Features: HTML → Markdown conversion, article extraction, response caching (5min TTL)
  • Truncation: 50,000 character max

Browser Automation:

  • Library: puppeteer-core (src/tools/builtin/browser/)
  • Config: browser.executable_path or browser.ws_endpoint
  • Features: Headless browsing, page management, screenshots
  • Limits: browser.max_pages (default 5), browser.default_timeout (default 30s)

Audio Transcription

Whisper-Compatible API:

  • Implementation: src/models/media.ts
  • Endpoint: Configurable via audio.transcription_endpoint
  • Auth: audio.transcription_api_key (Bearer token)
  • Model: audio.transcription_model (default: whisper-1)
  • Supported formats: OGG, MP3, WAV, WebM, MP4, M4A
  • Integration: Auto-transcribes audio attachments from channels before model processing

Native Audio Passthrough:

  • Implementation: src/models/capabilities.ts, src/daemon/routing.ts
  • Capability check: supportsAudioInput(provider, model, override?) determines if a model can process raw audio
  • Audio-capable providers: Gemini (inlineData), OpenAI (input_audio), GitHub (input_audio)
  • Non-audio providers: Anthropic, Bedrock, Ollama, llama.cpp (fall back to Whisper transcription)
  • Config override: supports_audio: true/false per model tier overrides auto-detection
  • Smart routing: createMessageRouter() checks capability, passes raw AudioSource for capable models or transcribes via Whisper for others
  • Audio content types: AudioSource ({ type: 'audio', data: string, mimeType: string }) in src/models/types.ts
  • Token estimation: estimateAudioTokens() in src/context/tokens.ts (base64 length -> bytes -> duration at 16kbps -> tokens at 32/sec)

Agent Tool: audio.transcribe:

  • Implementation: src/tools/builtin/audio-transcribe.ts
  • Transcribes audio files on-demand via the configured Whisper-compatible endpoint
  • Input: file path or base64 data with MIME type
  • Output: transcribed text

MCP (Model Context Protocol)

MCP Client:

  • SDK: @modelcontextprotocol/sdk (src/mcp/client.ts)
  • Transport: stdio (spawns external processes)
  • Config: mcp.servers[] with name, command, args, env, cwd
  • Bridge: MCP tools auto-registered in Flynn's tool registry (src/mcp/bridge.ts)
  • Management: McpManager starts/stops all configured servers (src/mcp/manager.ts)

Docker Sandbox

Per-Session Containers:

  • Implementation: src/sandbox/manager.ts, src/sandbox/docker.ts
  • Config: sandbox.image (default: node:22-slim), sandbox.network (default: none), sandbox.memory_limit, sandbox.cpu_limit
  • Features: Lazily created per session, replaces shell.exec and process.start tools with sandboxed versions
  • Prerequisite: Docker daemon available

Networking & Exposure

Gateway Server:

  • Protocol: WebSocket (JSON-RPC) + HTTP (src/gateway/server.ts)
  • Default port: 18800
  • Binding: 127.0.0.1 (localhost only) or 0.0.0.0
  • Features: LaneQueue for request ordering, session bridge, static file serving for dashboard

Tailscale Serve:

  • Implementation: src/gateway/tailscale.ts
  • Purpose: Expose gateway HTTPS endpoint on tailnet
  • Config: server.tailscale.serve, server.tailscale.hostname, server.tailscale.port
  • Prerequisite: Tailscale CLI installed and daemon running

Monitoring & Observability

Error Tracking:

  • None (console.error only)

Logging:

  • console.log / console.error / console.debug throughout
  • No structured logging framework

Cost Tracking:

  • Built-in: src/models/costs.ts with per-million-token pricing for known models
  • Tracks: Anthropic, OpenAI, Gemini, xAI, Bedrock models
  • GitHub Copilot models tracked at $0 (subscription-included)
  • Usage exposed via /usage command and gateway system.usage RPC

CI/CD & Deployment

Hosting:

  • Self-hosted (designed for personal deployment)
  • Process supervisor expected for restarts (exit code 75 = restart signal)

CI Pipeline:

  • Not detected in repository

Environment Configuration

Required env vars (minimum viable):

  • ANTHROPIC_API_KEY (or other model provider key)
  • FLYNN_TELEGRAM_TOKEN (if using default Telegram channel)

Optional env vars (by feature):

  • OPENAI_API_KEY - OpenAI models and embeddings
  • GOOGLE_API_KEY - Gemini models and embeddings
  • GITHUB_TOKEN - GitHub Models / Copilot access
  • AWS_REGION - Bedrock region
  • OPENROUTER_API_KEY - OpenRouter access
  • ZHIPUAI_API_KEY - ZhipuAI access
  • XAI_API_KEY - xAI (Grok) access
  • VOYAGE_API_KEY - Voyage AI embeddings
  • FLYNN_DATA_DIR - Custom data directory

Secrets location:

  • API keys: YAML config (with ${ENV_VAR} expansion) or environment variables
  • OAuth tokens: ~/.config/flynn/auth.json (GitHub), ~/.config/flynn/gmail-token.json (Gmail)
  • .env.example present at project root

Webhooks & Callbacks

Incoming:

  • POST /webhooks/{name} - Named webhooks with HMAC-SHA256 verification (src/automation/webhooks.ts)
  • POST /gmail/push - Google Pub/Sub push notifications for Gmail (src/automation/gmail.ts)

Outgoing:

  • None (no outbound webhooks — all communication goes through channel adapters)

Integration audit: 2026-02-09