will/flynn

Files

T

William Valentin 5c531a760d docs: document native audio support across README, CHANGELOG, config, and planning docs

- README: add audio.transcribe to tool list, update media pipeline description,
  add Native Audio Support and Audio Transcription config sections, add
  supports_audio per-tier override example
- SOUL.md: add audio.transcribe to available tools list
- CHANGELOG: add native audio support and audio.transcribe tool entries
- config/default.yaml: add commented audio config section, supports_audio hint
- INTEGRATIONS.md: expand audio section with native passthrough, capabilities,
  smart routing, AudioSource type, token estimation, audio.transcribe tool
- STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings
- ARCHITECTURE.md: update data flow step 5 to describe smart audio routing

2026-02-11 18:41:53 -08:00

14 KiB

Raw Blame History

External Integrations

Analysis Date: 2026-02-09

AI Model Providers

Flynn supports 10 model providers via a unified ModelClient interface (src/models/types.ts). Each provider implements chat() and optionally chatStream(). The ModelRouter (src/models/router.ts) manages tier-based routing (fast/default/complex/local) with fallback chains.

Anthropic:

SDK: @anthropic-ai/sdk (src/models/anthropic.ts)
Auth: ANTHROPIC_API_KEY env var or api_key in config
Features: Streaming, tool use, extended thinking mode, multimodal (images)
Extended thinking: { type: 'enabled', budget_tokens: 4096 } on request

OpenAI:

SDK: openai (src/models/openai.ts)
Auth: OPENAI_API_KEY env var or api_key in config
Features: Tool use, multimodal (images via data URIs or URLs)
Also powers: OpenRouter, ZhipuAI, xAI via baseURL override

Google Gemini:

SDK: @google/generative-ai (src/models/gemini.ts)
Auth: GOOGLE_API_KEY env var or api_key in config
Features: Streaming, tool use, extended thinking, multimodal

AWS Bedrock:

SDK: @aws-sdk/client-bedrock-runtime (src/models/bedrock.ts)
Auth: AWS_REGION env var + IAM credentials or explicit accessKeyId/secretAccessKey in config
Features: Streaming (ConverseStream), tool use, multimodal
Models: Meta Llama, Amazon Titan (cost-tracked in src/models/costs.ts)

GitHub Models (Copilot):

SDK: openai (OpenAI-compatible API) (src/models/github.ts)
Auth: GITHUB_TOKEN env var or OAuth device flow (src/auth/github.ts)
Endpoint: https://api.githubcopilot.com
Auto-fallback: When an Anthropic tier fails, Flynn automatically tries the same model via GitHub Models before the global fallback chain (src/daemon/index.ts createAutoFallbackClient())
OAuth device flow: Uses client ID Ov23li8tweQw6odWQebz, stores token at ~/.config/flynn/auth.json

OpenRouter:

SDK: openai with baseURL: https://openrouter.ai/api/v1 (src/daemon/index.ts)
Auth: OPENROUTER_API_KEY env var or api_key in config

ZhipuAI:

SDK: openai with baseURL: https://api.z.ai/api/paas/v4 (src/daemon/index.ts)
Auth: ZHIPUAI_API_KEY env var or api_key in config

xAI (Grok):

SDK: openai with baseURL: https://api.x.ai/v1 (src/daemon/index.ts)
Auth: XAI_API_KEY env var or api_key in config

Ollama (Local):

SDK: ollama (src/models/local/ollama.ts)
Auth: None (local server)
Endpoint: Configurable host (default: http://localhost:11434)
Config: num_gpu option for GPU layer control

llama.cpp (Local):

SDK: Raw fetch HTTP calls (src/models/local/llamacpp.ts)
Auth: Optional auth_token header
Endpoint: Configurable (default: http://localhost:8080)

Embedding Providers

Embedding providers (src/memory/embeddings.ts) power the hybrid vector + keyword search system. Factory function: createEmbeddingProvider().

OpenAI Embeddings:

SDK: openai (lazy import)
Auth: OPENAI_API_KEY or config api_key
Default model: text-embedding-3-small, default dims: 1536

Gemini Embeddings:

SDK: @google/generative-ai (lazy import)
Auth: GOOGLE_API_KEY or config api_key
Uses batchEmbedContents for efficiency, default dims: 768

Ollama Embeddings:

SDK: ollama (lazy import)
Auth: None (local)
Configurable host endpoint, default dims: 768

LlamaCpp Embeddings:

SDK: Raw fetch to /embedding endpoint
Auth: None
Default endpoint: http://localhost:8080, default dims: 768

Voyage AI Embeddings:

SDK: openai (OpenAI-compatible API, lazy import)
Auth: VOYAGE_API_KEY env var or config api_key
Endpoint: https://api.voyageai.com/v1, default dims: 1024

Data Storage

Session Database (SQLite):

Library: better-sqlite3 (src/session/store.ts)
Location: {dataDir}/sessions.db
Schema: messages table with id, session_id, role, content, created_at
TTL-based pruning: Configurable via sessions.ttl (default: 30 days), hourly cleanup

Vector Database (SQLite):

Library: better-sqlite3 (src/memory/vector-store.ts)
Location: {dataDir}/vectors.db
Stores embedding chunks as Float32Array BLOBs
Content hashing for deduplication
Background indexer runs every 30 seconds

Memory Store (Filesystem):

Location: {dataDir}/memory/ (src/memory/store.ts)
Format: Markdown files organized by namespace
Layout: global.md, user.md, sessions/{id}.md
Hybrid search: Keyword + vector (configurable weight via hybrid_weight, default 0.7)

File Storage:

Local filesystem only — no cloud object storage

Caching:

In-memory response cache for web fetch tool (5-minute TTL) (src/tools/builtin/web-fetch.ts)
No external cache service (Redis, etc.)

Channel Adapters (Messaging Platforms)

All adapters implement ChannelAdapter interface (src/channels/types.ts): connect(), disconnect(), send(), onMessage().

Telegram:

SDK: grammy (src/channels/telegram/)
Auth: Bot token via telegram.bot_token config
Features: Long polling, chat ID allowlist, mention requirement, pairing codes, image/audio attachments

Discord:

SDK: discord.js (src/channels/discord/)
Auth: Bot token via discord.bot_token config
Features: Guild/channel allowlists, mention requirement, pairing codes

Slack:

SDK: @slack/bolt (src/channels/slack/)
Auth: bot_token, app_token, signing_secret in config
Features: Socket mode, channel allowlists, mention requirement, pairing codes

WhatsApp:

SDK: whatsapp-web.js (src/channels/whatsapp/)
Auth: QR code scanning (web client emulation)
Features: Number/group allowlists, mention requirement, custom data directory, pairing codes

WebChat:

Implementation: Gateway WebSocket bridge (src/channels/webchat/)
Auth: Gateway token or Tailscale identity
UI: Vanilla JS dashboard at src/gateway/ui/ (HTML + CSS + JS, no framework)

Authentication & Identity

GitHub OAuth (Device Flow):

Implementation: src/auth/github.ts
Client ID: Ov23li8tweQw6odWQebz (GitHub Copilot)
Flow: Device code → User authorization → Token polling
Storage: ~/.config/flynn/auth.json (600 permissions)
Priority: GITHUB_TOKEN env → stored OAuth token → null

Gateway Auth:

Static bearer token (server.token in config)
Tailscale identity header trust (server.tailscale_identity)
HTTP auth optional (server.auth_http)
Gateway lock: Single-client WebSocket mode (server.lock)

DM Pairing Codes:

Implementation: src/channels/pairing.ts, src/session/store.ts (SQLite persistence)
Purpose: Authenticate unknown senders via one-time codes
Config: pairing.enabled, pairing.code_ttl (default 5m), pairing.code_length (default 6)
Gateway handlers for code generation/verification
TUI /pair command execution (generate/list/revoke) in src/frontends/tui/minimal.ts
Persistence: PairingStore interface with SQLite pairing_approved table -- approved senders survive daemon restarts

Gmail OAuth2:

SDK: googleapis (src/automation/gmail.ts)
Credentials: ~/.config/flynn/gmail-credentials.json
Token: ~/.config/flynn/gmail-token.json
Setup: flynn gmail-auth CLI command

Automation

Cron Scheduler:

Library: croner (src/automation/cron.ts)
Config: automation.cron[] — each job has name, schedule, message, output.channel, output.peer
Implements ChannelAdapter to inject cron-triggered messages into the channel registry
Features: Enable/disable per job, timezone support, runtime management tools

Webhooks:

Implementation: src/automation/webhooks.ts
Auth: HMAC-SHA256 signature verification (X-Webhook-Signature header)
Templates: {{body}} and {{json.field}} placeholders
Route: POST /webhooks/{name} on the gateway HTTP server
Config: automation.webhooks[] with name, secret, message, output

Gmail Watcher:

SDK: googleapis (src/automation/gmail.ts)
Modes: Pub/Sub push notifications or polling fallback
Pub/Sub topic: projects/flynn-agent/topics/gmail-push
Watch renewal: Every 6 days (Google watch expires at ~7 days)
Config: automation.gmail with watch_labels, poll_interval, history_start
Route: POST /gmail/push on gateway for Pub/Sub push

Heartbeat Monitor:

Implementation: src/automation/heartbeat.ts
Checks: gateway, model, channels, memory, disk
Config: automation.heartbeat with interval, checks, failure_threshold, disk_threshold_mb
Notification: Sends to configured channel/peer on failures

Web & Content Tools

Web Search (Brave / SearXNG):

Implementation: src/tools/builtin/web-search.ts
Brave Search API: https://api.search.brave.com/res/v1/web/search
- Auth: X-Subscription-Token header via web_search.api_key
SearXNG: Self-hosted instance via web_search.endpoint
- Auth: None (private instance)
Config: web_search.provider (brave or searxng), web_search.max_results

Web Fetch (Readability):

Libraries: linkedom, @mozilla/readability, turndown (src/tools/builtin/web-fetch.ts)
Features: HTML → Markdown conversion, article extraction, response caching (5min TTL)
Truncation: 50,000 character max

Browser Automation:

Library: puppeteer-core (src/tools/builtin/browser/)
Config: browser.executable_path or browser.ws_endpoint
Features: Headless browsing, page management, screenshots
Limits: browser.max_pages (default 5), browser.default_timeout (default 30s)

Audio Transcription

Whisper-Compatible API:

Implementation: src/models/media.ts
Endpoint: Configurable via audio.transcription_endpoint
Auth: audio.transcription_api_key (Bearer token)
Model: audio.transcription_model (default: whisper-1)
Supported formats: OGG, MP3, WAV, WebM, MP4, M4A
Integration: Auto-transcribes audio attachments from channels before model processing

Native Audio Passthrough:

Implementation: src/models/capabilities.ts, src/daemon/routing.ts
Capability check: supportsAudioInput(provider, model, override?) determines if a model can process raw audio
Audio-capable providers: Gemini (inlineData), OpenAI (input_audio), GitHub (input_audio)
Non-audio providers: Anthropic, Bedrock, Ollama, llama.cpp (fall back to Whisper transcription)
Config override: supports_audio: true/false per model tier overrides auto-detection
Smart routing: createMessageRouter() checks capability, passes raw AudioSource for capable models or transcribes via Whisper for others
Audio content types: AudioSource ({ type: 'audio', data: string, mimeType: string }) in src/models/types.ts
Token estimation: estimateAudioTokens() in src/context/tokens.ts (base64 length -> bytes -> duration at 16kbps -> tokens at 32/sec)

Agent Tool: audio.transcribe:

Implementation: src/tools/builtin/audio-transcribe.ts
Transcribes audio files on-demand via the configured Whisper-compatible endpoint
Input: file path or base64 data with MIME type
Output: transcribed text

MCP (Model Context Protocol)

MCP Client:

SDK: @modelcontextprotocol/sdk (src/mcp/client.ts)
Transport: stdio (spawns external processes)
Config: mcp.servers[] with name, command, args, env, cwd
Bridge: MCP tools auto-registered in Flynn's tool registry (src/mcp/bridge.ts)
Management: McpManager starts/stops all configured servers (src/mcp/manager.ts)

Docker Sandbox

Per-Session Containers:

Implementation: src/sandbox/manager.ts, src/sandbox/docker.ts
Config: sandbox.image (default: node:22-slim), sandbox.network (default: none), sandbox.memory_limit, sandbox.cpu_limit
Features: Lazily created per session, replaces shell.exec and process.start tools with sandboxed versions
Prerequisite: Docker daemon available

Networking & Exposure

Gateway Server:

Protocol: WebSocket (JSON-RPC) + HTTP (src/gateway/server.ts)
Default port: 18800
Binding: 127.0.0.1 (localhost only) or 0.0.0.0
Features: LaneQueue for request ordering, session bridge, static file serving for dashboard

Tailscale Serve:

Implementation: src/gateway/tailscale.ts
Purpose: Expose gateway HTTPS endpoint on tailnet
Config: server.tailscale.serve, server.tailscale.hostname, server.tailscale.port
Prerequisite: Tailscale CLI installed and daemon running

Monitoring & Observability

Error Tracking:

None (console.error only)

Logging:

console.log / console.error / console.debug throughout
No structured logging framework

Cost Tracking:

Built-in: src/models/costs.ts with per-million-token pricing for known models
Tracks: Anthropic, OpenAI, Gemini, xAI, Bedrock models
GitHub Copilot models tracked at $0 (subscription-included)
Usage exposed via /usage command and gateway system.usage RPC

CI/CD & Deployment

Hosting:

Self-hosted (designed for personal deployment)
Process supervisor expected for restarts (exit code 75 = restart signal)

CI Pipeline:

Not detected in repository

Environment Configuration

Required env vars (minimum viable):

ANTHROPIC_API_KEY (or other model provider key)
FLYNN_TELEGRAM_TOKEN (if using default Telegram channel)

Optional env vars (by feature):

OPENAI_API_KEY - OpenAI models and embeddings
GOOGLE_API_KEY - Gemini models and embeddings
GITHUB_TOKEN - GitHub Models / Copilot access
AWS_REGION - Bedrock region
OPENROUTER_API_KEY - OpenRouter access
ZHIPUAI_API_KEY - ZhipuAI access
XAI_API_KEY - xAI (Grok) access
VOYAGE_API_KEY - Voyage AI embeddings
FLYNN_DATA_DIR - Custom data directory

Secrets location:

API keys: YAML config (with ${ENV_VAR} expansion) or environment variables
OAuth tokens: ~/.config/flynn/auth.json (GitHub), ~/.config/flynn/gmail-token.json (Gmail)
.env.example present at project root

Webhooks & Callbacks

Incoming:

POST /webhooks/{name} - Named webhooks with HMAC-SHA256 verification (src/automation/webhooks.ts)
POST /gmail/push - Google Pub/Sub push notifications for Gmail (src/automation/gmail.ts)

Outgoing:

None (no outbound webhooks — all communication goes through channel adapters)

Integration audit: 2026-02-09

14 KiB Raw Blame History