5c531a760d
- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
342 lines
14 KiB
Markdown
342 lines
14 KiB
Markdown
# External Integrations
|
|
|
|
**Analysis Date:** 2026-02-09
|
|
|
|
## AI Model Providers
|
|
|
|
Flynn supports 10 model providers via a unified `ModelClient` interface (`src/models/types.ts`). Each provider implements `chat()` and optionally `chatStream()`. The `ModelRouter` (`src/models/router.ts`) manages tier-based routing (fast/default/complex/local) with fallback chains.
|
|
|
|
**Anthropic:**
|
|
- SDK: `@anthropic-ai/sdk` (`src/models/anthropic.ts`)
|
|
- Auth: `ANTHROPIC_API_KEY` env var or `api_key` in config
|
|
- Features: Streaming, tool use, extended thinking mode, multimodal (images)
|
|
- Extended thinking: `{ type: 'enabled', budget_tokens: 4096 }` on request
|
|
|
|
**OpenAI:**
|
|
- SDK: `openai` (`src/models/openai.ts`)
|
|
- Auth: `OPENAI_API_KEY` env var or `api_key` in config
|
|
- Features: Tool use, multimodal (images via data URIs or URLs)
|
|
- Also powers: OpenRouter, ZhipuAI, xAI via `baseURL` override
|
|
|
|
**Google Gemini:**
|
|
- SDK: `@google/generative-ai` (`src/models/gemini.ts`)
|
|
- Auth: `GOOGLE_API_KEY` env var or `api_key` in config
|
|
- Features: Streaming, tool use, extended thinking, multimodal
|
|
|
|
**AWS Bedrock:**
|
|
- SDK: `@aws-sdk/client-bedrock-runtime` (`src/models/bedrock.ts`)
|
|
- Auth: `AWS_REGION` env var + IAM credentials or explicit `accessKeyId`/`secretAccessKey` in config
|
|
- Features: Streaming (ConverseStream), tool use, multimodal
|
|
- Models: Meta Llama, Amazon Titan (cost-tracked in `src/models/costs.ts`)
|
|
|
|
**GitHub Models (Copilot):**
|
|
- SDK: `openai` (OpenAI-compatible API) (`src/models/github.ts`)
|
|
- Auth: `GITHUB_TOKEN` env var or OAuth device flow (`src/auth/github.ts`)
|
|
- Endpoint: `https://api.githubcopilot.com`
|
|
- Auto-fallback: When an Anthropic tier fails, Flynn automatically tries the same model via GitHub Models before the global fallback chain (`src/daemon/index.ts` `createAutoFallbackClient()`)
|
|
- OAuth device flow: Uses client ID `Ov23li8tweQw6odWQebz`, stores token at `~/.config/flynn/auth.json`
|
|
|
|
**OpenRouter:**
|
|
- SDK: `openai` with `baseURL: https://openrouter.ai/api/v1` (`src/daemon/index.ts`)
|
|
- Auth: `OPENROUTER_API_KEY` env var or `api_key` in config
|
|
|
|
**ZhipuAI:**
|
|
- SDK: `openai` with `baseURL: https://api.z.ai/api/paas/v4` (`src/daemon/index.ts`)
|
|
- Auth: `ZHIPUAI_API_KEY` env var or `api_key` in config
|
|
|
|
**xAI (Grok):**
|
|
- SDK: `openai` with `baseURL: https://api.x.ai/v1` (`src/daemon/index.ts`)
|
|
- Auth: `XAI_API_KEY` env var or `api_key` in config
|
|
|
|
**Ollama (Local):**
|
|
- SDK: `ollama` (`src/models/local/ollama.ts`)
|
|
- Auth: None (local server)
|
|
- Endpoint: Configurable `host` (default: `http://localhost:11434`)
|
|
- Config: `num_gpu` option for GPU layer control
|
|
|
|
**llama.cpp (Local):**
|
|
- SDK: Raw `fetch` HTTP calls (`src/models/local/llamacpp.ts`)
|
|
- Auth: Optional `auth_token` header
|
|
- Endpoint: Configurable (default: `http://localhost:8080`)
|
|
|
|
## Embedding Providers
|
|
|
|
Embedding providers (`src/memory/embeddings.ts`) power the hybrid vector + keyword search system. Factory function: `createEmbeddingProvider()`.
|
|
|
|
**OpenAI Embeddings:**
|
|
- SDK: `openai` (lazy import)
|
|
- Auth: `OPENAI_API_KEY` or config `api_key`
|
|
- Default model: `text-embedding-3-small`, default dims: 1536
|
|
|
|
**Gemini Embeddings:**
|
|
- SDK: `@google/generative-ai` (lazy import)
|
|
- Auth: `GOOGLE_API_KEY` or config `api_key`
|
|
- Uses `batchEmbedContents` for efficiency, default dims: 768
|
|
|
|
**Ollama Embeddings:**
|
|
- SDK: `ollama` (lazy import)
|
|
- Auth: None (local)
|
|
- Configurable host endpoint, default dims: 768
|
|
|
|
**LlamaCpp Embeddings:**
|
|
- SDK: Raw `fetch` to `/embedding` endpoint
|
|
- Auth: None
|
|
- Default endpoint: `http://localhost:8080`, default dims: 768
|
|
|
|
**Voyage AI Embeddings:**
|
|
- SDK: `openai` (OpenAI-compatible API, lazy import)
|
|
- Auth: `VOYAGE_API_KEY` env var or config `api_key`
|
|
- Endpoint: `https://api.voyageai.com/v1`, default dims: 1024
|
|
|
|
## Data Storage
|
|
|
|
**Session Database (SQLite):**
|
|
- Library: `better-sqlite3` (`src/session/store.ts`)
|
|
- Location: `{dataDir}/sessions.db`
|
|
- Schema: `messages` table with `id`, `session_id`, `role`, `content`, `created_at`
|
|
- TTL-based pruning: Configurable via `sessions.ttl` (default: 30 days), hourly cleanup
|
|
|
|
**Vector Database (SQLite):**
|
|
- Library: `better-sqlite3` (`src/memory/vector-store.ts`)
|
|
- Location: `{dataDir}/vectors.db`
|
|
- Stores embedding chunks as `Float32Array` BLOBs
|
|
- Content hashing for deduplication
|
|
- Background indexer runs every 30 seconds
|
|
|
|
**Memory Store (Filesystem):**
|
|
- Location: `{dataDir}/memory/` (`src/memory/store.ts`)
|
|
- Format: Markdown files organized by namespace
|
|
- Layout: `global.md`, `user.md`, `sessions/{id}.md`
|
|
- Hybrid search: Keyword + vector (configurable weight via `hybrid_weight`, default 0.7)
|
|
|
|
**File Storage:**
|
|
- Local filesystem only — no cloud object storage
|
|
|
|
**Caching:**
|
|
- In-memory response cache for web fetch tool (5-minute TTL) (`src/tools/builtin/web-fetch.ts`)
|
|
- No external cache service (Redis, etc.)
|
|
|
|
## Channel Adapters (Messaging Platforms)
|
|
|
|
All adapters implement `ChannelAdapter` interface (`src/channels/types.ts`): `connect()`, `disconnect()`, `send()`, `onMessage()`.
|
|
|
|
**Telegram:**
|
|
- SDK: `grammy` (`src/channels/telegram/`)
|
|
- Auth: Bot token via `telegram.bot_token` config
|
|
- Features: Long polling, chat ID allowlist, mention requirement, pairing codes, image/audio attachments
|
|
|
|
**Discord:**
|
|
- SDK: `discord.js` (`src/channels/discord/`)
|
|
- Auth: Bot token via `discord.bot_token` config
|
|
- Features: Guild/channel allowlists, mention requirement, pairing codes
|
|
|
|
**Slack:**
|
|
- SDK: `@slack/bolt` (`src/channels/slack/`)
|
|
- Auth: `bot_token`, `app_token`, `signing_secret` in config
|
|
- Features: Socket mode, channel allowlists, mention requirement, pairing codes
|
|
|
|
**WhatsApp:**
|
|
- SDK: `whatsapp-web.js` (`src/channels/whatsapp/`)
|
|
- Auth: QR code scanning (web client emulation)
|
|
- Features: Number/group allowlists, mention requirement, custom data directory, pairing codes
|
|
|
|
**WebChat:**
|
|
- Implementation: Gateway WebSocket bridge (`src/channels/webchat/`)
|
|
- Auth: Gateway token or Tailscale identity
|
|
- UI: Vanilla JS dashboard at `src/gateway/ui/` (HTML + CSS + JS, no framework)
|
|
|
|
## Authentication & Identity
|
|
|
|
**GitHub OAuth (Device Flow):**
|
|
- Implementation: `src/auth/github.ts`
|
|
- Client ID: `Ov23li8tweQw6odWQebz` (GitHub Copilot)
|
|
- Flow: Device code → User authorization → Token polling
|
|
- Storage: `~/.config/flynn/auth.json` (600 permissions)
|
|
- Priority: `GITHUB_TOKEN` env → stored OAuth token → `null`
|
|
|
|
**Gateway Auth:**
|
|
- Static bearer token (`server.token` in config)
|
|
- Tailscale identity header trust (`server.tailscale_identity`)
|
|
- HTTP auth optional (`server.auth_http`)
|
|
- Gateway lock: Single-client WebSocket mode (`server.lock`)
|
|
|
|
**DM Pairing Codes:**
|
|
- Implementation: `src/channels/pairing.ts`, `src/session/store.ts` (SQLite persistence)
|
|
- Purpose: Authenticate unknown senders via one-time codes
|
|
- Config: `pairing.enabled`, `pairing.code_ttl` (default 5m), `pairing.code_length` (default 6)
|
|
- Gateway handlers for code generation/verification
|
|
- TUI `/pair` command execution (generate/list/revoke) in `src/frontends/tui/minimal.ts`
|
|
- Persistence: `PairingStore` interface with SQLite `pairing_approved` table -- approved senders survive daemon restarts
|
|
|
|
**Gmail OAuth2:**
|
|
- SDK: `googleapis` (`src/automation/gmail.ts`)
|
|
- Credentials: `~/.config/flynn/gmail-credentials.json`
|
|
- Token: `~/.config/flynn/gmail-token.json`
|
|
- Setup: `flynn gmail-auth` CLI command
|
|
|
|
## Automation
|
|
|
|
**Cron Scheduler:**
|
|
- Library: `croner` (`src/automation/cron.ts`)
|
|
- Config: `automation.cron[]` — each job has `name`, `schedule`, `message`, `output.channel`, `output.peer`
|
|
- Implements `ChannelAdapter` to inject cron-triggered messages into the channel registry
|
|
- Features: Enable/disable per job, timezone support, runtime management tools
|
|
|
|
**Webhooks:**
|
|
- Implementation: `src/automation/webhooks.ts`
|
|
- Auth: HMAC-SHA256 signature verification (`X-Webhook-Signature` header)
|
|
- Templates: `{{body}}` and `{{json.field}}` placeholders
|
|
- Route: `POST /webhooks/{name}` on the gateway HTTP server
|
|
- Config: `automation.webhooks[]` with `name`, `secret`, `message`, `output`
|
|
|
|
**Gmail Watcher:**
|
|
- SDK: `googleapis` (`src/automation/gmail.ts`)
|
|
- Modes: Pub/Sub push notifications or polling fallback
|
|
- Pub/Sub topic: `projects/flynn-agent/topics/gmail-push`
|
|
- Watch renewal: Every 6 days (Google watch expires at ~7 days)
|
|
- Config: `automation.gmail` with `watch_labels`, `poll_interval`, `history_start`
|
|
- Route: `POST /gmail/push` on gateway for Pub/Sub push
|
|
|
|
**Heartbeat Monitor:**
|
|
- Implementation: `src/automation/heartbeat.ts`
|
|
- Checks: gateway, model, channels, memory, disk
|
|
- Config: `automation.heartbeat` with `interval`, `checks`, `failure_threshold`, `disk_threshold_mb`
|
|
- Notification: Sends to configured channel/peer on failures
|
|
|
|
## Web & Content Tools
|
|
|
|
**Web Search (Brave / SearXNG):**
|
|
- Implementation: `src/tools/builtin/web-search.ts`
|
|
- Brave Search API: `https://api.search.brave.com/res/v1/web/search`
|
|
- Auth: `X-Subscription-Token` header via `web_search.api_key`
|
|
- SearXNG: Self-hosted instance via `web_search.endpoint`
|
|
- Auth: None (private instance)
|
|
- Config: `web_search.provider` (`brave` or `searxng`), `web_search.max_results`
|
|
|
|
**Web Fetch (Readability):**
|
|
- Libraries: `linkedom`, `@mozilla/readability`, `turndown` (`src/tools/builtin/web-fetch.ts`)
|
|
- Features: HTML → Markdown conversion, article extraction, response caching (5min TTL)
|
|
- Truncation: 50,000 character max
|
|
|
|
**Browser Automation:**
|
|
- Library: `puppeteer-core` (`src/tools/builtin/browser/`)
|
|
- Config: `browser.executable_path` or `browser.ws_endpoint`
|
|
- Features: Headless browsing, page management, screenshots
|
|
- Limits: `browser.max_pages` (default 5), `browser.default_timeout` (default 30s)
|
|
|
|
## Audio Transcription
|
|
|
|
**Whisper-Compatible API:**
|
|
- Implementation: `src/models/media.ts`
|
|
- Endpoint: Configurable via `audio.transcription_endpoint`
|
|
- Auth: `audio.transcription_api_key` (Bearer token)
|
|
- Model: `audio.transcription_model` (default: `whisper-1`)
|
|
- Supported formats: OGG, MP3, WAV, WebM, MP4, M4A
|
|
- Integration: Auto-transcribes audio attachments from channels before model processing
|
|
|
|
**Native Audio Passthrough:**
|
|
- Implementation: `src/models/capabilities.ts`, `src/daemon/routing.ts`
|
|
- Capability check: `supportsAudioInput(provider, model, override?)` determines if a model can process raw audio
|
|
- Audio-capable providers: Gemini (`inlineData`), OpenAI (`input_audio`), GitHub (`input_audio`)
|
|
- Non-audio providers: Anthropic, Bedrock, Ollama, llama.cpp (fall back to Whisper transcription)
|
|
- Config override: `supports_audio: true/false` per model tier overrides auto-detection
|
|
- Smart routing: `createMessageRouter()` checks capability, passes raw `AudioSource` for capable models or transcribes via Whisper for others
|
|
- Audio content types: `AudioSource` (`{ type: 'audio', data: string, mimeType: string }`) in `src/models/types.ts`
|
|
- Token estimation: `estimateAudioTokens()` in `src/context/tokens.ts` (base64 length -> bytes -> duration at 16kbps -> tokens at 32/sec)
|
|
|
|
**Agent Tool: audio.transcribe:**
|
|
- Implementation: `src/tools/builtin/audio-transcribe.ts`
|
|
- Transcribes audio files on-demand via the configured Whisper-compatible endpoint
|
|
- Input: file path or base64 data with MIME type
|
|
- Output: transcribed text
|
|
|
|
## MCP (Model Context Protocol)
|
|
|
|
**MCP Client:**
|
|
- SDK: `@modelcontextprotocol/sdk` (`src/mcp/client.ts`)
|
|
- Transport: stdio (spawns external processes)
|
|
- Config: `mcp.servers[]` with `name`, `command`, `args`, `env`, `cwd`
|
|
- Bridge: MCP tools auto-registered in Flynn's tool registry (`src/mcp/bridge.ts`)
|
|
- Management: `McpManager` starts/stops all configured servers (`src/mcp/manager.ts`)
|
|
|
|
## Docker Sandbox
|
|
|
|
**Per-Session Containers:**
|
|
- Implementation: `src/sandbox/manager.ts`, `src/sandbox/docker.ts`
|
|
- Config: `sandbox.image` (default: `node:22-slim`), `sandbox.network` (default: `none`), `sandbox.memory_limit`, `sandbox.cpu_limit`
|
|
- Features: Lazily created per session, replaces `shell.exec` and `process.start` tools with sandboxed versions
|
|
- Prerequisite: Docker daemon available
|
|
|
|
## Networking & Exposure
|
|
|
|
**Gateway Server:**
|
|
- Protocol: WebSocket (JSON-RPC) + HTTP (`src/gateway/server.ts`)
|
|
- Default port: 18800
|
|
- Binding: `127.0.0.1` (localhost only) or `0.0.0.0`
|
|
- Features: LaneQueue for request ordering, session bridge, static file serving for dashboard
|
|
|
|
**Tailscale Serve:**
|
|
- Implementation: `src/gateway/tailscale.ts`
|
|
- Purpose: Expose gateway HTTPS endpoint on tailnet
|
|
- Config: `server.tailscale.serve`, `server.tailscale.hostname`, `server.tailscale.port`
|
|
- Prerequisite: Tailscale CLI installed and daemon running
|
|
|
|
## Monitoring & Observability
|
|
|
|
**Error Tracking:**
|
|
- None (console.error only)
|
|
|
|
**Logging:**
|
|
- `console.log` / `console.error` / `console.debug` throughout
|
|
- No structured logging framework
|
|
|
|
**Cost Tracking:**
|
|
- Built-in: `src/models/costs.ts` with per-million-token pricing for known models
|
|
- Tracks: Anthropic, OpenAI, Gemini, xAI, Bedrock models
|
|
- GitHub Copilot models tracked at $0 (subscription-included)
|
|
- Usage exposed via `/usage` command and gateway `system.usage` RPC
|
|
|
|
## CI/CD & Deployment
|
|
|
|
**Hosting:**
|
|
- Self-hosted (designed for personal deployment)
|
|
- Process supervisor expected for restarts (exit code 75 = restart signal)
|
|
|
|
**CI Pipeline:**
|
|
- Not detected in repository
|
|
|
|
## Environment Configuration
|
|
|
|
**Required env vars (minimum viable):**
|
|
- `ANTHROPIC_API_KEY` (or other model provider key)
|
|
- `FLYNN_TELEGRAM_TOKEN` (if using default Telegram channel)
|
|
|
|
**Optional env vars (by feature):**
|
|
- `OPENAI_API_KEY` - OpenAI models and embeddings
|
|
- `GOOGLE_API_KEY` - Gemini models and embeddings
|
|
- `GITHUB_TOKEN` - GitHub Models / Copilot access
|
|
- `AWS_REGION` - Bedrock region
|
|
- `OPENROUTER_API_KEY` - OpenRouter access
|
|
- `ZHIPUAI_API_KEY` - ZhipuAI access
|
|
- `XAI_API_KEY` - xAI (Grok) access
|
|
- `VOYAGE_API_KEY` - Voyage AI embeddings
|
|
- `FLYNN_DATA_DIR` - Custom data directory
|
|
|
|
**Secrets location:**
|
|
- API keys: YAML config (with `${ENV_VAR}` expansion) or environment variables
|
|
- OAuth tokens: `~/.config/flynn/auth.json` (GitHub), `~/.config/flynn/gmail-token.json` (Gmail)
|
|
- `.env.example` present at project root
|
|
|
|
## Webhooks & Callbacks
|
|
|
|
**Incoming:**
|
|
- `POST /webhooks/{name}` - Named webhooks with HMAC-SHA256 verification (`src/automation/webhooks.ts`)
|
|
- `POST /gmail/push` - Google Pub/Sub push notifications for Gmail (`src/automation/gmail.ts`)
|
|
|
|
**Outgoing:**
|
|
- None (no outbound webhooks — all communication goes through channel adapters)
|
|
|
|
---
|
|
|
|
*Integration audit: 2026-02-09*
|