will/flynn

Files

T

William Valentin 5c531a760d docs: document native audio support across README, CHANGELOG, config, and planning docs

- README: add audio.transcribe to tool list, update media pipeline description,
  add Native Audio Support and Audio Transcription config sections, add
  supports_audio per-tier override example
- SOUL.md: add audio.transcribe to available tools list
- CHANGELOG: add native audio support and audio.transcribe tool entries
- config/default.yaml: add commented audio config section, supports_audio hint
- INTEGRATIONS.md: expand audio section with native passthrough, capabilities,
  smart routing, AudioSource type, token estimation, audio.transcribe tool
- STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings
- ARCHITECTURE.md: update data flow step 5 to describe smart audio routing

2026-02-11 18:41:53 -08:00

15 KiB

Raw Blame History

Architecture

Analysis Date: 2025-02-09

Pattern Overview

Overall: Multi-channel AI agent daemon with layered pipeline architecture

Key Characteristics:

Message pipeline: Channel Adapter → ChannelRegistry → MessageRouter → AgentOrchestrator → NativeAgent → ModelClient
Registry/factory pattern for extensible channels, tools, models, and agent configs
Dependency injection via constructor config objects (no DI container)
YAML + Zod config-driven feature toggling — most subsystems are optional and activated by config
Lifecycle-managed daemon with ordered shutdown handlers (LIFO)
SQLite for session persistence, filesystem for memory persistence, SQLite for vector embeddings

Layers

CLI Layer:

Purpose: Parse commands, load config, bootstrap daemon or TUI
Location: src/cli/
Contains: Command registrations (commander.js), config loading, TUI entry
Depends on: Config, Daemon
Used by: End user via flynn binary
Entry: src/cli/index.ts → registers commands: start, tui, send, sessions, doctor, config, completion

Config Layer:

Purpose: Load YAML config, validate with Zod, expand ${ENV_VAR} references
Location: src/config/
Contains: Zod schema definitions (schema.ts), YAML loader with env expansion (loader.ts)
Depends on: zod, yaml
Used by: All layers — the Config type flows through the entire system
Key file: src/config/schema.ts — single source of truth for all configuration types (409 lines)

Channel Layer:

Purpose: Uniform messaging abstraction across platforms
Location: src/channels/
Contains: ChannelAdapter interface, ChannelRegistry, platform adapters
Depends on: Platform SDKs (grammy, discord.js, @slack/bolt, whatsapp-web.js)
Used by: Daemon (message routing), Automation (cron/webhook output)
Interface: ChannelAdapter with connect(), disconnect(), send(), onMessage()
Adapters: src/channels/telegram/adapter.ts, src/channels/discord/adapter.ts, src/channels/slack/adapter.ts, src/channels/whatsapp/adapter.ts, src/channels/webchat/adapter.ts

Agent Layer (Backend):

Purpose: Core AI agent loop — process messages, execute tools, manage conversation
Location: src/backends/native/
Contains: NativeAgent (tool loop), AgentOrchestrator (delegation, compaction, memory)
Depends on: Models, Tools, Session, Context, Memory
Used by: Daemon (via message router), Gateway (via SessionBridge)
Key abstraction: NativeAgent runs the tool loop; AgentOrchestrator wraps it with orchestration features

Model Layer:

Purpose: Unified interface to LLM providers with tier-based routing and fallback
Location: src/models/
Contains: ModelClient interface, provider implementations, ModelRouter, retry logic, cost estimation
Depends on: Provider SDKs (@anthropic-ai/sdk, openai, @google/generative-ai, ollama, @aws-sdk/client-bedrock-runtime)
Used by: Agent layer, Gateway SessionBridge
Providers: anthropic.ts, openai.ts, gemini.ts, bedrock.ts, github.ts, local/ollama.ts, local/llamacpp.ts

Tool Layer:

Purpose: Tool registry, execution with policy enforcement and hook checks
Location: src/tools/
Contains: ToolRegistry, ToolExecutor, ToolPolicy, builtin tool implementations
Depends on: Hooks (confirmation), Memory, Process/Browser managers
Used by: Agent layer (tool loop), Gateway (tool execution)
Three tool patterns:
- Static: export const fooTool: Tool (e.g., src/tools/builtin/shell.ts)
- Factory: export function createFooTool(dep): Tool (e.g., src/tools/builtin/media-send.ts)
- Multi-factory: export function createFooTools(dep): Tool[] (e.g., src/tools/builtin/process/index.ts)

Session Layer:

Purpose: Persistent conversation history per channel+sender pair
Location: src/session/
Contains: SessionStore (SQLite), SessionManager (in-memory cache + store), ManagedSession
Depends on: better-sqlite3
Used by: Agent layer, Gateway SessionBridge

Memory Layer:

Purpose: Persistent knowledge across sessions — namespace-based files + hybrid vector search
Location: src/memory/
Contains: MemoryStore (file-based), VectorStore (SQLite-backed embeddings), HybridSearch, embedding providers, text chunker
Depends on: Embedding providers (OpenAI, Gemini, Ollama, llama.cpp, Voyage)
Used by: Agent layer (memory injection into system prompt), Tools (memory.read/write/search)

Context Layer:

Purpose: Token estimation and conversation compaction
Location: src/context/
Contains: Token estimator (tokens.ts), compaction logic (compaction.ts)
Depends on: Agent layer (delegation for compaction), Memory (extraction)
Used by: AgentOrchestrator (automatic compaction before each message)

Gateway Layer:

Purpose: WebSocket JSON-RPC server + HTTP static file server + vanilla JS dashboard
Location: src/gateway/
Contains: GatewayServer, Router, SessionBridge, LaneQueue, auth, protocol, static serving, Tailscale Serve
Depends on: ws, Session, Agent, Tools
Used by: WebChat adapter, TUI (connects as WS client), external dashboard clients
Protocol: JSON-RPC 2.0 over WebSocket

Hooks Layer:

Purpose: Pattern-based tool confirmation engine
Location: src/hooks/
Contains: HookEngine with glob-pattern matching for confirm/log/silent actions
Depends on: Nothing (pure logic)
Used by: ToolExecutor (checks before execution)

Prompt Layer:

Purpose: Assemble system prompt from template files (SOUL.md, AGENTS.md, etc.)
Location: src/prompt/
Contains: Template search and assembly logic
Depends on: Filesystem (searches multiple directories)
Used by: Daemon (system prompt construction at startup)

MCP Layer:

Purpose: Model Context Protocol server management — bridge external MCP tools into the tool registry
Location: src/mcp/
Contains: McpClient, McpManager, tool bridging (bridge.ts)
Depends on: @modelcontextprotocol/sdk
Used by: Daemon (starts MCP servers, registers bridged tools)

Skills Layer:

Purpose: Pluggable skill system — bundled, managed (installed), and workspace skills
Location: src/skills/
Contains: SkillRegistry, SkillInstaller, skill loader
Depends on: Filesystem
Used by: Daemon (loads skills, injects into system prompt)

Agents Config Layer:

Purpose: Named agent configurations with per-agent overrides (model tier, tools, sandbox)
Location: src/agents/
Contains: AgentConfigRegistry, AgentRouter (channel+sender → agent config resolution)
Depends on: Config
Used by: Daemon message router (selects agent config per session)

Automation Layer:

Purpose: Scheduled tasks, webhooks, heartbeat monitoring, Gmail watching
Location: src/automation/
Contains: CronScheduler, WebhookHandler, HeartbeatMonitor, GmailWatcher
Depends on: croner, googleapis, Channels
Used by: Daemon (registers as channel adapters or standalone monitors)

Sandbox Layer:

Purpose: Docker container isolation for tool execution
Location: src/sandbox/
Contains: DockerSandbox, SandboxManager, sandboxed shell/process tools
Depends on: Docker CLI
Used by: Daemon message router (replaces shell/process tools with sandboxed variants)

Frontend Layer (Legacy):

Purpose: Direct Telegram bot integration and TUI
Location: src/frontends/
Contains: Telegram bot handlers with confirmation UI, TUI (minimal readline + fullscreen React/Ink)
Depends on: grammy (Telegram), ink/react (TUI)
Used by: CLI commands (start uses Telegram frontend, tui uses TUI)

Data Flow

Inbound Message Processing (Channel → Response):

Platform SDK receives message → Channel adapter normalizes to InboundMessage
Adapter calls onMessage() callback → ChannelRegistry.handleInbound() routes to MessageHandler
createMessageRouter() resolves agent config via AgentRouter.resolve(channel, senderId)
getOrCreateAgent() creates/retrieves AgentOrchestrator for the session (cached by channel:sender:agentConfig)
Audio routing: supportsAudioInput() checks provider capability — native audio passed through for Gemini/OpenAI/GitHub, transcribed via Whisper for others
orchestrator.process() → injects memory context → checks compaction → delegates to NativeAgent.process()
NativeAgent.toolLoop() → sends to ModelRouter.chat() → model returns response or tool calls
If tool calls: ToolExecutor.execute() → policy check → hook check → tool execution → loop back to model
Final text response returned → reply function sends via adapter → adapter.send() → platform SDK

Gateway WebSocket Flow:

Client connects via WebSocket → auth check → SessionBridge.connect() → NativeAgent created
Client sends JSON-RPC message → GatewayServer.handleMessage() → Router.dispatch() → handler
agent.send handler → LaneQueue serializes requests → SessionBridge processes via NativeAgent
Streaming events sent back via WebSocket as JSON-RPC notifications
HTTP requests serve static dashboard UI or webhook endpoints

Model Routing with Fallback:

ModelRouter.chat(request, tier) → tries primary client for requested tier
If retry config enabled: withRetry() wraps call with exponential backoff
On failure → try tier-specific fallbacks (e.g., Anthropic → GitHub Models same model)
On failure → try global fallback chain (typically local model)
All failures → throw aggregated error

Compaction Flow:

Before each process(), AgentOrchestrator.compactIfNeeded() checks token count vs threshold
If threshold exceeded → compactHistory() splits messages into compactable + recent (keep N turns)
Delegates summarization to fast tier via orchestrator.delegate()
Optionally extracts memory facts via separate delegation call
Replaces session history with [summary_message, ...recent_messages]

State Management:

Session history: SQLite (~/.local/share/flynn/sessions.db) + in-memory cache in SessionManager
Memory: Namespace-based markdown files in ~/.local/share/flynn/memory/
Vectors: SQLite (~/.local/share/flynn/vectors.db) for embeddings
Config: YAML file at ~/.config/flynn/config.yaml (read once at startup)

Key Abstractions

ModelClient:

Purpose: Uniform interface to any LLM provider
Interface: chat(request: ChatRequest): Promise<ChatResponse> + optional chatStream()
Implementations: src/models/anthropic.ts, src/models/openai.ts, src/models/gemini.ts, src/models/bedrock.ts, src/models/github.ts, src/models/local/ollama.ts, src/models/local/llamacpp.ts
Pattern: Each provider wraps its SDK and normalizes to ChatResponse

ModelRouter:

Purpose: Tier-based model selection with cascading fallback
Location: src/models/router.ts
Tiers: fast, default, complex, local — each maps to a ModelClient
Implements ModelClient interface itself, so consumers don't need to know about tiers

ChannelAdapter:

Purpose: Normalize platform-specific messaging into a common interface
Interface: connect(), disconnect(), send(peerId, msg), onMessage(handler)
Location: src/channels/types.ts
Pattern: Each adapter wraps a platform SDK, handles auth/filtering, emits InboundMessage

Tool:

Purpose: Executable capability exposed to the AI model
Interface: { name, description, inputSchema, execute(args): Promise<ToolResult> }
Location: src/tools/types.ts
Registration: tool file → src/tools/builtin/index.ts → src/tools/index.ts → src/daemon/index.ts

Session:

Purpose: Conversation state (message history) for a channel+sender pair
Interface: addMessage(), getHistory(), clear(), replaceHistory()
Location: src/session/manager.ts
ID format: channel:senderId (e.g., telegram:123456)

AgentOrchestrator:

Purpose: Wraps NativeAgent with delegation, compaction, memory, usage tracking
Location: src/backends/native/orchestrator.ts
Key method: delegate(SubAgentRequest) — stateless single-turn call to any tier
Delegation tasks: compaction, memory extraction, classification, tool summarisation, complex reasoning

DaemonContext:

Purpose: Holds all initialized subsystems returned by startDaemon()
Location: src/daemon/index.ts
Contains: config, lifecycle, session/model/tool/channel/gateway/mcp/skill/agent registries

Entry Points

CLI Binary (flynn):

Location: src/cli/index.ts
Triggers: flynn start, flynn tui, flynn send, flynn sessions, flynn doctor, flynn config
Responsibilities: Parse args, load config, bootstrap subsystems

Daemon Start:

Location: src/daemon/index.ts → startDaemon(config)
Triggers: flynn start CLI command
Responsibilities: Initialize all subsystems in order, wire dependencies, start channel adapters and gateway

Gateway Server:

Location: src/gateway/server.ts
Triggers: HTTP/WS connections on configured port (default 18800)
Responsibilities: JSON-RPC routing, WebSocket session management, static UI serving, webhook HTTP endpoints

TUI:

Location: src/frontends/tui/minimal.ts (readline) and src/frontends/tui/fullscreen.ts (React/Ink)
Triggers: flynn tui or flynn tui --fullscreen
Responsibilities: Local interactive chat interface connecting to gateway via WebSocket

Error Handling

Strategy: Catch-and-convert with descriptive context. No global error handler.

Patterns:

Model layer: Retry with exponential backoff → tier fallback → global fallback → throw aggregated error
Tool execution: Promise.race timeout → catch → return ToolResult { success: false, error: message }
Channel adapters: Promise.allSettled for start/stop — log per-adapter errors, don't crash
Daemon: Lifecycle LIFO shutdown handlers — each wrapped in try/catch
Config: Zod validation throws with structured error messages on invalid config
Gateway: JSON-RPC error codes (ParseError, MethodNotFound, InternalError)

Cross-Cutting Concerns

Logging: console.log/console.error/console.warn/console.debug throughout. No structured logging framework. Debug-level messages for model fallback decisions.

Validation: Zod for config validation (src/config/schema.ts). Tool args validated by model-provided schema. No runtime validation on tool args beyond what the tool itself checks.

Authentication: Multi-layer:

Gateway: Bearer token auth + optional Tailscale identity header (src/gateway/auth.ts)
Telegram: allowed_chat_ids whitelist
Discord: allowed_guild_ids + allowed_channel_ids whitelists
Slack: allowed_channel_ids whitelist + signing secret
WhatsApp: allowed_numbers + allowed_group_ids whitelists
Webhooks: HMAC signature verification (per-webhook secret)
Pairing: DM pairing codes for unknown senders (src/channels/pairing.ts)

Tool Policy: Profile-based filtering (minimal/messaging/coding/full) + glob-pattern allow/deny lists + per-agent/per-provider overrides (src/tools/policy.ts).

Configuration: Single YAML file with ${ENV_VAR} expansion, validated by comprehensive Zod schema. Every subsystem is feature-toggled via config. Default config path: ~/.config/flynn/config.yaml.

Architecture analysis: 2025-02-09

15 KiB Raw Blame History