Files
flynn/.planning/codebase/ARCHITECTURE.md
T
William Valentin 5c531a760d docs: document native audio support across README, CHANGELOG, config, and planning docs
- README: add audio.transcribe to tool list, update media pipeline description,
  add Native Audio Support and Audio Transcription config sections, add
  supports_audio per-tier override example
- SOUL.md: add audio.transcribe to available tools list
- CHANGELOG: add native audio support and audio.transcribe tool entries
- config/default.yaml: add commented audio config section, supports_audio hint
- INTEGRATIONS.md: expand audio section with native passthrough, capabilities,
  smart routing, AudioSource type, token estimation, audio.transcribe tool
- STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings
- ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
2026-02-11 18:41:53 -08:00

15 KiB

Architecture

Analysis Date: 2025-02-09

Pattern Overview

Overall: Multi-channel AI agent daemon with layered pipeline architecture

Key Characteristics:

  • Message pipeline: Channel Adapter → ChannelRegistry → MessageRouter → AgentOrchestrator → NativeAgent → ModelClient
  • Registry/factory pattern for extensible channels, tools, models, and agent configs
  • Dependency injection via constructor config objects (no DI container)
  • YAML + Zod config-driven feature toggling — most subsystems are optional and activated by config
  • Lifecycle-managed daemon with ordered shutdown handlers (LIFO)
  • SQLite for session persistence, filesystem for memory persistence, SQLite for vector embeddings

Layers

CLI Layer:

  • Purpose: Parse commands, load config, bootstrap daemon or TUI
  • Location: src/cli/
  • Contains: Command registrations (commander.js), config loading, TUI entry
  • Depends on: Config, Daemon
  • Used by: End user via flynn binary
  • Entry: src/cli/index.ts → registers commands: start, tui, send, sessions, doctor, config, completion

Config Layer:

  • Purpose: Load YAML config, validate with Zod, expand ${ENV_VAR} references
  • Location: src/config/
  • Contains: Zod schema definitions (schema.ts), YAML loader with env expansion (loader.ts)
  • Depends on: zod, yaml
  • Used by: All layers — the Config type flows through the entire system
  • Key file: src/config/schema.ts — single source of truth for all configuration types (409 lines)

Channel Layer:

  • Purpose: Uniform messaging abstraction across platforms
  • Location: src/channels/
  • Contains: ChannelAdapter interface, ChannelRegistry, platform adapters
  • Depends on: Platform SDKs (grammy, discord.js, @slack/bolt, whatsapp-web.js)
  • Used by: Daemon (message routing), Automation (cron/webhook output)
  • Interface: ChannelAdapter with connect(), disconnect(), send(), onMessage()
  • Adapters: src/channels/telegram/adapter.ts, src/channels/discord/adapter.ts, src/channels/slack/adapter.ts, src/channels/whatsapp/adapter.ts, src/channels/webchat/adapter.ts

Agent Layer (Backend):

  • Purpose: Core AI agent loop — process messages, execute tools, manage conversation
  • Location: src/backends/native/
  • Contains: NativeAgent (tool loop), AgentOrchestrator (delegation, compaction, memory)
  • Depends on: Models, Tools, Session, Context, Memory
  • Used by: Daemon (via message router), Gateway (via SessionBridge)
  • Key abstraction: NativeAgent runs the tool loop; AgentOrchestrator wraps it with orchestration features

Model Layer:

  • Purpose: Unified interface to LLM providers with tier-based routing and fallback
  • Location: src/models/
  • Contains: ModelClient interface, provider implementations, ModelRouter, retry logic, cost estimation
  • Depends on: Provider SDKs (@anthropic-ai/sdk, openai, @google/generative-ai, ollama, @aws-sdk/client-bedrock-runtime)
  • Used by: Agent layer, Gateway SessionBridge
  • Providers: anthropic.ts, openai.ts, gemini.ts, bedrock.ts, github.ts, local/ollama.ts, local/llamacpp.ts

Tool Layer:

  • Purpose: Tool registry, execution with policy enforcement and hook checks
  • Location: src/tools/
  • Contains: ToolRegistry, ToolExecutor, ToolPolicy, builtin tool implementations
  • Depends on: Hooks (confirmation), Memory, Process/Browser managers
  • Used by: Agent layer (tool loop), Gateway (tool execution)
  • Three tool patterns:
    • Static: export const fooTool: Tool (e.g., src/tools/builtin/shell.ts)
    • Factory: export function createFooTool(dep): Tool (e.g., src/tools/builtin/media-send.ts)
    • Multi-factory: export function createFooTools(dep): Tool[] (e.g., src/tools/builtin/process/index.ts)

Session Layer:

  • Purpose: Persistent conversation history per channel+sender pair
  • Location: src/session/
  • Contains: SessionStore (SQLite), SessionManager (in-memory cache + store), ManagedSession
  • Depends on: better-sqlite3
  • Used by: Agent layer, Gateway SessionBridge

Memory Layer:

  • Purpose: Persistent knowledge across sessions — namespace-based files + hybrid vector search
  • Location: src/memory/
  • Contains: MemoryStore (file-based), VectorStore (SQLite-backed embeddings), HybridSearch, embedding providers, text chunker
  • Depends on: Embedding providers (OpenAI, Gemini, Ollama, llama.cpp, Voyage)
  • Used by: Agent layer (memory injection into system prompt), Tools (memory.read/write/search)

Context Layer:

  • Purpose: Token estimation and conversation compaction
  • Location: src/context/
  • Contains: Token estimator (tokens.ts), compaction logic (compaction.ts)
  • Depends on: Agent layer (delegation for compaction), Memory (extraction)
  • Used by: AgentOrchestrator (automatic compaction before each message)

Gateway Layer:

  • Purpose: WebSocket JSON-RPC server + HTTP static file server + vanilla JS dashboard
  • Location: src/gateway/
  • Contains: GatewayServer, Router, SessionBridge, LaneQueue, auth, protocol, static serving, Tailscale Serve
  • Depends on: ws, Session, Agent, Tools
  • Used by: WebChat adapter, TUI (connects as WS client), external dashboard clients
  • Protocol: JSON-RPC 2.0 over WebSocket

Hooks Layer:

  • Purpose: Pattern-based tool confirmation engine
  • Location: src/hooks/
  • Contains: HookEngine with glob-pattern matching for confirm/log/silent actions
  • Depends on: Nothing (pure logic)
  • Used by: ToolExecutor (checks before execution)

Prompt Layer:

  • Purpose: Assemble system prompt from template files (SOUL.md, AGENTS.md, etc.)
  • Location: src/prompt/
  • Contains: Template search and assembly logic
  • Depends on: Filesystem (searches multiple directories)
  • Used by: Daemon (system prompt construction at startup)

MCP Layer:

  • Purpose: Model Context Protocol server management — bridge external MCP tools into the tool registry
  • Location: src/mcp/
  • Contains: McpClient, McpManager, tool bridging (bridge.ts)
  • Depends on: @modelcontextprotocol/sdk
  • Used by: Daemon (starts MCP servers, registers bridged tools)

Skills Layer:

  • Purpose: Pluggable skill system — bundled, managed (installed), and workspace skills
  • Location: src/skills/
  • Contains: SkillRegistry, SkillInstaller, skill loader
  • Depends on: Filesystem
  • Used by: Daemon (loads skills, injects into system prompt)

Agents Config Layer:

  • Purpose: Named agent configurations with per-agent overrides (model tier, tools, sandbox)
  • Location: src/agents/
  • Contains: AgentConfigRegistry, AgentRouter (channel+sender → agent config resolution)
  • Depends on: Config
  • Used by: Daemon message router (selects agent config per session)

Automation Layer:

  • Purpose: Scheduled tasks, webhooks, heartbeat monitoring, Gmail watching
  • Location: src/automation/
  • Contains: CronScheduler, WebhookHandler, HeartbeatMonitor, GmailWatcher
  • Depends on: croner, googleapis, Channels
  • Used by: Daemon (registers as channel adapters or standalone monitors)

Sandbox Layer:

  • Purpose: Docker container isolation for tool execution
  • Location: src/sandbox/
  • Contains: DockerSandbox, SandboxManager, sandboxed shell/process tools
  • Depends on: Docker CLI
  • Used by: Daemon message router (replaces shell/process tools with sandboxed variants)

Frontend Layer (Legacy):

  • Purpose: Direct Telegram bot integration and TUI
  • Location: src/frontends/
  • Contains: Telegram bot handlers with confirmation UI, TUI (minimal readline + fullscreen React/Ink)
  • Depends on: grammy (Telegram), ink/react (TUI)
  • Used by: CLI commands (start uses Telegram frontend, tui uses TUI)

Data Flow

Inbound Message Processing (Channel → Response):

  1. Platform SDK receives message → Channel adapter normalizes to InboundMessage
  2. Adapter calls onMessage() callback → ChannelRegistry.handleInbound() routes to MessageHandler
  3. createMessageRouter() resolves agent config via AgentRouter.resolve(channel, senderId)
  4. getOrCreateAgent() creates/retrieves AgentOrchestrator for the session (cached by channel:sender:agentConfig)
  5. Audio routing: supportsAudioInput() checks provider capability — native audio passed through for Gemini/OpenAI/GitHub, transcribed via Whisper for others
  6. orchestrator.process() → injects memory context → checks compaction → delegates to NativeAgent.process()
  7. NativeAgent.toolLoop() → sends to ModelRouter.chat() → model returns response or tool calls
  8. If tool calls: ToolExecutor.execute() → policy check → hook check → tool execution → loop back to model
  9. Final text response returned → reply function sends via adapter → adapter.send() → platform SDK

Gateway WebSocket Flow:

  1. Client connects via WebSocket → auth check → SessionBridge.connect()NativeAgent created
  2. Client sends JSON-RPC message → GatewayServer.handleMessage()Router.dispatch() → handler
  3. agent.send handler → LaneQueue serializes requests → SessionBridge processes via NativeAgent
  4. Streaming events sent back via WebSocket as JSON-RPC notifications
  5. HTTP requests serve static dashboard UI or webhook endpoints

Model Routing with Fallback:

  1. ModelRouter.chat(request, tier) → tries primary client for requested tier
  2. If retry config enabled: withRetry() wraps call with exponential backoff
  3. On failure → try tier-specific fallbacks (e.g., Anthropic → GitHub Models same model)
  4. On failure → try global fallback chain (typically local model)
  5. All failures → throw aggregated error

Compaction Flow:

  1. Before each process(), AgentOrchestrator.compactIfNeeded() checks token count vs threshold
  2. If threshold exceeded → compactHistory() splits messages into compactable + recent (keep N turns)
  3. Delegates summarization to fast tier via orchestrator.delegate()
  4. Optionally extracts memory facts via separate delegation call
  5. Replaces session history with [summary_message, ...recent_messages]

State Management:

  • Session history: SQLite (~/.local/share/flynn/sessions.db) + in-memory cache in SessionManager
  • Memory: Namespace-based markdown files in ~/.local/share/flynn/memory/
  • Vectors: SQLite (~/.local/share/flynn/vectors.db) for embeddings
  • Config: YAML file at ~/.config/flynn/config.yaml (read once at startup)

Key Abstractions

ModelClient:

  • Purpose: Uniform interface to any LLM provider
  • Interface: chat(request: ChatRequest): Promise<ChatResponse> + optional chatStream()
  • Implementations: src/models/anthropic.ts, src/models/openai.ts, src/models/gemini.ts, src/models/bedrock.ts, src/models/github.ts, src/models/local/ollama.ts, src/models/local/llamacpp.ts
  • Pattern: Each provider wraps its SDK and normalizes to ChatResponse

ModelRouter:

  • Purpose: Tier-based model selection with cascading fallback
  • Location: src/models/router.ts
  • Tiers: fast, default, complex, local — each maps to a ModelClient
  • Implements ModelClient interface itself, so consumers don't need to know about tiers

ChannelAdapter:

  • Purpose: Normalize platform-specific messaging into a common interface
  • Interface: connect(), disconnect(), send(peerId, msg), onMessage(handler)
  • Location: src/channels/types.ts
  • Pattern: Each adapter wraps a platform SDK, handles auth/filtering, emits InboundMessage

Tool:

  • Purpose: Executable capability exposed to the AI model
  • Interface: { name, description, inputSchema, execute(args): Promise<ToolResult> }
  • Location: src/tools/types.ts
  • Registration: tool file → src/tools/builtin/index.tssrc/tools/index.tssrc/daemon/index.ts

Session:

  • Purpose: Conversation state (message history) for a channel+sender pair
  • Interface: addMessage(), getHistory(), clear(), replaceHistory()
  • Location: src/session/manager.ts
  • ID format: channel:senderId (e.g., telegram:123456)

AgentOrchestrator:

  • Purpose: Wraps NativeAgent with delegation, compaction, memory, usage tracking
  • Location: src/backends/native/orchestrator.ts
  • Key method: delegate(SubAgentRequest) — stateless single-turn call to any tier
  • Delegation tasks: compaction, memory extraction, classification, tool summarisation, complex reasoning

DaemonContext:

  • Purpose: Holds all initialized subsystems returned by startDaemon()
  • Location: src/daemon/index.ts
  • Contains: config, lifecycle, session/model/tool/channel/gateway/mcp/skill/agent registries

Entry Points

CLI Binary (flynn):

  • Location: src/cli/index.ts
  • Triggers: flynn start, flynn tui, flynn send, flynn sessions, flynn doctor, flynn config
  • Responsibilities: Parse args, load config, bootstrap subsystems

Daemon Start:

  • Location: src/daemon/index.tsstartDaemon(config)
  • Triggers: flynn start CLI command
  • Responsibilities: Initialize all subsystems in order, wire dependencies, start channel adapters and gateway

Gateway Server:

  • Location: src/gateway/server.ts
  • Triggers: HTTP/WS connections on configured port (default 18800)
  • Responsibilities: JSON-RPC routing, WebSocket session management, static UI serving, webhook HTTP endpoints

TUI:

  • Location: src/frontends/tui/minimal.ts (readline) and src/frontends/tui/fullscreen.ts (React/Ink)
  • Triggers: flynn tui or flynn tui --fullscreen
  • Responsibilities: Local interactive chat interface connecting to gateway via WebSocket

Error Handling

Strategy: Catch-and-convert with descriptive context. No global error handler.

Patterns:

  • Model layer: Retry with exponential backoff → tier fallback → global fallback → throw aggregated error
  • Tool execution: Promise.race timeout → catch → return ToolResult { success: false, error: message }
  • Channel adapters: Promise.allSettled for start/stop — log per-adapter errors, don't crash
  • Daemon: Lifecycle LIFO shutdown handlers — each wrapped in try/catch
  • Config: Zod validation throws with structured error messages on invalid config
  • Gateway: JSON-RPC error codes (ParseError, MethodNotFound, InternalError)

Cross-Cutting Concerns

Logging: console.log/console.error/console.warn/console.debug throughout. No structured logging framework. Debug-level messages for model fallback decisions.

Validation: Zod for config validation (src/config/schema.ts). Tool args validated by model-provided schema. No runtime validation on tool args beyond what the tool itself checks.

Authentication: Multi-layer:

  • Gateway: Bearer token auth + optional Tailscale identity header (src/gateway/auth.ts)
  • Telegram: allowed_chat_ids whitelist
  • Discord: allowed_guild_ids + allowed_channel_ids whitelists
  • Slack: allowed_channel_ids whitelist + signing secret
  • WhatsApp: allowed_numbers + allowed_group_ids whitelists
  • Webhooks: HMAC signature verification (per-webhook secret)
  • Pairing: DM pairing codes for unknown senders (src/channels/pairing.ts)

Tool Policy: Profile-based filtering (minimal/messaging/coding/full) + glob-pattern allow/deny lists + per-agent/per-provider overrides (src/tools/policy.ts).

Configuration: Single YAML file with ${ENV_VAR} expansion, validated by comprehensive Zod schema. Every subsystem is feature-toggled via config. Default config path: ~/.config/flynn/config.yaml.


Architecture analysis: 2025-02-09