5c531a760d
- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
299 lines
15 KiB
Markdown
299 lines
15 KiB
Markdown
# Architecture
|
|
|
|
**Analysis Date:** 2025-02-09
|
|
|
|
## Pattern Overview
|
|
|
|
**Overall:** Multi-channel AI agent daemon with layered pipeline architecture
|
|
|
|
**Key Characteristics:**
|
|
- Message pipeline: Channel Adapter → ChannelRegistry → MessageRouter → AgentOrchestrator → NativeAgent → ModelClient
|
|
- Registry/factory pattern for extensible channels, tools, models, and agent configs
|
|
- Dependency injection via constructor config objects (no DI container)
|
|
- YAML + Zod config-driven feature toggling — most subsystems are optional and activated by config
|
|
- Lifecycle-managed daemon with ordered shutdown handlers (LIFO)
|
|
- SQLite for session persistence, filesystem for memory persistence, SQLite for vector embeddings
|
|
|
|
## Layers
|
|
|
|
**CLI Layer:**
|
|
- Purpose: Parse commands, load config, bootstrap daemon or TUI
|
|
- Location: `src/cli/`
|
|
- Contains: Command registrations (commander.js), config loading, TUI entry
|
|
- Depends on: Config, Daemon
|
|
- Used by: End user via `flynn` binary
|
|
- Entry: `src/cli/index.ts` → registers commands: `start`, `tui`, `send`, `sessions`, `doctor`, `config`, `completion`
|
|
|
|
**Config Layer:**
|
|
- Purpose: Load YAML config, validate with Zod, expand `${ENV_VAR}` references
|
|
- Location: `src/config/`
|
|
- Contains: Zod schema definitions (`schema.ts`), YAML loader with env expansion (`loader.ts`)
|
|
- Depends on: `zod`, `yaml`
|
|
- Used by: All layers — the `Config` type flows through the entire system
|
|
- Key file: `src/config/schema.ts` — single source of truth for all configuration types (409 lines)
|
|
|
|
**Channel Layer:**
|
|
- Purpose: Uniform messaging abstraction across platforms
|
|
- Location: `src/channels/`
|
|
- Contains: `ChannelAdapter` interface, `ChannelRegistry`, platform adapters
|
|
- Depends on: Platform SDKs (grammy, discord.js, @slack/bolt, whatsapp-web.js)
|
|
- Used by: Daemon (message routing), Automation (cron/webhook output)
|
|
- Interface: `ChannelAdapter` with `connect()`, `disconnect()`, `send()`, `onMessage()`
|
|
- Adapters: `src/channels/telegram/adapter.ts`, `src/channels/discord/adapter.ts`, `src/channels/slack/adapter.ts`, `src/channels/whatsapp/adapter.ts`, `src/channels/webchat/adapter.ts`
|
|
|
|
**Agent Layer (Backend):**
|
|
- Purpose: Core AI agent loop — process messages, execute tools, manage conversation
|
|
- Location: `src/backends/native/`
|
|
- Contains: `NativeAgent` (tool loop), `AgentOrchestrator` (delegation, compaction, memory)
|
|
- Depends on: Models, Tools, Session, Context, Memory
|
|
- Used by: Daemon (via message router), Gateway (via SessionBridge)
|
|
- Key abstraction: `NativeAgent` runs the tool loop; `AgentOrchestrator` wraps it with orchestration features
|
|
|
|
**Model Layer:**
|
|
- Purpose: Unified interface to LLM providers with tier-based routing and fallback
|
|
- Location: `src/models/`
|
|
- Contains: `ModelClient` interface, provider implementations, `ModelRouter`, retry logic, cost estimation
|
|
- Depends on: Provider SDKs (@anthropic-ai/sdk, openai, @google/generative-ai, ollama, @aws-sdk/client-bedrock-runtime)
|
|
- Used by: Agent layer, Gateway SessionBridge
|
|
- Providers: `anthropic.ts`, `openai.ts`, `gemini.ts`, `bedrock.ts`, `github.ts`, `local/ollama.ts`, `local/llamacpp.ts`
|
|
|
|
**Tool Layer:**
|
|
- Purpose: Tool registry, execution with policy enforcement and hook checks
|
|
- Location: `src/tools/`
|
|
- Contains: `ToolRegistry`, `ToolExecutor`, `ToolPolicy`, builtin tool implementations
|
|
- Depends on: Hooks (confirmation), Memory, Process/Browser managers
|
|
- Used by: Agent layer (tool loop), Gateway (tool execution)
|
|
- Three tool patterns:
|
|
- Static: `export const fooTool: Tool` (e.g., `src/tools/builtin/shell.ts`)
|
|
- Factory: `export function createFooTool(dep): Tool` (e.g., `src/tools/builtin/media-send.ts`)
|
|
- Multi-factory: `export function createFooTools(dep): Tool[]` (e.g., `src/tools/builtin/process/index.ts`)
|
|
|
|
**Session Layer:**
|
|
- Purpose: Persistent conversation history per channel+sender pair
|
|
- Location: `src/session/`
|
|
- Contains: `SessionStore` (SQLite), `SessionManager` (in-memory cache + store), `ManagedSession`
|
|
- Depends on: `better-sqlite3`
|
|
- Used by: Agent layer, Gateway SessionBridge
|
|
|
|
**Memory Layer:**
|
|
- Purpose: Persistent knowledge across sessions — namespace-based files + hybrid vector search
|
|
- Location: `src/memory/`
|
|
- Contains: `MemoryStore` (file-based), `VectorStore` (SQLite-backed embeddings), `HybridSearch`, embedding providers, text chunker
|
|
- Depends on: Embedding providers (OpenAI, Gemini, Ollama, llama.cpp, Voyage)
|
|
- Used by: Agent layer (memory injection into system prompt), Tools (memory.read/write/search)
|
|
|
|
**Context Layer:**
|
|
- Purpose: Token estimation and conversation compaction
|
|
- Location: `src/context/`
|
|
- Contains: Token estimator (`tokens.ts`), compaction logic (`compaction.ts`)
|
|
- Depends on: Agent layer (delegation for compaction), Memory (extraction)
|
|
- Used by: AgentOrchestrator (automatic compaction before each message)
|
|
|
|
**Gateway Layer:**
|
|
- Purpose: WebSocket JSON-RPC server + HTTP static file server + vanilla JS dashboard
|
|
- Location: `src/gateway/`
|
|
- Contains: `GatewayServer`, `Router`, `SessionBridge`, `LaneQueue`, auth, protocol, static serving, Tailscale Serve
|
|
- Depends on: `ws`, Session, Agent, Tools
|
|
- Used by: WebChat adapter, TUI (connects as WS client), external dashboard clients
|
|
- Protocol: JSON-RPC 2.0 over WebSocket
|
|
|
|
**Hooks Layer:**
|
|
- Purpose: Pattern-based tool confirmation engine
|
|
- Location: `src/hooks/`
|
|
- Contains: `HookEngine` with glob-pattern matching for confirm/log/silent actions
|
|
- Depends on: Nothing (pure logic)
|
|
- Used by: ToolExecutor (checks before execution)
|
|
|
|
**Prompt Layer:**
|
|
- Purpose: Assemble system prompt from template files (SOUL.md, AGENTS.md, etc.)
|
|
- Location: `src/prompt/`
|
|
- Contains: Template search and assembly logic
|
|
- Depends on: Filesystem (searches multiple directories)
|
|
- Used by: Daemon (system prompt construction at startup)
|
|
|
|
**MCP Layer:**
|
|
- Purpose: Model Context Protocol server management — bridge external MCP tools into the tool registry
|
|
- Location: `src/mcp/`
|
|
- Contains: `McpClient`, `McpManager`, tool bridging (`bridge.ts`)
|
|
- Depends on: `@modelcontextprotocol/sdk`
|
|
- Used by: Daemon (starts MCP servers, registers bridged tools)
|
|
|
|
**Skills Layer:**
|
|
- Purpose: Pluggable skill system — bundled, managed (installed), and workspace skills
|
|
- Location: `src/skills/`
|
|
- Contains: `SkillRegistry`, `SkillInstaller`, skill loader
|
|
- Depends on: Filesystem
|
|
- Used by: Daemon (loads skills, injects into system prompt)
|
|
|
|
**Agents Config Layer:**
|
|
- Purpose: Named agent configurations with per-agent overrides (model tier, tools, sandbox)
|
|
- Location: `src/agents/`
|
|
- Contains: `AgentConfigRegistry`, `AgentRouter` (channel+sender → agent config resolution)
|
|
- Depends on: Config
|
|
- Used by: Daemon message router (selects agent config per session)
|
|
|
|
**Automation Layer:**
|
|
- Purpose: Scheduled tasks, webhooks, heartbeat monitoring, Gmail watching
|
|
- Location: `src/automation/`
|
|
- Contains: `CronScheduler`, `WebhookHandler`, `HeartbeatMonitor`, `GmailWatcher`
|
|
- Depends on: `croner`, `googleapis`, Channels
|
|
- Used by: Daemon (registers as channel adapters or standalone monitors)
|
|
|
|
**Sandbox Layer:**
|
|
- Purpose: Docker container isolation for tool execution
|
|
- Location: `src/sandbox/`
|
|
- Contains: `DockerSandbox`, `SandboxManager`, sandboxed shell/process tools
|
|
- Depends on: Docker CLI
|
|
- Used by: Daemon message router (replaces shell/process tools with sandboxed variants)
|
|
|
|
**Frontend Layer (Legacy):**
|
|
- Purpose: Direct Telegram bot integration and TUI
|
|
- Location: `src/frontends/`
|
|
- Contains: Telegram bot handlers with confirmation UI, TUI (minimal readline + fullscreen React/Ink)
|
|
- Depends on: grammy (Telegram), ink/react (TUI)
|
|
- Used by: CLI commands (`start` uses Telegram frontend, `tui` uses TUI)
|
|
|
|
## Data Flow
|
|
|
|
**Inbound Message Processing (Channel → Response):**
|
|
|
|
1. Platform SDK receives message → Channel adapter normalizes to `InboundMessage`
|
|
2. Adapter calls `onMessage()` callback → `ChannelRegistry.handleInbound()` routes to `MessageHandler`
|
|
3. `createMessageRouter()` resolves agent config via `AgentRouter.resolve(channel, senderId)`
|
|
4. `getOrCreateAgent()` creates/retrieves `AgentOrchestrator` for the session (cached by `channel:sender:agentConfig`)
|
|
5. Audio routing: `supportsAudioInput()` checks provider capability — native audio passed through for Gemini/OpenAI/GitHub, transcribed via Whisper for others
|
|
6. `orchestrator.process()` → injects memory context → checks compaction → delegates to `NativeAgent.process()`
|
|
7. `NativeAgent.toolLoop()` → sends to `ModelRouter.chat()` → model returns response or tool calls
|
|
8. If tool calls: `ToolExecutor.execute()` → policy check → hook check → tool execution → loop back to model
|
|
9. Final text response returned → reply function sends via adapter → `adapter.send()` → platform SDK
|
|
|
|
**Gateway WebSocket Flow:**
|
|
|
|
1. Client connects via WebSocket → auth check → `SessionBridge.connect()` → `NativeAgent` created
|
|
2. Client sends JSON-RPC message → `GatewayServer.handleMessage()` → `Router.dispatch()` → handler
|
|
3. `agent.send` handler → `LaneQueue` serializes requests → `SessionBridge` processes via `NativeAgent`
|
|
4. Streaming events sent back via WebSocket as JSON-RPC notifications
|
|
5. HTTP requests serve static dashboard UI or webhook endpoints
|
|
|
|
**Model Routing with Fallback:**
|
|
|
|
1. `ModelRouter.chat(request, tier)` → tries primary client for requested tier
|
|
2. If retry config enabled: `withRetry()` wraps call with exponential backoff
|
|
3. On failure → try tier-specific fallbacks (e.g., Anthropic → GitHub Models same model)
|
|
4. On failure → try global fallback chain (typically local model)
|
|
5. All failures → throw aggregated error
|
|
|
|
**Compaction Flow:**
|
|
|
|
1. Before each `process()`, `AgentOrchestrator.compactIfNeeded()` checks token count vs threshold
|
|
2. If threshold exceeded → `compactHistory()` splits messages into compactable + recent (keep N turns)
|
|
3. Delegates summarization to `fast` tier via `orchestrator.delegate()`
|
|
4. Optionally extracts memory facts via separate delegation call
|
|
5. Replaces session history with `[summary_message, ...recent_messages]`
|
|
|
|
**State Management:**
|
|
- Session history: SQLite (`~/.local/share/flynn/sessions.db`) + in-memory cache in `SessionManager`
|
|
- Memory: Namespace-based markdown files in `~/.local/share/flynn/memory/`
|
|
- Vectors: SQLite (`~/.local/share/flynn/vectors.db`) for embeddings
|
|
- Config: YAML file at `~/.config/flynn/config.yaml` (read once at startup)
|
|
|
|
## Key Abstractions
|
|
|
|
**ModelClient:**
|
|
- Purpose: Uniform interface to any LLM provider
|
|
- Interface: `chat(request: ChatRequest): Promise<ChatResponse>` + optional `chatStream()`
|
|
- Implementations: `src/models/anthropic.ts`, `src/models/openai.ts`, `src/models/gemini.ts`, `src/models/bedrock.ts`, `src/models/github.ts`, `src/models/local/ollama.ts`, `src/models/local/llamacpp.ts`
|
|
- Pattern: Each provider wraps its SDK and normalizes to `ChatResponse`
|
|
|
|
**ModelRouter:**
|
|
- Purpose: Tier-based model selection with cascading fallback
|
|
- Location: `src/models/router.ts`
|
|
- Tiers: `fast`, `default`, `complex`, `local` — each maps to a `ModelClient`
|
|
- Implements `ModelClient` interface itself, so consumers don't need to know about tiers
|
|
|
|
**ChannelAdapter:**
|
|
- Purpose: Normalize platform-specific messaging into a common interface
|
|
- Interface: `connect()`, `disconnect()`, `send(peerId, msg)`, `onMessage(handler)`
|
|
- Location: `src/channels/types.ts`
|
|
- Pattern: Each adapter wraps a platform SDK, handles auth/filtering, emits `InboundMessage`
|
|
|
|
**Tool:**
|
|
- Purpose: Executable capability exposed to the AI model
|
|
- Interface: `{ name, description, inputSchema, execute(args): Promise<ToolResult> }`
|
|
- Location: `src/tools/types.ts`
|
|
- Registration: tool file → `src/tools/builtin/index.ts` → `src/tools/index.ts` → `src/daemon/index.ts`
|
|
|
|
**Session:**
|
|
- Purpose: Conversation state (message history) for a channel+sender pair
|
|
- Interface: `addMessage()`, `getHistory()`, `clear()`, `replaceHistory()`
|
|
- Location: `src/session/manager.ts`
|
|
- ID format: `channel:senderId` (e.g., `telegram:123456`)
|
|
|
|
**AgentOrchestrator:**
|
|
- Purpose: Wraps NativeAgent with delegation, compaction, memory, usage tracking
|
|
- Location: `src/backends/native/orchestrator.ts`
|
|
- Key method: `delegate(SubAgentRequest)` — stateless single-turn call to any tier
|
|
- Delegation tasks: compaction, memory extraction, classification, tool summarisation, complex reasoning
|
|
|
|
**DaemonContext:**
|
|
- Purpose: Holds all initialized subsystems returned by `startDaemon()`
|
|
- Location: `src/daemon/index.ts`
|
|
- Contains: config, lifecycle, session/model/tool/channel/gateway/mcp/skill/agent registries
|
|
|
|
## Entry Points
|
|
|
|
**CLI Binary (`flynn`):**
|
|
- Location: `src/cli/index.ts`
|
|
- Triggers: `flynn start`, `flynn tui`, `flynn send`, `flynn sessions`, `flynn doctor`, `flynn config`
|
|
- Responsibilities: Parse args, load config, bootstrap subsystems
|
|
|
|
**Daemon Start:**
|
|
- Location: `src/daemon/index.ts` → `startDaemon(config)`
|
|
- Triggers: `flynn start` CLI command
|
|
- Responsibilities: Initialize all subsystems in order, wire dependencies, start channel adapters and gateway
|
|
|
|
**Gateway Server:**
|
|
- Location: `src/gateway/server.ts`
|
|
- Triggers: HTTP/WS connections on configured port (default 18800)
|
|
- Responsibilities: JSON-RPC routing, WebSocket session management, static UI serving, webhook HTTP endpoints
|
|
|
|
**TUI:**
|
|
- Location: `src/frontends/tui/minimal.ts` (readline) and `src/frontends/tui/fullscreen.ts` (React/Ink)
|
|
- Triggers: `flynn tui` or `flynn tui --fullscreen`
|
|
- Responsibilities: Local interactive chat interface connecting to gateway via WebSocket
|
|
|
|
## Error Handling
|
|
|
|
**Strategy:** Catch-and-convert with descriptive context. No global error handler.
|
|
|
|
**Patterns:**
|
|
- Model layer: Retry with exponential backoff → tier fallback → global fallback → throw aggregated error
|
|
- Tool execution: `Promise.race` timeout → catch → return `ToolResult { success: false, error: message }`
|
|
- Channel adapters: `Promise.allSettled` for start/stop — log per-adapter errors, don't crash
|
|
- Daemon: Lifecycle LIFO shutdown handlers — each wrapped in try/catch
|
|
- Config: Zod validation throws with structured error messages on invalid config
|
|
- Gateway: JSON-RPC error codes (`ParseError`, `MethodNotFound`, `InternalError`)
|
|
|
|
## Cross-Cutting Concerns
|
|
|
|
**Logging:** `console.log`/`console.error`/`console.warn`/`console.debug` throughout. No structured logging framework. Debug-level messages for model fallback decisions.
|
|
|
|
**Validation:** Zod for config validation (`src/config/schema.ts`). Tool args validated by model-provided schema. No runtime validation on tool args beyond what the tool itself checks.
|
|
|
|
**Authentication:** Multi-layer:
|
|
- Gateway: Bearer token auth + optional Tailscale identity header (`src/gateway/auth.ts`)
|
|
- Telegram: `allowed_chat_ids` whitelist
|
|
- Discord: `allowed_guild_ids` + `allowed_channel_ids` whitelists
|
|
- Slack: `allowed_channel_ids` whitelist + signing secret
|
|
- WhatsApp: `allowed_numbers` + `allowed_group_ids` whitelists
|
|
- Webhooks: HMAC signature verification (per-webhook secret)
|
|
- Pairing: DM pairing codes for unknown senders (`src/channels/pairing.ts`)
|
|
|
|
**Tool Policy:** Profile-based filtering (minimal/messaging/coding/full) + glob-pattern allow/deny lists + per-agent/per-provider overrides (`src/tools/policy.ts`).
|
|
|
|
**Configuration:** Single YAML file with `${ENV_VAR}` expansion, validated by comprehensive Zod schema. Every subsystem is feature-toggled via config. Default config path: `~/.config/flynn/config.yaml`.
|
|
|
|
---
|
|
|
|
*Architecture analysis: 2025-02-09*
|