Files
flynn/CHANGELOG.md
T
William Valentin 28c78d469d docs: update audio config docs and add voice message failure fix to changelog
- README.md: Update audio config format to match schema (enabled + provider.* fields instead of old transcription_endpoint fields), add whisper.cpp server Docker example
- CHANGELOG.md: Add '### Fixed' section with voice message failure handling details
- config/default.yaml: Update audio section with new schema format and Docker setup example
2026-02-11 19:47:52 -08:00

10 KiB

Changelog

All notable changes to Flynn are documented in this file.

[Unreleased]

Added

  • Native Audio Support -- Smart routing for voice messages: audio-capable models (Gemini, OpenAI, GitHub) receive raw audio directly via AudioSource content parts; non-audio models (Anthropic, Bedrock, Ollama, llama.cpp) get Whisper transcription fallback. supportsAudioInput() capability check with per-model supports_audio config override. Audio token estimation (base64 -> bytes -> duration -> tokens at 32 tokens/sec). 38 new tests (18 capabilities + 15 media + 5 token estimation).

  • Agent Tool: audio.transcribe -- Transcribe audio files via a Whisper-compatible API endpoint. Configurable via audio.transcription_endpoint, supports OGG, MP3, WAV, WebM, MP4, M4A formats.

  • xAI (Grok) Provider -- xAI as OpenAI-compatible model provider with provider: xai config. Supports grok-3, grok-3-mini, grok-2, grok-2-mini, grok-3-fast. Uses XAI_API_KEY env var or config api_key.

  • Voyage AI Embeddings -- Voyage AI embedding provider for memory vector search. OpenAI-compatible API at https://api.voyageai.com/v1, defaults to 1024 dimensions. Uses VOYAGE_API_KEY env var.

  • Lane Queue -- Per-session FIFO queue in the gateway that serializes concurrent requests instead of rejecting them. Ensures ordered execution within each session. 9 tests.

  • Credential Redaction -- Expanded redactConfig() from 2 to 18+ secret fields: all channel tokens, model tier api_keys, embedding keys, webhook secrets, Gmail credentials, MCP server env vars.

  • Web UI Token Dashboard -- New Usage page in the web dashboard SPA with summary cards (total tokens, cost, sessions), per-session breakdown table, and 10s auto-refresh. New system.tokenUsage gateway endpoint.

  • Gateway Lock -- Single-client mode for the WebSocket gateway. When server.lock: true, only one connection allowed at a time; additional connections rejected with close code 4003. Web UI detects locked state. 4 tests.

  • Shell Completion -- flynn completion <shell> command generating bash, zsh, and fish completions. --install flag writes to the appropriate shell config file. 11 tests.

  • Tailscale Serve -- Auto-expose gateway via tailscale serve on daemon start. isTailscaleAvailable() check, auto-start/stop lifecycle, flynn doctor integration. 6 tests.

  • DM Pairing Codes -- PairingManager with time-limited codes for authenticating unknown DM senders. Integrated into all 4 channel adapters (Telegram, Discord, Slack, WhatsApp). Gateway handlers (pairing.generate, pairing.list, pairing.revoke). TUI /pair command with generate/list/revoke subcommands wired through PairingManager. SQLite persistence via PairingStore interface -- approved senders survive daemon restarts. Configurable TTL and code length. 35 tests.

  • Zhipu AI (GLM) Provider -- Support for Zhipu AI's GLM models (glm-4.5, glm-4.7, etc.) via their OpenAI-compatible API at https://api.z.ai/api/paas/v4. Uses provider: zhipuai in config with api_key or ZHIPUAI_API_KEY env var.

  • Agent Tool: file.patch -- Multi-file, multi-hunk structured patch tool. Apply line-based replacements, insertions, and deletions across multiple files in a single tool call. Hunks are applied bottom-up to preserve line numbers. 10 tests.

  • Gmail Pub/Sub Watcher -- Automation source monitoring Gmail via Google Cloud Pub/Sub push notifications with polling fallback. OAuth2 auth, configurable watch labels, template rendering with email metadata placeholders ({{from}}, {{subject}}, {{snippet}}, {{date}}, {{id}}, {{labels}}). Wired into daemon lifecycle and gateway (POST /gmail/push endpoint). 16 tests.

  • Inbound Webhooks -- HTTP endpoints (POST /webhooks/:name) that trigger agent processing. Config-driven with per-webhook HMAC signature verification, message template rendering ({{body}}, {{json.field}}), and output channel routing. Bypasses gateway token auth in favour of per-webhook secrets.

  • Heartbeat Monitor -- Periodic health check system with 5 configurable checks: gateway responsiveness, model router status, channel adapter connectivity, memory store accessibility, and disk space. Sends failure notifications after a configurable threshold and recovery notifications when health restores.

  • Vector Memory Search -- Hybrid keyword + semantic search for the memory system. Provider-agnostic embeddings (OpenAI, Gemini, Ollama, llama.cpp) stored in SQLite with brute-force cosine similarity. Background indexer processes dirty namespaces every 30s. Configurable hybrid weighting (0.0 = keyword only, 1.0 = vector only). Graceful fallback to keyword search when embeddings are unavailable.

  • Docker Deployment -- Multi-stage Dockerfile (node:22-alpine build + runtime), .dockerignore, and docker-compose.yml. Native dependency handling for better-sqlite3. FLYNN_DATA_DIR environment variable for configurable data directory in container deployments.

  • Agent Tools: sessions.* -- 4 new agent-callable tools (sessions.list, sessions.history, sessions.create, sessions.delete) wrapping SessionManager for runtime session management by the AI agent

  • Agent Tools: agents.list -- New tool exposing AgentConfigRegistry to the agent, listing all registered agent configurations with tiers, profiles, and sandbox status

  • Agent Tools: message.send -- Cross-channel messaging tool allowing the agent to proactively send messages to any connected channel (Telegram, Discord, Slack, etc.)

  • Agent Tools: cron.* -- 2 new tools (cron.list, cron.trigger) for runtime cron job management, allowing the agent to list and manually trigger scheduled jobs

  • Web UI Dashboard (P7) -- Full SPA control dashboard at the gateway web UI with four pages: Dashboard (health stats, channels, auto-refresh), Chat (session selector, streaming tool events, markdown rendering), Sessions (list, history viewer, delete), and Settings (hook pattern editor, tool list, config viewer). No build step — vanilla JS with ES modules, hash-based routing, and WebSocket JSON-RPC client with auto-reconnect.

  • Gateway: sessions.delete -- New handler to clear a session's message history

  • Gateway: sessions.switch -- New handler to switch a WebSocket connection to a different session

  • Gateway: system.channels -- New handler listing active channel adapters and their connection status

  • Gateway: system.usage -- New handler returning aggregated usage stats (uptime, sessions, connections, tools)

  • CLI Surface -- Full command-line interface via flynn binary with 6 commands: start, tui, send, sessions, doctor, config

  • Doctor Diagnostics -- flynn doctor validates config, YAML parsing, schema, env vars, data directory, session DB, model config, Telegram, MCP servers, and skills

  • Cron Scheduling -- automation.cron config for scheduled agent messages with output channel routing (e.g. fire a prompt at 9 AM, send the response to Telegram)

  • CronScheduler Channel Adapter -- Implements ChannelAdapter interface for cron-triggered messages through the standard agent pipeline

  • CLI Shared Utilities -- Config loading, data dir resolution, secret redaction, status formatting for all CLI commands

  • CronJobConfig Type Export -- CronJobConfig type available from config/index.ts

  • Agent Tool: system.info -- Get current date, time, hostname, platform, architecture, OS release, uptime, Node.js version, memory usage, and working directory. Available in all tool profiles.

  • Runtime Context Injection -- System prompt now automatically includes current date and time via a # Runtime Context section in every session

  • Local Model Tool Calling -- Ollama and llama.cpp clients now support tool calling. Tools are converted to each backend's native format, tool call responses are parsed with generated IDs, and stopReason is set to tool_use. Ollama streaming also handles thinking fields from reasoning models (deepseek-r1, glm-4.7-flash). llama.cpp accumulates streaming tool call deltas across chunks. 16 tests (8 Ollama + 8 llama.cpp).

Changed

  • Gateway Server -- GatewayServerConfig now accepts channelRegistry for channel status reporting; static file server supports .mjs, .png, .ico, .woff2
  • Entry Points Refactored -- src/index.ts and src/tui.ts now delegate to CLI module (src/cli/index.ts) instead of directly starting daemon/TUI
  • Daemon Wiring -- CronScheduler auto-registers in channel registry when automation.cron jobs are configured; channelRegistry passed to GatewayServer

Fixed

  • Voice Message Failure Handling -- Telegram voice/audio messages now send user feedback on download failures instead of silently dropping. When audio transcription is not configured for non-audio-capable models, a graceful error message is sent to the user instead of an empty message which would cause an API crash.

[0.1.0] - 2026-02-05

Added

  • Core Agent -- NativeAgent with conversation history and iterative tool use
  • Model Providers -- Anthropic Claude, OpenAI, Ollama, llama.cpp with streaming
  • Model Router -- Intelligent routing with fallback chains and tier switching
  • Telegram Bot -- Full Telegram frontend with commands, confirmations, tool status
  • Terminal UI -- Minimal (readline) and fullscreen (React/Ink) modes with markdown rendering, streaming, model switching, and session transfer
  • Session Persistence -- SQLite-backed sessions with multi-frontend support
  • Hook Engine -- Pattern-based confirmation system for sensitive tool operations
  • Tool Framework -- Registry, executor, and builtin tools (shell, file, web-fetch)
  • Channel Abstraction -- Unified ChannelAdapter interface with Telegram and WebChat
  • WebSocket Gateway -- JSON-RPC protocol with API key auth and web UI dashboard
  • MCP Integration -- External tool server support via Model Context Protocol
  • Skills System -- Extensible capability packages (bundled, managed, workspace tiers)
  • Config System -- YAML config with Zod validation and env var expansion
  • Daemon Lifecycle -- Graceful shutdown with ordered cleanup handlers