5c531a760d
- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
150 lines
9.9 KiB
Markdown
150 lines
9.9 KiB
Markdown
# Changelog
|
|
|
|
All notable changes to Flynn are documented in this file.
|
|
|
|
## [Unreleased]
|
|
|
|
### Added
|
|
|
|
- **Native Audio Support** -- Smart routing for voice messages: audio-capable models
|
|
(Gemini, OpenAI, GitHub) receive raw audio directly via `AudioSource` content parts;
|
|
non-audio models (Anthropic, Bedrock, Ollama, llama.cpp) get Whisper transcription
|
|
fallback. `supportsAudioInput()` capability check with per-model `supports_audio`
|
|
config override. Audio token estimation (base64 -> bytes -> duration -> tokens at
|
|
32 tokens/sec). 38 new tests (18 capabilities + 15 media + 5 token estimation).
|
|
- **Agent Tool: audio.transcribe** -- Transcribe audio files via a Whisper-compatible
|
|
API endpoint. Configurable via `audio.transcription_endpoint`, supports OGG, MP3,
|
|
WAV, WebM, MP4, M4A formats.
|
|
- **xAI (Grok) Provider** -- xAI as OpenAI-compatible model provider with
|
|
`provider: xai` config. Supports grok-3, grok-3-mini, grok-2, grok-2-mini,
|
|
grok-3-fast. Uses `XAI_API_KEY` env var or config `api_key`.
|
|
- **Voyage AI Embeddings** -- Voyage AI embedding provider for memory vector
|
|
search. OpenAI-compatible API at `https://api.voyageai.com/v1`, defaults to
|
|
1024 dimensions. Uses `VOYAGE_API_KEY` env var.
|
|
- **Lane Queue** -- Per-session FIFO queue in the gateway that serializes
|
|
concurrent requests instead of rejecting them. Ensures ordered execution
|
|
within each session. 9 tests.
|
|
- **Credential Redaction** -- Expanded `redactConfig()` from 2 to 18+ secret
|
|
fields: all channel tokens, model tier api_keys, embedding keys, webhook
|
|
secrets, Gmail credentials, MCP server env vars.
|
|
- **Web UI Token Dashboard** -- New Usage page in the web dashboard SPA with
|
|
summary cards (total tokens, cost, sessions), per-session breakdown table,
|
|
and 10s auto-refresh. New `system.tokenUsage` gateway endpoint.
|
|
- **Gateway Lock** -- Single-client mode for the WebSocket gateway. When
|
|
`server.lock: true`, only one connection allowed at a time; additional
|
|
connections rejected with close code 4003. Web UI detects locked state. 4 tests.
|
|
- **Shell Completion** -- `flynn completion <shell>` command generating bash,
|
|
zsh, and fish completions. `--install` flag writes to the appropriate shell
|
|
config file. 11 tests.
|
|
- **Tailscale Serve** -- Auto-expose gateway via `tailscale serve` on daemon
|
|
start. `isTailscaleAvailable()` check, auto-start/stop lifecycle, `flynn
|
|
doctor` integration. 6 tests.
|
|
- **DM Pairing Codes** -- PairingManager with time-limited codes for
|
|
authenticating unknown DM senders. Integrated into all 4 channel adapters
|
|
(Telegram, Discord, Slack, WhatsApp). Gateway handlers (`pairing.generate`,
|
|
`pairing.list`, `pairing.revoke`). TUI `/pair` command with generate/list/revoke
|
|
subcommands wired through PairingManager. SQLite persistence via `PairingStore`
|
|
interface -- approved senders survive daemon restarts. Configurable TTL and code
|
|
length. 35 tests.
|
|
- **Zhipu AI (GLM) Provider** -- Support for Zhipu AI's GLM models (glm-4.5, glm-4.7, etc.)
|
|
via their OpenAI-compatible API at `https://api.z.ai/api/paas/v4`. Uses `provider: zhipuai`
|
|
in config with `api_key` or `ZHIPUAI_API_KEY` env var.
|
|
|
|
- **Agent Tool: file.patch** -- Multi-file, multi-hunk structured patch tool. Apply
|
|
line-based replacements, insertions, and deletions across multiple files in a single
|
|
tool call. Hunks are applied bottom-up to preserve line numbers. 10 tests.
|
|
- **Gmail Pub/Sub Watcher** -- Automation source monitoring Gmail via Google Cloud
|
|
Pub/Sub push notifications with polling fallback. OAuth2 auth, configurable watch
|
|
labels, template rendering with email metadata placeholders (`{{from}}`, `{{subject}}`,
|
|
`{{snippet}}`, `{{date}}`, `{{id}}`, `{{labels}}`). Wired into daemon lifecycle and
|
|
gateway (`POST /gmail/push` endpoint). 16 tests.
|
|
- **Inbound Webhooks** -- HTTP endpoints (`POST /webhooks/:name`) that trigger agent
|
|
processing. Config-driven with per-webhook HMAC signature verification, message
|
|
template rendering (`{{body}}`, `{{json.field}}`), and output channel routing.
|
|
Bypasses gateway token auth in favour of per-webhook secrets.
|
|
- **Heartbeat Monitor** -- Periodic health check system with 5 configurable checks:
|
|
gateway responsiveness, model router status, channel adapter connectivity, memory
|
|
store accessibility, and disk space. Sends failure notifications after a configurable
|
|
threshold and recovery notifications when health restores.
|
|
- **Vector Memory Search** -- Hybrid keyword + semantic search for the memory system.
|
|
Provider-agnostic embeddings (OpenAI, Gemini, Ollama, llama.cpp) stored in SQLite
|
|
with brute-force cosine similarity. Background indexer processes dirty namespaces
|
|
every 30s. Configurable hybrid weighting (0.0 = keyword only, 1.0 = vector only).
|
|
Graceful fallback to keyword search when embeddings are unavailable.
|
|
- **Docker Deployment** -- Multi-stage Dockerfile (node:22-alpine build + runtime),
|
|
`.dockerignore`, and `docker-compose.yml`. Native dependency handling for
|
|
`better-sqlite3`. `FLYNN_DATA_DIR` environment variable for configurable data
|
|
directory in container deployments.
|
|
- **Agent Tools: sessions.\*** -- 4 new agent-callable tools (`sessions.list`,
|
|
`sessions.history`, `sessions.create`, `sessions.delete`) wrapping SessionManager
|
|
for runtime session management by the AI agent
|
|
- **Agent Tools: agents.list** -- New tool exposing AgentConfigRegistry to the agent,
|
|
listing all registered agent configurations with tiers, profiles, and sandbox status
|
|
- **Agent Tools: message.send** -- Cross-channel messaging tool allowing the agent to
|
|
proactively send messages to any connected channel (Telegram, Discord, Slack, etc.)
|
|
- **Agent Tools: cron.\*** -- 2 new tools (`cron.list`, `cron.trigger`) for runtime
|
|
cron job management, allowing the agent to list and manually trigger scheduled jobs
|
|
- **Web UI Dashboard (P7)** -- Full SPA control dashboard at the gateway web UI with
|
|
four pages: Dashboard (health stats, channels, auto-refresh), Chat (session selector,
|
|
streaming tool events, markdown rendering), Sessions (list, history viewer, delete),
|
|
and Settings (hook pattern editor, tool list, config viewer). No build step — vanilla
|
|
JS with ES modules, hash-based routing, and WebSocket JSON-RPC client with auto-reconnect.
|
|
- **Gateway: sessions.delete** -- New handler to clear a session's message history
|
|
- **Gateway: sessions.switch** -- New handler to switch a WebSocket connection to a
|
|
different session
|
|
- **Gateway: system.channels** -- New handler listing active channel adapters and their
|
|
connection status
|
|
- **Gateway: system.usage** -- New handler returning aggregated usage stats (uptime,
|
|
sessions, connections, tools)
|
|
- **CLI Surface** -- Full command-line interface via `flynn` binary with 6 commands:
|
|
`start`, `tui`, `send`, `sessions`, `doctor`, `config`
|
|
- **Doctor Diagnostics** -- `flynn doctor` validates config, YAML parsing, schema,
|
|
env vars, data directory, session DB, model config, Telegram, MCP servers, and skills
|
|
- **Cron Scheduling** -- `automation.cron` config for scheduled agent messages with
|
|
output channel routing (e.g. fire a prompt at 9 AM, send the response to Telegram)
|
|
- **CronScheduler Channel Adapter** -- Implements `ChannelAdapter` interface for
|
|
cron-triggered messages through the standard agent pipeline
|
|
- **CLI Shared Utilities** -- Config loading, data dir resolution, secret redaction,
|
|
status formatting for all CLI commands
|
|
- **CronJobConfig Type Export** -- `CronJobConfig` type available from `config/index.ts`
|
|
- **Agent Tool: system.info** -- Get current date, time, hostname, platform,
|
|
architecture, OS release, uptime, Node.js version, memory usage, and working
|
|
directory. Available in all tool profiles.
|
|
- **Runtime Context Injection** -- System prompt now automatically includes current
|
|
date and time via a `# Runtime Context` section in every session
|
|
- **Local Model Tool Calling** -- Ollama and llama.cpp clients now support tool
|
|
calling. Tools are converted to each backend's native format, tool call responses
|
|
are parsed with generated IDs, and `stopReason` is set to `tool_use`. Ollama
|
|
streaming also handles `thinking` fields from reasoning models (deepseek-r1,
|
|
glm-4.7-flash). llama.cpp accumulates streaming tool call deltas across chunks.
|
|
16 tests (8 Ollama + 8 llama.cpp).
|
|
|
|
### Changed
|
|
|
|
- **Gateway Server** -- `GatewayServerConfig` now accepts `channelRegistry` for
|
|
channel status reporting; static file server supports `.mjs`, `.png`, `.ico`, `.woff2`
|
|
- **Entry Points Refactored** -- `src/index.ts` and `src/tui.ts` now delegate to
|
|
the CLI module (`src/cli/index.ts`) instead of directly starting the daemon/TUI
|
|
- **Daemon Wiring** -- CronScheduler auto-registers in the channel registry when
|
|
`automation.cron` jobs are configured; channelRegistry passed to GatewayServer
|
|
|
|
## [0.1.0] - 2026-02-05
|
|
|
|
### Added
|
|
|
|
- **Core Agent** -- NativeAgent with conversation history and iterative tool use
|
|
- **Model Providers** -- Anthropic Claude, OpenAI, Ollama, llama.cpp with streaming
|
|
- **Model Router** -- Intelligent routing with fallback chains and tier switching
|
|
- **Telegram Bot** -- Full Telegram frontend with commands, confirmations, tool status
|
|
- **Terminal UI** -- Minimal (readline) and fullscreen (React/Ink) modes with
|
|
markdown rendering, streaming, model switching, and session transfer
|
|
- **Session Persistence** -- SQLite-backed sessions with multi-frontend support
|
|
- **Hook Engine** -- Pattern-based confirmation system for sensitive tool operations
|
|
- **Tool Framework** -- Registry, executor, and builtin tools (shell, file, web-fetch)
|
|
- **Channel Abstraction** -- Unified ChannelAdapter interface with Telegram and WebChat
|
|
- **WebSocket Gateway** -- JSON-RPC protocol with API key auth and web UI dashboard
|
|
- **MCP Integration** -- External tool server support via Model Context Protocol
|
|
- **Skills System** -- Extensible capability packages (bundled, managed, workspace tiers)
|
|
- **Config System** -- YAML config with Zod validation and env var expansion
|
|
- **Daemon Lifecycle** -- Graceful shutdown with ordered cleanup handlers
|