docs: document native audio support across README, CHANGELOG, config, and planning docs

- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
2026-02-11 18:41:53 -08:00
parent 819ac26b3b
commit 5c531a760d
7 changed files with 87 additions and 8 deletions
@@ -6,6 +6,15 @@ All notable changes to Flynn are documented in this file.

 ### Added

+- **Native Audio Support** -- Smart routing for voice messages: audio-capable models
+  (Gemini, OpenAI, GitHub) receive raw audio directly via `AudioSource` content parts;
+  non-audio models (Anthropic, Bedrock, Ollama, llama.cpp) get Whisper transcription
+  fallback. `supportsAudioInput()` capability check with per-model `supports_audio`
+  config override. Audio token estimation (base64 -> bytes -> duration -> tokens at
+  32 tokens/sec). 38 new tests (18 capabilities + 15 media + 5 token estimation).
+- **Agent Tool: audio.transcribe** -- Transcribe audio files via a Whisper-compatible
+  API endpoint. Configurable via `audio.transcription_endpoint`, supports OGG, MP3,
+  WAV, WebM, MP4, M4A formats.
 - **xAI (Grok) Provider** -- xAI as OpenAI-compatible model provider with
  `provider: xai` config. Supports grok-3, grok-3-mini, grok-2, grok-2-mini,
  grok-3-fast. Uses `XAI_API_KEY` env var or config `api_key`.