docs: document native audio support across README, CHANGELOG, config, and planning docs

- README: add audio.transcribe to tool list, update media pipeline description,
  add Native Audio Support and Audio Transcription config sections, add
  supports_audio per-tier override example
- SOUL.md: add audio.transcribe to available tools list
- CHANGELOG: add native audio support and audio.transcribe tool entries
- config/default.yaml: add commented audio config section, supports_audio hint
- INTEGRATIONS.md: expand audio section with native passthrough, capabilities,
  smart routing, AudioSource type, token estimation, audio.transcribe tool
- STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings
- ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
This commit is contained in:
William Valentin
2026-02-11 18:41:53 -08:00
parent 819ac26b3b
commit 5c531a760d
7 changed files with 87 additions and 8 deletions
+9
View File
@@ -6,6 +6,15 @@ All notable changes to Flynn are documented in this file.
### Added
- **Native Audio Support** -- Smart routing for voice messages: audio-capable models
(Gemini, OpenAI, GitHub) receive raw audio directly via `AudioSource` content parts;
non-audio models (Anthropic, Bedrock, Ollama, llama.cpp) get Whisper transcription
fallback. `supportsAudioInput()` capability check with per-model `supports_audio`
config override. Audio token estimation (base64 -> bytes -> duration -> tokens at
32 tokens/sec). 38 new tests (18 capabilities + 15 media + 5 token estimation).
- **Agent Tool: audio.transcribe** -- Transcribe audio files via a Whisper-compatible
API endpoint. Configurable via `audio.transcription_endpoint`, supports OGG, MP3,
WAV, WebM, MP4, M4A formats.
- **xAI (Grok) Provider** -- xAI as OpenAI-compatible model provider with
`provider: xai` config. Supports grok-3, grok-3-mini, grok-2, grok-2-mini,
grok-3-fast. Uses `XAI_API_KEY` env var or config `api_key`.