docs: document native audio support across README, CHANGELOG, config, and planning docs
- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
This commit is contained in:
@@ -6,6 +6,15 @@ All notable changes to Flynn are documented in this file.
|
||||
|
||||
### Added
|
||||
|
||||
- **Native Audio Support** -- Smart routing for voice messages: audio-capable models
|
||||
(Gemini, OpenAI, GitHub) receive raw audio directly via `AudioSource` content parts;
|
||||
non-audio models (Anthropic, Bedrock, Ollama, llama.cpp) get Whisper transcription
|
||||
fallback. `supportsAudioInput()` capability check with per-model `supports_audio`
|
||||
config override. Audio token estimation (base64 -> bytes -> duration -> tokens at
|
||||
32 tokens/sec). 38 new tests (18 capabilities + 15 media + 5 token estimation).
|
||||
- **Agent Tool: audio.transcribe** -- Transcribe audio files via a Whisper-compatible
|
||||
API endpoint. Configurable via `audio.transcription_endpoint`, supports OGG, MP3,
|
||||
WAV, WebM, MP4, M4A formats.
|
||||
- **xAI (Grok) Provider** -- xAI as OpenAI-compatible model provider with
|
||||
`provider: xai` config. Supports grok-3, grok-3-mini, grok-2, grok-2-mini,
|
||||
grok-3-fast. Uses `XAI_API_KEY` env var or config `api_key`.
|
||||
|
||||
Reference in New Issue
Block a user