docs: document native audio support across README, CHANGELOG, config, and planning docs
- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
This commit is contained in:
@@ -234,6 +234,22 @@ All adapters implement `ChannelAdapter` interface (`src/channels/types.ts`): `co
|
||||
- Supported formats: OGG, MP3, WAV, WebM, MP4, M4A
|
||||
- Integration: Auto-transcribes audio attachments from channels before model processing
|
||||
|
||||
**Native Audio Passthrough:**
|
||||
- Implementation: `src/models/capabilities.ts`, `src/daemon/routing.ts`
|
||||
- Capability check: `supportsAudioInput(provider, model, override?)` determines if a model can process raw audio
|
||||
- Audio-capable providers: Gemini (`inlineData`), OpenAI (`input_audio`), GitHub (`input_audio`)
|
||||
- Non-audio providers: Anthropic, Bedrock, Ollama, llama.cpp (fall back to Whisper transcription)
|
||||
- Config override: `supports_audio: true/false` per model tier overrides auto-detection
|
||||
- Smart routing: `createMessageRouter()` checks capability, passes raw `AudioSource` for capable models or transcribes via Whisper for others
|
||||
- Audio content types: `AudioSource` (`{ type: 'audio', data: string, mimeType: string }`) in `src/models/types.ts`
|
||||
- Token estimation: `estimateAudioTokens()` in `src/context/tokens.ts` (base64 length -> bytes -> duration at 16kbps -> tokens at 32/sec)
|
||||
|
||||
**Agent Tool: audio.transcribe:**
|
||||
- Implementation: `src/tools/builtin/audio-transcribe.ts`
|
||||
- Transcribes audio files on-demand via the configured Whisper-compatible endpoint
|
||||
- Input: file path or base64 data with MIME type
|
||||
- Output: transcribed text
|
||||
|
||||
## MCP (Model Context Protocol)
|
||||
|
||||
**MCP Client:**
|
||||
|
||||
Reference in New Issue
Block a user