docs: document native audio support across README, CHANGELOG, config, and planning docs

- README: add audio.transcribe to tool list, update media pipeline description, add Native Audio Support and Audio Transcription config sections, add supports_audio per-tier override example - SOUL.md: add audio.transcribe to available tools list - CHANGELOG: add native audio support and audio.transcribe tool entries - config/default.yaml: add commented audio config section, supports_audio hint - INTEGRATIONS.md: expand audio section with native passthrough, capabilities, smart routing, AudioSource type, token estimation, audio.transcribe tool - STRUCTURE.md: add capabilities.ts and audio-transcribe.ts to key file listings - ARCHITECTURE.md: update data flow step 5 to describe smart audio routing
2026-02-11 18:41:53 -08:00
parent 819ac26b3b
commit 5c531a760d
7 changed files with 87 additions and 8 deletions
@@ -39,6 +39,7 @@ models:
  default:
    provider: anthropic
    model: claude-sonnet-4-20250514
+    # supports_audio: false            # Override native audio detection per tier
  local:
    provider: ollama
    model: glm-4.7-flash
@@ -117,3 +118,14 @@ hooks:
 #       peer: "123456789"
 #     failure_threshold: 2
 #     disk_threshold_mb: 100
+
+# ── Audio ────────────────────────────────────────────────────────────
+# Configure a Whisper-compatible endpoint for audio transcription.
+# Models that support native audio input (Gemini, OpenAI, GitHub) will
+# receive raw audio directly; others fall back to this endpoint.
+
+# audio:
+#   transcription_endpoint: "http://localhost:8080/v1/audio/transcriptions"
+#   transcription_api_key: "${WHISPER_API_KEY}"
+#   transcription_model: "whisper-1"
+#   transcription_provider: "openai"