docs: update audio config docs and add voice message failure fix to changelog

- README.md: Update audio config format to match schema (enabled + provider.* fields instead of old transcription_endpoint fields), add whisper.cpp server Docker example
- CHANGELOG.md: Add '### Fixed' section with voice message failure handling details
- config/default.yaml: Update audio section with new schema format and Docker setup example
This commit is contained in:
William Valentin
2026-02-11 19:47:52 -08:00
parent 2e235213d9
commit 28c78d469d
3 changed files with 49 additions and 16 deletions
+26 -9
View File
@@ -170,20 +170,37 @@ Configure a Whisper-compatible endpoint for models that don't support native aud
```yaml
audio:
transcription_endpoint: "http://localhost:8080/v1/audio/transcriptions"
transcription_api_key: "${WHISPER_API_KEY}" # Optional Bearer token
transcription_model: "whisper-1" # Model name (default: whisper-1)
transcription_provider: "openai" # Provider format: openai (default)
enabled: true
provider:
type: custom # openai, groq, ollama, llamacpp, custom
endpoint: "http://localhost:18801/v1/audio/transcriptions"
api_key: "${WHISPER_API_KEY}" # Optional Bearer token
model: "whisper-1" # Model name (default: whisper-1)
```
| Field | Required | Description |
|-------|----------|-------------|
| `transcription_endpoint` | yes | Whisper-compatible API endpoint |
| `transcription_api_key` | no | Bearer token for authentication |
| `transcription_model` | no | Model name sent in the request (default: `whisper-1`) |
| `transcription_provider` | no | API format: `openai` (default) |
| `enabled` | no | Enable audio transcription (default: `false`) |
| `provider.type` | yes | Provider type: `openai`, `groq`, `ollama`, `llamacpp`, or `custom` |
| `provider.endpoint` | yes | Whisper-compatible API endpoint |
| `provider.api_key` | no | Bearer token for authentication |
| `provider.model` | no | Model name sent in request (default: `whisper-1`) |
Without an `audio` config, voice messages from non-audio-capable models are silently skipped.
Without an `audio` config, voice messages from non-audio-capable models will display an error message to the user. For local transcription, you can run a whisper.cpp server:
```bash
# Start whisper.cpp server with OpenAI-compatible endpoint
docker run -d \
--name whisper-server \
-p 18801:8080 \
ghcr.io/ggml-org/whisper.cpp:main \
--model /app/models/ggml-base.en.bin \
--host 0.0.0.0 \
--port 8080 \
--convert \
--language en \
--inference-path /v1/audio/transcriptions
```
## Telegram Commands