3.0 KiB
3.0 KiB
Voice Transcription Debug Runbook
This runbook covers Telegram voice-message troubleshooting for audio.transcribe.
Fast Checks
- Confirm tool is enabled in config:
audio:
enabled: true
provider:
endpoint: http://localhost:18801/v1/audio/transcriptions
- Confirm recent tool events:
tail -n 400 ~/.local/share/flynn/audit.log | jq -c 'select(.event_type=="tool.start" or .event_type=="tool.success" or .event_type=="tool.error" or .event_type=="tool.args_rewritten")'
- If needed, confirm local endpoint behavior directly:
curl -sS -i -X POST http://localhost:18801/v1/audio/transcriptions \
-F file=@/tmp/sample.ogg \
-F model=whisper-1 \
-F response_format=json
Interpreting Common Errors
-
Either data or url must be provided- Model/tool call had empty args and no hydrated attachment data.
-
Only http/https URLs are allowed, got file:- Model emitted
file://...URL; Flynn should rewrite from latest session audio.
- Model emitted
-
Transcription endpoint error: FFmpeg conversion failed.- Endpoint could not decode payload as audio. Often caused by model-provided fake or mismatched
data/mime_type.
- Endpoint could not decode payload as audio. Often caused by model-provided fake or mismatched
-
[No speech detected]- Request succeeded and endpoint returned empty transcript text.
Rewrite Metric
Flynn emits tool.args_rewritten whenever it replaces model-provided audio.transcribe args with trusted session audio bytes.
Fields:
source:latest_turn,persisted, orhistoryreason:latest_audio_preferred,voice_turn_fallback,invalid_model_args,missing_model_argsoriginal_*andfinal_mime_typefor quick diagnosis
Example:
{
"event_type": "tool.args_rewritten",
"event": {
"tool_name": "audio.transcribe",
"source": "persisted",
"reason": "voice_turn_fallback",
"original_mime_type": "audio/ogg",
"final_mime_type": "audio/ogg"
}
}
Where Audio Is Stored Locally
Audio bytes are not written as standalone files by default. They are persisted in SQLite:
- DB path:
~/.local/share/flynn/sessions.db - Table:
session_config - Key:
lastAudioAttachment - Value: JSON with
data(base64) orurl, plusmimeType
Inspect current value:
sqlite3 ~/.local/share/flynn/sessions.db \
"SELECT session_id,key,length(value) FROM session_config WHERE key='lastAudioAttachment';"
Clear Cached Audio for One Session
Delete only one chat/session cache entry (example session id: telegram:8367012007):
sqlite3 ~/.local/share/flynn/sessions.db \
"DELETE FROM session_config WHERE session_id='telegram:8367012007' AND key='lastAudioAttachment';"
Verify:
sqlite3 ~/.local/share/flynn/sessions.db \
"SELECT session_id,key,length(value) FROM session_config WHERE key='lastAudioAttachment';"
If /reset is run in that chat, it also clears the session's lastAudioAttachment row.
Data Lifecycle
session.clear()(e.g./reset) removes messages, tool executions, and session config for that session.- Session TTL pruning removes stale sessions and associated config from SQLite.