4.1 KiB
NPU voice/audio local-file pipeline
This is the first-slice local-file voice/audio path for the NPU maximization program:
local audio file or already-staged attachment
-> OpenVINO NPU Whisper (:18816)
-> OpenVINO NPU classifier (:18819)
-> explicit advisory gate
-> Atlas/Hermes only after separate approval
The implementation is scripts/npu_voice_audio_pipeline.py. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.
Safety gates
Closed unless explicitly approved later:
- Telegram/Discord fetching by bot token or attachment URL.
- Outbound messages or auto-sends.
- Obsidian/vault writes.
- Memory writes.
- Vector DB mutation or reindex.
- Automatic Kanban mutation.
- Service restarts or new persistent listeners.
- Private-directory root broadening.
- Live Atlas/Hermes routing authority changes.
HTTP success is not NPU proof. For NPU claims, require real inference plus positive /sys/class/accel/accel0/device/npu_busy_time_us deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.
Example: synthetic local WAV smoke
cd /home/will/lab/swarm
python - <<'PY'
import math, struct, wave
path = '/tmp/npu-voice-smoke.wav'
sr = 16000
with wave.open(path, 'wb') as w:
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(sr)
frames = bytearray()
for i in range(int(sr * 0.6)):
frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
w.writeframes(frames)
print(path)
PY
Run the local-file wrapper:
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/npu-voice-smoke.wav \
--title "synthetic smoke" \
--source manual_smoke \
--json
Compact output shape:
{
"ok": true,
"source": "manual_smoke",
"transcript_chars": 3,
"action_worthy": false,
"atlas_gate": "suppressed_not_action_worthy",
"whisper_npu_delta_us": 85441,
"whisper_sysfs_delta_us": 85441,
"classifier_npu_delta_us": 85908,
"classifier_sysfs_delta_us": 85908,
"classifier_observed_sysfs_delta_us": 85908,
"external_sends": 0,
"writes": 0
}
A non-actionable smoke should stay suppressed_not_action_worthy. A transcript with a reminder, task, follow-up, explicit question, or classifier tool_needed=true should become advisory_only_not_sent, not sent.
Example: already-staged platform voice file
This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/staged-voice-message.ogg \
--source staged_telegram \
--title "staged local Telegram voice memo" \
--json
Compact fields
The CLI always reports:
okidsourcetranscript_charsaction_worthyatlas_gatenext_gatewhisper_npu_delta_uswhisper_sysfs_delta_usclassifier_npu_delta_usclassifier_sysfs_delta_usclassifier_observed_sysfs_delta_uslabels.workflow_categorylabels.tool_neededlabels.urgencylabels.safety_confirmation_requiredexternal_sendswrites
Transcript text is omitted by default. Use --include-transcript or --include-transcript-preview-chars N only for explicit local debugging.
Input limits
--audiomust be an absolute local path.- Symlinks, directories, missing files, empty files, unsupported extensions, and files over
--max-bytesare refused. - WAV duration is capped by
--max-audio-seconds; other codecs remain size-capped in this first slice. - Classifier transcript payload is bounded by
--max-transcript-chars.
Health prerequisites
Read-only checks:
curl -fsS http://127.0.0.1:18816/health
curl -fsS http://127.0.0.1:18819/healthz
Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.