Files
swarm-master/docs/npu-voice-audio-pipeline.md
T
2026-06-05 15:52:43 -07:00

136 lines
4.1 KiB
Markdown

# NPU voice/audio local-file pipeline
This is the first-slice local-file voice/audio path for the NPU maximization program:
```text
local audio file or already-staged attachment
-> OpenVINO NPU Whisper (:18816)
-> OpenVINO NPU classifier (:18819)
-> explicit advisory gate
-> Atlas/Hermes only after separate approval
```
The implementation is `scripts/npu_voice_audio_pipeline.py`. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.
## Safety gates
Closed unless explicitly approved later:
- Telegram/Discord fetching by bot token or attachment URL.
- Outbound messages or auto-sends.
- Obsidian/vault writes.
- Memory writes.
- Vector DB mutation or reindex.
- Automatic Kanban mutation.
- Service restarts or new persistent listeners.
- Private-directory root broadening.
- Live Atlas/Hermes routing authority changes.
HTTP success is not NPU proof. For NPU claims, require real inference plus positive `/sys/class/accel/accel0/device/npu_busy_time_us` deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.
## Example: synthetic local WAV smoke
```bash
cd /home/will/lab/swarm
python - <<'PY'
import math, struct, wave
path = '/tmp/npu-voice-smoke.wav'
sr = 16000
with wave.open(path, 'wb') as w:
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(sr)
frames = bytearray()
for i in range(int(sr * 0.6)):
frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
w.writeframes(frames)
print(path)
PY
```
Run the local-file wrapper:
```bash
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/npu-voice-smoke.wav \
--title "synthetic smoke" \
--source manual_smoke \
--json
```
Compact output shape:
```json
{
"ok": true,
"source": "manual_smoke",
"transcript_chars": 3,
"action_worthy": false,
"atlas_gate": "suppressed_not_action_worthy",
"whisper_npu_delta_us": 85441,
"whisper_sysfs_delta_us": 85441,
"classifier_npu_delta_us": 85908,
"classifier_sysfs_delta_us": 85908,
"classifier_observed_sysfs_delta_us": 85908,
"external_sends": 0,
"writes": 0
}
```
A non-actionable smoke should stay `suppressed_not_action_worthy`. A transcript with a reminder, task, follow-up, explicit question, or classifier `tool_needed=true` should become `advisory_only_not_sent`, not sent.
## Example: already-staged platform voice file
This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.
```bash
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/staged-voice-message.ogg \
--source staged_telegram \
--title "staged local Telegram voice memo" \
--json
```
## Compact fields
The CLI always reports:
- `ok`
- `id`
- `source`
- `transcript_chars`
- `action_worthy`
- `atlas_gate`
- `next_gate`
- `whisper_npu_delta_us`
- `whisper_sysfs_delta_us`
- `classifier_npu_delta_us`
- `classifier_sysfs_delta_us`
- `classifier_observed_sysfs_delta_us`
- `labels.workflow_category`
- `labels.tool_needed`
- `labels.urgency`
- `labels.safety_confirmation_required`
- `external_sends`
- `writes`
Transcript text is omitted by default. Use `--include-transcript` or `--include-transcript-preview-chars N` only for explicit local debugging.
## Input limits
- `--audio` must be an absolute local path.
- Symlinks, directories, missing files, empty files, unsupported extensions, and files over `--max-bytes` are refused.
- WAV duration is capped by `--max-audio-seconds`; other codecs remain size-capped in this first slice.
- Classifier transcript payload is bounded by `--max-transcript-chars`.
## Health prerequisites
Read-only checks:
```bash
curl -fsS http://127.0.0.1:18816/health
curl -fsS http://127.0.0.1:18819/healthz
```
Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.