feat(npu): add voice audio advisory pipeline

2026-06-05 15:52:43 -07:00
parent 6906c2079b
commit d2bad88596
3 changed files with 644 additions and 0 deletions
@@ -0,0 +1,135 @@
+# NPU voice/audio local-file pipeline
+
+This is the first-slice local-file voice/audio path for the NPU maximization program:
+
+```text
+local audio file or already-staged attachment
+  -> OpenVINO NPU Whisper (:18816)
+  -> OpenVINO NPU classifier (:18819)
+  -> explicit advisory gate
+  -> Atlas/Hermes only after separate approval
+```
+
+The implementation is `scripts/npu_voice_audio_pipeline.py`. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.
+
+## Safety gates
+
+Closed unless explicitly approved later:
+
+- Telegram/Discord fetching by bot token or attachment URL.
+- Outbound messages or auto-sends.
+- Obsidian/vault writes.
+- Memory writes.
+- Vector DB mutation or reindex.
+- Automatic Kanban mutation.
+- Service restarts or new persistent listeners.
+- Private-directory root broadening.
+- Live Atlas/Hermes routing authority changes.
+
+HTTP success is not NPU proof. For NPU claims, require real inference plus positive `/sys/class/accel/accel0/device/npu_busy_time_us` deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.
+
+## Example: synthetic local WAV smoke
+
+```bash
+cd /home/will/lab/swarm
+python - <<'PY'
+import math, struct, wave
+path = '/tmp/npu-voice-smoke.wav'
+sr = 16000
+with wave.open(path, 'wb') as w:
+    w.setnchannels(1)
+    w.setsampwidth(2)
+    w.setframerate(sr)
+    frames = bytearray()
+    for i in range(int(sr * 0.6)):
+        frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
+    w.writeframes(frames)
+print(path)
+PY
+```
+
+Run the local-file wrapper:
+
+```bash
+/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
+  --audio /tmp/npu-voice-smoke.wav \
+  --title "synthetic smoke" \
+  --source manual_smoke \
+  --json
+```
+
+Compact output shape:
+
+```json
+{
+  "ok": true,
+  "source": "manual_smoke",
+  "transcript_chars": 3,
+  "action_worthy": false,
+  "atlas_gate": "suppressed_not_action_worthy",
+  "whisper_npu_delta_us": 85441,
+  "whisper_sysfs_delta_us": 85441,
+  "classifier_npu_delta_us": 85908,
+  "classifier_sysfs_delta_us": 85908,
+  "classifier_observed_sysfs_delta_us": 85908,
+  "external_sends": 0,
+  "writes": 0
+}
+```
+
+A non-actionable smoke should stay `suppressed_not_action_worthy`. A transcript with a reminder, task, follow-up, explicit question, or classifier `tool_needed=true` should become `advisory_only_not_sent`, not sent.
+
+## Example: already-staged platform voice file
+
+This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.
+
+```bash
+/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
+  --audio /tmp/staged-voice-message.ogg \
+  --source staged_telegram \
+  --title "staged local Telegram voice memo" \
+  --json
+```
+
+## Compact fields
+
+The CLI always reports:
+
+- `ok`
+- `id`
+- `source`
+- `transcript_chars`
+- `action_worthy`
+- `atlas_gate`
+- `next_gate`
+- `whisper_npu_delta_us`
+- `whisper_sysfs_delta_us`
+- `classifier_npu_delta_us`
+- `classifier_sysfs_delta_us`
+- `classifier_observed_sysfs_delta_us`
+- `labels.workflow_category`
+- `labels.tool_needed`
+- `labels.urgency`
+- `labels.safety_confirmation_required`
+- `external_sends`
+- `writes`
+
+Transcript text is omitted by default. Use `--include-transcript` or `--include-transcript-preview-chars N` only for explicit local debugging.
+
+## Input limits
+
+- `--audio` must be an absolute local path.
+- Symlinks, directories, missing files, empty files, unsupported extensions, and files over `--max-bytes` are refused.
+- WAV duration is capped by `--max-audio-seconds`; other codecs remain size-capped in this first slice.
+- Classifier transcript payload is bounded by `--max-transcript-chars`.
+
+## Health prerequisites
+
+Read-only checks:
+
+```bash
+curl -fsS http://127.0.0.1:18816/health
+curl -fsS http://127.0.0.1:18819/healthz
+```
+
+Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.