Files
swarm-master/docs/npu-voice-audio-pipeline.md
T
2026-06-05 15:52:43 -07:00

4.1 KiB

NPU voice/audio local-file pipeline

This is the first-slice local-file voice/audio path for the NPU maximization program:

local audio file or already-staged attachment
  -> OpenVINO NPU Whisper (:18816)
  -> OpenVINO NPU classifier (:18819)
  -> explicit advisory gate
  -> Atlas/Hermes only after separate approval

The implementation is scripts/npu_voice_audio_pipeline.py. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.

Safety gates

Closed unless explicitly approved later:

  • Telegram/Discord fetching by bot token or attachment URL.
  • Outbound messages or auto-sends.
  • Obsidian/vault writes.
  • Memory writes.
  • Vector DB mutation or reindex.
  • Automatic Kanban mutation.
  • Service restarts or new persistent listeners.
  • Private-directory root broadening.
  • Live Atlas/Hermes routing authority changes.

HTTP success is not NPU proof. For NPU claims, require real inference plus positive /sys/class/accel/accel0/device/npu_busy_time_us deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.

Example: synthetic local WAV smoke

cd /home/will/lab/swarm
python - <<'PY'
import math, struct, wave
path = '/tmp/npu-voice-smoke.wav'
sr = 16000
with wave.open(path, 'wb') as w:
    w.setnchannels(1)
    w.setsampwidth(2)
    w.setframerate(sr)
    frames = bytearray()
    for i in range(int(sr * 0.6)):
        frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
    w.writeframes(frames)
print(path)
PY

Run the local-file wrapper:

/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
  --audio /tmp/npu-voice-smoke.wav \
  --title "synthetic smoke" \
  --source manual_smoke \
  --json

Compact output shape:

{
  "ok": true,
  "source": "manual_smoke",
  "transcript_chars": 3,
  "action_worthy": false,
  "atlas_gate": "suppressed_not_action_worthy",
  "whisper_npu_delta_us": 85441,
  "whisper_sysfs_delta_us": 85441,
  "classifier_npu_delta_us": 85908,
  "classifier_sysfs_delta_us": 85908,
  "classifier_observed_sysfs_delta_us": 85908,
  "external_sends": 0,
  "writes": 0
}

A non-actionable smoke should stay suppressed_not_action_worthy. A transcript with a reminder, task, follow-up, explicit question, or classifier tool_needed=true should become advisory_only_not_sent, not sent.

Example: already-staged platform voice file

This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.

/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
  --audio /tmp/staged-voice-message.ogg \
  --source staged_telegram \
  --title "staged local Telegram voice memo" \
  --json

Compact fields

The CLI always reports:

  • ok
  • id
  • source
  • transcript_chars
  • action_worthy
  • atlas_gate
  • next_gate
  • whisper_npu_delta_us
  • whisper_sysfs_delta_us
  • classifier_npu_delta_us
  • classifier_sysfs_delta_us
  • classifier_observed_sysfs_delta_us
  • labels.workflow_category
  • labels.tool_needed
  • labels.urgency
  • labels.safety_confirmation_required
  • external_sends
  • writes

Transcript text is omitted by default. Use --include-transcript or --include-transcript-preview-chars N only for explicit local debugging.

Input limits

  • --audio must be an absolute local path.
  • Symlinks, directories, missing files, empty files, unsupported extensions, and files over --max-bytes are refused.
  • WAV duration is capped by --max-audio-seconds; other codecs remain size-capped in this first slice.
  • Classifier transcript payload is bounded by --max-transcript-chars.

Health prerequisites

Read-only checks:

curl -fsS http://127.0.0.1:18816/health
curl -fsS http://127.0.0.1:18819/healthz

Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.