feat(npu): add voice audio advisory pipeline
This commit is contained in:
@@ -0,0 +1,135 @@
|
||||
# NPU voice/audio local-file pipeline
|
||||
|
||||
This is the first-slice local-file voice/audio path for the NPU maximization program:
|
||||
|
||||
```text
|
||||
local audio file or already-staged attachment
|
||||
-> OpenVINO NPU Whisper (:18816)
|
||||
-> OpenVINO NPU classifier (:18819)
|
||||
-> explicit advisory gate
|
||||
-> Atlas/Hermes only after separate approval
|
||||
```
|
||||
|
||||
The implementation is `scripts/npu_voice_audio_pipeline.py`. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.
|
||||
|
||||
## Safety gates
|
||||
|
||||
Closed unless explicitly approved later:
|
||||
|
||||
- Telegram/Discord fetching by bot token or attachment URL.
|
||||
- Outbound messages or auto-sends.
|
||||
- Obsidian/vault writes.
|
||||
- Memory writes.
|
||||
- Vector DB mutation or reindex.
|
||||
- Automatic Kanban mutation.
|
||||
- Service restarts or new persistent listeners.
|
||||
- Private-directory root broadening.
|
||||
- Live Atlas/Hermes routing authority changes.
|
||||
|
||||
HTTP success is not NPU proof. For NPU claims, require real inference plus positive `/sys/class/accel/accel0/device/npu_busy_time_us` deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.
|
||||
|
||||
## Example: synthetic local WAV smoke
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm
|
||||
python - <<'PY'
|
||||
import math, struct, wave
|
||||
path = '/tmp/npu-voice-smoke.wav'
|
||||
sr = 16000
|
||||
with wave.open(path, 'wb') as w:
|
||||
w.setnchannels(1)
|
||||
w.setsampwidth(2)
|
||||
w.setframerate(sr)
|
||||
frames = bytearray()
|
||||
for i in range(int(sr * 0.6)):
|
||||
frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
|
||||
w.writeframes(frames)
|
||||
print(path)
|
||||
PY
|
||||
```
|
||||
|
||||
Run the local-file wrapper:
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
|
||||
--audio /tmp/npu-voice-smoke.wav \
|
||||
--title "synthetic smoke" \
|
||||
--source manual_smoke \
|
||||
--json
|
||||
```
|
||||
|
||||
Compact output shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"ok": true,
|
||||
"source": "manual_smoke",
|
||||
"transcript_chars": 3,
|
||||
"action_worthy": false,
|
||||
"atlas_gate": "suppressed_not_action_worthy",
|
||||
"whisper_npu_delta_us": 85441,
|
||||
"whisper_sysfs_delta_us": 85441,
|
||||
"classifier_npu_delta_us": 85908,
|
||||
"classifier_sysfs_delta_us": 85908,
|
||||
"classifier_observed_sysfs_delta_us": 85908,
|
||||
"external_sends": 0,
|
||||
"writes": 0
|
||||
}
|
||||
```
|
||||
|
||||
A non-actionable smoke should stay `suppressed_not_action_worthy`. A transcript with a reminder, task, follow-up, explicit question, or classifier `tool_needed=true` should become `advisory_only_not_sent`, not sent.
|
||||
|
||||
## Example: already-staged platform voice file
|
||||
|
||||
This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
|
||||
--audio /tmp/staged-voice-message.ogg \
|
||||
--source staged_telegram \
|
||||
--title "staged local Telegram voice memo" \
|
||||
--json
|
||||
```
|
||||
|
||||
## Compact fields
|
||||
|
||||
The CLI always reports:
|
||||
|
||||
- `ok`
|
||||
- `id`
|
||||
- `source`
|
||||
- `transcript_chars`
|
||||
- `action_worthy`
|
||||
- `atlas_gate`
|
||||
- `next_gate`
|
||||
- `whisper_npu_delta_us`
|
||||
- `whisper_sysfs_delta_us`
|
||||
- `classifier_npu_delta_us`
|
||||
- `classifier_sysfs_delta_us`
|
||||
- `classifier_observed_sysfs_delta_us`
|
||||
- `labels.workflow_category`
|
||||
- `labels.tool_needed`
|
||||
- `labels.urgency`
|
||||
- `labels.safety_confirmation_required`
|
||||
- `external_sends`
|
||||
- `writes`
|
||||
|
||||
Transcript text is omitted by default. Use `--include-transcript` or `--include-transcript-preview-chars N` only for explicit local debugging.
|
||||
|
||||
## Input limits
|
||||
|
||||
- `--audio` must be an absolute local path.
|
||||
- Symlinks, directories, missing files, empty files, unsupported extensions, and files over `--max-bytes` are refused.
|
||||
- WAV duration is capped by `--max-audio-seconds`; other codecs remain size-capped in this first slice.
|
||||
- Classifier transcript payload is bounded by `--max-transcript-chars`.
|
||||
|
||||
## Health prerequisites
|
||||
|
||||
Read-only checks:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18816/health
|
||||
curl -fsS http://127.0.0.1:18819/healthz
|
||||
```
|
||||
|
||||
Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.
|
||||
Reference in New Issue
Block a user