7 Commits

Author SHA1 Message Date
William Valentin 08fb9ca686 docs(npu): update integrated health runbooks 2026-06-05 15:52:51 -07:00
William Valentin 9e5ffa0fd0 feat(npu): add kanban hygiene advisory 2026-06-05 15:52:43 -07:00
William Valentin d2bad88596 feat(npu): add voice audio advisory pipeline 2026-06-05 15:52:43 -07:00
William Valentin 6906c2079b feat(npu): add explicit-root batch triage wrapper 2026-06-05 15:52:43 -07:00
William Valentin 6155b54ab5 feat(npu): add cron and n8n advisory examples 2026-06-05 15:52:43 -07:00
William Valentin 5a14adaf58 feat(npu): add utilization digest tooling 2026-06-05 15:52:43 -07:00
William Valentin b7b4edf0f5 feat(npu): add local context gate advisory 2026-06-05 15:52:42 -07:00
28 changed files with 4596 additions and 134 deletions
+2
View File
@@ -37,6 +37,8 @@ For the current host-side AI/search/voice automation stack, n8n watchdogs, and a
- [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks
- [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram
- [`docs/diagram-maintenance.md`](docs/diagram-maintenance.md) — diagram upkeep conventions
- [`docs/npu-utilization-digest.md`](docs/npu-utilization-digest.md) — compact on-demand NPU proof/utilization digest runbook
- [`docs/npu-integrated-health-ops.md`](docs/npu-integrated-health-ops.md) — integrated operator health-check workflow combining `npu-service-health.sh` and the utilization digest
- OpenVINO NPU services and prototypes are documented in `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` and the component READMEs under `openvino-*-npu*/`. Live baseline ports are RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; sidecar ports `:18818`, `:18819`, `:18820`, and optional doc/image triage `:18829` are approved prototypes only, not live Atlas/Hermes routing.
## VM: zap
+52
View File
@@ -0,0 +1,52 @@
version: 1
policy:
default_mode: dry_run
require_explicit_root: true
allow_external_uploads: false
allow_mutations: false
log_raw_text: false
include_full_paths_default: false
npu_proof_path: /sys/class/accel/accel0/device/npu_busy_time_us
# Copy to config/triage-roots.local.yaml and approve exactly one narrow,
# lane-specific staging root. The committed template is intentionally
# unapproved/fail-closed; do not point any lane at broad home, Downloads,
# vault, screenshot, photo-library, or historical audio roots without explicit
# approval for that exact lane/root.
roots:
screenshots:
approved: false
root: null
allowed_extensions: [.png, .jpg, .jpeg, .webp, .heic]
max_files: 50
max_file_mb: 25
receipts:
approved: false
root: null
allowed_extensions: [.png, .jpg, .jpeg, .pdf, .webp]
max_files: 50
max_file_mb: 25
downloads:
approved: false
root: null
allowed_extensions: [.pdf, .png, .jpg, .jpeg, .webp]
max_files: 50
max_file_mb: 25
obsidian_attachments:
approved: false
root: null
allowed_extensions: [.pdf, .png, .jpg, .jpeg, .webp, .mp3, .m4a, .wav, .ogg]
max_files: 50
max_file_mb: 50
voice_memos:
approved: false
root: null
allowed_extensions: [.mp3, .m4a, .wav, .ogg, .opus]
max_files: 25
max_file_mb: 100
meeting_snippets:
approved: false
root: null
allowed_extensions: [.mp3, .m4a, .wav, .ogg, .opus]
max_files: 25
max_file_mb: 200
+46
View File
@@ -0,0 +1,46 @@
version: 1
policy:
default_mode: dry_run
require_explicit_root: true
allow_external_uploads: false
allow_mutations: false
log_raw_text: false
include_full_paths_default: false
npu_proof_path: /sys/class/accel/accel0/device/npu_busy_time_us
roots:
screenshots:
approved: true
root: ../openvino-doc-image-triage-npu/samples
allowed_extensions: [.png, .jpg, .jpeg, .webp, .heic]
max_files: 50
max_file_mb: 25
receipts:
approved: true
root: ../openvino-doc-image-triage-npu/samples
allowed_extensions: [.png, .jpg, .jpeg, .pdf, .webp]
max_files: 50
max_file_mb: 25
downloads:
approved: true
root: ../openvino-doc-image-triage-npu/samples
allowed_extensions: [.pdf, .png, .jpg, .jpeg, .webp]
max_files: 50
max_file_mb: 25
obsidian_attachments:
approved: true
root: ../openvino-doc-image-triage-npu/samples
allowed_extensions: [.pdf, .png, .jpg, .jpeg, .webp, .mp3, .m4a, .wav, .ogg]
max_files: 50
max_file_mb: 50
voice_memos:
approved: true
root: ../tmp/synthetic-voice-memos
allowed_extensions: [.mp3, .m4a, .wav, .ogg, .opus]
max_files: 25
max_file_mb: 100
meeting_snippets:
approved: true
root: ../tmp/synthetic-meeting-snippets
allowed_extensions: [.mp3, .m4a, .wav, .ogg, .opus]
max_files: 25
max_file_mb: 200
+65
View File
@@ -0,0 +1,65 @@
# Explicit-root NPU batch triage dry-run examples
These examples are wrappers only. They do not install cron jobs, enable services,
change Atlas/Hermes routing, write Obsidian/RAG/vector DBs, move/delete files, or
send outbound messages.
The committed manifest template at `config/triage-roots.example.yaml` is
intentionally unapproved. For real private data, copy it to
`config/triage-roots.local.yaml` and approve exactly one narrow lane-specific
staging folder. Request-level `--root` may narrow that manifest root but cannot
broaden it.
Synthetic document/image smoke, CPU-only/no NPU claim:
```bash
python scripts/npu-batch-triage-dry-run.py \
--manifest config/triage-roots.test.yaml \
--lane screenshots \
--root openvino-doc-image-triage-npu/samples \
--limit 5 \
--dry-run \
--no-npu \
--json
```
Synthetic document/image smoke with the existing local embeddings NPU service,
if `127.0.0.1:18817` is healthy. Treat NPU as proven only when `npu.proof_ok` is
true and `npu.busy_delta_us` (or item-level delta) is positive:
```bash
python scripts/npu-batch-triage-dry-run.py \
--manifest config/triage-roots.test.yaml \
--lane receipts \
--root openvino-doc-image-triage-npu/samples \
--limit 5 \
--dry-run \
--json
```
Audio smoke should use generated/public synthetic audio only until a private
audio staging root is approved:
```bash
python scripts/npu-batch-triage-dry-run.py \
--manifest config/triage-roots.test.yaml \
--lane voice_memos \
--root tmp/synthetic-voice-memos \
--limit 3 \
--dry-run \
--no-npu \
--json
```
Cron/n8n shape (disabled example only):
```text
Manual Trigger / disabled cron
-> Execute Command: python /home/will/lab/swarm/scripts/npu-batch-triage-dry-run.py --manifest /home/will/lab/swarm/config/triage-roots.local.yaml --lane receipts --limit 25 --dry-run --json
-> IF ok && npu.proof_ok && files_processed > 0
-> local dashboard/report only
```
Do not connect this output to Telegram/Discord/email sends, Obsidian writes,
RAG/vector reindex, file moves/deletes, Kanban mutation, service restarts, or
Atlas/Hermes routing without a separate reviewed approval gate.
+201
View File
@@ -0,0 +1,201 @@
# NPU integrated health checks — operator runbook notes
Compact, read-only operator workflow that combines the existing
`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the
reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization
and fallback report. Together they form a single safe daily / on-demand NPU
health pass.
Scope:
- Read-only against live services. No restarts, route changes, vector mutation,
advisory POSTs, outbound sends, or memory writes.
- No new persistent services, timers, sockets, compose services, or Dockerfiles
are introduced by this integration. Both scripts are foreground / on-demand.
- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`).
Pre-existing broader binds on the live baseline ports (`18810`, `18814`,
`18816`, `18817`) are noted in the runbook and unchanged here.
- NPU proof requires real inference plus a positive
`/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is
not sufficient.
## When to run
- Daily / on-demand ops check.
- After upgrades that touch the NPU stack, OpenVINO, or any of the live
specialists.
- Before any approval-gated change that depends on the NPU reflex layer.
- As the read-only verification step of a deploy or recovery runbook.
## Required artifacts on the branch
| Path | Role |
| --- | --- |
| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
| `docs/npu-utilization-digest.md` | Per-service digest reference. |
| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). |
## Integrated workflow
### Step 1 — Listener and service-state snapshot
```bash
cd ~/lab/swarm
./scripts/npu-service-health.sh
```
What it verifies, in order:
1. `npu_busy_time_us` counter is readable.
2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 /
18819 / 18820 / 18829 / 18830`.
3. User systemd services are active/enabled for embeddings, RAG health,
reranker, router/classifier, and the small GenAI worker.
4. Docker Compose `whisper-server-npu` is up.
5. Health endpoints return JSON for the live baseline and local specialists.
6. A single non-private embeddings request to `:18817` produces a positive
sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no
positive delta.
Read the last block (`== Embeddings NPU busy-time proof ==`) first. If
`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not,
do not run the digest; triage the embeddings service first.
### Step 2 — Per-service utilization digest
```bash
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
```
Compact output shape:
```text
NPU utilization digest <timestamp>
counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
- rerank: ok=true calls=1 docs=2 avg_ms=... npu_delta_us=... proof=true mode=NPU
- whisper: ok=true calls=1 jobs=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
- genai: ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
- doc_triage: ok=true calls=1 files=1 avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
- rag_endpoint: ok=true mode=health_only gate=closed:vector-mutation
- rag_health: ok=true mode=health_only
- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
fallbacks: skipped_cold_load=1
```
Read order for ops:
1. `services_ok` row — anything below `9/9` means a service is down or unhealthy.
2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real
inference request produced a positive sysfs NPU delta.
3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is
intentionally not cold-loaded). Any other fallback label is a triage signal.
4. `gate=` labels — closed gates that remain closed by design.
### Step 3 — Optional artifact for trend tracking
```bash
scripts/npu-utilization-digest.py --format jsonl
```
Writes a single JSONL line per digest under
`/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl`. The first
line is the summary; subsequent lines are per-service rows. No JSONL write
happens with `--no-write`.
### Step 4 — Offline unit tests
```bash
python -m pytest tests/test_npu_utilization_digest.py -q
```
Does not require live services. Use to validate digest logic after edits or
before merging.
## Compact proof interpretation
For each proof-capable service, both the response-level `npu_busy_delta_us`
(when the service reports it) and the script's own sysfs before/after delta
must agree and be `> 0`. The proof is only valid when an actual inference
request ran. If a probe was skipped (`reason=skipped_cold_load` or
`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row
contributes a labeled fallback instead of a proof failure.
Proof currently runs on:
- `embeddings` (`:18817`)
- `rerank` (`:18818`)
- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default)
- `classifier` (`:18819`)
- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default);
proof is via the embeddings service, not directly on the NPU device, so the
row reports `mode=NPU-via-embedding-service`.
Intentionally health-only (no proof row):
- `rag_endpoint` (`:18810`) — closed:vector-mutation
- `rag_health` (`:18814`)
- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post
Intentionally skipped by default:
- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to
prove the NPU is not free, so it is treated as a labeled fallback rather
than a proof failure. Opt in with `--include-genai-smoke=true` only when the
task actually needs a generation smoke.
## Exit codes and triage gates
`scripts/npu-service-health.sh`:
| Exit | Meaning | Next |
| ---: | --- | --- |
| 0 | All checks passed including embeddings proof. | Continue to digest. |
| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. |
| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. |
| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. |
`scripts/npu-utilization-digest.py`:
| Exit | Meaning | Next |
| ---: | --- | --- |
| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. |
| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. |
## Approval gates left closed
The integrated workflow intentionally does not:
- start, stop, restart, enable, or disable any user systemd unit or Docker
Compose service;
- write to or mutate the Chroma collection `obsidian_bge_npu` or any other
vector store;
- change Atlas/Hermes routing or model defaults;
- post classification/generation/triage events to the advisory gateway;
- broaden private document, image, or audio roots;
- bind any new listener, including on `0.0.0.0`;
- write memory, send messages, execute tools, or mutate Kanban state.
These remain approval-gated and are tracked on the `npu-maximization` board.
## Quick reference
```bash
# Single-pass NPU health check (listener + systemd + embeddings proof).
cd ~/lab/swarm && ./scripts/npu-service-health.sh
# Compact digest with per-service proof and fallback accounting.
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
# Same, with a JSONL artifact for trend tracking.
scripts/npu-utilization-digest.py --format jsonl
# Strict mode for CI / pre-merge.
scripts/npu-utilization-digest.py --no-write --strict-proof
# Offline digest logic tests.
python -m pytest tests/test_npu_utilization_digest.py -q
```
+49
View File
@@ -0,0 +1,49 @@
# NPU utilization digest
Compact on-demand observability for Will's local OpenVINO/NPU specialists.
Script:
```bash
/home/will/lab/swarm/scripts/npu-utilization-digest.py --format text
```
Safe defaults:
- read-only for services; no service starts/stops/restarts, routing changes, vector DB mutation, advisory POSTs, outbound sends, or memory writes;
- writes only a compact JSONL artifact under `/home/will/.local/state/npu-utilization/digests` unless `--no-write` is passed;
- uses synthetic/non-private requests for embeddings, rerank, classifier dry-run, and doc triage;
- keeps GenAI generation disabled by default when the worker is not loaded, to avoid cold-load side effects;
- advisory gateway remains health-only because POSTs write metadata/events;
- NPU proof is only true when an inference probe ran and `/sys/class/accel/accel0/device/npu_busy_time_us` increased around that probe.
Common commands:
```bash
# Compact CLI digest, plus JSONL artifact.
scripts/npu-utilization-digest.py --format text
# No artifact write; useful during reviews.
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false
# Machine-readable stdout.
scripts/npu-utilization-digest.py --format jsonl --no-write
# CI/unit tests; live services not required.
python -m pytest tests/test_npu_utilization_digest.py -q
```
Output shape is intentionally small: service booleans, counts, average probe ms, sysfs deltas, proof flags, fallback warning counts, artifact path, and closed gates. `fallbacks` includes unavailable services, failed/missing proof, and skipped proof-capable smokes such as disabled Whisper/doc-triage probes or GenAI cold-load skips; intentionally health-only RAG/advisory rows are not fallbacks unless unavailable. It does not print raw embeddings, transcripts, OCR text, model completions, request headers, or full upstream JSON.
Covered rows:
- `embeddings`: `/v1/embeddings` synthetic string, positive sysfs delta required.
- `rerank`: `/rerank` with two synthetic docs, positive sysfs delta required.
- `whisper`: health-only unless the bounded generated-WAV smoke is enabled.
- `classifier`: `/v1/classify` with `dry_run=true` and `include_evidence=false`, positive sysfs delta required.
- `genai`: health-only by default; skips when `loaded=false` unless explicitly opted in.
- `doc_triage`: one approved synthetic sample under the service sample root, with `allowed_roots` narrowed to that sample directory; NPU proof is via embeddings.
- `rag_endpoint` and `rag_health`: health-only; no vector mutation.
- `advisory_gateway`: health-only; `closed:advisory-post` gate remains closed.
Closed gates left for later approval: sending/delivery, recurring timer, GenAI cold-load smoke, advisory POSTs, Atlas/Hermes routing changes, vector mutation/reindex, and broad private document/audio/image roots.
+135
View File
@@ -0,0 +1,135 @@
# NPU voice/audio local-file pipeline
This is the first-slice local-file voice/audio path for the NPU maximization program:
```text
local audio file or already-staged attachment
-> OpenVINO NPU Whisper (:18816)
-> OpenVINO NPU classifier (:18819)
-> explicit advisory gate
-> Atlas/Hermes only after separate approval
```
The implementation is `scripts/npu_voice_audio_pipeline.py`. It is a CLI wrapper only; it starts no listener and performs no outbound sends, Obsidian writes, memory writes, vector DB mutations, Kanban mutations, service restarts, platform API calls, or live Atlas/Hermes routing changes.
## Safety gates
Closed unless explicitly approved later:
- Telegram/Discord fetching by bot token or attachment URL.
- Outbound messages or auto-sends.
- Obsidian/vault writes.
- Memory writes.
- Vector DB mutation or reindex.
- Automatic Kanban mutation.
- Service restarts or new persistent listeners.
- Private-directory root broadening.
- Live Atlas/Hermes routing authority changes.
HTTP success is not NPU proof. For NPU claims, require real inference plus positive `/sys/class/accel/accel0/device/npu_busy_time_us` deltas. The CLI reports response deltas and observed sysfs deltas for Whisper and classifier calls.
## Example: synthetic local WAV smoke
```bash
cd /home/will/lab/swarm
python - <<'PY'
import math, struct, wave
path = '/tmp/npu-voice-smoke.wav'
sr = 16000
with wave.open(path, 'wb') as w:
w.setnchannels(1)
w.setsampwidth(2)
w.setframerate(sr)
frames = bytearray()
for i in range(int(sr * 0.6)):
frames.extend(struct.pack('<h', int(12000 * math.sin(2 * math.pi * 440 * i / sr))))
w.writeframes(frames)
print(path)
PY
```
Run the local-file wrapper:
```bash
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/npu-voice-smoke.wav \
--title "synthetic smoke" \
--source manual_smoke \
--json
```
Compact output shape:
```json
{
"ok": true,
"source": "manual_smoke",
"transcript_chars": 3,
"action_worthy": false,
"atlas_gate": "suppressed_not_action_worthy",
"whisper_npu_delta_us": 85441,
"whisper_sysfs_delta_us": 85441,
"classifier_npu_delta_us": 85908,
"classifier_sysfs_delta_us": 85908,
"classifier_observed_sysfs_delta_us": 85908,
"external_sends": 0,
"writes": 0
}
```
A non-actionable smoke should stay `suppressed_not_action_worthy`. A transcript with a reminder, task, follow-up, explicit question, or classifier `tool_needed=true` should become `advisory_only_not_sent`, not sent.
## Example: already-staged platform voice file
This example assumes another approved process has already placed the audio file locally. The wrapper does not fetch from Telegram/Discord and does not read bot tokens.
```bash
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py \
--audio /tmp/staged-voice-message.ogg \
--source staged_telegram \
--title "staged local Telegram voice memo" \
--json
```
## Compact fields
The CLI always reports:
- `ok`
- `id`
- `source`
- `transcript_chars`
- `action_worthy`
- `atlas_gate`
- `next_gate`
- `whisper_npu_delta_us`
- `whisper_sysfs_delta_us`
- `classifier_npu_delta_us`
- `classifier_sysfs_delta_us`
- `classifier_observed_sysfs_delta_us`
- `labels.workflow_category`
- `labels.tool_needed`
- `labels.urgency`
- `labels.safety_confirmation_required`
- `external_sends`
- `writes`
Transcript text is omitted by default. Use `--include-transcript` or `--include-transcript-preview-chars N` only for explicit local debugging.
## Input limits
- `--audio` must be an absolute local path.
- Symlinks, directories, missing files, empty files, unsupported extensions, and files over `--max-bytes` are refused.
- WAV duration is capped by `--max-audio-seconds`; other codecs remain size-capped in this first slice.
- Classifier transcript payload is bounded by `--max-transcript-chars`.
## Health prerequisites
Read-only checks:
```bash
curl -fsS http://127.0.0.1:18816/health
curl -fsS http://127.0.0.1:18819/healthz
```
Do not restart services from this runbook. If either endpoint is unhealthy, stop and request an ops/remediation task.
+12
View File
@@ -33,6 +33,18 @@ POST /v1/advisory/generate
POST /v1/advisory/triage
```
## Cron and n8n advisory dry-run contract
For cron/n8n event classification, use the dry-run contract in `docs/cron-n8n-advisory-classifier.md`.
It defines the normalized event envelope, decision envelope, `suppress|log|summarize|escalate` recommendation mapping, and duplicate/stale/no-op/action-required examples.
Example artifacts:
- `examples/cron-advisory-dry-run.sh` — host-local cron wrapper that prints one compact decision line and performs no side effects.
- `examples/n8n-advisory-dry-run-fragment.json` — sanitized inactive n8n node fragment for Set -> HTTP Request -> Code decision mapping.
Both examples preserve the gateway authority boundary: advisory only, no send/restart/memory/tool/routing authority.
### Classifier shadow call
```bash
@@ -0,0 +1,256 @@
# Cron and n8n advisory classifier contract
Status: dry-run specification and integration examples
Scope: cron and n8n alert/event classification through the OpenVINO advisory gateway
Gateway: `http://172.19.0.1:18830` from `n8n-agent` and host-local cron on the current bridge-bound service. Override `NPU_ADVISORY_GATEWAY_URL=http://127.0.0.1:18830` only if a localhost-bound instance is explicitly running.
## Authority boundary
This contract is advisory only. It may recommend one of `suppress`, `log`, `summarize`, or `escalate`, but it must not perform the action itself.
Every integration must preserve these authority flags:
```json
{
"may_route": false,
"may_write_memory": false,
"may_send_external": false,
"may_process_private_dirs": false,
"may_execute_tools": false,
"may_restart_services": false
}
```
Allowed side effects in dry-run mode:
- read an explicit cron/n8n event payload;
- call the advisory gateway classifier/generator;
- write compact local stdout or n8n execution logs;
- store metadata-only advisory counters if an existing log sink already does so.
Forbidden without separate explicit approval:
- outbound sends/pages/Discord/Telegram/email;
- service restarts, command execution, or tool calls;
- Hermes/Atlas routing changes;
- memory writes;
- broad private-directory processing;
- vector database mutation or reindexing.
## Input event envelope
Cron and n8n producers should normalize events before classification. Keep this input small and avoid raw private payloads.
```json
{
"schema": "cron_n8n_event_v1",
"trace_id": "cron:service-health:2026-06-05T14:30:00Z",
"source": "cron",
"workflow": "npu-service-health",
"event_kind": "health_check",
"severity": "warning",
"subject": "openvino-reranker health check repeated warning",
"summary": "Two consecutive health probes reported timeout, no restart attempted.",
"dedupe_key": "service:openvino-reranker:timeout",
"observed_at": "2026-06-05T14:30:00Z",
"stale_after_s": 900,
"action_requested": false,
"dry_run": true
}
```
Field rules:
- `source`: `cron` or `n8n`.
- `workflow`: compact job/workflow name, not a private URL.
- `subject` + `summary`: the only text sent to the classifier.
- `dedupe_key`: stable non-secret key for duplicate detection by the caller.
- `stale_after_s`: caller-side freshness gate; stale events should not page.
- `action_requested`: true only when an upstream job is asking a human/Atlas to consider action.
- `dry_run`: must remain true for this phase.
## Gateway classifier call
The current gateway `/v1/advisory/classify` accepts explicit text and wraps the classifier response in `openvino_advisory_v1` with NPU proof and authority fields.
Host cron example for the current bridge-bound service:
```bash
curl -fsS http://172.19.0.1:18830/v1/advisory/classify \
-H 'Content-Type: application/json' \
-d '{
"trace_id":"cron:service-health:sample",
"text":"source=cron workflow=npu-service-health severity=warning kind=health_check subject=openvino-reranker repeated timeout summary=Two consecutive health probes reported timeout; no restart attempted; dry_run=true"
}' | jq '{schema, mode, trace_id, npu_ok: .npu_proof.ok, npu_delta: .npu_proof.npu_busy_delta_us, authority, labels: .result.labels}'
```
n8n Docker-bridge example:
```bash
curl -fsS http://172.19.0.1:18830/v1/advisory/classify \
-H 'Content-Type: application/json' \
-d '{"trace_id":"n8n:swarm-health:sample","text":"source=n8n workflow=swarm-health-watchdog severity=critical kind=health_check subject=multiple services unhealthy summary=Health probe failed for three services; dry_run=true"}' \
| jq '{mode, npu_ok: .npu_proof.ok, npu_delta: .npu_proof.npu_busy_delta_us, may_send_external: .authority.may_send_external}'
```
NPU proof gate: an HTTP 200 is not enough. Treat the classifier as NPU-backed only when `.npu_proof.ok == true` and `.npu_proof.npu_busy_delta_us > 0` for real inference.
## Advisory decision envelope
Cron/n8n wrappers should map the gateway response plus caller-side freshness/deduplication state into this compact decision envelope:
```json
{
"schema": "cron_n8n_advisory_decision_v1",
"trace_id": "cron:service-health:2026-06-05T14:30:00Z",
"source": "cron",
"workflow": "npu-service-health",
"dry_run": true,
"recommendation": "summarize",
"classification": "action_required",
"confidence": 0.84,
"reason_codes": ["warning_or_high_urgency", "fresh_event", "not_duplicate"],
"npu_proof": {"required": true, "ok": true, "npu_busy_delta_us": 1234},
"authority": {
"may_route": false,
"may_write_memory": false,
"may_send_external": false,
"may_process_private_dirs": false,
"may_execute_tools": false,
"may_restart_services": false
},
"next_gate": "human_or_atlas_review_required_before_any_side_effect"
}
```
Decision fields:
- `recommendation`: `suppress`, `log`, `summarize`, or `escalate`.
- `classification`: `duplicate`, `stale`, `no_op`, or `action_required` for v1 examples.
- `confidence`: use classifier urgency/category confidence when available; otherwise use a conservative wrapper score.
- `reason_codes`: compact machine-readable rationale, not raw payload text.
- `next_gate`: always a review/approval gate before side effects.
## Recommendation mapping
This is the v1 dry-run mapping. It is intentionally conservative and caller-side; the NPU classifier advises, the wrapper chooses a recommendation, and humans/Atlas retain authority.
| Caller/classifier signal | Classification | Recommendation | Dry-run behavior |
|---|---|---|---|
| Same `dedupe_key` observed inside caller cooldown | `duplicate` | `suppress` | Log compact duplicate count only. Do not send. |
| `observed_at + stale_after_s` is older than now | `stale` | `log` | Log stale event and age. Do not summarize/page. |
| Severity low/normal, no action requested, classifier urgency low/normal | `no_op` | `log` | Keep normal execution log only. |
| Warning/high urgency or action requested, NPU proof ok | `action_required` | `summarize` | Draft a local summary for review; no send/restart. |
| Critical severity or repeated failures and NPU proof ok | `action_required` | `escalate` | Recommend escalation to Atlas/human; wrapper still must not send/restart. |
| NPU proof missing or false | `action_required` or caller-specific | `log` | Log `npu_proof_failed`; do not claim NPU-backed advice. |
## Required examples
### Duplicate -> suppress
Input summary:
```json
{"source":"cron","workflow":"npu-service-health","severity":"warning","dedupe_key":"service:reranker:timeout","summary":"Same timeout as prior run inside cooldown.","dry_run":true}
```
Decision:
```json
{"classification":"duplicate","recommendation":"suppress","reason_codes":["dedupe_key_in_cooldown"],"next_gate":"none_in_dry_run"}
```
### Stale -> log
Input summary:
```json
{"source":"n8n","workflow":"swarm-health-watchdog","severity":"warning","observed_at":"older_than_stale_after","stale_after_s":900,"summary":"Delayed webhook replay for an old probe.","dry_run":true}
```
Decision:
```json
{"classification":"stale","recommendation":"log","reason_codes":["event_stale"],"next_gate":"none_in_dry_run"}
```
### No-op -> log
Input summary:
```json
{"source":"cron","workflow":"backup-check","severity":"normal","action_requested":false,"summary":"Backup completed and all expected files are present.","dry_run":true}
```
Decision:
```json
{"classification":"no_op","recommendation":"log","reason_codes":["normal_severity","no_action_requested"],"next_gate":"none_in_dry_run"}
```
### Action required -> summarize/escalate
Input summary:
```json
{"source":"n8n","workflow":"swarm-health-watchdog","severity":"critical","action_requested":true,"summary":"RAG and embeddings health failed repeatedly; no restart attempted.","dry_run":true}
```
Decision:
```json
{"classification":"action_required","recommendation":"escalate","reason_codes":["critical_severity","action_requested","fresh_event"],"next_gate":"human_or_atlas_review_required_before_any_side_effect"}
```
## Optional local summary draft
If the decision is `summarize` or `escalate`, a wrapper may request a bounded draft from `/v1/advisory/generate`:
```bash
curl -fsS http://172.19.0.1:18830/v1/advisory/generate \
-H 'Content-Type: application/json' \
-d '{"trace_id":"cron:service-health:sample","job":"summary","input":"Health check warning: openvino-reranker timed out twice; no restart attempted.","max_new_tokens":48}' \
| jq '{mode, trace_id, npu_ok: .npu_proof.ok, authority, draft: .result.draft_text, final_authority: .result.final_authority}'
```
The draft remains non-authoritative. It must not be automatically sent externally or written to memory.
## n8n integration pattern
Recommended node chain for dry-run workflows:
```text
Schedule/Webhook/Failure Trigger
-> Set normalized event envelope
-> HTTP Request POST /v1/advisory/classify
-> Code node maps decision envelope
-> IF node on recommendation
suppress/log: execution log only
summarize/escalate: optional local summary draft, then execution log only
```
The IF node must not connect to outbound messaging, service restart, memory write, or Hermes routing nodes until a separate approval changes the authority boundary.
See `../examples/n8n-advisory-dry-run-fragment.json` for a sanitized node fragment.
## Cron integration pattern
Cron jobs should call a wrapper script that prints one compact line and exits successfully unless the wrapper itself fails. The wrapper should not page or restart.
Example crontab shape:
```text
*/15 * * * * /home/will/lab/swarm/openvino-advisory-gateway/examples/cron-advisory-dry-run.sh npu-service-health warning health_check "openvino-reranker timeout twice" "service:openvino-reranker:timeout" >> /home/will/.local/state/npu-advisory/cron.log 2>&1
```
See `../examples/cron-advisory-dry-run.sh`.
## Verification checklist
- Gateway health is reachable on the intended interface.
- Classifier response includes `schema=openvino_advisory_v1`.
- `.authority.*` flags are all false for side-effect authority.
- `.npu_proof.ok` is true and `npu_busy_delta_us > 0` before claiming NPU-backed advice.
- Decision envelope is compact and contains only booleans/counts/paths/deltas/gates.
- Duplicate/stale/no-op/action-required examples remain dry-run only.
- No n8n workflow activation, outbound send, service restart, memory write, routing change, private-dir broadening, or vector DB mutation occurred.
@@ -0,0 +1,48 @@
#!/usr/bin/env bash
set -euo pipefail
# Dry-run cron/n8n advisory wrapper.
# It calls the advisory classifier and prints one compact decision line.
# It does not send, restart, route, execute tools, or write memory.
GATEWAY_URL="${NPU_ADVISORY_GATEWAY_URL:-http://172.19.0.1:18830}"
WORKFLOW="${1:-cron-advisory-sample}"
SEVERITY="${2:-normal}"
EVENT_KIND="${3:-health_check}"
SUBJECT="${4:-sample advisory event}"
DEDUPE_KEY="${5:-sample}"
TRACE_ID="${NPU_ADVISORY_TRACE_ID:-cron:${WORKFLOW}:$(date -u +%Y%m%dT%H%M%SZ)}"
TEXT="source=cron workflow=${WORKFLOW} severity=${SEVERITY} kind=${EVENT_KIND} subject=${SUBJECT} dedupe_key=${DEDUPE_KEY} dry_run=true authority=no-send,no-restart,no-memory"
payload=$(jq -nc --arg trace_id "$TRACE_ID" --arg text "$TEXT" '{trace_id:$trace_id,text:$text}')
response=$(curl -fsS "${GATEWAY_URL%/}/v1/advisory/classify" -H 'Content-Type: application/json' -d "$payload")
printf '%s\n' "$response" | jq -c --arg source cron --arg workflow "$WORKFLOW" --arg severity "$SEVERITY" --arg dedupe_key "$DEDUPE_KEY" '
. as $env
| ($env.result.labels.urgency.value // "normal") as $urgency
| ($env.result.labels.urgency.confidence // 0) as $confidence
| ($env.npu_proof.ok == true and (($env.npu_proof.npu_busy_delta_us // 0) > 0)) as $npu_ok
| (if ($npu_ok | not) then "log"
elif ($severity == "critical") then "escalate"
elif ($severity == "warning" or $urgency == "high" or $urgency == "critical") then "summarize"
else "log" end) as $recommendation
| (if ($recommendation == "log" and $severity == "normal") then "no_op" else "action_required" end) as $classification
| {
schema: "cron_n8n_advisory_decision_v1",
trace_id: $env.trace_id,
source: $source,
workflow: $workflow,
dry_run: true,
recommendation: $recommendation,
classification: $classification,
confidence: $confidence,
reason_codes: ([
(if $npu_ok then "npu_proof_ok" else "npu_proof_failed" end),
("severity_" + $severity),
("urgency_" + $urgency)
]),
npu_proof: $env.npu_proof,
authority: $env.authority,
next_gate: (if $recommendation == "escalate" or $recommendation == "summarize" then "human_or_atlas_review_required_before_any_side_effect" else "none_in_dry_run" end)
}'
@@ -0,0 +1,70 @@
{
"name": "OpenVINO Advisory Dry-Run Fragment",
"active": false,
"nodes": [
{
"parameters": {
"values": {
"string": [
{"name": "schema", "value": "cron_n8n_event_v1"},
{"name": "source", "value": "n8n"},
{"name": "workflow", "value": "swarm-health-watchdog"},
{"name": "event_kind", "value": "health_check"},
{"name": "severity", "value": "warning"},
{"name": "subject", "value": "OpenVINO service health warning"},
{"name": "summary", "value": "Health probe reported a warning; no restart or send is authorized."},
{"name": "dedupe_key", "value": "service:openvino:warning"},
{"name": "dry_run", "value": "true"}
]
},
"options": {}
},
"id": "set-normalized-event",
"name": "Set normalized advisory event",
"type": "n8n-nodes-base.set",
"typeVersion": 2,
"position": [260, 300]
},
{
"parameters": {
"method": "POST",
"url": "http://172.19.0.1:18830/v1/advisory/classify",
"sendBody": true,
"contentType": "json",
"jsonBody": "={{ JSON.stringify({ trace_id: 'n8n:' + $json.workflow + ':' + $now.toISO(), text: 'source=n8n workflow=' + $json.workflow + ' severity=' + $json.severity + ' kind=' + $json.event_kind + ' subject=' + $json.subject + ' summary=' + $json.summary + ' dedupe_key=' + $json.dedupe_key + ' dry_run=true authority=no-send,no-restart,no-memory' }) }}",
"options": {
"timeout": 20000
}
},
"id": "http-advisory-classify",
"name": "HTTP advisory classify dry-run",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4,
"position": [520, 300]
},
{
"parameters": {
"jsCode": "const env = $json;\nconst labels = env.result?.labels || {};\nconst urgency = labels.urgency?.value || 'normal';\nconst severity = $('Set normalized advisory event').first().json.severity || 'normal';\nconst npuOk = env.npu_proof?.ok === true && (env.npu_proof?.npu_busy_delta_us || 0) > 0;\nlet recommendation = 'log';\nlet classification = 'no_op';\nconst reason_codes = [npuOk ? 'npu_proof_ok' : 'npu_proof_failed', `severity_${severity}`, `urgency_${urgency}`];\nif (npuOk && severity === 'critical') { recommendation = 'escalate'; classification = 'action_required'; }\nelse if (npuOk && (severity === 'warning' || urgency === 'high' || urgency === 'critical')) { recommendation = 'summarize'; classification = 'action_required'; }\nif (!npuOk) reason_codes.push('log_only_no_npu_claim');\nreturn [{ json: { schema: 'cron_n8n_advisory_decision_v1', trace_id: env.trace_id, source: 'n8n', workflow: $('Set normalized advisory event').first().json.workflow, dry_run: true, recommendation, classification, confidence: labels.urgency?.confidence || 0, reason_codes, npu_proof: env.npu_proof, authority: env.authority, next_gate: (recommendation === 'summarize' || recommendation === 'escalate') ? 'human_or_atlas_review_required_before_any_side_effect' : 'none_in_dry_run' } } }];"
},
"id": "map-dry-run-decision",
"name": "Map dry-run decision (no side effects)",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [780, 300]
}
],
"connections": {
"Set normalized advisory event": {
"main": [[{"node": "HTTP advisory classify dry-run", "type": "main", "index": 0}]]
},
"HTTP advisory classify dry-run": {
"main": [[{"node": "Map dry-run decision (no side effects)", "type": "main", "index": 0}]]
}
},
"settings": {
"executionOrder": "v1"
},
"pinData": {},
"staticData": null,
"tags": ["dry-run", "openvino", "advisory"]
}
+89
View File
@@ -0,0 +1,89 @@
# OpenVINO Context Gate
Local-only Atlas/Hermes context-gate advisory prototype.
This first slice is CLI-only and dry-run by design. It takes a non-private query,
optionally asks the localhost classifier on `127.0.0.1:18819` for advisory labels,
and emits a compact typed context bundle plan. It does not retrieve private
content or change live Atlas/Hermes behavior.
## Safety invariants
Closed in v1:
- live Atlas/Hermes routing changes
- memory writes
- outbound sends
- tool execution by the sidecar
- service restarts
- vector DB mutation or reindexing
- private root broadening
- live config changes
The CLI only plans which source classes an authoritative Atlas/Hermes agent might
use later: `durable_memory`, `session_search`, `rag_search`, `repo_files`,
`live_system`, `web`, or `no_retrieval`.
NPU proof is strict: `npu_verified=true` is only emitted when a live classifier
request reports a positive endpoint NPU delta and a positive sysfs/endpoint sysfs
busy delta. HTTP 200 alone is never treated as proof. Offline and fallback modes
set `npu_verified=false` and include a warning.
## Usage
Live classifier path, with compact terminal output:
```bash
python scripts/context-gate-advisory.py \
--query "How do I check whether the RAG reranker is using the NPU?" \
--format compact
```
Deterministic offline smoke, safe for unit-test hosts without NPU services:
```bash
python scripts/context-gate-advisory.py \
--offline \
--query "Write a haiku about Seattle rain." \
--format compact-json
```
Fallback plan if the classifier is down:
```bash
python scripts/context-gate-advisory.py \
--allow-offline-fallback \
--query "Where did we leave the NPU context gate implementation plan?" \
--context platform=kanban \
--context repo_path=/home/will/lab/swarm \
--format compact-json
```
## Output shape
Full JSON includes:
- `schema=atlas_context_gate_plan_v1`
- `dry_run=true`
- `query_class`
- `source_plan`
- `bundle_plan`
- `npu_proof`
- closed `authority`
- closed approval `gates`
- compact `warnings`
Compact output intentionally avoids raw private snippets and raw JSON dumps:
```text
ok=true schema=atlas_context_gate_plan_v1 bundle=OpsDebugBundle sources=live_system,repo_files,rag_search source_count=3 npu_verified=false classifier_delta_us=None outer_sysfs_delta_us=None gates=closed:route,memory,send,tools,restart,vector,private_roots,config warnings=offline_heuristic_classifier_no_npu_claim,npu_proof_inconclusive
```
## Notes for reviewers
- No HTTP service or systemd unit is added in this slice.
- The prototype does not call RAG, memory, session search, web, filesystem tools,
or the advisory gateway. It only emits a plan.
- Unit tests use fake/offline classifier results and do not require live NPU.
- Optional live smoke may call only the local classifier endpoint and read
`/sys/class/accel/accel0/device/npu_busy_time_us` for positive delta proof.
+5
View File
@@ -0,0 +1,5 @@
"""Atlas/Hermes local advisory context-gate prototype."""
from .context_gate import SCHEMA, ContextGateError, build_plan, compact_json, compact_line, validate_plan
__all__ = ["SCHEMA", "ContextGateError", "build_plan", "compact_json", "compact_line", "validate_plan"]
+90
View File
@@ -0,0 +1,90 @@
from __future__ import annotations
import argparse
import json
import sys
from typing import Any
from .context_gate import (
DEFAULT_CLASSIFIER_URL,
ContextGateError,
build_plan,
classify_live,
classify_offline,
compact_json,
compact_line,
)
def _parse_context(raw_items: list[str]) -> dict[str, Any]:
context: dict[str, Any] = {}
for item in raw_items:
if "=" not in item:
raise ContextGateError(f"invalid_context_item:{item}")
key, value = item.split("=", 1)
if not key:
raise ContextGateError("invalid_context_key")
if value.lower() == "true":
parsed: Any = True
elif value.lower() == "false":
parsed = False
else:
parsed = value
context[key] = parsed
return context
def build_arg_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Emit a local-only Atlas/Hermes advisory context bundle plan. No routing, retrieval, memory writes, sends, restarts, or vector mutations are performed.",
)
parser.add_argument("--query", required=True, help="Non-private query to plan for")
parser.add_argument("--format", choices=["compact", "compact-json", "json"], default="compact")
parser.add_argument("--context", action="append", default=[], metavar="KEY=VALUE", help="Optional compact request context, e.g. platform=kanban repo_path=/path")
parser.add_argument("--max-sources", type=int, default=4)
parser.add_argument("--trace-id")
parser.add_argument("--classifier-url", default=DEFAULT_CLASSIFIER_URL)
parser.add_argument("--classifier-timeout", type=float, default=8.0)
parser.add_argument("--offline", action="store_true", help="Use deterministic heuristic labels; makes no NPU claim")
parser.add_argument("--allow-offline-fallback", action="store_true", help="If live classifier is unavailable, emit an advisory fallback plan with npu_verified=false")
parser.add_argument("--no-require-npu-proof", action="store_true", help="Do not add npu_proof_inconclusive warning when running offline/fallback")
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_arg_parser()
args = parser.parse_args(argv)
try:
context = _parse_context(args.context)
options = {
"dry_run": True,
"max_sources": args.max_sources,
"include_private_text": False,
"require_npu_proof": not args.no_require_npu_proof,
"trace_id": args.trace_id,
}
if args.offline:
classifier = classify_offline(args.query, context)
else:
try:
classifier = classify_live(args.query, context, classifier_url=args.classifier_url, timeout=args.classifier_timeout)
except ContextGateError as exc:
if not args.allow_offline_fallback:
raise
classifier = classify_offline(args.query, context, warning=str(exc))
plan = build_plan(args.query, context=context, options=options, classifier=classifier)
except ContextGateError as exc:
print(f"error={exc}", file=sys.stderr)
return 2
if args.format == "json":
print(json.dumps(plan, indent=2, sort_keys=True))
elif args.format == "compact-json":
print(compact_json(plan))
else:
print(compact_line(plan))
return 0
if __name__ == "__main__": # pragma: no cover
raise SystemExit(main())
+482
View File
@@ -0,0 +1,482 @@
"""Local-only advisory context bundle planner for Atlas/Hermes.
This module intentionally emits a retrieval/authority plan only. It does not call
Hermes memory/session/RAG/web tools, mutate vector stores, broaden private roots,
or change live routing.
"""
from __future__ import annotations
import json
import ipaddress
import re
import time
import urllib.error
import urllib.parse
import urllib.request
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Mapping, Sequence
SCHEMA = "atlas_context_gate_plan_v1"
NPU_BUSY_PATH = Path("/sys/class/accel/accel0/device/npu_busy_time_us")
DEFAULT_CLASSIFIER_URL = "http://127.0.0.1:18819/v1/classify"
AUTHORITY = {
"may_route": False,
"may_write_memory": False,
"may_send_external": False,
"may_process_private_dirs": False,
"may_execute_tools": False,
"may_restart_services": False,
"may_mutate_vector_db": False,
"may_change_live_config": False,
}
GATES = {
"live_routing_change": "closed_requires_explicit_approval",
"memory_write": "closed_requires_explicit_approval",
"outbound_send": "closed_requires_explicit_approval",
"tool_execution": "closed_requires_explicit_approval",
"service_restart": "closed_requires_explicit_approval",
"vector_mutation": "closed_requires_explicit_approval",
"private_root_broadening": "closed_requires_explicit_approval",
}
_ALLOWED_SOURCES = {
"durable_memory",
"session_search",
"rag_search",
"repo_files",
"live_system",
"web",
"no_retrieval",
}
class ContextGateError(ValueError):
"""Raised for invalid requests or unavailable required local stages."""
@dataclass(frozen=True)
class ClassifierResult:
labels: Mapping[str, Any]
npu_busy_delta_us: int | None
sysfs_npu_busy_delta_us: int | None
outer_sysfs_delta_us: int | None
live: bool
warning: str | None = None
def read_npu_busy_time_us(path: Path = NPU_BUSY_PATH) -> int | None:
try:
return int(path.read_text(encoding="utf-8").strip())
except (FileNotFoundError, PermissionError, ValueError, OSError):
return None
def _label_value(labels: Mapping[str, Any], name: str, default: Any) -> Any:
value = labels.get(name, default)
if isinstance(value, Mapping) and "value" in value:
return value.get("value", default)
return value
def _label_confidence(labels: Mapping[str, Any], name: str, default: float = 0.5) -> float:
value = labels.get(name)
if isinstance(value, Mapping):
try:
return float(value.get("confidence", default))
except (TypeError, ValueError):
return default
return default
def heuristic_labels(query: str, context: Mapping[str, Any] | None = None) -> dict[str, Any]:
"""Small transparent fallback used by tests and explicit offline smoke mode."""
text = query.lower()
platform = str((context or {}).get("platform", "unknown")).lower()
current_words = ["current", "now", "health", "port", "process", "systemd", "status", "npu", "listening", "logs"]
prior_words = ["where did we leave", "what did we decide", "previous", "earlier", "handoff", "plan"]
coding_words = ["implement", "code", "repo", "test", "pytest", "diff", "branch", "hermes"]
research_words = ["research", "compare", "summarize", "explain", "what is", "how do i"]
unsafe_words = ["change live routing", "live routing", "restart", "send", "write memory", "reindex", "mutate", "delete"]
safety = any(w in text for w in unsafe_words)
tool_needed = any(w in text for w in current_words + coding_words) or safety
if platform == "kanban" or "kanban" in text or any(w in text for w in coding_words):
category = "coding"
elif any(w in text for w in current_words):
category = "devops"
elif any(w in text for w in research_words + prior_words):
category = "research"
else:
category = "chat"
if "remember" in text or "preference" in text:
memory_candidate = "durable_user_fact"
elif "convention" in text or "workflow" in text:
memory_candidate = "workflow_convention"
else:
memory_candidate = "none"
urgency = "high" if any(w in text for w in ["urgent", "critical", "down", "broken"]) else "normal"
return {
"tool_needed": {"value": tool_needed, "confidence": 0.76 if tool_needed else 0.68},
"memory_candidate": {"value": memory_candidate, "confidence": 0.8 if memory_candidate != "none" else 0.35},
"urgency": {"value": urgency, "confidence": 0.8 if urgency == "high" else 0.65},
"workflow_category": {"value": category, "confidence": 0.78 if category != "chat" else 0.7},
"safety_confirmation_required": {"value": safety, "confidence": 0.9 if safety else 0.2},
}
class _NoClassifierRedirectHandler(urllib.request.HTTPRedirectHandler):
"""Fail closed instead of following redirects away from a validated local URL."""
def redirect_request(self, req, fp, code, msg, headers, newurl): # type: ignore[no-untyped-def]
return None
_CLASSIFIER_OPENER = urllib.request.build_opener(_NoClassifierRedirectHandler)
def classify_live(
query: str,
context: Mapping[str, Any] | None = None,
classifier_url: str = DEFAULT_CLASSIFIER_URL,
timeout: float = 8.0,
) -> ClassifierResult:
classifier_url = validate_classifier_url(classifier_url)
before = read_npu_busy_time_us()
payload = {
"id": f"context-gate-{int(time.time())}",
"text": query,
"context": {"platform": (context or {}).get("platform", "cli"), "source": "context_gate"},
"options": {"include_evidence": False, "include_embedding_debug": False, "dry_run": True},
}
req = urllib.request.Request(
classifier_url,
data=json.dumps(payload).encode("utf-8"),
headers={"Content-Type": "application/json"},
method="POST",
)
try:
with _CLASSIFIER_OPENER.open(req, timeout=timeout) as resp: # noqa: S310 - local configured endpoint only
raw = resp.read(256_000)
except (urllib.error.URLError, TimeoutError, OSError) as exc:
raise ContextGateError(f"classifier_unavailable: {exc}") from exc
after = read_npu_busy_time_us()
try:
data = json.loads(raw.decode("utf-8"))
except json.JSONDecodeError as exc:
raise ContextGateError("classifier_invalid_json") from exc
labels = data.get("labels")
if not isinstance(labels, Mapping):
raise ContextGateError("classifier_missing_labels")
outer = after - before if before is not None and after is not None else None
return ClassifierResult(
labels=labels,
npu_busy_delta_us=_as_int_or_none(data.get("npu_busy_delta_us")),
sysfs_npu_busy_delta_us=_as_int_or_none(data.get("sysfs_npu_busy_delta_us")),
outer_sysfs_delta_us=outer,
live=True,
)
def validate_classifier_url(classifier_url: str) -> str:
"""Validate the local-only classifier endpoint before any POST is attempted."""
parsed = urllib.parse.urlparse(classifier_url)
if parsed.scheme not in {"http", "https"}:
raise ContextGateError("invalid_classifier_url:scheme_must_be_http_or_https")
host = parsed.hostname
if not host:
raise ContextGateError("invalid_classifier_url:missing_host")
host_normalized = host.lower().rstrip(".")
if host_normalized == "localhost":
return classifier_url
try:
address = ipaddress.ip_address(host_normalized)
except ValueError as exc:
raise ContextGateError("invalid_classifier_url:host_must_be_loopback") from exc
if not address.is_loopback:
raise ContextGateError("invalid_classifier_url:host_must_be_loopback")
return classifier_url
def _as_int_or_none(value: Any) -> int | None:
try:
return int(value)
except (TypeError, ValueError):
return None
def classify_offline(query: str, context: Mapping[str, Any] | None = None, warning: str | None = None) -> ClassifierResult:
return ClassifierResult(
labels=heuristic_labels(query, context),
npu_busy_delta_us=None,
sysfs_npu_busy_delta_us=None,
outer_sysfs_delta_us=None,
live=False,
warning=warning or "offline_heuristic_classifier_no_npu_claim",
)
def _has_any(text: str, needles: list[str]) -> bool:
return any(n in text for n in needles)
def _source(source: str, action: str, reason: str, priority: int, freshness: str, confidence: float) -> dict[str, Any]:
assert source in _ALLOWED_SOURCES
return {
"source": source,
"action": action,
"reason": reason,
"priority": priority,
"freshness": freshness,
"permission": "tool_required_by_authoritative_agent" if source != "no_retrieval" else "none",
"missing_behavior": "retrieve_or_mark_missing" if source != "no_retrieval" else "skip_retrieval",
"confidence": round(confidence, 2),
}
def select_sources(query: str, labels: Mapping[str, Any], context: Mapping[str, Any], max_sources: int) -> list[dict[str, Any]]:
text = query.lower()
sources: list[dict[str, Any]] = []
category = str(_label_value(labels, "workflow_category", "unknown"))
memory_candidate = str(_label_value(labels, "memory_candidate", "none"))
tool_needed = bool(_label_value(labels, "tool_needed", False))
if tool_needed or _has_any(text, ["current", "now", "health", "port", "process", "systemd", "status", "npu", "listening", "logs", "time", "date"]):
sources.append(_source("live_system", "inspect_with_terminal_or_domain_tool", "current service/system state requested", 1, "live_required", 0.9))
if context.get("repo_path") or category == "coding" or _has_any(text, ["repo", "code", "file", "test", "pytest", "diff", "implementation", "hermes", "atlas"]):
sources.append(_source("repo_files", "inspect_explicit_repo_paths", "repo-specific implementation or config context", 2, "current_filesystem", 0.84))
if _has_any(text, ["where did we leave", "what did we decide", "previous", "earlier", "handoff", "prior", "last time"]):
sources.append(_source("session_search", "search_prior_sessions_or_kanban_handoffs", "prior decision or handoff requested", 3, "session-era", 0.82))
if _has_any(text, ["runbook", "note", "obsidian", "rag", "docs", "knowledge", "plan"]):
sources.append(_source("rag_search", "query_local_index_read_only", "local docs or indexed knowledge likely useful", 4, "cached_index", 0.76))
if memory_candidate != "none" or _has_any(text, ["preference", "remember", "profile", "durable fact"]):
sources.append(_source("durable_memory", "read_stable_facts_only", "stable preference/environment facts may be relevant", 5, "static", 0.72))
if _has_any(text, ["latest", "news", "version", "release", "public", "web"]):
sources.append(_source("web", "search_public_current_sources", "current external public fact requested", 6, "live_external", 0.7))
if not sources:
sources.append(_source("no_retrieval", "answer_directly", "no factual retrieval dependency detected", 1, "none", 0.78))
# Stable priority order and bounded compact plan.
seen: set[str] = set()
deduped = []
for item in sorted(sources, key=lambda x: x["priority"]):
if item["source"] not in seen:
seen.add(item["source"])
deduped.append(item)
return deduped[:max_sources]
def select_bundle_name(query: str, labels: Mapping[str, Any], context: Mapping[str, Any]) -> str:
text = query.lower()
category = str(_label_value(labels, "workflow_category", "unknown"))
if context.get("platform") == "kanban" or context.get("task_id") or category == "coding":
return "CodingTaskBundle"
if category in {"devops", "debugging"} or _has_any(text, ["health", "port", "systemd", "npu", "service", "logs"]):
return "OpsDebugBundle"
if category in {"note_taking", "productivity"} or _has_any(text, ["preference", "remember", "profile"]):
return "PersonalAssistantBundle"
if "no_retrieval" in [s["source"] for s in select_sources(query, labels, context, 1)]:
return "SimpleResponseBundle"
return "ResearchBundle"
def _field(field: str, shape: str, source: str, freshness: str, missing: str, privacy: str, confidence: float = 0.8) -> dict[str, Any]:
return {
"field": field,
"shape": shape,
"source_of_truth": source,
"freshness": freshness,
"provenance_required": True,
"missing_behavior": missing,
"privacy": privacy,
"confidence": round(confidence, 2),
}
def build_bundle_plan(bundle_name: str, sources: Sequence[Mapping[str, Any]], query: str, labels: Mapping[str, Any]) -> dict[str, Any]:
safety_required = bool(_label_value(labels, "safety_confirmation_required", False))
source_names = {s["source"] for s in sources}
if bundle_name == "OpsDebugBundle":
required = [
_field("problem_statement", "compact_text", "user", "request", "mark_missing", "query_text_only"),
_field("target_scope", "service_repo_or_host", "query_or_classifier", "request", "ask_or_infer_low_confidence", "no_private_paths_beyond_explicit"),
_field("live_state", "status_table", "live_system", "live_required", "retrieve_or_fail_closed", "no_raw_logs_by_default"),
_field("safety_gates", "closed_gate_map", "policy", "static", "fail_closed", "no_private_data"),
_field("provenance", "tool_names_and_paths", "executing_agent", "run", "mark_missing", "paths_only"),
]
elif bundle_name == "CodingTaskBundle":
required = [
_field("repo_root", "absolute_path", "task_or_context", "current", "ask_or_fail", "explicit_path_only"),
_field("git_state", "branch_dirty_counts", "live_system", "live_required", "retrieve_or_fail_closed", "no_diff_dump_by_default"),
_field("requirements", "bullet_summary", "user_kanban_files", "current", "retrieve_or_mark_missing", "no_private_snippets"),
_field("relevant_paths", "path_list", "repo_files", "current_filesystem", "search_narrowly", "paths_only"),
_field("tests_or_smokes", "command_list", "repo_files", "current_filesystem", "mark_missing", "commands_only"),
_field("review_gates", "closed_gate_map", "policy", "static", "fail_closed", "no_private_data"),
]
elif bundle_name == "PersonalAssistantBundle":
required = [
_field("user_intent", "compact_text", "user", "request", "mark_missing", "query_text_only"),
_field("durable_facts_needed", "fact_keys", "durable_memory", "static", "retrieve_or_mark_missing", "no_raw_memory_dump"),
_field("prior_decisions_needed", "session_refs", "session_search", "session-era", "retrieve_or_mark_missing", "summaries_only"),
_field("privacy_boundary", "closed_gate_map", "policy", "static", "fail_closed", "no_private_data"),
_field("action_authority", "closed_gate_map", "policy", "static", "fail_closed", "no_private_data"),
]
elif bundle_name == "SimpleResponseBundle":
required = []
else:
required = [
_field("research_question", "compact_text", "user", "request", "mark_missing", "query_text_only"),
_field("source_plan", "ordered_source_list", "context_gate", "run", "mark_missing", "no_private_snippets"),
_field("evidence_requirements", "provenance_rules", "policy", "static", "fail_closed", "no_private_data"),
_field("freshness_cutoff", "freshness_policy", "classifier_query", "request", "mark_missing", "no_private_data"),
_field("missing_data_behavior", "policy_enum", "policy", "static", "fail_closed", "no_private_data"),
]
blocked = []
if safety_required or re.search(r"\b(route|routing|restart|send|write memory|reindex|delete|mutate)\b", query.lower()):
blocked.append(_field("authority_side_effect", "approval_required", "policy", "static", "fail_closed", "no_side_effects_in_v1", 0.95))
if "rag_search" in source_names:
blocked.append(_field("vector_db_mutation", "not_allowed", "policy", "static", "fail_closed", "read_only_query_plan", 0.95))
return {"bundle_name": bundle_name, "required_fields": required, "optional_fields": [], "blocked_fields": blocked}
def summarize_query_class(labels: Mapping[str, Any]) -> dict[str, Any]:
return {
"workflow_category": _label_value(labels, "workflow_category", "unknown"),
"urgency": _label_value(labels, "urgency", "normal"),
"tool_needed": bool(_label_value(labels, "tool_needed", False)),
"memory_candidate": _label_value(labels, "memory_candidate", "none"),
"safety_confirmation_required": bool(_label_value(labels, "safety_confirmation_required", False)),
"confidence": round(max(
_label_confidence(labels, "workflow_category", 0.5),
_label_confidence(labels, "tool_needed", 0.5),
_label_confidence(labels, "safety_confirmation_required", 0.5),
), 2),
}
def npu_proof_from_classifier(result: ClassifierResult, require_npu_proof: bool) -> tuple[dict[str, Any], list[str]]:
endpoint_delta = result.npu_busy_delta_us
endpoint_sysfs_delta = result.sysfs_npu_busy_delta_us
outer_delta = result.outer_sysfs_delta_us
positive_endpoint_sysfs = endpoint_sysfs_delta is not None and endpoint_sysfs_delta > 0
positive_outer = outer_delta is not None and outer_delta > 0
verified = bool(result.live and (positive_endpoint_sysfs or positive_outer))
warnings: list[str] = []
if result.warning:
warnings.append(result.warning)
if require_npu_proof and not verified:
warnings.append("npu_proof_inconclusive")
return {
"classifier_delta_us": endpoint_delta,
"classifier_sysfs_delta_us": endpoint_sysfs_delta,
"outer_sysfs_delta_us": outer_delta,
"rerank_delta_us": None,
"verified": verified,
"required": require_npu_proof,
"classifier_live": result.live,
}, warnings
def build_plan(
query: str,
*,
context: Mapping[str, Any] | None = None,
options: Mapping[str, Any] | None = None,
classifier: ClassifierResult | None = None,
) -> dict[str, Any]:
if not query or not query.strip():
raise ContextGateError("query_required")
context = dict(context or {})
options = dict(options or {})
if options.get("dry_run", True) is not True:
raise ContextGateError("dry_run_must_remain_true_in_v1")
if options.get("include_private_text", False):
raise ContextGateError("include_private_text_not_allowed_in_v1")
max_sources = max(1, min(6, int(options.get("max_sources", 4))))
require_npu = bool(options.get("require_npu_proof", True))
if classifier is None:
classifier = classify_offline(query, context)
labels = classifier.labels
source_plan = select_sources(query, labels, context, max_sources)
bundle_name = select_bundle_name(query, labels, context)
npu_proof, warnings = npu_proof_from_classifier(classifier, require_npu)
plan = {
"schema": SCHEMA,
"trace_id": options.get("trace_id") or context.get("trace_id"),
"dry_run": True,
"ok": True,
"query_class": summarize_query_class(labels),
"source_plan": source_plan,
"bundle_plan": build_bundle_plan(bundle_name, source_plan, query, labels),
"npu_proof": npu_proof,
"authority": dict(AUTHORITY),
"gates": dict(GATES),
"warnings": warnings,
}
validate_plan(plan)
return plan
def validate_plan(plan: Mapping[str, Any]) -> None:
if plan.get("schema") != SCHEMA:
raise ContextGateError("invalid_schema")
if plan.get("dry_run") is not True:
raise ContextGateError("dry_run_missing")
if plan.get("authority") != AUTHORITY:
raise ContextGateError("authority_not_closed")
sources = plan.get("source_plan")
if not isinstance(sources, list) or not sources:
raise ContextGateError("source_plan_required")
for item in sources:
if item.get("source") not in _ALLOWED_SOURCES:
raise ContextGateError(f"invalid_source:{item.get('source')}")
required_blocks = ["query_class", "bundle_plan", "npu_proof", "gates"]
for block in required_blocks:
if block not in plan:
raise ContextGateError(f"missing_block:{block}")
def compact_line(plan: Mapping[str, Any]) -> str:
sources = ",".join(str(s["source"]) for s in plan["source_plan"])
closed = "route,memory,send,tools,restart,vector,private_roots,config"
warnings = ",".join(plan.get("warnings") or []) or "none"
return (
f"ok={str(plan['ok']).lower()} schema={plan['schema']} "
f"bundle={plan['bundle_plan']['bundle_name']} sources={sources} "
f"source_count={len(plan['source_plan'])} "
f"npu_verified={str(plan['npu_proof']['verified']).lower()} "
f"classifier_delta_us={plan['npu_proof'].get('classifier_delta_us')} "
f"outer_sysfs_delta_us={plan['npu_proof'].get('outer_sysfs_delta_us')} "
f"gates=closed:{closed} warnings={warnings}"
)
def compact_json(plan: Mapping[str, Any]) -> str:
compact = {
"schema": plan["schema"],
"ok": plan["ok"],
"dry_run": plan["dry_run"],
"bundle_name": plan["bundle_plan"]["bundle_name"],
"sources": [s["source"] for s in plan["source_plan"]],
"source_count": len(plan["source_plan"]),
"query_class": plan["query_class"],
"npu_proof": plan["npu_proof"],
"authority": plan["authority"],
"gates_closed": list(plan["gates"].keys()),
"warnings": plan.get("warnings", []),
}
return json.dumps(compact, sort_keys=True, separators=(",", ":"))
+16
View File
@@ -0,0 +1,16 @@
#!/usr/bin/env python3
"""Thin repo-local wrapper for the Atlas/Hermes context-gate advisory CLI."""
from __future__ import annotations
import sys
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[1]
if str(REPO_ROOT) not in sys.path:
sys.path.insert(0, str(REPO_ROOT))
from openvino_context_gate.cli import main # noqa: E402
if __name__ == "__main__":
raise SystemExit(main())
+526
View File
@@ -0,0 +1,526 @@
#!/usr/bin/env python3
"""Dry-run Kanban hygiene advisory classifier.
Reads compact board/task summaries and emits bounded labels/next gates without
mutating any Hermes Kanban state. Phase 1 is deterministic rules only; it does
not call kanban tools, restart services, write memory, or send outbound data.
"""
from __future__ import annotations
import argparse
import json
import re
import sys
import time
from pathlib import Path
from typing import Any
SCHEMA = "kanban_hygiene_advisory_v1"
AUTHORITY = {
"may_mutate_board": False,
"may_assign": False,
"may_block_or_unblock": False,
"may_complete_or_archive": False,
"may_create_tasks": False,
"may_write_memory": False,
"may_send_external": False,
"may_restart_services": False,
"may_execute_tools": False,
}
NPU_PROOF = {
"required_for_npu_claims": True,
"attempted": False,
"ok": None,
"npu_busy_delta_us": None,
}
REQUIRED_TASK_FIELDS = {"id", "title", "status", "assignee", "created_at", "updated_at"}
SUPPORTED_STATUSES = {
"triage",
"todo",
"ready",
"running",
"blocked",
"done",
"archived",
"failed",
"cancelled",
}
TASK_TYPES = {
"charter",
"discovery",
"spec",
"implement",
"test",
"review",
"docs",
"ops",
"integration",
"final",
"unknown",
}
LANES = {
"observability_utilization",
"cron_n8n_classifier",
"rag_context_gate",
"doc_image_audio_triage",
"voice_audio_pipeline",
"kanban_hygiene",
"docs_runbook_service_map",
"ops_integration",
"final_closeout",
"general",
"unknown",
}
LIFECYCLE_PREFIXES = {
"charter",
"discovery",
"spec",
"implement",
"test",
"review",
"docs",
"doc",
"ops",
"integration",
"final",
}
def compact_text(task: dict[str, Any]) -> str:
parts = [str(task.get("title", "")), str(task.get("body_excerpt", "")), str(task.get("last_run_summary_excerpt", "")), str(task.get("last_comment_excerpt", ""))]
return " ".join(part for part in parts if part).lower()
def load_jsonl(raw: str) -> tuple[list[dict[str, Any]], dict[str, Any]]:
tasks = []
for line_no, line in enumerate(raw.splitlines(), start=1):
if not line.strip():
continue
try:
row = json.loads(line)
except json.JSONDecodeError as exc:
raise ValueError(f"invalid JSONL on line {line_no}: {exc.msg}") from exc
if not isinstance(row, dict):
raise ValueError(f"JSONL line {line_no} is not an object")
tasks.append(row)
return tasks, {}
def load_input(path: str | None, fmt: str) -> tuple[list[dict[str, Any]], dict[str, Any]]:
raw = sys.stdin.read() if not path or path == "-" else Path(path).read_text(encoding="utf-8")
if not raw.strip():
raise ValueError("input is empty")
parse_as_jsonl = fmt == "jsonl" or (fmt == "auto" and "\n" in raw.strip() and not raw.lstrip().startswith(("{", "[")))
if parse_as_jsonl:
return load_jsonl(raw)
try:
data = json.loads(raw)
except json.JSONDecodeError as exc:
if fmt == "auto" and "\n" in raw.strip():
return load_jsonl(raw)
raise ValueError(f"invalid JSON input: {exc.msg}") from exc
if isinstance(data, list):
if not all(isinstance(item, dict) for item in data):
raise ValueError("JSON list must contain task objects")
return data, {}
if isinstance(data, dict):
tasks = data.get("tasks")
if tasks is None:
# Treat a single object with required task fields as a one-task summary.
if REQUIRED_TASK_FIELDS.issubset(data):
return [data], {}
raise ValueError("JSON object must contain a 'tasks' list")
if not isinstance(tasks, list) or not all(isinstance(item, dict) for item in tasks):
raise ValueError("'tasks' must be a list of objects")
metadata = {key: value for key, value in data.items() if key != "tasks"}
return tasks, metadata
raise ValueError("input must be JSON object, JSON list, or JSON Lines")
def validate_task(task: dict[str, Any]) -> None:
missing = sorted(REQUIRED_TASK_FIELDS - set(task))
if missing:
task_id = task.get("id", "<unknown>")
raise ValueError(f"task {task_id} missing required fields: {', '.join(missing)}")
status = str(task.get("status"))
if status not in SUPPORTED_STATUSES:
raise ValueError(f"task {task.get('id')} has unsupported status: {status}")
for field in ("created_at", "updated_at"):
if not isinstance(task.get(field), (int, float)):
raise ValueError(f"task {task.get('id')} field {field} must be epoch seconds")
def confidence(value: float) -> float:
return round(max(0.0, min(1.0, value)), 2)
def classify_task_type(task: dict[str, Any]) -> dict[str, Any]:
title = str(task.get("title", "")).strip().lower()
body = compact_text(task)
prefix = title.split(":", 1)[0].strip() if ":" in title else ""
prefix_map = {"doc": "docs"}
if prefix in LIFECYCLE_PREFIXES:
value = prefix_map.get(prefix, prefix)
if value in TASK_TYPES:
return {"value": value, "confidence": 0.95, "reason_codes": [f"title_prefix_{value}"]}
keyword_rules = [
("discovery", ["discover", "inventory", "repo map", "read-only"]),
("spec", ["spec", "define", "contract", "schema"]),
("implement", ["implement", "engineer", "script", "code", "build"]),
("review", ["review", "approve", "findings"]),
("docs", ["docs", "runbook", "readme"]),
("ops", ["ops", "health", "monitor", "deploy", "cleanup"]),
("integration", ["integration", "merge", "cherry-pick", "fan-in"]),
("final", ["final", "closeout", "synthesis"]),
("test", ["test", "smoke", "validate"]),
("charter", ["charter", "program framing"]),
]
for value, needles in keyword_rules:
if any(needle in body for needle in needles):
return {"value": value, "confidence": 0.78, "reason_codes": [f"keyword_{value}"]}
return {"value": "unknown", "confidence": 0.2, "reason_codes": ["insufficient_signal"]}
def classify_lane(task: dict[str, Any]) -> dict[str, Any]:
text = compact_text(task)
rules = [
("kanban_hygiene", ["kanban", "task hygiene", "board summaries", "review-needed", "next gate"]),
("cron_n8n_classifier", ["cron", "n8n", "alert", "event classifier"]),
("rag_context_gate", ["rag", "context gate", "retrieval", "bundle"]),
("doc_image_audio_triage", ["document", "image", "audio triage", "ocr", "attachments"]),
("voice_audio_pipeline", ["voice", "whisper", "memo", "transcribe"]),
("docs_runbook_service_map", ["service map", "runbook", "readme"]),
("observability_utilization", ["health", "utilization", "metrics", "digest"]),
("ops_integration", ["merge", "integration", "cherry-pick", "fan-in"]),
("final_closeout", ["final", "closeout", "synthesis"]),
]
for value, needles in rules:
matched = [needle.replace(" ", "_") for needle in needles if needle in text]
if matched:
return {"value": value, "confidence": 0.9, "reason_codes": [f"mentions_{matched[0]}"]}
if text:
return {"value": "general", "confidence": 0.45, "reason_codes": ["no_lane_specific_signal"]}
return {"value": "unknown", "confidence": 0.1, "reason_codes": ["insufficient_signal"]}
def classify_project(task: dict[str, Any], board: str | None, input_metadata: dict[str, Any]) -> dict[str, Any]:
explicit = task.get("project") or input_metadata.get("project")
if explicit:
return {"value": str(explicit), "confidence": 0.9, "source": "input"}
board_name = board or input_metadata.get("board")
if board_name:
return {"value": str(board_name), "confidence": 0.98, "source": "board_name"}
text = compact_text(task)
if "npu" in text or "openvino" in text:
return {"value": "npu-maximization", "confidence": 0.72, "source": "body"}
return {"value": "unknown", "confidence": 0.1, "source": "unknown"}
def classify_blocker(task: dict[str, Any]) -> dict[str, Any]:
status = str(task.get("status"))
text = compact_text(task)
last_outcome = str(task.get("last_run_outcome") or "").lower()
reason_codes: list[str] = []
value = "none"
blocked = False
conf = 0.0
if status == "blocked":
blocked = True
conf = 0.85
if "review-required" in text or "changes requested" in text:
value = "review_changes_requested"
reason_codes.append("blocked_review_required_or_changes")
elif any(word in text for word in ("credential", "token", "path", "spawn_failed")):
value = "missing_credentials"
reason_codes.append("blocked_missing_credentials_or_path")
elif any(word in text for word in ("human", "approval", "decision", "confirm")):
value = "human_decision"
reason_codes.append("blocked_human_decision")
else:
value = "unknown"
reason_codes.append("status_blocked")
elif status == "todo" and task.get("parents"):
value = "missing_parent"
conf = 0.75
reason_codes.append("todo_with_parents")
elif last_outcome in {"crashed", "timed_out", "failed"}:
value = "failed_tests"
conf = 0.65
reason_codes.append(f"last_run_{last_outcome}")
return {"value": value, "blocked": blocked, "confidence": confidence(conf), "reason_codes": reason_codes}
def age_hours(now: float, timestamp: Any) -> float | None:
if not isinstance(timestamp, (int, float)):
return None
return round(max(0.0, now - float(timestamp)) / 3600.0, 2)
def classify_staleness(task: dict[str, Any], now: float) -> dict[str, Any]:
status = str(task.get("status"))
created = float(task["created_at"])
activity_ts = float(task.get("heartbeat_at") or task.get("last_activity_at") or task.get("updated_at") or created)
age = age_hours(now, created)
last_activity = age_hours(now, activity_ts)
threshold = 24
value = "fresh"
reason_codes: list[str] = []
if status == "running":
threshold = 1
if last_activity is not None and last_activity > 1:
value = "stale_lock"
reason_codes.append("running_no_recent_heartbeat")
elif status == "ready":
threshold = 24
if last_activity is not None and last_activity >= 72:
value = "stale"
reason_codes.append("ready_over_72h")
elif last_activity is not None and last_activity >= 24:
value = "aging"
reason_codes.append("ready_over_24h")
elif status == "blocked":
review_required = "review-required" in compact_text(task)
threshold = 24 if review_required else 48
if last_activity is not None and last_activity >= 168:
value = "stale"
reason_codes.append("blocked_over_7d")
elif review_required and last_activity is not None and last_activity >= 72:
value = "stale"
reason_codes.append("review_required_over_72h")
elif last_activity is not None and last_activity >= threshold:
value = "aging"
reason_codes.append("blocked_or_review_aging")
elif status == "todo" and not task.get("parents") and last_activity is not None and last_activity >= 72:
value = "orphaned"
threshold = 72
reason_codes.append("todo_without_parents_over_72h")
return {
"value": value,
"age_hours": age,
"last_activity_hours": last_activity,
"threshold_hours": threshold,
"reason_codes": reason_codes,
}
def normalize_title(title: str) -> str:
text = title.lower().strip()
text = re.sub(r"^(charter|discovery|spec|implement|test|review|docs?|ops|integration|final)\s*:\s*", "", text)
text = re.sub(r"[^a-z0-9]+", " ", text)
return re.sub(r"\s+", " ", text).strip()
def find_duplicates(tasks: list[dict[str, Any]], labels: dict[str, dict[str, Any]]) -> dict[str, dict[str, Any]]:
groups: dict[tuple[str, str, str], list[str]] = {}
active_statuses = SUPPORTED_STATUSES - {"done", "archived", "cancelled"}
for task in tasks:
if str(task.get("status")) not in active_statuses:
continue
task_id = str(task["id"])
key = (
normalize_title(str(task.get("title", ""))),
labels[task_id]["lane"]["value"],
labels[task_id]["task_type"]["value"],
)
if key[0]:
groups.setdefault(key, []).append(task_id)
result = {
str(task["id"]): {
"is_duplicate": False,
"canonical_task_id": None,
"candidate_ids": [],
"confidence": 0.0,
"reason_codes": [],
}
for task in tasks
}
for ids in groups.values():
if len(ids) < 2:
continue
canonical = sorted(ids)[0]
for task_id in ids:
candidates = [candidate for candidate in ids if candidate != task_id]
result[task_id] = {
"is_duplicate": task_id != canonical,
"canonical_task_id": canonical if task_id != canonical else None,
"candidate_ids": candidates,
"confidence": 0.86,
"reason_codes": ["same_normalized_title_lane_and_task_type"],
}
return result
def has_non_positive_npu_busy_delta(text: str) -> bool:
if "npu" not in text and "busy" not in text:
return False
patterns = [
r"\b(?:npu_)?busy(?:_time)?(?:_delta)?(?:_us)?\s*[=:]\s*([+-]?\d+(?:\.\d+)?)\b",
r"\b(?:npu_)?delta(?:_us)?\s*[=:]\s*([+-]?\d+(?:\.\d+)?)\b",
]
for pattern in patterns:
for match in re.finditer(pattern, text):
try:
if float(match.group(1)) <= 0:
return True
except ValueError:
continue
return False
def classify_review_needed(task: dict[str, Any], task_type: str) -> dict[str, Any]:
text = compact_text(task)
changed_files = task.get("changed_files") or task.get("diff_path") or task.get("tests_run")
if has_non_positive_npu_busy_delta(text):
return {"value": True, "kind": "npu_proof_gate", "confidence": 0.84, "reason_codes": ["npu_claim_non_positive_busy_delta"]}
if "npu" in text and ("http 200" in text or "no busy" in text or "missing busy" in text):
return {"value": True, "kind": "npu_proof_gate", "confidence": 0.8, "reason_codes": ["npu_claim_needs_busy_delta"]}
if "review-required" in text:
kind = "code_change" if task_type == "implement" else "spec_review"
return {"value": True, "kind": kind, "confidence": 0.92, "reason_codes": ["review_required_marker"]}
if changed_files and task_type in {"implement", "ops", "docs"}:
return {"value": True, "kind": "code_change", "confidence": 0.86, "reason_codes": ["reported_changed_files_or_tests"]}
if any(needle in text for needle in ("routing authority", "restart service", "write memory", "send outbound", "private root", "wildcard bind", "vector db mutation")):
return {"value": True, "kind": "human_approval", "confidence": 0.84, "reason_codes": ["authority_change_requires_approval"]}
return {"value": False, "kind": "none", "confidence": 0.2, "reason_codes": []}
def classify_next_gate(task: dict[str, Any], labels: dict[str, Any]) -> dict[str, Any]:
task_type = labels["task_type"]["value"]
status = str(task.get("status"))
reason_codes: list[str] = []
if labels["duplicate"]["is_duplicate"]:
return {"value": "dedupe_review", "confidence": 0.86, "reason_codes": ["duplicate_candidate"]}
if labels["staleness"]["value"] == "stale_lock":
return {"value": "investigate_stale_lock", "confidence": 0.88, "reason_codes": ["running_stale_lock"]}
blocker = labels["blocker"]
if blocker["value"] in {"human_decision", "missing_credentials", "unknown"} and blocker["blocked"]:
return {"value": "needs_human_decision", "confidence": 0.85, "reason_codes": blocker["reason_codes"] or ["blocked"]}
if blocker["value"] == "missing_parent":
return {"value": "wait_for_parents", "confidence": 0.82, "reason_codes": ["unfinished_parents"]}
if task_type == "implement" and not (task.get("tests_run") or task.get("test_evidence")) and status in {"blocked", "done"}:
return {"value": "needs_test_evidence", "confidence": 0.78, "reason_codes": ["implementation_without_test_evidence"]}
review_needed = labels["review_needed"]
if review_needed["kind"] == "npu_proof_gate":
return {"value": "needs_npu_proof", "confidence": 0.8, "reason_codes": review_needed["reason_codes"]}
if review_needed["value"]:
return {"value": "ready_for_review", "confidence": 0.86, "reason_codes": review_needed["reason_codes"]}
gate_by_type = {
"spec": "ready_for_implementation",
"implement": "ready_for_review",
"review": "ready_for_integration",
"docs": "ready_for_integration",
"ops": "ready_for_ops_validation",
"integration": "ready_for_closeout",
"final": "safe_to_complete",
"discovery": "safe_to_complete",
"charter": "ready_for_spec",
"test": "ready_for_review",
}
type_gate = gate_by_type.get(task_type, "unknown")
if task_type in gate_by_type:
reason_codes.append(f"task_type_{task_type}")
return {"value": type_gate, "confidence": 0.74 if type_gate != "unknown" else 0.2, "reason_codes": reason_codes}
def advisory(tasks: list[dict[str, Any]], *, board: str | None, now: float, input_metadata: dict[str, Any], include_evidence: bool) -> dict[str, Any]:
for task in tasks:
validate_task(task)
prelim: dict[str, dict[str, Any]] = {}
for task in tasks:
task_id = str(task["id"])
prelim[task_id] = {
"task_type": classify_task_type(task),
"project": classify_project(task, board, input_metadata),
"lane": classify_lane(task),
"blocker": classify_blocker(task),
"staleness": classify_staleness(task, now),
}
duplicates = find_duplicates(tasks, prelim)
items = []
for task in tasks:
task_id = str(task["id"])
labels = dict(prelim[task_id])
labels["duplicate"] = duplicates[task_id]
labels["review_needed"] = classify_review_needed(task, labels["task_type"]["value"])
labels["next_gate"] = classify_next_gate(task, labels)
item = {
"task_id": task_id,
**labels,
"warnings": [],
}
if include_evidence:
item["evidence"] = {
"normalized_title": normalize_title(str(task.get("title", ""))),
"status": task.get("status"),
"parents_count": len(task.get("parents") or []),
"children_count": len(task.get("children") or []),
}
items.append(item)
counts = {
"tasks": len(items),
"duplicates": sum(1 for item in items if item["duplicate"]["is_duplicate"]),
"review_needed": sum(1 for item in items if item["review_needed"]["value"]),
"stale": sum(1 for item in items if item["staleness"]["value"] in {"stale", "stale_lock", "orphaned"}),
"blocked": sum(1 for item in items if item["blocker"]["blocked"]),
}
return {
"schema": SCHEMA,
"dry_run": True,
"created": int(now),
"board": board or input_metadata.get("board") or None,
"counts": counts,
"authority": AUTHORITY,
"npu_proof": NPU_PROOF,
"items": items,
}
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Dry-run Kanban hygiene advisory classifier",
epilog="Input: JSON object with tasks[] or JSONL task objects. Required task fields: id,title,status,assignee,created_at,updated_at. Optional compact fields such as body_excerpt, parents, children, changed_files, tests_run, last_run_outcome, and last_comment_excerpt improve labels.",
)
parser.add_argument("--input", "-i", help="Input JSON/JSONL file; omit or '-' for stdin")
parser.add_argument("--format", choices=["auto", "json", "jsonl"], default="auto", help="Input format")
parser.add_argument("--board", help="Board/project name to include in output")
parser.add_argument("--now", type=float, default=None, help="Epoch seconds for deterministic staleness tests")
parser.add_argument("--compact", action="store_true", help="Accepted for compatibility; output is compact JSON by default")
parser.add_argument("--include-evidence", action="store_true", help="Include short derived evidence fields")
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
try:
tasks, metadata = load_input(args.input, args.format)
output = advisory(
tasks,
board=args.board,
now=args.now if args.now is not None else time.time(),
input_metadata=metadata,
include_evidence=args.include_evidence,
)
except (OSError, ValueError) as exc:
print(f"kanban-hygiene-advisory: {exc}", file=sys.stderr)
return 2
print(json.dumps(output, sort_keys=True, separators=(",", ":")))
return 0
if __name__ == "__main__":
raise SystemExit(main())
+523
View File
@@ -0,0 +1,523 @@
#!/usr/bin/env python3
"""Explicit-root dry-run batch triage for local documents, images, and audio.
This wrapper is intentionally report-only. It requires a lane-scoped approved
root in a manifest, rejects request roots that broaden that approval, redacts raw
text/transcripts by default, and never mutates Obsidian, RAG/vector DBs, files,
routing, memory, services, or sends.
"""
from __future__ import annotations
import argparse
import datetime as dt
import hashlib
import ipaddress
import importlib.util
import json
import mimetypes
import os
import re
import sys
import time
import urllib.error
import urllib.parse
import urllib.request
from pathlib import Path
from typing import Any
try:
import yaml # type: ignore
except Exception as exc: # pragma: no cover
raise SystemExit("PyYAML is required to read triage root manifests") from exc
LANES = (
"screenshots",
"receipts",
"downloads",
"obsidian_attachments",
"voice_memos",
"meeting_snippets",
)
AUDIO_LANES = {"voice_memos", "meeting_snippets"}
DOC_IMAGE_LANES = {"screenshots", "receipts", "downloads", "obsidian_attachments"}
SKIP_DIR_NAMES = {".git", ".obsidian", "__pycache__", ".cache", "cache", "chroma", "chromadb", "vector_db", "vectors"}
NPU_BUSY_PATH = Path("/sys/class/accel/accel0/device/npu_busy_time_us")
DEFAULT_WHISPER_URL = "http://127.0.0.1:18816/v1/audio/transcriptions"
MUTATIONS_FALSE = {
"obsidian": False,
"rag": False,
"vector_db": False,
"sends": False,
"file_moves": False,
"routing": False,
"memory": False,
"service_restarts": False,
}
ACTION_PATTERNS = {
"follow_up": re.compile(r"\b(follow up|follow-up|circle back|reply|respond)\b", re.I),
"date_or_deadline": re.compile(r"\b(deadline|due|by (?:mon|tue|wed|thu|fri|sat|sun)|20\d{2}[-/]\d{1,2}[-/]\d{1,2})\b", re.I),
"decision": re.compile(r"\b(decided|decision|approved|rejected|go with|choose)\b", re.I),
"task": re.compile(r"\b(todo|to-do|action item|assign|need to|please)\b", re.I),
}
class FailClosed(Exception):
pass
def sha256_text(text: str) -> str:
return "sha256:" + hashlib.sha256(text.encode("utf-8", errors="replace")).hexdigest()
def sha256_file(path: Path) -> str:
h = hashlib.sha256()
with path.open("rb") as f:
for chunk in iter(lambda: f.read(1024 * 1024), b""):
h.update(chunk)
return "sha256:" + h.hexdigest()
def read_busy(path: Path = NPU_BUSY_PATH) -> int | None:
try:
return int(path.read_text().strip())
except Exception:
return None
def validate_local_whisper_url(whisper_url: str) -> str:
"""Fail closed unless Whisper transcription stays on the approved loopback service."""
try:
parsed = urllib.parse.urlparse(whisper_url)
port = parsed.port
except ValueError as exc:
raise FailClosed("whisper_url_invalid") from exc
if parsed.scheme != "http":
raise FailClosed("whisper_url_scheme_not_http")
if parsed.username or parsed.password:
raise FailClosed("whisper_url_credentials_not_allowed")
if port != 18816:
raise FailClosed("whisper_url_port_not_approved")
host = (parsed.hostname or "").strip().lower()
if host == "localhost":
return whisper_url
try:
if ipaddress.ip_address(host).is_loopback:
return whisper_url
except ValueError:
pass
raise FailClosed("whisper_url_not_loopback")
def is_under(path: Path, root: Path) -> bool:
try:
path.resolve().relative_to(root.resolve())
return True
except ValueError:
return False
def load_manifest(path: Path) -> dict[str, Any]:
if not path.exists():
raise FailClosed(f"manifest_missing:{path}")
data = yaml.safe_load(path.read_text())
if not isinstance(data, dict):
raise FailClosed("manifest_invalid:not_mapping")
if data.get("version") != 1:
raise FailClosed("manifest_invalid:version_must_be_1")
policy = data.get("policy") or {}
if policy.get("default_mode", "dry_run") != "dry_run":
raise FailClosed("policy_invalid:default_mode_not_dry_run")
for key, expected in {
"require_explicit_root": True,
"allow_external_uploads": False,
"allow_mutations": False,
"log_raw_text": False,
}.items():
if policy.get(key) is not expected:
raise FailClosed(f"policy_invalid:{key}")
if not isinstance(data.get("roots"), dict):
raise FailClosed("manifest_invalid:roots_missing")
return data
def resolve_lane_root(manifest: dict[str, Any], manifest_path: Path, lane: str, requested_root: str | None) -> tuple[dict[str, Any], Path, Path]:
lane_cfg = (manifest.get("roots") or {}).get(lane)
if not isinstance(lane_cfg, dict):
raise FailClosed(f"lane_missing:{lane}")
if lane_cfg.get("approved") is not True:
raise FailClosed(f"lane_unapproved:{lane}")
root_value = lane_cfg.get("root")
if not root_value:
raise FailClosed(f"root_missing:{lane}")
approved_root = Path(str(root_value)).expanduser()
if not approved_root.is_absolute():
approved_root = (manifest_path.parent / approved_root).resolve()
else:
approved_root = approved_root.resolve()
if not approved_root.exists() or not approved_root.is_dir():
raise FailClosed(f"approved_root_unavailable:{lane}")
selected_root = Path(requested_root).expanduser() if requested_root else approved_root
selected_root = selected_root.resolve()
if not selected_root.exists() or not selected_root.is_dir():
raise FailClosed(f"request_root_unavailable:{lane}")
if not is_under(selected_root, approved_root):
raise FailClosed(f"request_root_broadens_approval:{lane}")
return lane_cfg, approved_root, selected_root
def allowed_exts(lane_cfg: dict[str, Any]) -> set[str]:
return {str(e).lower() if str(e).startswith(".") else "." + str(e).lower() for e in lane_cfg.get("allowed_extensions", [])}
def iter_files(root: Path, approved_root: Path, exts: set[str], max_file_mb: float, max_age_days: float | None) -> tuple[list[Path], dict[str, int], int]:
skipped = {"extension": 0, "size": 0, "symlink_escape": 0, "not_regular_file": 0, "too_old": 0, "policy": 0}
accepted: list[Path] = []
files_seen = 0
now = time.time()
max_bytes = int(max_file_mb * 1024 * 1024)
for dirpath, dirnames, filenames in os.walk(root, followlinks=False):
dirnames[:] = [d for d in dirnames if d not in SKIP_DIR_NAMES and not d.startswith(".")]
current = Path(dirpath)
if not is_under(current, approved_root):
skipped["symlink_escape"] += 1
dirnames[:] = []
continue
for name in filenames:
path = current / name
if name.startswith("."):
skipped["policy"] += 1
continue
files_seen += 1
try:
resolved = path.resolve()
except Exception:
skipped["symlink_escape"] += 1
continue
if not is_under(resolved, approved_root):
skipped["symlink_escape"] += 1
continue
if not resolved.is_file():
skipped["not_regular_file"] += 1
continue
if resolved.suffix.lower() not in exts:
skipped["extension"] += 1
continue
try:
st = resolved.stat()
except OSError:
skipped["not_regular_file"] += 1
continue
if st.st_size > max_bytes:
skipped["size"] += 1
continue
if max_age_days is not None and now - st.st_mtime > max_age_days * 86400:
skipped["too_old"] += 1
continue
accepted.append(resolved)
accepted.sort(key=lambda p: p.stat().st_mtime, reverse=True)
return accepted, skipped, files_seen
def load_doc_triage_module(repo_root: Path):
module_path = repo_root / "openvino-doc-image-triage-npu" / "triage.py"
spec = importlib.util.spec_from_file_location("doc_image_triage", module_path)
if spec is None or spec.loader is None:
raise RuntimeError("doc_image_triage_import_failed")
module = importlib.util.module_from_spec(spec)
sys.modules.setdefault("doc_image_triage", module)
spec.loader.exec_module(module) # type: ignore[union-attr]
return module
def fallback_doc_item(path: Path, root: Path, lane: str) -> dict[str, Any]:
sidecar = path.with_suffix(path.suffix + ".txt")
text = ""
if sidecar.exists() and sidecar.is_file():
text = sidecar.read_text(errors="replace")[:12000]
lower = text.lower()
category = "unknown_or_low_confidence"
if any(w in lower for w in ("receipt", "subtotal", "store")):
category = "receipt"
elif any(w in lower for w in ("invoice", "amount due", "payment due")):
category = "bill_or_invoice"
elif lane == "screenshots":
category = "screenshot_web_or_app"
reasons = [name for name, rx in ACTION_PATTERNS.items() if rx.search(text)]
return {
"basename": path.name,
"relative_path_hash": sha256_text(path.relative_to(root).as_posix()),
"file_id": sha256_file(path),
"media_type": infer_media_type(path),
"category": category,
"needs_attention": bool(reasons),
"reasons": sorted(reasons),
"raw_text_redacted": True,
"full_path_included": False,
"metadata": {"dates_count": len(set(re.findall(r"\b20\d{2}[-/]\d{1,2}[-/]\d{1,2}\b", text))), "amounts_count": len(set(re.findall(r"\$\s?\d+(?:\.\d{2})?", text))), "raw_values_redacted": True},
"processing": {"doc_image_triage": "fallback_cpu_sidecar_rules", "npu_verified": False},
}
def infer_media_type(path: Path) -> str:
if path.suffix.lower() == ".pdf":
return "pdf"
mt, _ = mimetypes.guess_type(path.name)
if mt and mt.startswith("image/"):
return "image"
if mt and mt.startswith("audio/"):
return "audio"
return "unknown"
def compact_doc_item(path: Path, root: Path, lane: str, triage_result: dict[str, Any]) -> dict[str, Any]:
pages = triage_result.get("pages") or []
first = pages[0] if pages else {}
cls = first.get("classification") or {}
attn = first.get("needs_attention") or {}
meta = first.get("metadata") or {}
device_summary = triage_result.get("processing_device_summary") or {}
item = {
"basename": path.name,
"relative_path_hash": sha256_text(path.relative_to(root).as_posix()),
"file_id": triage_result.get("file_id") or sha256_file(path),
"media_type": triage_result.get("media_type") or infer_media_type(path),
"category": cls.get("label") or "unknown_or_low_confidence",
"needs_attention": bool(attn.get("value")),
"reasons": attn.get("reasons") or [],
"raw_text_redacted": True,
"full_path_included": False,
"metadata": {
"dates_count": meta.get("dates_count", 0),
"amounts_count": meta.get("amounts_count", 0),
"raw_values_redacted": True,
},
"processing": {
"doc_image_triage": "openvino-doc-image-triage-npu",
"image_category_device": (cls.get("device") or "CPU"),
"needs_attention_device": attn.get("device") or "CPU",
"npu_verified": bool(device_summary.get("npu_verified")),
"npu_busy_delta_us": device_summary.get("npu_busy_delta_us"),
},
}
if lane == "receipts":
item["receipt_fields"] = {"vendor_present": bool((meta.get("detected_entities") or {}).get("org_present")), "amounts_count": item["metadata"]["amounts_count"], "dates_count": item["metadata"]["dates_count"]}
return item
def classify_transcript(text: str, lane: str) -> dict[str, Any]:
reasons = [name for name, rx in ACTION_PATTERNS.items() if rx.search(text)]
action_count = sum(1 for rx in (ACTION_PATTERNS["follow_up"], ACTION_PATTERNS["task"]) if rx.search(text))
decisions = 1 if ACTION_PATTERNS["decision"].search(text) else 0
followups = 1 if ACTION_PATTERNS["follow_up"].search(text) else 0
return {
"category": "meeting_snippet" if lane == "meeting_snippets" else "voice_memo",
"action_worthy": bool(reasons),
"reasons": sorted(reasons),
"action_items_count": action_count,
"decisions_count": decisions,
"followups_count": followups,
}
def multipart_transcribe(path: Path, whisper_url: str, timeout: float) -> dict[str, Any]:
whisper_url = validate_local_whisper_url(whisper_url)
boundary = "----NpuBatchTriage" + hashlib.sha256(path.name.encode()).hexdigest()[:12]
data = path.read_bytes()
body = (
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="file"; filename="{path.name}"\r\n'
"Content-Type: application/octet-stream\r\n\r\n"
).encode() + data + (
f"\r\n--{boundary}\r\n"
'Content-Disposition: form-data; name="model"\r\n\r\n'
"whisper-1\r\n"
f"--{boundary}--\r\n"
).encode()
before = read_busy()
req = urllib.request.Request(whisper_url, data=body, headers={"Content-Type": f"multipart/form-data; boundary={boundary}"})
t0 = time.perf_counter()
with urllib.request.urlopen(req, timeout=timeout) as resp:
raw = resp.read(1024 * 1024)
status = resp.status
parsed = json.loads(raw.decode())
after = read_busy()
text = str(parsed.get("text") or parsed.get("transcription") or "").strip()
service_delta = parsed.get("npu_busy_delta_us")
sysfs_delta = None if before is None or after is None else after - before
proof_delta = service_delta if isinstance(service_delta, int) else sysfs_delta
return {
"ok": status == 200 and bool(text),
"text": text,
"transcript_chars": len(text),
"duration_seconds": parsed.get("duration_seconds"),
"language": parsed.get("language"),
"npu_busy_delta_us": proof_delta,
"verified_npu": bool(proof_delta and proof_delta > 0),
"wall_ms": round((time.perf_counter() - t0) * 1000, 2),
}
def compact_audio_item(path: Path, root: Path, lane: str, no_npu: bool, whisper_url: str, timeout: float) -> dict[str, Any]:
transcript = ""
transcribed = False
npu_delta = 0
proof_ok = False
duration = None
language = None
error = None
if not no_npu:
try:
result = multipart_transcribe(path, whisper_url, timeout)
transcript = result["text"]
transcribed = result["ok"]
npu_delta = result.get("npu_busy_delta_us") or 0
proof_ok = bool(result.get("verified_npu"))
duration = result.get("duration_seconds")
language = result.get("language")
except (urllib.error.URLError, TimeoutError, OSError, json.JSONDecodeError) as exc:
error = f"whisper_error:{type(exc).__name__}"
summary = classify_transcript(transcript, lane)
item = {
"basename": path.name,
"relative_path_hash": sha256_text(path.relative_to(root).as_posix()),
"file_id": sha256_file(path),
"media_type": "audio",
"duration_seconds": duration,
"transcribed": transcribed,
"transcript_chars": len(transcript),
"language": language,
**summary,
"npu_busy_delta_us": npu_delta,
"raw_transcript_logged": False,
"full_path_included": False,
}
if error:
item["error"] = error
return item
def process(args: argparse.Namespace) -> dict[str, Any]:
repo_root = Path(__file__).resolve().parents[1]
manifest_path = Path(args.manifest).expanduser().resolve()
manifest = load_manifest(manifest_path)
lane_cfg, approved_root, root = resolve_lane_root(manifest, manifest_path, args.lane, args.root)
exts = allowed_exts(lane_cfg)
if not exts:
raise FailClosed(f"extensions_missing:{args.lane}")
manifest_limit = int(lane_cfg.get("max_files", 50))
limit = min(args.limit if args.limit is not None else manifest_limit, manifest_limit)
files, skipped, files_seen = iter_files(root, approved_root, exts, float(lane_cfg.get("max_file_mb", 25)), args.max_age_days)
selected = files[:limit]
npu_before = read_busy()
items: list[dict[str, Any]] = []
errors: list[str] = []
doc_module = None
if args.lane in AUDIO_LANES and not args.no_npu:
validate_local_whisper_url(args.whisper_url)
if args.lane in DOC_IMAGE_LANES and not args.no_npu:
try:
doc_module = load_doc_triage_module(repo_root)
except Exception as exc:
errors.append(f"doc_triage_import_error:{type(exc).__name__}")
for path in selected:
try:
if args.lane in AUDIO_LANES:
item = compact_audio_item(path, root, args.lane, args.no_npu, args.whisper_url, args.timeout_seconds)
elif doc_module is not None:
opts = doc_module.TriageOptions(
dry_run=False,
include_ocr_text=False,
include_full_path=False,
use_embeddings=not args.no_npu,
allowed_roots=[approved_root],
timeout_seconds=args.timeout_seconds,
)
item = compact_doc_item(path, root, args.lane, doc_module.triage_file(path, opts))
else:
item = fallback_doc_item(path, root, args.lane)
if args.include_full_path:
item["full_path"] = str(path)
item["full_path_included"] = True
if args.include_raw_text:
item["raw_text_included"] = False
item["raw_text_note"] = "unsupported_by_batch_wrapper"
items.append(item)
except FailClosed:
raise
except Exception as exc:
errors.append(f"{path.name}:{type(exc).__name__}")
items.append({"basename": path.name, "ok": False, "error": type(exc).__name__, "raw_text_redacted": True, "full_path_included": False})
npu_after = read_busy()
sysfs_delta = None if npu_before is None or npu_after is None else npu_after - npu_before
item_deltas = [i.get("npu_busy_delta_us") for i in items if isinstance(i.get("npu_busy_delta_us"), int)]
claimed = not args.no_npu and any((d or 0) > 0 for d in item_deltas + ([sysfs_delta] if isinstance(sysfs_delta, int) else []))
proof_ok = claimed and bool(sysfs_delta is None or sysfs_delta > 0 or any((d or 0) > 0 for d in item_deltas))
return {
"ok": not errors,
"lane": args.lane,
"dry_run": True,
"approved_root": True,
"root_basename": root.name,
"files_seen": files_seen,
"files_processed": len(items),
"skipped": skipped,
"npu": {"claimed": claimed, "busy_delta_us": sysfs_delta, "proof_ok": proof_ok},
"mutations": MUTATIONS_FALSE.copy(),
"items": items,
"raw_content_redacted": not args.include_raw_text,
"full_paths_included": bool(args.include_full_path),
"errors": errors,
"gates": {
"external_uploads": False,
"private_root_broadening": False,
"obsidian_mutation": False,
"vector_db_mutation": False,
"outbound_sends": False,
"routing_changes": False,
},
}
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Explicit-root dry-run batch triage wrapper")
parser.add_argument("--manifest", required=True, help="lane approval manifest; missing/unapproved fails closed")
parser.add_argument("--lane", required=True, choices=LANES)
parser.add_argument("--root", help="optional narrower root under the manifest-approved lane root")
parser.add_argument("--dry-run", action="store_true", help="required; mutation modes are not implemented")
parser.add_argument("--limit", type=int, default=None)
parser.add_argument("--max-age-days", type=float, default=None)
parser.add_argument("--include-raw-text", action="store_true", help="kept redacted by this wrapper; present only for explicit operator attempts")
parser.add_argument("--include-full-path", action="store_true", help="operator-only local debugging")
parser.add_argument("--no-npu", action="store_true", help="CPU-only smoke; never claims NPU")
parser.add_argument("--json", action="store_true", help="emit compact JSON")
parser.add_argument("--pretty", action="store_true", help="pretty JSON for local debugging")
parser.add_argument("--whisper-url", default=DEFAULT_WHISPER_URL)
parser.add_argument("--timeout-seconds", type=float, default=20.0)
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
if not args.dry_run:
print(json.dumps({"ok": False, "error": "dry_run_required", "mutations": MUTATIONS_FALSE}), file=sys.stderr)
return 2
if args.limit is not None and args.limit < 1:
print(json.dumps({"ok": False, "error": "limit_must_be_positive"}), file=sys.stderr)
return 2
try:
out = process(args)
except FailClosed as exc:
out = {"ok": False, "error": "fail_closed", "reason": str(exc), "dry_run": True, "mutations": MUTATIONS_FALSE.copy()}
print(json.dumps(out, indent=2 if args.pretty else None, sort_keys=True))
return 0 if out.get("ok") else 2
if __name__ == "__main__":
raise SystemExit(main())
+536
View File
@@ -0,0 +1,536 @@
#!/usr/bin/env python3
"""Compact, read-only NPU/OpenVINO utilization digest.
Default behavior is safe for on-demand or scheduled runs: health checks plus
bounded synthetic probes, one compact JSONL artifact, and no service restarts,
routing changes, advisory POSTs, vector mutations, outbound sends, or private
root broadening.
"""
from __future__ import annotations
import argparse
import base64
import datetime as dt
import json
import math
import os
import tempfile
import time
import urllib.error
import urllib.parse
import urllib.request
import uuid
import wave
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Callable
BUSY_PATH = Path("/sys/class/accel/accel0/device/npu_busy_time_us")
DEFAULT_OUT_DIR = Path("/home/will/.local/state/npu-utilization/digests")
EMBED_URL = "http://127.0.0.1:18817/v1/embeddings"
EMBED_HEALTH_URL = "http://127.0.0.1:18817/healthz"
RERANK_URL = "http://127.0.0.1:18818/rerank"
RERANK_HEALTH_URL = "http://127.0.0.1:18818/readyz"
WHISPER_URL = "http://127.0.0.1:18816/v1/audio/transcriptions"
WHISPER_HEALTH_URL = "http://127.0.0.1:18816/health"
CLASSIFIER_URL = "http://127.0.0.1:18819/v1/classify"
CLASSIFIER_HEALTH_URL = "http://127.0.0.1:18819/healthz"
GENAI_HEALTH_URL = "http://127.0.0.1:18820/healthz"
GENAI_GENERATE_URL = "http://127.0.0.1:18820/v1/generate"
DOC_TRIAGE_URL = "http://127.0.0.1:18829/triage"
DOC_TRIAGE_HEALTH_URL = "http://127.0.0.1:18829/healthz"
RAG_ENDPOINT_HEALTH_URL = "http://127.0.0.1:18810/healthz"
RAG_HEALTH_URL = "http://127.0.0.1:18814/healthz"
ADVISORY_HEALTH_URL = "http://172.19.0.1:18830/healthz"
@dataclass
class ServiceRow:
type: str = "service"
service: str = ""
reachable: bool = False
probe_ran: bool = False
proof_ok: bool | None = None
calls: int = 0
items: int = 0
avg_ms: float | None = None
npu_delta_us: int | None = None
response_delta_us: int | None = None
mode: str = "unavailable"
fallbacks: int = 0
warnings: list[str] = field(default_factory=list)
gate: str = "none"
jobs: int | None = None
events: int | None = None
files: int | None = None
docs: int | None = None
text_len: int | None = None
sample_rate: int | None = None
embedding_count: int | None = None
embedding_dim: int | None = None
dry_run: bool | None = None
suppress: int | None = None
escalate: int | None = None
loaded: bool | None = None
allowed_roots_count: int | None = None
reason: str | None = None
error: str | None = None
def compact_dict(obj: Any) -> dict[str, Any]:
data = asdict(obj) if hasattr(obj, "__dataclass_fields__") else dict(obj)
return {k: v for k, v in data.items() if v is not None and v != []}
def read_busy(path: Path = BUSY_PATH) -> int | None:
try:
return int(path.read_text().strip())
except Exception:
return None
def safe_error(exc: BaseException) -> str:
return type(exc).__name__
def http_get_json(url: str, timeout: float) -> tuple[int, dict[str, Any]]:
try:
req = urllib.request.Request(url, headers={"Accept": "application/json"})
with urllib.request.urlopen(req, timeout=timeout) as resp:
body = resp.read(1024 * 1024).decode("utf-8", "replace")
return int(resp.status), json.loads(body or "{}")
except urllib.error.HTTPError as exc:
try:
body = exc.read(1024 * 1024).decode("utf-8", "replace")
return int(exc.code), json.loads(body or "{}")
except Exception:
return int(exc.code), {"error": "http_error"}
except Exception as exc:
return 0, {"error": safe_error(exc)}
def http_post_json(url: str, payload: dict[str, Any], timeout: float) -> tuple[int, dict[str, Any]]:
body = json.dumps(payload, ensure_ascii=False).encode("utf-8")
req = urllib.request.Request(url, data=body, headers={"Content-Type": "application/json", "Accept": "application/json"})
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = resp.read(2 * 1024 * 1024).decode("utf-8", "replace")
return int(resp.status), json.loads(data or "{}")
except urllib.error.HTTPError as exc:
try:
data = exc.read(1024 * 1024).decode("utf-8", "replace")
return int(exc.code), json.loads(data or "{}")
except Exception:
return int(exc.code), {"error": "http_error"}
except Exception as exc:
return 0, {"error": safe_error(exc)}
def health_row(service: str, url: str, timeout: float, gate: str = "none", mode: str = "health_only") -> tuple[ServiceRow, dict[str, Any]]:
status, payload = http_get_json(url, timeout)
ok = status == 200 and payload.get("ok", True) is not False
row = ServiceRow(service=service, reachable=ok, mode=mode if ok else "unavailable", gate=gate)
if not ok:
row.fallbacks = 1
row.warnings.append("unavailable")
row.error = str(payload.get("error") or payload.get("ready_error") or f"http_{status}")[:80]
return row, payload
def measure_probe(fn: Callable[[], tuple[int, dict[str, Any]]], timeout_label: str, busy_path: Path = BUSY_PATH) -> tuple[int, dict[str, Any], float, int | None]:
before = read_busy(busy_path)
started = time.perf_counter()
status, payload = fn()
elapsed_ms = round((time.perf_counter() - started) * 1000, 3)
after = read_busy(busy_path)
delta = None if before is None or after is None else after - before
return status, payload, elapsed_ms, delta
def apply_proof(row: ServiceRow, delta: int | None) -> None:
row.npu_delta_us = delta
row.proof_ok = bool(delta is not None and delta > 0)
if not row.proof_ok:
row.fallbacks += 1
row.warnings.append("no_positive_sysfs_delta" if delta is not None else "missing_sysfs_counter")
def mark_skipped_fallback(row: ServiceRow, reason: str) -> None:
"""Record a skipped/unloaded proof condition as a fallback.
Health-only rows that are intentionally never proof probes should keep
fallbacks at zero. This helper is for proof-capable rows where a bounded
smoke was disabled or skipped to avoid side effects such as cold-loading.
"""
row.fallbacks += 1
row.warnings.append(reason)
def probe_embeddings(timeout: float, busy_path: Path = BUSY_PATH, post_json: Callable[..., tuple[int, dict[str, Any]]] = http_post_json) -> ServiceRow:
row, _ = health_row("embeddings", EMBED_HEALTH_URL, timeout)
if not row.reachable:
return row
payload = {"input": "non-private npu utilization digest probe", "model": "bge-base-en-v1.5-int8-ov"}
status, data, elapsed, delta = measure_probe(lambda: post_json(EMBED_URL, payload, timeout), "embeddings", busy_path)
row.probe_ran = True
row.calls = 1
row.items = 1
row.avg_ms = elapsed
row.mode = "NPU"
row.reachable = status == 200 and "data" in data
row.embedding_count = len(data.get("data", [])) if isinstance(data.get("data"), list) else 0
row.embedding_dim = data.get("embedding_dim")
row.response_delta_us = data.get("npu_busy_delta_us") if isinstance(data.get("npu_busy_delta_us"), int) else None
apply_proof(row, delta)
if not row.reachable:
row.warnings.append("probe_http_failed")
row.error = str(data.get("error") or f"http_{status}")[:80]
return row
def probe_rerank(timeout: float, busy_path: Path = BUSY_PATH, post_json: Callable[..., tuple[int, dict[str, Any]]] = http_post_json) -> ServiceRow:
row, _ = health_row("rerank", RERANK_HEALTH_URL, timeout)
if not row.reachable:
return row
docs = ["Intel NPU accelerates OpenVINO inference.", "Bananas ripen on a kitchen counter."]
payload = {"query": "OpenVINO NPU inference", "documents": docs, "top_k": 2, "return_documents": False}
status, data, elapsed, delta = measure_probe(lambda: post_json(RERANK_URL, payload, timeout), "rerank", busy_path)
row.probe_ran = True
row.calls = 1
row.docs = len(docs)
row.avg_ms = float(data.get("duration_ms") or elapsed)
row.mode = "NPU"
row.reachable = status == 200 and data.get("ok", True) is not False
row.response_delta_us = data.get("npu_busy_delta_us") if isinstance(data.get("npu_busy_delta_us"), int) else None
apply_proof(row, delta)
if not row.reachable:
row.warnings.append("probe_http_failed")
row.error = str(data.get("error") or f"http_{status}")[:80]
return row
def probe_classifier(timeout: float, busy_path: Path = BUSY_PATH, post_json: Callable[..., tuple[int, dict[str, Any]]] = http_post_json) -> ServiceRow:
row, _ = health_row("classifier", CLASSIFIER_HEALTH_URL, timeout, mode="dry_run")
if not row.reachable:
return row
payload = {
"id": "npu-digest-probe",
"text": "Non-private cron event: backup completed successfully, no user action required.",
"options": {"dry_run": True, "include_evidence": False},
}
status, data, elapsed, delta = measure_probe(lambda: post_json(CLASSIFIER_URL, payload, timeout), "classifier", busy_path)
row.probe_ran = True
row.calls = 1
row.events = 1
row.avg_ms = elapsed
row.mode = "dry_run"
row.dry_run = True
row.reachable = status == 200 and "error" not in data
row.response_delta_us = next((data.get(k) for k in ("sysfs_npu_busy_delta_us", "npu_busy_delta_us") if isinstance(data.get(k), int)), None)
raw_labels = data.get("labels")
labels: dict[str, Any] = raw_labels if isinstance(raw_labels, dict) else {}
raw_action = data.get("action")
action: dict[str, Any] = raw_action if isinstance(raw_action, dict) else {}
row.escalate = int(bool(action.get("escalate") or labels.get("action_required") or labels.get("tool_needed")))
row.suppress = int(bool(action.get("suppress") or labels.get("no_op") or labels.get("duplicate")))
row.items = len(labels)
apply_proof(row, delta)
if not row.reachable:
row.warnings.append("probe_http_failed")
row.error = str(data.get("error") or f"http_{status}")[:80]
return row
def write_tone_wav(path: Path, seconds: float = 0.35, sample_rate: int = 16000) -> None:
frames = int(seconds * sample_rate)
with wave.open(str(path), "wb") as wav:
wav.setnchannels(1)
wav.setsampwidth(2)
wav.setframerate(sample_rate)
for i in range(frames):
value = int(9000 * math.sin(2 * math.pi * 440 * (i / sample_rate)))
wav.writeframesraw(value.to_bytes(2, byteorder="little", signed=True))
def post_multipart_file(url: str, file_path: Path, timeout: float) -> tuple[int, dict[str, Any]]:
boundary = "----npu-digest-" + uuid.uuid4().hex
file_bytes = file_path.read_bytes()
parts = [
f"--{boundary}\r\nContent-Disposition: form-data; name=\"model\"\r\n\r\nwhisper\r\n".encode(),
f"--{boundary}\r\nContent-Disposition: form-data; name=\"response_format\"\r\n\r\njson\r\n".encode(),
f"--{boundary}\r\nContent-Disposition: form-data; name=\"file\"; filename=\"npu-digest.wav\"\r\nContent-Type: audio/wav\r\n\r\n".encode(),
file_bytes,
f"\r\n--{boundary}--\r\n".encode(),
]
req = urllib.request.Request(url, data=b"".join(parts), headers={"Content-Type": f"multipart/form-data; boundary={boundary}"})
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return int(resp.status), json.loads(resp.read(1024 * 1024).decode("utf-8", "replace") or "{}")
except Exception as exc:
return 0, {"error": safe_error(exc)}
def probe_whisper(timeout: float, include_smoke: bool, busy_path: Path = BUSY_PATH) -> ServiceRow:
row, _ = health_row("whisper", WHISPER_HEALTH_URL, timeout)
row.jobs = 0
if not row.reachable or not include_smoke:
if row.reachable:
row.mode = "health_only"
row.reason = "smoke_disabled"
mark_skipped_fallback(row, "skipped")
return row
with tempfile.TemporaryDirectory(prefix="npu-digest-whisper-") as tmp:
wav_path = Path(tmp) / "probe.wav"
write_tone_wav(wav_path)
status, data, elapsed, delta = measure_probe(lambda: post_multipart_file(WHISPER_URL, wav_path, timeout), "whisper", busy_path)
row.probe_ran = True
row.calls = 1
row.jobs = 1
row.avg_ms = elapsed
row.mode = "NPU"
row.reachable = status == 200 and "error" not in data
row.text_len = len(str(data.get("text") or ""))
row.sample_rate = data.get("sample_rate") if isinstance(data.get("sample_rate"), int) else None
row.response_delta_us = data.get("npu_busy_delta_us") if isinstance(data.get("npu_busy_delta_us"), int) else None
apply_proof(row, delta)
if not row.reachable:
row.warnings.append("probe_http_failed")
row.error = str(data.get("error") or f"http_{status}")[:80]
return row
def probe_genai(timeout: float, include_smoke: bool, busy_path: Path = BUSY_PATH, post_json: Callable[..., tuple[int, dict[str, Any]]] = http_post_json) -> ServiceRow:
row, health = health_row("genai", GENAI_HEALTH_URL, timeout)
row.loaded = bool(health.get("loaded")) if isinstance(health, dict) and "loaded" in health else None
row.jobs = 0
if not row.reachable:
return row
if not include_smoke or row.loaded is False:
row.mode = "loaded=false" if row.loaded is False else "health_only"
row.reason = "skipped_cold_load" if row.loaded is False else "smoke_disabled"
mark_skipped_fallback(row, row.reason)
return row
payload = {"prompt": "Say pong.", "max_new_tokens": 8}
status, data, elapsed, delta = measure_probe(lambda: post_json(GENAI_GENERATE_URL, payload, timeout), "genai", busy_path)
row.probe_ran = True
row.calls = 1
row.jobs = 1
row.avg_ms = elapsed
row.mode = "NPU"
row.reachable = status == 200 and "error" not in data
apply_proof(row, delta)
return row
def doc_triage_sample_path() -> Path | None:
candidates = [
Path("/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png"),
Path(__file__).resolve().parents[1] / "openvino-doc-image-triage-npu" / "samples" / "synthetic_invoice.png",
]
for candidate in candidates:
if candidate.exists() and candidate.with_suffix(".png.txt").exists():
return candidate
return None
def probe_doc_triage(timeout: float, include_smoke: bool, busy_path: Path = BUSY_PATH, post_json: Callable[..., tuple[int, dict[str, Any]]] = http_post_json) -> ServiceRow:
row, _ = health_row("doc_triage", DOC_TRIAGE_HEALTH_URL, timeout, gate="closed:private-root")
row.files = 0
if not row.reachable or not include_smoke:
if row.reachable:
row.mode = "health_only"
row.reason = "smoke_disabled"
mark_skipped_fallback(row, "skipped")
return row
sample = doc_triage_sample_path()
if sample is not None:
root = sample.parent.resolve()
payload = {"path": str(sample), "options": {"allowed_roots": [str(root)], "include_ocr_text": False, "use_embeddings": True}}
status, data, elapsed, delta = measure_probe(lambda: post_json(DOC_TRIAGE_URL, payload, timeout), "doc_triage", busy_path)
else:
with tempfile.TemporaryDirectory(prefix="npu-digest-doc-") as tmp:
root = Path(tmp).resolve()
sample = root / "synthetic-invoice.png"
sample.write_bytes(base64.b64decode("iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR42mP8/x8AAwMCAO+/p9sAAAAASUVORK5CYII="))
sample.with_suffix(".png.txt").write_text("Synthetic invoice. Amount due $12.34 by 2026-06-30. No private data.\n")
payload = {"path": str(sample), "options": {"allowed_roots": [str(root)], "include_ocr_text": False, "use_embeddings": True}}
status, data, elapsed, delta = measure_probe(lambda: post_json(DOC_TRIAGE_URL, payload, timeout), "doc_triage", busy_path)
row.probe_ran = True
row.calls = 1
row.files = 1
row.avg_ms = elapsed
row.mode = "NPU-via-embedding-service"
row.allowed_roots_count = 1
row.reachable = status == 200 and data.get("ok", True) is not False
raw_result = data.get("result")
result: dict[str, Any] = raw_result if isinstance(raw_result, dict) else {}
raw_pages = result.get("pages")
pages: list[Any] = raw_pages if isinstance(raw_pages, list) else []
embedding: dict[str, Any] = {}
if pages and isinstance(pages[0], dict):
raw_attn = pages[0].get("needs_attention")
attn: dict[str, Any] = raw_attn if isinstance(raw_attn, dict) else {}
raw_embedding = attn.get("embedding")
embedding = raw_embedding if isinstance(raw_embedding, dict) else {}
row.response_delta_us = embedding.get("npu_busy_delta_us") if isinstance(embedding.get("npu_busy_delta_us"), int) else None
apply_proof(row, delta)
if not row.reachable:
row.warnings.append("probe_http_failed")
row.error = str(data.get("error") or f"http_{status}")[:80]
return row
def build_summary(rows: list[ServiceRow], artifact_path: str | None, counter_delta_us: int | None, started_at: str) -> dict[str, Any]:
services_ok = sum(1 for r in rows if r.reachable)
proof_rows = [r for r in rows if r.probe_ran and r.proof_ok is not None]
proof_ok = sum(1 for r in proof_rows if r.proof_ok)
gates_closed = sum(1 for r in rows if str(r.gate).startswith("closed:"))
fallbacks = sum(r.fallbacks for r in rows)
warnings: dict[str, int] = {}
for row in rows:
for warning in row.warnings:
warnings[warning] = warnings.get(warning, 0) + 1
return {
"type": "summary",
"timestamp": started_at,
"counter": str(BUSY_PATH),
"delta_us": counter_delta_us,
"services_ok": services_ok,
"services_total": len(rows),
"proof_ok": proof_ok,
"proof_total": len(proof_rows),
"fallbacks": fallbacks,
"gates_closed": gates_closed,
"warnings": warnings,
"artifact": artifact_path,
}
def render_text(summary: dict[str, Any], rows: list[ServiceRow]) -> str:
lines = [
f"NPU utilization digest {summary['timestamp']}",
f"counter={summary['counter']} delta_us={summary.get('delta_us')}",
f"services_ok={summary['services_ok']}/{summary['services_total']} proof_ok={summary['proof_ok']}/{summary['proof_total']} fallbacks={summary['fallbacks']} gates_closed={summary['gates_closed']}",
]
for r in rows:
parts = [f"- {r.service}:", f"ok={str(r.reachable).lower()}"]
if r.calls:
parts.append(f"calls={r.calls}")
if r.jobs is not None:
parts.append(f"jobs={r.jobs}")
if r.events is not None:
parts.append(f"events={r.events}")
if r.files is not None:
parts.append(f"files={r.files}")
if r.docs is not None:
parts.append(f"docs={r.docs}")
if r.avg_ms is not None:
parts.append(f"avg_ms={r.avg_ms}")
if r.npu_delta_us is not None:
parts.append(f"npu_delta_us={r.npu_delta_us}")
if r.proof_ok is not None:
parts.append(f"proof={str(r.proof_ok).lower()}")
if r.dry_run is not None:
parts.append(f"dry_run={str(r.dry_run).lower()}")
if r.suppress is not None:
parts.append(f"suppress={r.suppress}")
if r.escalate is not None:
parts.append(f"escalate={r.escalate}")
if r.loaded is not None:
parts.append(f"loaded={str(r.loaded).lower()}")
if r.allowed_roots_count is not None:
parts.append(f"allowed_roots={r.allowed_roots_count}")
if r.text_len is not None:
parts.append(f"text_len={r.text_len}")
if r.mode:
parts.append(f"mode={r.mode}")
if r.gate != "none":
parts.append(f"gate={r.gate}")
if r.reason:
parts.append(f"reason={r.reason}")
if r.warnings:
parts.append("warnings=" + ",".join(sorted(set(r.warnings))))
lines.append(" ".join(parts))
warning_counts = summary.get("warnings") or {}
lines.append("fallbacks: " + " ".join(f"{k}={v}" for k, v in sorted(warning_counts.items())) if warning_counts else "fallbacks: none")
if summary.get("artifact"):
lines.append(f"artifact: {summary['artifact']}")
return "\n".join(lines)
def write_jsonl(summary: dict[str, Any], rows: list[ServiceRow], out_dir: Path) -> Path:
out_dir.mkdir(parents=True, exist_ok=True)
stamp = summary["timestamp"].replace(":", "").replace("+", "").replace("-", "")
path = out_dir / f"{stamp}.jsonl"
with path.open("w", encoding="utf-8") as f:
f.write(json.dumps(summary, sort_keys=True, separators=(",", ":")) + "\n")
for row in rows:
f.write(json.dumps(compact_dict(row), sort_keys=True, separators=(",", ":")) + "\n")
return path
def str_bool(value: str) -> bool:
lowered = value.lower()
if lowered in {"1", "true", "yes", "y", "on"}:
return True
if lowered in {"0", "false", "no", "n", "off"}:
return False
raise argparse.ArgumentTypeError("expected true or false")
def parse_args(argv: list[str] | None = None) -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Compact NPU utilization digest")
parser.add_argument("--format", choices=("text", "jsonl"), default="text")
parser.add_argument("--out", default=str(DEFAULT_OUT_DIR))
parser.add_argument("--timeout-s", type=float, default=8.0)
parser.add_argument("--include-whisper-smoke", type=str_bool, default=True)
parser.add_argument("--include-genai-smoke", type=str_bool, default=False)
parser.add_argument("--include-doc-triage-smoke", type=str_bool, default=True)
parser.add_argument("--no-write", action="store_true")
parser.add_argument("--strict-proof", action="store_true", help="exit nonzero if a proof-required probe ran without positive sysfs delta")
parser.add_argument("--verbose", action="store_true")
return parser.parse_args(argv)
def run(args: argparse.Namespace) -> tuple[dict[str, Any], list[ServiceRow]]:
started_at = dt.datetime.now().astimezone().replace(microsecond=0).isoformat()
before_all = read_busy(BUSY_PATH)
rows = [
probe_embeddings(args.timeout_s),
probe_rerank(args.timeout_s),
probe_whisper(args.timeout_s, args.include_whisper_smoke),
probe_classifier(args.timeout_s),
probe_genai(args.timeout_s, args.include_genai_smoke),
probe_doc_triage(args.timeout_s, args.include_doc_triage_smoke),
]
rows.append(health_row("rag_endpoint", RAG_ENDPOINT_HEALTH_URL, args.timeout_s, gate="closed:vector-mutation")[0])
rows.append(health_row("rag_health", RAG_HEALTH_URL, args.timeout_s)[0])
rows.append(health_row("advisory_gateway", ADVISORY_HEALTH_URL, args.timeout_s, gate="closed:advisory-post")[0])
after_all = read_busy(BUSY_PATH)
delta_all = None if before_all is None or after_all is None else after_all - before_all
summary = build_summary(rows, artifact_path=None, counter_delta_us=delta_all, started_at=started_at)
return summary, rows
def main(argv: list[str] | None = None) -> int:
args = parse_args(argv)
summary, rows = run(args)
if not args.no_write:
artifact = write_jsonl(summary, rows, Path(args.out).expanduser())
summary["artifact"] = str(artifact)
# rewrite with artifact path included in the summary line
artifact.write_text("\n".join([json.dumps(summary, sort_keys=True, separators=(",", ":"))] + [json.dumps(compact_dict(r), sort_keys=True, separators=(",", ":")) for r in rows]) + "\n")
if args.format == "jsonl":
print(json.dumps(summary, sort_keys=True, separators=(",", ":")))
for row in rows:
print(json.dumps(compact_dict(row), sort_keys=True, separators=(",", ":")))
else:
print(render_text(summary, rows))
if args.strict_proof and any(r.probe_ran and r.proof_ok is False for r in rows):
return 2
return 0
if __name__ == "__main__":
raise SystemExit(main())
+339
View File
@@ -0,0 +1,339 @@
#!/usr/bin/env python3
"""Local-file voice/audio NPU advisory pipeline.
Side-effect-free first slice:
local audio file -> Whisper NPU -> classifier NPU -> advisory gate
No platform fetching, outbound sends, Obsidian/memory/vector writes, service
restarts, or live Atlas/Hermes routing changes are performed by this script.
"""
from __future__ import annotations
import argparse
import ipaddress
import json
import mimetypes
import os
import re
import sys
import time
import uuid
import wave
from pathlib import Path
from typing import Any
import urllib.error
import urllib.parse
import urllib.request
DEFAULT_WHISPER_URL = "http://127.0.0.1:18816/v1/audio/transcriptions"
DEFAULT_CLASSIFIER_URL = "http://127.0.0.1:18819/v1/classify"
NPU_BUSY_PATH = Path("/sys/class/accel/accel0/device/npu_busy_time_us")
AUDIO_EXTENSIONS = {".wav", ".ogg", ".oga", ".opus", ".mp3", ".m4a", ".mp4", ".webm", ".flac"}
ACTION_MARKERS = re.compile(
r"\b(remind|todo|to-do|task|follow[- ]?up|schedule|call|email|send|draft|inspect|check|fix|review|question|ask)\b",
re.IGNORECASE,
)
class PipelineError(RuntimeError):
def __init__(self, message: str, *, status: int = 1, details: dict[str, Any] | None = None):
super().__init__(message)
self.status = status
self.details = details or {}
def validate_loopback_endpoint(url: str, *, label: str) -> str:
"""Return url when it targets an explicit local HTTP(S) endpoint.
The pipeline reads local audio and posts transcripts/audio bytes, so endpoint
overrides must not be able to exfiltrate data to remote hosts. Keep the
policy intentionally narrow: localhost, IPv4 loopback, or IPv6 ::1 only.
"""
parsed = urllib.parse.urlparse(url)
if parsed.scheme not in {"http", "https"}:
raise PipelineError(
f"{label}_url_scheme_not_allowed",
details={"url_host": parsed.hostname or "", "allowed_schemes": ["http", "https"]},
)
host = parsed.hostname
if not host:
raise PipelineError(f"{label}_url_missing_host")
normalized = host.rstrip(".").lower()
if normalized == "localhost":
return url
try:
address = ipaddress.ip_address(normalized)
except ValueError as exc:
raise PipelineError(
f"{label}_url_host_not_loopback",
details={"url_host": host, "allowed_hosts": ["localhost", "127.0.0.0/8", "::1"]},
) from exc
if not address.is_loopback:
raise PipelineError(
f"{label}_url_host_not_loopback",
details={"url_host": host, "allowed_hosts": ["localhost", "127.0.0.0/8", "::1"]},
)
return url
def read_npu_busy_us(path: Path = NPU_BUSY_PATH) -> int | None:
try:
return int(path.read_text().strip())
except (OSError, ValueError):
return None
def delta_us(before: int | None, after: int | None) -> int | None:
if before is None or after is None:
return None
return max(0, after - before)
def encode_multipart(fields: dict[str, str], files: dict[str, tuple[str, bytes, str]]) -> tuple[bytes, str]:
boundary = "----npu-voice-audio-" + uuid.uuid4().hex
parts: list[bytes] = []
for name, value in fields.items():
parts.append(f"--{boundary}\r\n".encode())
parts.append(f'Content-Disposition: form-data; name="{name}"\r\n\r\n'.encode())
parts.append(str(value).encode())
parts.append(b"\r\n")
for name, (filename, data, content_type) in files.items():
parts.append(f"--{boundary}\r\n".encode())
parts.append(f'Content-Disposition: form-data; name="{name}"; filename="{filename}"\r\n'.encode())
parts.append(f"Content-Type: {content_type}\r\n\r\n".encode())
parts.append(data)
parts.append(b"\r\n")
parts.append(f"--{boundary}--\r\n".encode())
return b"".join(parts), f"multipart/form-data; boundary={boundary}"
def post_json(url: str, payload: dict[str, Any], *, timeout: int) -> dict[str, Any]:
url = validate_loopback_endpoint(url, label="classifier")
req = urllib.request.Request(
url,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read().decode())
except urllib.error.HTTPError as exc:
body = exc.read().decode(errors="replace")[:300]
raise PipelineError(f"classifier_http_{exc.code}", details={"body_preview": body}) from exc
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError) as exc:
raise PipelineError(f"classifier_request_failed: {exc}") from exc
def post_whisper(url: str, audio_path: Path, audio_data: bytes, language: str, *, timeout: int) -> dict[str, Any]:
url = validate_loopback_endpoint(url, label="whisper")
content_type = mimetypes.guess_type(audio_path.name)[0] or "application/octet-stream"
body, multipart_type = encode_multipart(
{"model": "whisper-1", "language": language, "response_format": "json"},
{"file": (audio_path.name, audio_data, content_type)},
)
req = urllib.request.Request(url, data=body, headers={"Content-Type": multipart_type}, method="POST")
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read().decode())
except urllib.error.HTTPError as exc:
body = exc.read().decode(errors="replace")[:300]
raise PipelineError(f"whisper_http_{exc.code}", details={"body_preview": body}) from exc
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError) as exc:
raise PipelineError(f"whisper_request_failed: {exc}") from exc
def validate_audio_path(path_text: str, *, max_bytes: int, max_audio_seconds: float | None) -> tuple[Path, int]:
path = Path(path_text).expanduser()
if not path.is_absolute():
raise PipelineError("audio_path_must_be_absolute")
if path.is_symlink():
raise PipelineError("audio_path_must_not_be_symlink")
if not path.exists():
raise PipelineError("audio_path_not_found")
if not path.is_file():
raise PipelineError("audio_path_not_file")
if path.suffix.lower() not in AUDIO_EXTENSIONS:
raise PipelineError("unsupported_audio_extension", details={"extension": path.suffix.lower()})
size = path.stat().st_size
if size <= 0:
raise PipelineError("audio_file_empty")
if size > max_bytes:
raise PipelineError("audio_file_too_large", details={"bytes": size, "max_bytes": max_bytes})
if max_audio_seconds is not None and path.suffix.lower() == ".wav":
try:
with wave.open(str(path), "rb") as wav:
duration = wav.getnframes() / float(wav.getframerate())
except wave.Error as exc:
raise PipelineError(f"wav_decode_failed: {exc}") from exc
if duration > max_audio_seconds:
raise PipelineError("audio_duration_too_long", details={"duration_seconds": round(duration, 3), "max_audio_seconds": max_audio_seconds})
return path, size
def extract_transcript(payload: dict[str, Any]) -> str:
text = payload.get("text") or payload.get("transcript") or payload.get("transcription")
if not text and isinstance(payload.get("segments"), list):
text = " ".join(str(seg.get("text", "")) for seg in payload["segments"] if isinstance(seg, dict))
return str(text or "").strip()
def label_value(labels: dict[str, Any], key: str, default: Any = None) -> Any:
value = labels.get(key, default)
if isinstance(value, dict) and "value" in value:
return value.get("value")
return value
def compact_labels(classifier_payload: dict[str, Any]) -> dict[str, Any]:
raw_labels = classifier_payload.get("labels")
labels: dict[str, Any] = raw_labels if isinstance(raw_labels, dict) else {}
return {
"workflow_category": label_value(labels, "workflow_category"),
"tool_needed": bool(label_value(labels, "tool_needed", False)),
"urgency": label_value(labels, "urgency", "normal"),
"safety_confirmation_required": bool(label_value(labels, "safety_confirmation_required", False)),
}
def classify_text(
*,
classifier_url: str,
item_id: str,
source: str,
title: str,
transcript: str,
max_transcript_chars: int,
dry_run: bool,
timeout: int,
) -> tuple[dict[str, Any], int | None]:
bounded_transcript = transcript[:max_transcript_chars]
title_line = f"Title: {title}\n" if title else ""
text = "Voice memo transcript summary candidate.\n" f"Source: {source}\n" f"{title_line}Transcript:\n{bounded_transcript}"
payload = {
"id": item_id,
"text": text,
"context": {"source": source, "media": "audio"},
"options": {"include_evidence": False, "dry_run": dry_run},
}
before = read_npu_busy_us()
data = post_json(classifier_url, payload, timeout=timeout)
after = read_npu_busy_us()
return data, delta_us(before, after)
def decide_gate(transcript: str, labels: dict[str, Any], whisper_proven: bool, classifier_proven: bool) -> tuple[bool, str, str]:
safety_required = bool(labels.get("safety_confirmation_required"))
urgency = str(labels.get("urgency") or "normal").lower()
action_worthy = bool(labels.get("tool_needed")) or urgency in {"high", "critical"} or bool(ACTION_MARKERS.search(transcript))
if not whisper_proven or not classifier_proven:
return action_worthy, "blocked_missing_npu_proof", "npu_proof_required"
if safety_required:
return action_worthy, "blocked_safety_confirmation_required", "human_approval_required"
if action_worthy:
return True, "advisory_only_not_sent", "dry_run_no_side_effects"
return False, "suppressed_not_action_worthy", "dry_run_no_side_effects"
def run_pipeline(args: argparse.Namespace) -> dict[str, Any]:
args.whisper_url = validate_loopback_endpoint(args.whisper_url, label="whisper")
args.classifier_url = validate_loopback_endpoint(args.classifier_url, label="classifier")
audio_path, audio_bytes = validate_audio_path(
args.audio,
max_bytes=args.max_bytes,
max_audio_seconds=args.max_audio_seconds,
)
audio_data = audio_path.read_bytes()
item_id = args.id or f"voice-audio-{int(time.time())}"
whisper_before = read_npu_busy_us()
whisper_payload = post_whisper(args.whisper_url, audio_path, audio_data, args.language, timeout=args.timeout)
whisper_after = read_npu_busy_us()
whisper_sysfs_delta = delta_us(whisper_before, whisper_after)
transcript = extract_transcript(whisper_payload)
if not transcript:
raise PipelineError("whisper_empty_transcript")
whisper_response_delta = int(whisper_payload.get("npu_busy_delta_us") or 0)
whisper_proven = whisper_response_delta > 0 and (whisper_sysfs_delta is None or whisper_sysfs_delta > 0)
classifier_payload, classifier_sysfs_observed = classify_text(
classifier_url=args.classifier_url,
item_id=item_id,
source=args.source,
title=args.title or "",
transcript=transcript,
max_transcript_chars=args.max_transcript_chars,
dry_run=args.dry_run,
timeout=args.timeout,
)
labels = compact_labels(classifier_payload)
classifier_response_delta = int(classifier_payload.get("npu_busy_delta_us") or 0)
classifier_response_sysfs_delta = int(classifier_payload.get("sysfs_npu_busy_delta_us") or 0)
classifier_proven = classifier_response_delta > 0 and classifier_response_sysfs_delta > 0 and (classifier_sysfs_observed is None or classifier_sysfs_observed > 0)
action_worthy, atlas_gate, next_gate = decide_gate(transcript, labels, whisper_proven, classifier_proven)
output: dict[str, Any] = {
"ok": True,
"id": item_id,
"source": args.source,
"transcript_chars": len(transcript),
"action_worthy": action_worthy,
"atlas_gate": atlas_gate,
"next_gate": next_gate,
"whisper_npu_delta_us": whisper_response_delta,
"whisper_sysfs_delta_us": whisper_sysfs_delta,
"classifier_npu_delta_us": classifier_response_delta,
"classifier_sysfs_delta_us": classifier_response_sysfs_delta,
"classifier_observed_sysfs_delta_us": classifier_sysfs_observed,
"labels": labels,
"external_sends": 0,
"writes": 0,
}
if args.include_transcript:
output["transcript"] = transcript
if args.include_transcript_preview_chars > 0:
output["transcript_preview"] = transcript[: args.include_transcript_preview_chars]
if args.include_raw:
output["raw"] = {"whisper": whisper_payload, "classifier": classifier_payload}
return output
def build_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(description="Run local-file audio through NPU Whisper and NPU classifier in dry-run advisory mode.")
parser.add_argument("--audio", required=True, help="Absolute path to a local audio file; no URL/platform fetching is performed.")
parser.add_argument("--id", default="", help="Optional stable item id for classifier correlation.")
parser.add_argument("--source", default="local_file", choices=["local_file", "manual_smoke", "local_voice_memo", "meeting_snippet", "staged_telegram", "staged_discord"], help="Local/staged source label only.")
parser.add_argument("--title", default="", help="Optional short local title for classifier context.")
parser.add_argument("--language", default="en")
parser.add_argument("--whisper-url", default=DEFAULT_WHISPER_URL)
parser.add_argument("--classifier-url", default=DEFAULT_CLASSIFIER_URL)
parser.add_argument("--dry-run", dest="dry_run", action="store_true", default=True, help="Keep classifier in dry-run advisory mode (default).")
parser.add_argument("--no-dry-run", dest="dry_run", action="store_false", help="Send dry_run=false to classifier; this script still performs no side effects.")
parser.add_argument("--json", action="store_true", help="Emit compact JSON; default is JSON for machine-safe handoff.")
parser.add_argument("--include-transcript", action="store_true", help="Include full transcript in output; off by default.")
parser.add_argument("--include-transcript-preview-chars", type=int, default=0, help="Include a bounded transcript preview; default 0.")
parser.add_argument("--include-raw", action="store_true", help="Include raw service responses for one-off local debugging; off by default.")
parser.add_argument("--max-bytes", type=int, default=25 * 1024 * 1024)
parser.add_argument("--max-audio-seconds", type=float, default=300.0, help="Enforced for WAV inputs; other codecs remain size-capped.")
parser.add_argument("--max-transcript-chars", type=int, default=6000)
parser.add_argument("--timeout", type=int, default=300)
return parser
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
try:
result = run_pipeline(args)
print(json.dumps(result, ensure_ascii=False, sort_keys=True))
return 0
except PipelineError as exc:
result = {"ok": False, "error": str(exc), "external_sends": 0, "writes": 0, **exc.details}
print(json.dumps(result, ensure_ascii=False, sort_keys=True), file=sys.stderr)
return exc.status
if __name__ == "__main__":
raise SystemExit(main())
@@ -3,7 +3,7 @@ type: runbook
system: openvino-npu-services
status: draft
created: 2026-06-04
updated: 2026-06-04
updated: 2026-06-05
tags:
- runbook
- openvino
@@ -18,33 +18,92 @@ related:
# OpenVINO NPU Services Runbook
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services after the first approved `npu-maximization` lanes. It treats the NPU as a local reflex layer: classify, embed, rerank, transcribe, triage, and draft compact advisory output while Atlas/Hermes keeps final authority unless a separate approval changes that.
Safety posture:
- Do not restart the live Atlas/Hermes gateway from this runbook.
- Do not change primary Atlas/Hermes routing without explicit Will approval.
- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
- Keep endpoints local-only unless Will explicitly approves broader exposure.
- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after a real inference.
- Keep endpoints local-only or on the approved Docker bridge only; do not add wildcard binds.
- Keep raw prompts, private documents, OCR text, transcripts, and secrets out of logs and durable handoffs.
- Keep operational outputs compact: booleans, counts, paths, deltas, and gates rather than raw JSON dumps.
## Current service map
## Reflex-layer topology
| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
| --- | ---: | --- | --- | --- | --- | --- |
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
```text
event / audio / doc / query / task
-> local OpenVINO/NPU specialists
embeddings :18817, rerank :18818, whisper :18816,
classifier :18819, genai worker :18820, doc/image triage :18829,
advisory gateway 172.19.0.1:18830
-> explicit policy and authority gates
-> Atlas/Hermes or human only when approved/useful
```
Authority split:
- NPU services may advise, label, score, transcribe, embed, rerank, triage explicit roots/files, and draft bounded summaries.
- NPU services must not route Atlas/Hermes, write memory, send outbound messages, restart services, execute tools, mutate Kanban, or mutate vector DBs without separate approval.
## Live baseline services
These are part of the current live local baseline. Use read-only checks unless Will explicitly asks for remediation.
| Capability | Port / bind | Runtime / service | State | Health / proof | Notes |
| --- | ---: | --- | --- | --- | --- |
| Obsidian/RAG endpoint | `18810` | `obsidian-reindex-endpoint.service` / local Python endpoint | live baseline | `http://127.0.0.1:18810/healthz`; NPU proof is indirect through embeddings/rerank | Uses collection `obsidian_bge_npu`; do not mutate/reindex in place. Discovery observed `RAG_RERANK_ENABLED=true` and `RAG_RERANK_REQUIRE_NPU_PROOF=true`; do not change from this runbook. |
| RAG/embedding health wrapper | `18814` | `rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | Health wrapper only; use compact summaries. |
| Whisper transcription | `18816` | Docker Compose service/container `whisper-server-npu` | live baseline | `http://127.0.0.1:18816/health`; transcription response plus sysfs busy delta must increase | Use small non-private WAV fixtures for proof. Do not restart from docs. |
| OpenVINO embeddings | `18817` | user systemd `openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz`; embedding response and sysfs delta must be positive | Model `bge-base-en-v1.5-int8-ov`, dim 768. Existing bind is broader than new-service guidance; do not broaden anything else. |
## Live local-only advisory specialists
These services are available locally for advisory/reflex work, not for authority. Some were originally prototypes but discovery/review found them active/enabled; do not reinstall or enable again blindly.
| Capability | Port / bind | Runtime / service | State | Health / proof | Authority boundary |
| --- | ---: | --- | --- | --- | --- |
| NPU reranker | `18818` localhost | `openvino-reranker.service` / `openvino-reranker-npu/` | live local specialist | `/readyz`; `/rerank` response and positive sysfs delta | Rerank only; no vector mutation. |
| NPU router/classifier | `18819` localhost | `openvino-router-classifier.service` / `openvino-classifier-npu/` | live local specialist, dry-run/advisory | `/healthz`; `/v1/classify` response and positive sysfs delta | Labels/recommendations only; no routing, sends, memory writes, restarts, or tool execution. |
| Small OpenVINO GenAI worker | `18820` localhost | `openvino-genai-npu-worker.service` / `openvino-genai-npu-worker/` | live local specialist; may report `loaded=false` until used | `/healthz`, `/models`; generation proof requires positive sysfs delta | Bounded draft/title/summary jobs only; not primary Atlas chat. Avoid cold-load generation unless the task requires it. |
| Document/image triage | `18829` localhost | `openvino-doc-image-triage-npu/` | live local specialist with explicit roots | `/healthz`, `/models`; v1 NPU proof is semantic embedding through `:18817` | Request roots may narrow configured roots, never broaden. OCR/image classification are CPU/local fallbacks. |
| Advisory gateway | `172.19.0.1:18830` approved bridge | `openvino-advisory-gateway.service` / `openvino-advisory-gateway/` | live bridge-facing advisory wrapper | `/healthz`; classify/generate/triage responses include NPU proof | For `n8n-agent` and host cron. POSTs can write metadata events, so use health-only unless classification/draft is in scope. No wildcard bind. |
Port notes:
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
- Prefer localhost for host-only sidecars. The advisory gateway bridge bind is intentionally for Docker bridge consumers such as `n8n-agent`.
- `18828` was an earlier review alternate for doc/image triage and should not be treated as the preferred documented port.
- Check listeners before foreground smokes: `ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b'`.
## Dry-run examples and approved lane artifacts
The first-slice lanes below are approved as dry-run/local advisory examples. They may be merged into the repo by the integration lane, but they do not grant authority to mutate live Atlas/Hermes behavior.
| Lane | Approved branch / commit | Artifact paths | Safe use |
| --- | --- | --- | --- |
| Observability/utilization digest | `feature/npu-max-observability` @ `d661dc299` | `docs/npu-utilization-digest.md`, `scripts/npu-utilization-digest.py` | Read-only compact digest; can write JSONL under `~/.local/state/npu-utilization/digests` unless `--no-write`. Reviewer verified services_ok=9/9, proof_ok=5/5 on live smoke. |
| Context-gate advisory CLI | `feature/npu-max-context-gate` @ `b4ef90aff` | `openvino_context_gate/`, `scripts/context-gate-advisory.py` | Plans typed context bundle sources; no retrieval, routing, memory write, or private content. Classifier URL is loopback-only and redirects fail closed. |
| Cron/n8n advisory classifier | `feature/npu-max-cron-n8n` @ `54d3bcb7` | `openvino-advisory-gateway/docs/cron-n8n-advisory-classifier.md`, `examples/cron-advisory-dry-run.sh`, `examples/n8n-advisory-dry-run-fragment.json` | Dry-run event classification: duplicate/stale/no-op/action-required -> suppress/log/summarize/escalate recommendation, then human/Atlas gate before side effects. |
| Explicit-root batch doc/image/audio triage | `feature/npu-max-doc-audio-triage` @ `bfa62cddb` | `docs/npu-batch-triage-dry-run.md`, `scripts/npu-batch-triage-dry-run.py`, `config/triage-roots*.yaml` | Reads only approved/narrow staging roots; reports compact counts/proof; no file moves, Obsidian/RAG writes, sends, or vector mutation. Whisper endpoint override is loopback `:18816` only. |
| Voice/audio local-file pipeline | `feature/npu-max-voice` @ `534816249` | `docs/npu-voice-audio-pipeline.md`, `scripts/npu_voice_audio_pipeline.py` | Local audio file -> Whisper NPU -> classifier NPU -> advisory gate. No platform fetching, sends, writes, memory writes, or routing changes. |
| Kanban/task hygiene advisory | `feature/npu-max-kanban-hygiene` @ `575a3cef6` | `scripts/kanban-hygiene-advisory.py` | Reads compact board summaries and suggests labels/next gates only. Does not call Kanban tools or mutate the board. NPU proof failures dominate generic review-required gates. |
Dry-run command patterns:
```bash
# Compact service/proof digest; no artifact write during review.
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false
# Local-only context-gate planning; does not retrieve private content.
python scripts/context-gate-advisory.py --query "How do I check NPU reranker proof?" --format compact
# Cron/n8n event advisory wrapper; dry-run only, one compact decision line.
openvino-advisory-gateway/examples/cron-advisory-dry-run.sh npu-service-health warning health_check "openvino-reranker timeout twice" "service:openvino-reranker:timeout"
# Explicit-root triage; manifest root may be narrowed by --root, never broadened.
python scripts/npu-batch-triage-dry-run.py --manifest config/triage-roots.test.yaml --lane receipts --root openvino-doc-image-triage-npu/samples --limit 5 --dry-run --json
# Local-file audio advisory; transcript omitted unless explicitly requested.
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py --audio /tmp/npu-voice-smoke.wav --title "synthetic smoke" --source manual_smoke --json
```
## Read-only unified health check
@@ -55,15 +114,15 @@ cd ~/lab/swarm
./scripts/npu-service-health.sh
```
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
The script is read-only. It checks listeners for the live baseline and local specialists, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
Manual minimal checks:
```bash
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
cat "$BUSY"
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b' || true
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service openvino-reranker.service openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service openvino-advisory-gateway.service
cd ~/lab/swarm && docker compose ps whisper-server-npu
curl -fsS http://127.0.0.1:18817/healthz | jq .
```
@@ -87,23 +146,7 @@ A healthy NPU path has:
## Service-specific smoke checks
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
Safe foreground-server pattern:
```bash
server_pid=""
cleanup() {
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
kill "$server_pid"
wait "$server_pid" 2>/dev/null || true
fi
}
trap cleanup EXIT
# start prototype server with --host 127.0.0.1 --port <port> &
# server_pid=$!
# run curl/smoke commands, then let trap stop it
```
For any foreground prototype/server smoke, run it in a terminal you control or capture its PID and stop it at the end. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning unless Will explicitly approved persistent service enablement. Several specialists are already live; do not start duplicate listeners.
### Whisper NPU (`:18816`)
@@ -115,7 +158,6 @@ curl -fsS http://127.0.0.1:18816/health | jq .
Operational notes:
- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
- Do not restart it from this runbook unless Will asked for remediation.
### OpenVINO embeddings (`:18817`)
@@ -127,26 +169,10 @@ curl -fsS http://127.0.0.1:18817/healthz | jq .
Operational notes:
- User systemd unit: `openvino-embeddings.service`.
- Model: `bge-base-en-v1.5-int8-ov`.
- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.
### Reranker prototype (`:18818`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18818\b' || true
cd ~/lab/swarm/openvino-reranker-npu
source /home/will/.venvs/openvino-reranker/bin/activate
OPENVINO_RERANKER_HOST=127.0.0.1 \
OPENVINO_RERANKER_PORT=18818 \
OPENVINO_RERANKER_DEVICE=NPU \
OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
python server.py
```
From another shell:
### Reranker (`:18818`)
```bash
curl -fsS http://127.0.0.1:18818/readyz | jq .
@@ -154,107 +180,78 @@ python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
```
Approval gate:
- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
- Rerank may score candidate passages only. Any change to RAG answer selection, rerank policy, or vector DB behavior requires separate approval and rollback notes.
### Router/classifier prototype (`:18819`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18819\b' || true
cd ~/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
```
Smoke:
### Router/classifier (`:18819`)
```bash
curl -fsS http://127.0.0.1:18819/healthz | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
-H 'Content-Type: application/json' \
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":false,"dry_run":true}}' | jq '{id, labels, npu_busy_delta_us, sysfs_npu_busy_delta_us}'
```
Approval gate:
- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
- Must remain dry-run/advisory and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
### Small GenAI NPU worker (`:18820`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18820\b' || true
cd ~/lab/swarm/openvino-genai-npu-worker
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
```
Smoke:
```bash
curl -fsS http://127.0.0.1:18820/healthz | jq .
curl -fsS http://127.0.0.1:18820/models | jq .
curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
-H 'Content-Type: application/json' \
-d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
```
Approval gate:
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
- Must not become primary Atlas/Hermes model routing. Use only for bounded local jobs such as title, summary, notification condensation, and memory-candidate drafting after the relevant job is approved.
- Avoid generation smokes that cold-load the model unless the task explicitly calls for it.
### Document/image triage prototype (`:18829` optional review port)
Foreground review start only, after confirming the port is free:
```bash
ss -ltnp | grep ':18829\b' || true
cd ~/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
```
Smoke:
### Document/image triage (`:18829`)
```bash
curl -fsS http://127.0.0.1:18829/healthz | jq .
curl -fsS http://127.0.0.1:18829/models | jq .
/home/will/.venvs/npu/bin/python tests/smoke_test.py
```
Approval gate:
- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
- Do not include raw OCR text or full source paths unless Will explicitly asks for one-off debugging.
- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.
## Systemd and Compose recommendations
### Advisory gateway (`172.19.0.1:18830`)
Recommended management split:
- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
```bash
curl -fsS http://172.19.0.1:18830/healthz | jq .
docker exec n8n-agent wget -qO- -T 8 http://172.19.0.1:18830/healthz
```
User-systemd unit expectations for optional prototypes:
- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
- `ExecStart` uses the existing venv path documented by the prototype.
- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
- `Restart=on-failure`, not aggressive restart loops.
- Logs go to user journal; do not log raw request bodies.
- Start manually for smoke; enable on boot only after Will approval.
Approval gate:
- Classification/generation/triage POSTs are advisory only and may write metadata counters. Do not wire outputs to sends, restarts, memory writes, tool execution, or Atlas/Hermes routing without a separate reviewed approval.
Compose expectations for existing swarm services:
- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps <service>` for read-only checks.
- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
## Approval-gated / not-live integrations
The following remain closed even though dry-run examples and local specialists exist:
| Integration | Current gate |
| --- | --- |
| Primary Atlas/Hermes routing changes | closed; no live routing authority changes from this program slice |
| Memory writes from NPU classifier/GenAI/advisory gateway | closed |
| Telegram/Discord/email/outbound sends from cron/n8n/voice/advisory output | closed |
| Service restarts or tool execution triggered by classifier/gateway output | closed |
| Automatic Kanban task mutation, assignment, block/unblock, completion, or task creation | closed |
| Broad private document/image/audio root processing | closed; only explicit approved/narrow roots |
| Vector DB mutation/reindex or Chroma collection replacement | closed |
| Wildcard binds or broader exposure for new services | closed |
| GenAI worker as primary chat model | closed; bounded local drafts only |
| Diffusion/image generation on the NPU | rejected/parked for this program |
## Monitoring and logging notes
Minimum recurring monitoring should include:
- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
- Docker Compose health for `whisper-server-npu`.
- Listener presence for live baseline and any approved specialist ports.
- User service state for OpenVINO services and Docker Compose health for `whisper-server-npu`.
- HTTP health endpoint success.
- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
- Compact counts/deltas/gates only. Avoid raw prompts, transcripts, OCR text, private document names, credentials, and API keys.
Useful log commands:
@@ -264,23 +261,14 @@ journalctl --user -u rag-embedding-health.service -n 100 --no-pager
journalctl --user -u openvino-reranker.service -n 100 --no-pager
journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
journalctl --user -u openvino-advisory-gateway.service -n 100 --no-pager
cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
```
## Approval gates
## Approved/parked outcomes
Requires explicit Will approval before proceeding:
- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
- Restarting the live Atlas/Hermes gateway.
- Deleting, overwriting, or in-place reindexing existing vector collections.
- Broadening bind addresses or exposure beyond local-only defaults.
Approved/parked outcomes:
- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
- Live baseline retained: RAG endpoint (`:18810`), RAG health wrapper (`:18814`), Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`).
- Live local-only advisory/reflex specialists: reranker (`:18818`), router/classifier (`:18819`), GenAI worker (`:18820`), doc/image triage (`:18829`), advisory gateway bridge (`172.19.0.1:18830`).
- Approved dry-run examples: utilization digest, context gate plan, cron/n8n advisory classifier, explicit-root batch triage, local-file voice/audio pipeline, Kanban hygiene advisory.
- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
- Rejected for this NPU program: diffusion/image generation.
+200
View File
@@ -0,0 +1,200 @@
from __future__ import annotations
import json
import socket
import subprocess
import sys
import threading
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from pathlib import Path
import pytest
REPO_ROOT = Path(__file__).resolve().parents[1]
if str(REPO_ROOT) not in sys.path:
sys.path.insert(0, str(REPO_ROOT))
from openvino_context_gate.context_gate import ( # noqa: E402
AUTHORITY,
ClassifierResult,
ContextGateError,
build_plan,
classify_live,
compact_json,
compact_line,
)
def fake_classifier(
labels: dict,
*,
endpoint_delta: int | None = 120,
sysfs_delta: int | None = 120,
outer_delta: int | None = 80,
) -> ClassifierResult:
return ClassifierResult(
labels=labels,
npu_busy_delta_us=endpoint_delta,
sysfs_npu_busy_delta_us=sysfs_delta,
outer_sysfs_delta_us=outer_delta,
live=True,
)
def labels(category: str, *, tool: bool = False, safety: bool = False, memory: str = "none") -> dict:
return {
"tool_needed": {"value": tool, "confidence": 0.8 if tool else 0.4},
"memory_candidate": {"value": memory, "confidence": 0.8 if memory != "none" else 0.3},
"urgency": {"value": "normal", "confidence": 0.6},
"workflow_category": {"value": category, "confidence": 0.86},
"safety_confirmation_required": {"value": safety, "confidence": 0.9 if safety else 0.1},
}
def test_current_npu_debug_query_selects_ops_live_and_repo_sources() -> None:
plan = build_plan(
"How do I check whether the RAG reranker is using the NPU?",
context={"platform": "cli", "repo_path": "/home/will/lab/swarm"},
classifier=fake_classifier(labels("devops", tool=True)),
)
assert plan["schema"] == "atlas_context_gate_plan_v1"
assert plan["bundle_plan"]["bundle_name"] == "OpsDebugBundle"
assert [s["source"] for s in plan["source_plan"]][:2] == ["live_system", "repo_files"]
assert plan["npu_proof"]["verified"] is True
assert plan["authority"] == AUTHORITY
assert all(value.startswith("closed_") for value in plan["gates"].values())
def test_prior_plan_query_uses_session_or_rag_and_coding_for_kanban() -> None:
plan = build_plan(
"Where did we leave the NPU context gate implementation plan?",
context={"platform": "kanban", "task_id": "t_example", "repo_path": "/home/will/lab/swarm"},
classifier=fake_classifier(labels("coding", tool=True)),
)
sources = [s["source"] for s in plan["source_plan"]]
assert plan["bundle_plan"]["bundle_name"] == "CodingTaskBundle"
assert "repo_files" in sources
assert "session_search" in sources
assert "rag_search" in sources
def test_simple_creative_query_no_retrieval_offline_no_npu_claim() -> None:
plan = build_plan("Write a haiku about Seattle rain.")
assert plan["bundle_plan"]["bundle_name"] == "SimpleResponseBundle"
assert [s["source"] for s in plan["source_plan"]] == ["no_retrieval"]
assert plan["npu_proof"]["verified"] is False
assert "npu_proof_inconclusive" in plan["warnings"]
assert "offline_heuristic_classifier_no_npu_claim" in plan["warnings"]
def test_unsafe_live_routing_request_keeps_authority_closed_and_blocks_side_effect() -> None:
plan = build_plan(
"Change Hermes live routing to use the classifier automatically.",
context={"repo_path": "/home/will/lab/swarm"},
classifier=fake_classifier(labels("coding", tool=True, safety=True)),
)
assert plan["authority"] == AUTHORITY
assert plan["authority"]["may_route"] is False
assert any(field["field"] == "authority_side_effect" for field in plan["bundle_plan"]["blocked_fields"])
assert plan["gates"]["live_routing_change"] == "closed_requires_explicit_approval"
def test_rejects_non_dry_run_and_private_text_options() -> None:
with pytest.raises(ContextGateError, match="dry_run_must_remain_true"):
build_plan("hello", options={"dry_run": False})
with pytest.raises(ContextGateError, match="include_private_text"):
build_plan("hello", options={"include_private_text": True})
def test_compact_outputs_are_small_and_parseable() -> None:
plan = build_plan("How do I check whether port 18819 is healthy?")
line = compact_line(plan)
assert "schema=atlas_context_gate_plan_v1" in line
assert "gates=closed:" in line
parsed = json.loads(compact_json(plan))
assert parsed["schema"] == "atlas_context_gate_plan_v1"
assert isinstance(parsed["sources"], list)
assert "authority" in parsed
def test_cli_offline_compact_json_smoke() -> None:
script = REPO_ROOT / "scripts" / "context-gate-advisory.py"
result = subprocess.run(
[sys.executable, str(script), "--offline", "--query", "Write a haiku about Seattle rain.", "--format", "compact-json"],
check=True,
text=True,
capture_output=True,
cwd=REPO_ROOT,
)
parsed = json.loads(result.stdout)
assert parsed["ok"] is True
assert parsed["bundle_name"] == "SimpleResponseBundle"
assert parsed["sources"] == ["no_retrieval"]
assert parsed["npu_proof"]["verified"] is False
def test_npu_proof_requires_positive_sysfs_delta() -> None:
classifier = fake_classifier(labels("devops", tool=True), endpoint_delta=120, sysfs_delta=0, outer_delta=None)
plan = build_plan("How do I check whether the RAG reranker is using the NPU?", classifier=classifier)
assert plan["npu_proof"]["verified"] is False
assert "npu_proof_inconclusive" in plan["warnings"]
endpoint_sysfs_plan = build_plan(
"How do I check whether the RAG reranker is using the NPU?",
classifier=fake_classifier(labels("devops", tool=True), endpoint_delta=120, sysfs_delta=1, outer_delta=None),
)
assert endpoint_sysfs_plan["npu_proof"]["verified"] is True
outer_sysfs_plan = build_plan(
"How do I check whether the RAG reranker is using the NPU?",
classifier=fake_classifier(labels("devops", tool=True), endpoint_delta=120, sysfs_delta=0, outer_delta=1),
)
assert outer_sysfs_plan["npu_proof"]["verified"] is True
def test_classifier_url_must_be_loopback_or_localhost() -> None:
for url in [
"http://example.com/v1/classify",
"https://10.0.0.5/v1/classify",
"http://0.0.0.0:18819/v1/classify",
"ftp://127.0.0.1/v1/classify",
]:
with pytest.raises(ContextGateError, match="invalid_classifier_url"):
classify_live("hello", classifier_url=url, timeout=0.01)
def test_classifier_url_redirect_to_non_loopback_is_not_followed(monkeypatch: pytest.MonkeyPatch) -> None:
requests: list[str] = []
class RedirectHandler(BaseHTTPRequestHandler):
def do_POST(self) -> None: # noqa: N802 - stdlib callback name
requests.append(self.path)
self.send_response(302)
self.send_header("Location", "http://example.com/v1/classify")
self.end_headers()
def log_message(self, format: str, *args: object) -> None:
return
original_create_connection = socket.create_connection
def guarded_create_connection(address, *args, **kwargs): # type: ignore[no-untyped-def]
host = address[0]
if host not in {"127.0.0.1", "localhost"}:
raise AssertionError(f"attempted non-loopback redirect connection to {host}")
return original_create_connection(address, *args, **kwargs)
server = ThreadingHTTPServer(("127.0.0.1", 0), RedirectHandler)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
monkeypatch.setattr(socket, "create_connection", guarded_create_connection)
try:
url = f"http://127.0.0.1:{server.server_port}/v1/classify"
with pytest.raises(ContextGateError, match="classifier_unavailable"):
classify_live("hello", classifier_url=url, timeout=1.0)
finally:
server.shutdown()
server.server_close()
thread.join(timeout=2)
assert requests == ["/v1/classify"]
+198
View File
@@ -0,0 +1,198 @@
from __future__ import annotations
import importlib.util
import json
import subprocess
import sys
import tempfile
import unittest
from pathlib import Path
from typing import cast
ROOT = Path(__file__).resolve().parents[1]
MODULE_PATH = ROOT / "scripts" / "kanban-hygiene-advisory.py"
def load_module():
spec = importlib.util.spec_from_file_location("kanban_hygiene_advisory", MODULE_PATH)
assert spec and spec.loader
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
def task(task_id: str, title: str, status: str = "ready", **extra):
row = {
"id": task_id,
"title": title,
"status": status,
"assignee": "engineer",
"created_at": 1_780_000_000,
"updated_at": 1_780_000_100,
}
row.update(extra)
return row
class KanbanHygieneAdvisoryTests(unittest.TestCase):
def setUp(self):
self.mod = load_module()
def advisory(self, tasks, now=1_780_003_600):
return self.mod.advisory(
tasks,
board="npu-maximization",
now=now,
input_metadata={},
include_evidence=False,
)
def test_output_contract_and_authority_flags_are_all_false(self):
output = self.advisory([
task("t_spec", "spec: Kanban/task hygiene classifier", body_excerpt="Define dry-run labels and next gate.")
])
self.assertEqual(output["schema"], "kanban_hygiene_advisory_v1")
self.assertTrue(output["dry_run"])
self.assertEqual(output["counts"]["tasks"], 1)
self.assertTrue(output["npu_proof"]["required_for_npu_claims"])
self.assertFalse(output["npu_proof"]["attempted"])
self.assertTrue(output["authority"])
self.assertTrue(all(value is False for value in output["authority"].values()))
def test_required_labels_and_kanban_lane_gate(self):
output = self.advisory([
task("t1", "spec: Kanban/task hygiene classifier", body_excerpt="Read board summaries and suggest review-needed next gate labels.")
])
item = output["items"][0]
for key in ["task_type", "project", "lane", "blocker", "staleness", "duplicate", "review_needed", "next_gate"]:
self.assertIn(key, item)
self.assertEqual(item["task_type"]["value"], "spec")
self.assertEqual(item["project"]["value"], "npu-maximization")
self.assertEqual(item["lane"]["value"], "kanban_hygiene")
self.assertEqual(item["next_gate"]["value"], "ready_for_implementation")
def test_lifecycle_chain_is_not_duplicate_even_with_same_normalized_title(self):
rows = [
task("t_spec", "spec: Kanban hygiene advisory", children=["t_impl"]),
task("t_impl", "implement: Kanban hygiene advisory", parents=["t_spec"], children=["t_review"]),
task("t_review", "review: Kanban hygiene advisory", parents=["t_impl"]),
]
output = self.advisory(rows)
self.assertEqual(output["counts"]["duplicates"], 0)
self.assertTrue(all(not item["duplicate"]["is_duplicate"] for item in output["items"]))
def test_duplicate_same_type_lane_and_normalized_title_is_flagged(self):
rows = [
task("t_a", "implement: dry-run Kanban hygiene advisory", body_excerpt="Kanban board summaries"),
task("t_b", "implement: dry run kanban hygiene advisory", body_excerpt="Kanban board summaries"),
]
output = self.advisory(rows)
self.assertEqual(output["counts"]["duplicates"], 1)
dupes = [item for item in output["items"] if item["duplicate"]["is_duplicate"]]
self.assertEqual(len(dupes), 1)
self.assertEqual(dupes[0]["next_gate"]["value"], "dedupe_review")
def test_staleness_is_deterministic_with_now(self):
output = self.advisory([
task("t_run", "implement: NPU service", status="running", updated_at=1_780_000_000, heartbeat_at=1_780_000_000)
], now=1_780_007_201)
item = output["items"][0]
self.assertEqual(item["staleness"]["value"], "stale_lock")
self.assertEqual(item["next_gate"]["value"], "investigate_stale_lock")
self.assertEqual(output["counts"]["stale"], 1)
def test_review_required_marker_sets_ready_for_review(self):
output = self.advisory([
task(
"t_impl",
"implement: dry-run Kanban hygiene advisory",
status="blocked",
body_excerpt="review-required: code change needs review",
changed_files=["scripts/kanban-hygiene-advisory.py"],
tests_run=8,
)
])
item = output["items"][0]
self.assertTrue(item["review_needed"]["value"])
self.assertEqual(item["review_needed"]["kind"], "code_change")
self.assertEqual(item["next_gate"]["value"], "ready_for_review")
def test_missing_parent_waits_without_marking_blocked(self):
output = self.advisory([
task("t_child", "implement: context gate", status="todo", parents=["t_parent"], body_excerpt="RAG context gate")
])
item = output["items"][0]
self.assertEqual(item["blocker"]["value"], "missing_parent")
self.assertFalse(item["blocker"]["blocked"])
self.assertEqual(item["next_gate"]["value"], "wait_for_parents")
def test_npu_claim_without_busy_delta_routes_to_proof_gate(self):
for excerpt in [
"NPU classifier returned HTTP 200 but missing busy delta evidence",
"NPU reranker reported npu_busy_delta_us=0",
"NPU reranker reported npu_busy_delta_us=-5",
"NPU reranker reported npu_busy_delta_us=-0.1",
]:
with self.subTest(excerpt=excerpt):
output = self.advisory([task("t_npu", "test: NPU classifier smoke", body_excerpt=excerpt)])
item = output["items"][0]
self.assertTrue(item["review_needed"]["value"])
self.assertEqual(item["review_needed"]["kind"], "npu_proof_gate")
self.assertEqual(item["next_gate"]["value"], "needs_npu_proof")
def test_npu_proof_gate_dominates_review_required_marker(self):
for excerpt in [
"review-required: NPU reranker reported npu_busy_delta_us=0 after smoke",
"review-required: NPU classifier returned HTTP 200 but missing busy delta evidence",
]:
with self.subTest(excerpt=excerpt):
output = self.advisory([
task(
"t_npu_review",
"implement: NPU classifier smoke",
status="blocked",
body_excerpt=excerpt,
changed_files=["scripts/npu-classifier.py"],
tests_run=1,
)
])
item = output["items"][0]
self.assertTrue(item["review_needed"]["value"])
self.assertEqual(item["review_needed"]["kind"], "npu_proof_gate")
self.assertEqual(item["next_gate"]["value"], "needs_npu_proof")
def test_cli_accepts_jsonl_auto_format_and_invalid_schema_exits_nonzero(self):
good_rows = [
json.dumps(task("t1", "docs: service map update", body_excerpt="runbook README")),
json.dumps(task("t2", "ops: utilization digest", body_excerpt="health metrics digest")),
]
with tempfile.NamedTemporaryFile("w", suffix=".jsonl", delete=False) as handle:
handle.write("\n".join(good_rows))
good_path = handle.name
try:
result = subprocess.run(
[sys.executable, str(MODULE_PATH), "--input", good_path, "--board", "npu-maximization", "--now", "1780003600"],
capture_output=True,
text=True,
check=False,
)
finally:
Path(good_path).unlink(missing_ok=True)
self.assertEqual(result.returncode, 0, result.stderr)
parsed = json.loads(result.stdout)
self.assertEqual(parsed["counts"]["tasks"], 2)
bad = subprocess.run(
[sys.executable, str(MODULE_PATH)],
input=json.dumps({"tasks": [{"id": "missing-fields"}]}),
capture_output=True,
text=True,
check=False,
)
self.assertNotEqual(bad.returncode, 0)
self.assertIn("missing required fields", bad.stderr)
if __name__ == "__main__":
unittest.main()
+202
View File
@@ -0,0 +1,202 @@
from __future__ import annotations
import importlib.util
import json
import os
import subprocess
import sys
from pathlib import Path
import pytest
import yaml
ROOT = Path(__file__).resolve().parents[1]
SCRIPT = ROOT / "scripts" / "npu-batch-triage-dry-run.py"
def load_script_module():
spec = importlib.util.spec_from_file_location("npu_batch_triage_dry_run", SCRIPT)
assert spec and spec.loader
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def run_cli(*args: str) -> subprocess.CompletedProcess[str]:
return subprocess.run(
[sys.executable, str(SCRIPT), *args],
cwd=ROOT,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=False,
)
def manifest(tmp_path: Path, lane: str = "screenshots", approved: bool = True, root: Path | None = None, exts: list[str] | None = None) -> Path:
root = root or tmp_path / "approved"
root.mkdir(parents=True, exist_ok=True)
data = {
"version": 1,
"policy": {
"default_mode": "dry_run",
"require_explicit_root": True,
"allow_external_uploads": False,
"allow_mutations": False,
"log_raw_text": False,
},
"roots": {
lane: {
"approved": approved,
"root": str(root),
"allowed_extensions": exts or [".png", ".txt", ".wav"],
"max_files": 10,
"max_file_mb": 5,
}
},
}
path = tmp_path / "triage-roots.yaml"
path.write_text(yaml.safe_dump(data))
return path
def test_missing_manifest_fails_closed(tmp_path: Path) -> None:
result = run_cli("--manifest", str(tmp_path / "missing.yaml"), "--lane", "screenshots", "--dry-run", "--no-npu", "--json")
out = json.loads(result.stdout)
assert result.returncode == 2
assert out["ok"] is False
assert out["error"] == "fail_closed"
assert "manifest_missing" in out["reason"]
assert all(v is False for v in out["mutations"].values())
def test_unapproved_lane_fails_closed(tmp_path: Path) -> None:
man = manifest(tmp_path, approved=False)
result = run_cli("--manifest", str(man), "--lane", "screenshots", "--dry-run", "--no-npu", "--json")
out = json.loads(result.stdout)
assert result.returncode == 2
assert out["ok"] is False
assert "lane_unapproved:screenshots" in out["reason"]
def test_request_root_cannot_broaden_manifest_root(tmp_path: Path) -> None:
approved = tmp_path / "approved" / "narrow"
man = manifest(tmp_path, root=approved)
broad = tmp_path / "approved"
result = run_cli("--manifest", str(man), "--lane", "screenshots", "--root", str(broad), "--dry-run", "--no-npu", "--json")
out = json.loads(result.stdout)
assert result.returncode == 2
assert out["ok"] is False
assert "request_root_broadens_approval:screenshots" in out["reason"]
def test_symlink_escape_is_skipped_and_output_redacted(tmp_path: Path) -> None:
approved = tmp_path / "approved"
approved.mkdir()
outside = tmp_path / "outside"
outside.mkdir()
(approved / "note.png").write_bytes(b"fake image")
(approved / "note.png.txt").write_text("Invoice payment due 2026-06-10 $42.00")
(outside / "secret.png").write_bytes(b"secret")
os.symlink(outside / "secret.png", approved / "escape.png")
man = manifest(tmp_path, root=approved, exts=[".png"])
result = run_cli("--manifest", str(man), "--lane", "screenshots", "--dry-run", "--no-npu", "--json")
out = json.loads(result.stdout)
assert result.returncode == 0
assert out["ok"] is True
assert out["files_processed"] == 1
assert out["skipped"]["symlink_escape"] == 1
item = out["items"][0]
assert item["basename"] == "note.png"
assert item["raw_text_redacted"] is True
assert item["full_path_included"] is False
assert "full_path" not in item
assert "Invoice" not in json.dumps(out)
assert out["npu"]["claimed"] is False
assert all(v is False for v in out["mutations"].values())
def test_committed_sample_manifest_cpu_smoke() -> None:
result = run_cli(
"--manifest",
"config/triage-roots.test.yaml",
"--lane",
"receipts",
"--root",
"openvino-doc-image-triage-npu/samples",
"--limit",
"2",
"--dry-run",
"--no-npu",
"--json",
)
out = json.loads(result.stdout)
assert result.returncode == 0
assert out["ok"] is True
assert out["lane"] == "receipts"
assert out["dry_run"] is True
assert out["files_processed"] == 2
assert out["npu"] == {"busy_delta_us": None, "claimed": False, "proof_ok": False} or out["npu"]["claimed"] is False
assert all(v is False for v in out["mutations"].values())
assert all(item.get("raw_text_redacted", True) for item in out["items"])
def test_audio_lane_no_npu_does_not_transcribe_or_claim(tmp_path: Path) -> None:
approved = tmp_path / "voice"
approved.mkdir()
(approved / "memo.wav").write_bytes(b"not really wav; no-npu mode must not decode")
man = manifest(tmp_path, lane="voice_memos", root=approved, exts=[".wav"])
result = run_cli("--manifest", str(man), "--lane", "voice_memos", "--dry-run", "--no-npu", "--json")
out = json.loads(result.stdout)
assert result.returncode == 0
item = out["items"][0]
assert item["transcribed"] is False
assert item["raw_transcript_logged"] is False
assert out["npu"]["claimed"] is False
assert all(v is False for v in out["mutations"].values())
def test_external_whisper_url_fails_closed_before_audio_read(tmp_path: Path) -> None:
module = load_script_module()
with pytest.raises(module.FailClosed, match="whisper_url_not_loopback"):
module.multipart_transcribe(tmp_path / "missing.wav", "http://example.com:18816/v1/audio/transcriptions", 0.01)
def test_audio_lane_rejects_external_whisper_url(tmp_path: Path) -> None:
approved = tmp_path / "voice"
approved.mkdir()
(approved / "memo.wav").write_bytes(b"synthetic audio bytes")
man = manifest(tmp_path, lane="voice_memos", root=approved, exts=[".wav"])
result = run_cli(
"--manifest",
str(man),
"--lane",
"voice_memos",
"--dry-run",
"--whisper-url",
"https://example.com/v1/audio/transcriptions",
"--json",
)
out = json.loads(result.stdout)
assert result.returncode == 2
assert out["ok"] is False
assert out["error"] == "fail_closed"
assert out["reason"] == "whisper_url_scheme_not_http"
assert out["mutations"] == {
"obsidian": False,
"rag": False,
"vector_db": False,
"sends": False,
"file_moves": False,
"routing": False,
"memory": False,
"service_restarts": False,
}
def test_localhost_whisper_url_is_allowed() -> None:
module = load_script_module()
assert module.validate_local_whisper_url("http://localhost:18816/v1/audio/transcriptions")
assert module.validate_local_whisper_url("http://127.0.0.1:18816/v1/audio/transcriptions")
assert module.validate_local_whisper_url("http://[::1]:18816/v1/audio/transcriptions")
+162
View File
@@ -0,0 +1,162 @@
from __future__ import annotations
import importlib.util
import json
import sys
from pathlib import Path
SCRIPT = Path(__file__).resolve().parents[1] / "scripts" / "npu-utilization-digest.py"
spec = importlib.util.spec_from_file_location("npu_utilization_digest", SCRIPT)
assert spec and spec.loader
digest = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = digest
spec.loader.exec_module(digest)
def test_positive_sysfs_delta_required_for_proof(tmp_path, monkeypatch):
busy = tmp_path / "busy"
busy.write_text("100")
def fake_health(service, url, timeout, gate="none", mode="health_only"):
return digest.ServiceRow(service=service, reachable=True, mode=mode, gate=gate), {"ok": True}
def fake_post(url, payload, timeout):
busy.write_text("100")
return 200, {"data": [{"embedding": [0.1, 0.2]}], "embedding_dim": 2, "npu_busy_delta_us": 7}
monkeypatch.setattr(digest, "health_row", fake_health)
row = digest.probe_embeddings(1, busy_path=busy, post_json=fake_post)
assert row.probe_ran is True
assert row.proof_ok is False
assert "no_positive_sysfs_delta" in row.warnings
def test_embeddings_row_redacts_vectors(tmp_path):
row = digest.ServiceRow(
service="embeddings",
reachable=True,
probe_ran=True,
proof_ok=True,
calls=1,
items=1,
avg_ms=12.3,
npu_delta_us=5,
embedding_count=1,
embedding_dim=3,
mode="NPU",
)
summary = digest.build_summary([row], None, 5, "2026-06-05T14:20:00-07:00")
text = digest.render_text(summary, [row])
assert "embedding_count" not in text # counts are intentionally terse in text
assert "0.1" not in text
out = digest.write_jsonl(summary, [row], tmp_path)
body = out.read_text()
assert "embedding" in body # compact metadata key is okay
assert "[0.1" not in body
assert "embedding_dim" in body
def test_classifier_dry_run_payload(tmp_path, monkeypatch):
busy = tmp_path / "busy"
busy.write_text("10")
seen = {}
def fake_health(service, url, timeout, gate="none", mode="health_only"):
return digest.ServiceRow(service=service, reachable=True, mode=mode, gate=gate), {"ok": True}
def fake_post(url, payload, timeout):
seen.update(payload)
busy.write_text("35")
return 200, {"labels": {"tool_needed": True, "duplicate": False}, "npu_busy_delta_us": 25}
monkeypatch.setattr(digest, "health_row", fake_health)
row = digest.probe_classifier(1, busy_path=busy, post_json=fake_post)
assert seen["options"]["dry_run"] is True
assert seen["options"]["include_evidence"] is False
assert row.escalate == 1
assert row.suppress == 0
assert row.proof_ok is True
def test_doc_triage_allowed_root_compact(tmp_path):
row = digest.ServiceRow(
service="doc_triage",
reachable=True,
probe_ran=True,
proof_ok=True,
files=1,
avg_ms=9,
npu_delta_us=11,
allowed_roots_count=1,
mode="NPU-via-embedding-service",
gate="closed:private-root",
)
summary = digest.build_summary([row], None, 11, "2026-06-05T14:20:00-07:00")
text = digest.render_text(summary, [row])
assert "allowed_roots=1" in text
assert str(tmp_path) not in text
def test_advisory_gateway_health_only(monkeypatch):
calls = []
def fake_get(url, timeout):
return 200, {"ok": True}
def fake_post(url, payload, timeout):
calls.append((url, payload))
return 200, {}
monkeypatch.setattr(digest, "http_get_json", fake_get)
monkeypatch.setattr(digest, "http_post_json", fake_post)
row, _ = digest.health_row("advisory_gateway", digest.ADVISORY_HEALTH_URL, 1, gate="closed:advisory-post")
assert row.reachable is True
assert row.probe_ran is False
assert row.mode == "health_only"
assert calls == []
def test_genai_loaded_false_skips_default_smoke(monkeypatch):
def fake_health(service, url, timeout, gate="none", mode="health_only"):
return digest.ServiceRow(service=service, reachable=True, mode=mode), {"ok": True, "loaded": False}
monkeypatch.setattr(digest, "health_row", fake_health)
row = digest.probe_genai(1, include_smoke=False)
assert row.probe_ran is False
assert row.loaded is False
assert row.reason == "skipped_cold_load"
assert row.fallbacks == 1
assert "skipped_cold_load" in row.warnings
def test_disabled_proof_smokes_count_as_fallbacks(monkeypatch):
def fake_health(service, url, timeout, gate="none", mode="health_only"):
return digest.ServiceRow(service=service, reachable=True, mode=mode, gate=gate), {"ok": True, "loaded": True}
monkeypatch.setattr(digest, "health_row", fake_health)
rows = [
digest.probe_whisper(1, include_smoke=False),
digest.probe_genai(1, include_smoke=False),
digest.probe_doc_triage(1, include_smoke=False),
]
summary = digest.build_summary(rows, None, 0, "2026-06-05T14:20:00-07:00")
assert summary["fallbacks"] == 3
assert summary["warnings"] == {"skipped": 2, "smoke_disabled": 1}
def test_jsonl_shape(tmp_path):
rows = [digest.ServiceRow(service="embeddings", reachable=True, probe_ran=True, proof_ok=True, npu_delta_us=1)]
summary = digest.build_summary(rows, None, 1, "2026-06-05T14:20:00-07:00")
path = digest.write_jsonl(summary, rows, tmp_path)
lines = [json.loads(line) for line in path.read_text().splitlines()]
assert lines[0]["type"] == "summary"
assert lines[1]["type"] == "service"
assert lines[1]["service"] == "embeddings"
def test_exit_codes(monkeypatch):
rows = [digest.ServiceRow(service="embeddings", reachable=True, probe_ran=True, proof_ok=False, warnings=["no_positive_sysfs_delta"])]
summary = digest.build_summary(rows, None, 0, "2026-06-05T14:20:00-07:00")
monkeypatch.setattr(digest, "run", lambda args: (summary, rows))
assert digest.main(["--no-write"]) == 0
assert digest.main(["--no-write", "--strict-proof"]) == 2
+170
View File
@@ -0,0 +1,170 @@
import importlib.util
import json
import sys
import types
import unittest
from argparse import Namespace
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import cast
from unittest import mock
MODULE_PATH = Path(__file__).resolve().parents[1] / "scripts" / "npu_voice_audio_pipeline.py"
def load_module():
spec = importlib.util.spec_from_file_location("npu_voice_audio_pipeline", MODULE_PATH)
assert spec is not None and spec.loader is not None
module = importlib.util.module_from_spec(spec)
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return cast(types.ModuleType, module)
class NpuVoiceAudioPipelineTests(unittest.TestCase):
def setUp(self):
self.pipeline = load_module()
def test_rejects_relative_audio_path(self):
with self.assertRaisesRegex(self.pipeline.PipelineError, "audio_path_must_be_absolute"):
self.pipeline.validate_audio_path("memo.wav", max_bytes=1024, max_audio_seconds=300)
def test_rejects_symlink_audio_path(self):
with TemporaryDirectory() as tmp:
root = Path(tmp)
target = root / "memo.wav"
target.write_bytes(b"RIFFfake")
link = root / "link.wav"
link.symlink_to(target)
with self.assertRaisesRegex(self.pipeline.PipelineError, "audio_path_must_not_be_symlink"):
self.pipeline.validate_audio_path(str(link), max_bytes=1024, max_audio_seconds=None)
def test_compact_labels_unwraps_classifier_label_values(self):
labels = self.pipeline.compact_labels(
{
"labels": {
"workflow_category": {"value": "media"},
"tool_needed": {"value": True},
"urgency": {"value": "high"},
"safety_confirmation_required": {"value": False},
}
}
)
self.assertEqual(labels["workflow_category"], "media")
self.assertTrue(labels["tool_needed"])
self.assertEqual(labels["urgency"], "high")
self.assertFalse(labels["safety_confirmation_required"])
def test_gate_blocks_missing_npu_proof(self):
action_worthy, atlas_gate, next_gate = self.pipeline.decide_gate(
"remind me to review logs",
{"tool_needed": True, "urgency": "normal", "safety_confirmation_required": False},
whisper_proven=False,
classifier_proven=True,
)
self.assertTrue(action_worthy)
self.assertEqual(atlas_gate, "blocked_missing_npu_proof")
self.assertEqual(next_gate, "npu_proof_required")
def test_loopback_endpoint_policy_accepts_local_urls(self):
allowed = [
"http://localhost:18816/v1/audio/transcriptions",
"https://localhost:18816/v1/audio/transcriptions",
"http://127.0.0.1:18816/v1/audio/transcriptions",
"http://127.42.0.9:18816/v1/audio/transcriptions",
"http://[::1]:18816/v1/audio/transcriptions",
]
for url in allowed:
with self.subTest(url=url):
self.assertEqual(self.pipeline.validate_loopback_endpoint(url, label="whisper"), url)
def test_loopback_endpoint_policy_rejects_remote_urls(self):
rejected = [
"http://example.com:18816/v1/audio/transcriptions",
"https://10.0.0.5:18816/v1/audio/transcriptions",
"http://192.168.1.10:18816/v1/audio/transcriptions",
"http://[2001:db8::1]:18816/v1/audio/transcriptions",
"file:///tmp/audio.wav",
]
for url in rejected:
with self.subTest(url=url):
with self.assertRaisesRegex(self.pipeline.PipelineError, "whisper_url_.*not_.*|whisper_url_scheme_not_allowed"):
self.pipeline.validate_loopback_endpoint(url, label="whisper")
def test_run_pipeline_rejects_remote_url_before_audio_read(self):
args = Namespace(
audio="/tmp/does-not-exist-remote-rejection-smoke.ogg",
id="voice-smoke",
source="local_file",
title="synthetic smoke",
language="en",
whisper_url="http://example.com:18816/v1/audio/transcriptions",
classifier_url="http://127.0.0.1:18819/v1/classify",
dry_run=True,
include_transcript=False,
include_transcript_preview_chars=0,
include_raw=False,
max_bytes=1024 * 1024,
max_audio_seconds=300,
max_transcript_chars=6000,
timeout=1,
)
with self.assertRaisesRegex(self.pipeline.PipelineError, "whisper_url_host_not_loopback"):
self.pipeline.run_pipeline(args)
def test_run_pipeline_compact_success_with_mocked_services(self):
with TemporaryDirectory() as tmp:
audio = Path(tmp) / "memo.ogg"
audio.write_bytes(b"not-real-audio-but-services-are-mocked")
args = Namespace(
audio=str(audio),
id="voice-smoke",
source="local_file",
title="synthetic smoke",
language="en",
whisper_url="http://127.0.0.1:18816/v1/audio/transcriptions",
classifier_url="http://127.0.0.1:18819/v1/classify",
dry_run=True,
include_transcript=False,
include_transcript_preview_chars=0,
include_raw=False,
max_bytes=1024 * 1024,
max_audio_seconds=300,
max_transcript_chars=6000,
timeout=1,
)
busy_values = iter([100, 150, 150, 225])
with mock.patch.object(self.pipeline, "read_npu_busy_us", side_effect=lambda: next(busy_values)):
with mock.patch.object(
self.pipeline,
"post_whisper",
return_value={"text": "remind me to check npu logs", "npu_busy_delta_us": 50},
):
with mock.patch.object(
self.pipeline,
"post_json",
return_value={
"dry_run": True,
"labels": {
"workflow_category": {"value": "media"},
"tool_needed": {"value": True},
"urgency": {"value": "normal"},
"safety_confirmation_required": {"value": False},
},
"npu_busy_delta_us": 75,
"sysfs_npu_busy_delta_us": 75,
},
):
result = self.pipeline.run_pipeline(args)
self.assertTrue(result["ok"])
self.assertEqual(result["external_sends"], 0)
self.assertEqual(result["writes"], 0)
self.assertEqual(result["whisper_sysfs_delta_us"], 50)
self.assertEqual(result["classifier_observed_sysfs_delta_us"], 75)
self.assertEqual(result["atlas_gate"], "advisory_only_not_sent")
self.assertNotIn("transcript", result)
json.dumps(result)
if __name__ == "__main__":
unittest.main()
View File