Add operator runbook and link integrated health docs for advisory-only observability, dry-run metrics, and future promotion criteria.
8.7 KiB
NPU integrated health checks — operator runbook notes
Compact, read-only operator workflow that combines the existing
scripts/npu-service-health.sh listener/systemd/embedding-proof probe with the
reviewer-approved scripts/npu-utilization-digest.py per-service utilization
and fallback report. Together they form a single safe daily / on-demand NPU
health pass.
Scope:
- Read-only against live services. No restarts, route changes, vector mutation, advisory POSTs, outbound sends, or memory writes.
- No new persistent services, timers, sockets, compose services, or Dockerfiles are introduced by this integration. Both scripts are foreground / on-demand.
- Binds verified local-only or on the approved Docker bridge (
172.19.0.1:18830). Pre-existing broader binds on the live baseline ports (18810,18814,18816,18817) are noted in the runbook and unchanged here. - NPU proof requires real inference plus a positive
/sys/class/accel/accel0/device/npu_busy_time_usdelta. HTTP 200 alone is not sufficient.
When to run
- Daily / on-demand ops check.
- After upgrades that touch the NPU stack, OpenVINO, or any of the live specialists.
- Before any approval-gated change that depends on the NPU reflex layer.
- As the read-only verification step of a deploy or recovery runbook.
Required artifacts on the branch
| Path | Role |
|---|---|
scripts/npu-service-health.sh |
Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
scripts/npu-utilization-digest.py |
Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
docs/npu-utilization-digest.md |
Per-service digest reference. |
docs/npu-advisory-observability-runbook.md |
Dry-run comparison and later promotion criteria for advisory lanes. |
tests/test_npu_utilization_digest.py |
Offline unit tests for the digest (no live services required). |
Integrated workflow
Step 1 — Listener and service-state snapshot
cd ~/lab/swarm
./scripts/npu-service-health.sh
What it verifies, in order:
npu_busy_time_uscounter is readable.- Required listeners are present on
18810 / 18814 / 18816 / 18817 / 18818 / 18819 / 18820 / 18829 / 18830. - User systemd services are active/enabled for embeddings, RAG health, reranker, router/classifier, and the small GenAI worker.
- Docker Compose
whisper-server-npuis up. - Health endpoints return JSON for the live baseline and local specialists.
- A single non-private embeddings request to
:18817produces a positive sysfsnpu_busy_time_usdelta; the script exits nonzero if there is no positive delta.
Read the last block (== Embeddings NPU busy-time proof ==) first. If
result=ok and sysfs_delta_us > 0, the central NPU path is healthy. If not,
do not run the digest; triage the embeddings service first.
Step 2 — Per-service utilization digest
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
Compact output shape:
NPU utilization digest <timestamp>
counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
- rerank: ok=true calls=1 docs=2 avg_ms=... npu_delta_us=... proof=true mode=NPU
- whisper: ok=true calls=1 jobs=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
- genai: ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
- doc_triage: ok=true calls=1 files=1 avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
- rag_endpoint: ok=true mode=health_only gate=closed:vector-mutation
- rag_health: ok=true mode=health_only
- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
fallbacks: skipped_cold_load=1
Read order for ops:
services_okrow — anything below9/9means a service is down or unhealthy.proof_okrow —proof_ok=5/5means every probe that ran with a real inference request produced a positive sysfs NPU delta.fallbacks:line —skipped_cold_load=1is expected (GenAI worker is intentionally not cold-loaded). Any other fallback label is a triage signal.gate=labels — closed gates that remain closed by design.
Step 3 — Optional artifact for trend tracking
scripts/npu-utilization-digest.py --format jsonl
Writes a single JSONL line per digest under
/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl. The first
line is the summary; subsequent lines are per-service rows. No JSONL write
happens with --no-write.
Step 4 — Offline unit tests
python -m pytest tests/test_npu_utilization_digest.py -q
Does not require live services. Use to validate digest logic after edits or before merging.
Compact proof interpretation
For each proof-capable service, both the response-level npu_busy_delta_us
(when the service reports it) and the script's own sysfs before/after delta
must agree and be > 0. The proof is only valid when an actual inference
request ran. If a probe was skipped (reason=skipped_cold_load or
reason=smoke_disabled), proof_ok for that row is None and the row
contributes a labeled fallback instead of a proof failure.
Proof currently runs on:
embeddings(:18817)rerank(:18818)whisper(:18816) when--include-whisper-smoke=true(default)classifier(:18819)doc_triage(:18829) when--include-doc-triage-smoke=true(default); proof is via the embeddings service, not directly on the NPU device, so the row reportsmode=NPU-via-embedding-service.
Intentionally health-only (no proof row):
rag_endpoint(:18810) — closed:vector-mutationrag_health(:18814)advisory_gateway(172.19.0.1:18830) — closed:advisory-post
Intentionally skipped by default:
genai(:18820) —loaded=falseuntil first use; cold-loading just to prove the NPU is not free, so it is treated as a labeled fallback rather than a proof failure. Opt in with--include-genai-smoke=trueonly when the task actually needs a generation smoke.
Exit codes and triage gates
scripts/npu-service-health.sh:
| Exit | Meaning | Next |
|---|---|---|
| 0 | All checks passed including embeddings proof. | Continue to digest. |
| 2 | npu_busy_time_us not readable. |
Check kernel/driver; do not run digest. |
| 3 | Embedding request failed. | Triage openvino-embeddings.service and port :18817. |
| 4 | Embedding request succeeded but sysfs delta <= 0. |
Service reachable but not on the NPU; check service logs and device bind. |
scripts/npu-utilization-digest.py:
| Exit | Meaning | Next |
|---|---|---|
| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect proof_ok and fallbacks: for any unexpected labels. |
| 2 | --strict-proof was set and at least one proof-required probe ran without a positive sysfs delta. |
Triage the named service's NPU path. |
Approval gates left closed
The integrated workflow intentionally does not:
- start, stop, restart, enable, or disable any user systemd unit or Docker Compose service;
- write to or mutate the Chroma collection
obsidian_bge_npuor any other vector store; - change Atlas/Hermes routing or model defaults;
- post classification/generation/triage events to the advisory gateway;
- broaden private document, image, or audio roots;
- bind any new listener, including on
0.0.0.0; - write memory, send messages, execute tools, or mutate Kanban state.
These remain approval-gated and are tracked on the npu-maximization board.
For advisory-lane promotion decisions, pair this live utilization pass with the fixture-only dry-run comparison in docs/npu-advisory-observability-runbook.md. The digest can show whether live NPU services are healthy enough to collect evidence; it does not promote advisory outputs into authority. Promotion remains a separate lane-specific approval with explicit scope and rollback.
Quick reference
# Single-pass NPU health check (listener + systemd + embeddings proof).
cd ~/lab/swarm && ./scripts/npu-service-health.sh
# Compact digest with per-service proof and fallback accounting.
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
# Same, with a JSONL artifact for trend tracking.
scripts/npu-utilization-digest.py --format jsonl
# Strict mode for CI / pre-merge.
scripts/npu-utilization-digest.py --no-write --strict-proof
# Offline digest logic tests.
python -m pytest tests/test_npu_utilization_digest.py -q