docs(npu): update integrated health runbooks

2026-06-05 15:52:51 -07:00
parent 9e5ffa0fd0
commit 08fb9ca686
2 changed files with 323 additions and 134 deletions
@@ -0,0 +1,201 @@
+# NPU integrated health checks — operator runbook notes
+
+Compact, read-only operator workflow that combines the existing
+`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the
+reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization
+and fallback report. Together they form a single safe daily / on-demand NPU
+health pass.
+
+Scope:
+
+- Read-only against live services. No restarts, route changes, vector mutation,
+  advisory POSTs, outbound sends, or memory writes.
+- No new persistent services, timers, sockets, compose services, or Dockerfiles
+  are introduced by this integration. Both scripts are foreground / on-demand.
+- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`).
+  Pre-existing broader binds on the live baseline ports (`18810`, `18814`,
+  `18816`, `18817`) are noted in the runbook and unchanged here.
+- NPU proof requires real inference plus a positive
+  `/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is
+  not sufficient.
+
+## When to run
+
+- Daily / on-demand ops check.
+- After upgrades that touch the NPU stack, OpenVINO, or any of the live
+  specialists.
+- Before any approval-gated change that depends on the NPU reflex layer.
+- As the read-only verification step of a deploy or recovery runbook.
+
+## Required artifacts on the branch
+
+| Path | Role |
+| --- | --- |
+| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
+| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
+| `docs/npu-utilization-digest.md` | Per-service digest reference. |
+| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). |
+
+## Integrated workflow
+
+### Step 1 — Listener and service-state snapshot
+
+```bash
+cd ~/lab/swarm
+./scripts/npu-service-health.sh
+```
+
+What it verifies, in order:
+
+1. `npu_busy_time_us` counter is readable.
+2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 /
+   18819 / 18820 / 18829 / 18830`.
+3. User systemd services are active/enabled for embeddings, RAG health,
+   reranker, router/classifier, and the small GenAI worker.
+4. Docker Compose `whisper-server-npu` is up.
+5. Health endpoints return JSON for the live baseline and local specialists.
+6. A single non-private embeddings request to `:18817` produces a positive
+   sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no
+   positive delta.
+
+Read the last block (`== Embeddings NPU busy-time proof ==`) first. If
+`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not,
+do not run the digest; triage the embeddings service first.
+
+### Step 2 — Per-service utilization digest
+
+```bash
+scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
+```
+
+Compact output shape:
+
+```text
+NPU utilization digest <timestamp>
+counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
+services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
+- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
+- rerank:     ok=true calls=1 docs=2   avg_ms=... npu_delta_us=... proof=true mode=NPU
+- whisper:    ok=true calls=1 jobs=1   avg_ms=... npu_delta_us=... proof=true mode=NPU
+- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
+- genai:      ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
+- doc_triage: ok=true calls=1 files=1  avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
+- rag_endpoint:   ok=true mode=health_only gate=closed:vector-mutation
+- rag_health:     ok=true mode=health_only
+- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
+fallbacks: skipped_cold_load=1
+```
+
+Read order for ops:
+
+1. `services_ok` row — anything below `9/9` means a service is down or unhealthy.
+2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real
+   inference request produced a positive sysfs NPU delta.
+3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is
+   intentionally not cold-loaded). Any other fallback label is a triage signal.
+4. `gate=` labels — closed gates that remain closed by design.
+
+### Step 3 — Optional artifact for trend tracking
+
+```bash
+scripts/npu-utilization-digest.py --format jsonl
+```
+
+Writes a single JSONL line per digest under
+`/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl`. The first
+line is the summary; subsequent lines are per-service rows. No JSONL write
+happens with `--no-write`.
+
+### Step 4 — Offline unit tests
+
+```bash
+python -m pytest tests/test_npu_utilization_digest.py -q
+```
+
+Does not require live services. Use to validate digest logic after edits or
+before merging.
+
+## Compact proof interpretation
+
+For each proof-capable service, both the response-level `npu_busy_delta_us`
+(when the service reports it) and the script's own sysfs before/after delta
+must agree and be `> 0`. The proof is only valid when an actual inference
+request ran. If a probe was skipped (`reason=skipped_cold_load` or
+`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row
+contributes a labeled fallback instead of a proof failure.
+
+Proof currently runs on:
+
+- `embeddings` (`:18817`)
+- `rerank` (`:18818`)
+- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default)
+- `classifier` (`:18819`)
+- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default);
+  proof is via the embeddings service, not directly on the NPU device, so the
+  row reports `mode=NPU-via-embedding-service`.
+
+Intentionally health-only (no proof row):
+
+- `rag_endpoint` (`:18810`) — closed:vector-mutation
+- `rag_health` (`:18814`)
+- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post
+
+Intentionally skipped by default:
+
+- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to
+  prove the NPU is not free, so it is treated as a labeled fallback rather
+  than a proof failure. Opt in with `--include-genai-smoke=true` only when the
+  task actually needs a generation smoke.
+
+## Exit codes and triage gates
+
+`scripts/npu-service-health.sh`:
+
+| Exit | Meaning | Next |
+| ---: | --- | --- |
+| 0 | All checks passed including embeddings proof. | Continue to digest. |
+| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. |
+| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. |
+| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. |
+
+`scripts/npu-utilization-digest.py`:
+
+| Exit | Meaning | Next |
+| ---: | --- | --- |
+| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. |
+| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. |
+
+## Approval gates left closed
+
+The integrated workflow intentionally does not:
+
+- start, stop, restart, enable, or disable any user systemd unit or Docker
+  Compose service;
+- write to or mutate the Chroma collection `obsidian_bge_npu` or any other
+  vector store;
+- change Atlas/Hermes routing or model defaults;
+- post classification/generation/triage events to the advisory gateway;
+- broaden private document, image, or audio roots;
+- bind any new listener, including on `0.0.0.0`;
+- write memory, send messages, execute tools, or mutate Kanban state.
+
+These remain approval-gated and are tracked on the `npu-maximization` board.
+
+## Quick reference
+
+```bash
+# Single-pass NPU health check (listener + systemd + embeddings proof).
+cd ~/lab/swarm && ./scripts/npu-service-health.sh
+
+# Compact digest with per-service proof and fallback accounting.
+scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
+
+# Same, with a JSONL artifact for trend tracking.
+scripts/npu-utilization-digest.py --format jsonl
+
+# Strict mode for CI / pre-merge.
+scripts/npu-utilization-digest.py --no-write --strict-proof
+
+# Offline digest logic tests.
+python -m pytest tests/test_npu_utilization_digest.py -q
+```