202 lines
8.2 KiB
Markdown
202 lines
8.2 KiB
Markdown
# NPU integrated health checks — operator runbook notes
|
|
|
|
Compact, read-only operator workflow that combines the existing
|
|
`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the
|
|
reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization
|
|
and fallback report. Together they form a single safe daily / on-demand NPU
|
|
health pass.
|
|
|
|
Scope:
|
|
|
|
- Read-only against live services. No restarts, route changes, vector mutation,
|
|
advisory POSTs, outbound sends, or memory writes.
|
|
- No new persistent services, timers, sockets, compose services, or Dockerfiles
|
|
are introduced by this integration. Both scripts are foreground / on-demand.
|
|
- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`).
|
|
Pre-existing broader binds on the live baseline ports (`18810`, `18814`,
|
|
`18816`, `18817`) are noted in the runbook and unchanged here.
|
|
- NPU proof requires real inference plus a positive
|
|
`/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is
|
|
not sufficient.
|
|
|
|
## When to run
|
|
|
|
- Daily / on-demand ops check.
|
|
- After upgrades that touch the NPU stack, OpenVINO, or any of the live
|
|
specialists.
|
|
- Before any approval-gated change that depends on the NPU reflex layer.
|
|
- As the read-only verification step of a deploy or recovery runbook.
|
|
|
|
## Required artifacts on the branch
|
|
|
|
| Path | Role |
|
|
| --- | --- |
|
|
| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
|
|
| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
|
|
| `docs/npu-utilization-digest.md` | Per-service digest reference. |
|
|
| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). |
|
|
|
|
## Integrated workflow
|
|
|
|
### Step 1 — Listener and service-state snapshot
|
|
|
|
```bash
|
|
cd ~/lab/swarm
|
|
./scripts/npu-service-health.sh
|
|
```
|
|
|
|
What it verifies, in order:
|
|
|
|
1. `npu_busy_time_us` counter is readable.
|
|
2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 /
|
|
18819 / 18820 / 18829 / 18830`.
|
|
3. User systemd services are active/enabled for embeddings, RAG health,
|
|
reranker, router/classifier, and the small GenAI worker.
|
|
4. Docker Compose `whisper-server-npu` is up.
|
|
5. Health endpoints return JSON for the live baseline and local specialists.
|
|
6. A single non-private embeddings request to `:18817` produces a positive
|
|
sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no
|
|
positive delta.
|
|
|
|
Read the last block (`== Embeddings NPU busy-time proof ==`) first. If
|
|
`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not,
|
|
do not run the digest; triage the embeddings service first.
|
|
|
|
### Step 2 — Per-service utilization digest
|
|
|
|
```bash
|
|
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
|
|
```
|
|
|
|
Compact output shape:
|
|
|
|
```text
|
|
NPU utilization digest <timestamp>
|
|
counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
|
|
services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
|
|
- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
|
- rerank: ok=true calls=1 docs=2 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
|
- whisper: ok=true calls=1 jobs=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
|
- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
|
|
- genai: ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
|
|
- doc_triage: ok=true calls=1 files=1 avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
|
|
- rag_endpoint: ok=true mode=health_only gate=closed:vector-mutation
|
|
- rag_health: ok=true mode=health_only
|
|
- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
|
|
fallbacks: skipped_cold_load=1
|
|
```
|
|
|
|
Read order for ops:
|
|
|
|
1. `services_ok` row — anything below `9/9` means a service is down or unhealthy.
|
|
2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real
|
|
inference request produced a positive sysfs NPU delta.
|
|
3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is
|
|
intentionally not cold-loaded). Any other fallback label is a triage signal.
|
|
4. `gate=` labels — closed gates that remain closed by design.
|
|
|
|
### Step 3 — Optional artifact for trend tracking
|
|
|
|
```bash
|
|
scripts/npu-utilization-digest.py --format jsonl
|
|
```
|
|
|
|
Writes a single JSONL line per digest under
|
|
`/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl`. The first
|
|
line is the summary; subsequent lines are per-service rows. No JSONL write
|
|
happens with `--no-write`.
|
|
|
|
### Step 4 — Offline unit tests
|
|
|
|
```bash
|
|
python -m pytest tests/test_npu_utilization_digest.py -q
|
|
```
|
|
|
|
Does not require live services. Use to validate digest logic after edits or
|
|
before merging.
|
|
|
|
## Compact proof interpretation
|
|
|
|
For each proof-capable service, both the response-level `npu_busy_delta_us`
|
|
(when the service reports it) and the script's own sysfs before/after delta
|
|
must agree and be `> 0`. The proof is only valid when an actual inference
|
|
request ran. If a probe was skipped (`reason=skipped_cold_load` or
|
|
`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row
|
|
contributes a labeled fallback instead of a proof failure.
|
|
|
|
Proof currently runs on:
|
|
|
|
- `embeddings` (`:18817`)
|
|
- `rerank` (`:18818`)
|
|
- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default)
|
|
- `classifier` (`:18819`)
|
|
- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default);
|
|
proof is via the embeddings service, not directly on the NPU device, so the
|
|
row reports `mode=NPU-via-embedding-service`.
|
|
|
|
Intentionally health-only (no proof row):
|
|
|
|
- `rag_endpoint` (`:18810`) — closed:vector-mutation
|
|
- `rag_health` (`:18814`)
|
|
- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post
|
|
|
|
Intentionally skipped by default:
|
|
|
|
- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to
|
|
prove the NPU is not free, so it is treated as a labeled fallback rather
|
|
than a proof failure. Opt in with `--include-genai-smoke=true` only when the
|
|
task actually needs a generation smoke.
|
|
|
|
## Exit codes and triage gates
|
|
|
|
`scripts/npu-service-health.sh`:
|
|
|
|
| Exit | Meaning | Next |
|
|
| ---: | --- | --- |
|
|
| 0 | All checks passed including embeddings proof. | Continue to digest. |
|
|
| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. |
|
|
| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. |
|
|
| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. |
|
|
|
|
`scripts/npu-utilization-digest.py`:
|
|
|
|
| Exit | Meaning | Next |
|
|
| ---: | --- | --- |
|
|
| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. |
|
|
| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. |
|
|
|
|
## Approval gates left closed
|
|
|
|
The integrated workflow intentionally does not:
|
|
|
|
- start, stop, restart, enable, or disable any user systemd unit or Docker
|
|
Compose service;
|
|
- write to or mutate the Chroma collection `obsidian_bge_npu` or any other
|
|
vector store;
|
|
- change Atlas/Hermes routing or model defaults;
|
|
- post classification/generation/triage events to the advisory gateway;
|
|
- broaden private document, image, or audio roots;
|
|
- bind any new listener, including on `0.0.0.0`;
|
|
- write memory, send messages, execute tools, or mutate Kanban state.
|
|
|
|
These remain approval-gated and are tracked on the `npu-maximization` board.
|
|
|
|
## Quick reference
|
|
|
|
```bash
|
|
# Single-pass NPU health check (listener + systemd + embeddings proof).
|
|
cd ~/lab/swarm && ./scripts/npu-service-health.sh
|
|
|
|
# Compact digest with per-service proof and fallback accounting.
|
|
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
|
|
|
|
# Same, with a JSONL artifact for trend tracking.
|
|
scripts/npu-utilization-digest.py --format jsonl
|
|
|
|
# Strict mode for CI / pre-merge.
|
|
scripts/npu-utilization-digest.py --no-write --strict-proof
|
|
|
|
# Offline digest logic tests.
|
|
python -m pytest tests/test_npu_utilization_digest.py -q
|
|
```
|