feat(npu): add utilization digest tooling
This commit is contained in:
@@ -0,0 +1,49 @@
|
||||
# NPU utilization digest
|
||||
|
||||
Compact on-demand observability for Will's local OpenVINO/NPU specialists.
|
||||
|
||||
Script:
|
||||
|
||||
```bash
|
||||
/home/will/lab/swarm/scripts/npu-utilization-digest.py --format text
|
||||
```
|
||||
|
||||
Safe defaults:
|
||||
|
||||
- read-only for services; no service starts/stops/restarts, routing changes, vector DB mutation, advisory POSTs, outbound sends, or memory writes;
|
||||
- writes only a compact JSONL artifact under `/home/will/.local/state/npu-utilization/digests` unless `--no-write` is passed;
|
||||
- uses synthetic/non-private requests for embeddings, rerank, classifier dry-run, and doc triage;
|
||||
- keeps GenAI generation disabled by default when the worker is not loaded, to avoid cold-load side effects;
|
||||
- advisory gateway remains health-only because POSTs write metadata/events;
|
||||
- NPU proof is only true when an inference probe ran and `/sys/class/accel/accel0/device/npu_busy_time_us` increased around that probe.
|
||||
|
||||
Common commands:
|
||||
|
||||
```bash
|
||||
# Compact CLI digest, plus JSONL artifact.
|
||||
scripts/npu-utilization-digest.py --format text
|
||||
|
||||
# No artifact write; useful during reviews.
|
||||
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false
|
||||
|
||||
# Machine-readable stdout.
|
||||
scripts/npu-utilization-digest.py --format jsonl --no-write
|
||||
|
||||
# CI/unit tests; live services not required.
|
||||
python -m pytest tests/test_npu_utilization_digest.py -q
|
||||
```
|
||||
|
||||
Output shape is intentionally small: service booleans, counts, average probe ms, sysfs deltas, proof flags, fallback warning counts, artifact path, and closed gates. `fallbacks` includes unavailable services, failed/missing proof, and skipped proof-capable smokes such as disabled Whisper/doc-triage probes or GenAI cold-load skips; intentionally health-only RAG/advisory rows are not fallbacks unless unavailable. It does not print raw embeddings, transcripts, OCR text, model completions, request headers, or full upstream JSON.
|
||||
|
||||
Covered rows:
|
||||
|
||||
- `embeddings`: `/v1/embeddings` synthetic string, positive sysfs delta required.
|
||||
- `rerank`: `/rerank` with two synthetic docs, positive sysfs delta required.
|
||||
- `whisper`: health-only unless the bounded generated-WAV smoke is enabled.
|
||||
- `classifier`: `/v1/classify` with `dry_run=true` and `include_evidence=false`, positive sysfs delta required.
|
||||
- `genai`: health-only by default; skips when `loaded=false` unless explicitly opted in.
|
||||
- `doc_triage`: one approved synthetic sample under the service sample root, with `allowed_roots` narrowed to that sample directory; NPU proof is via embeddings.
|
||||
- `rag_endpoint` and `rag_health`: health-only; no vector mutation.
|
||||
- `advisory_gateway`: health-only; `closed:advisory-post` gate remains closed.
|
||||
|
||||
Closed gates left for later approval: sending/delivery, recurring timer, GenAI cold-load smoke, advisory POSTs, Atlas/Hermes routing changes, vector mutation/reindex, and broad private document/audio/image roots.
|
||||
Reference in New Issue
Block a user