Files
swarm-master/docs/npu-utilization-digest.md
T
2026-06-05 15:52:43 -07:00

2.8 KiB

NPU utilization digest

Compact on-demand observability for Will's local OpenVINO/NPU specialists.

Script:

/home/will/lab/swarm/scripts/npu-utilization-digest.py --format text

Safe defaults:

  • read-only for services; no service starts/stops/restarts, routing changes, vector DB mutation, advisory POSTs, outbound sends, or memory writes;
  • writes only a compact JSONL artifact under /home/will/.local/state/npu-utilization/digests unless --no-write is passed;
  • uses synthetic/non-private requests for embeddings, rerank, classifier dry-run, and doc triage;
  • keeps GenAI generation disabled by default when the worker is not loaded, to avoid cold-load side effects;
  • advisory gateway remains health-only because POSTs write metadata/events;
  • NPU proof is only true when an inference probe ran and /sys/class/accel/accel0/device/npu_busy_time_us increased around that probe.

Common commands:

# Compact CLI digest, plus JSONL artifact.
scripts/npu-utilization-digest.py --format text

# No artifact write; useful during reviews.
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false

# Machine-readable stdout.
scripts/npu-utilization-digest.py --format jsonl --no-write

# CI/unit tests; live services not required.
python -m pytest tests/test_npu_utilization_digest.py -q

Output shape is intentionally small: service booleans, counts, average probe ms, sysfs deltas, proof flags, fallback warning counts, artifact path, and closed gates. fallbacks includes unavailable services, failed/missing proof, and skipped proof-capable smokes such as disabled Whisper/doc-triage probes or GenAI cold-load skips; intentionally health-only RAG/advisory rows are not fallbacks unless unavailable. It does not print raw embeddings, transcripts, OCR text, model completions, request headers, or full upstream JSON.

Covered rows:

  • embeddings: /v1/embeddings synthetic string, positive sysfs delta required.
  • rerank: /rerank with two synthetic docs, positive sysfs delta required.
  • whisper: health-only unless the bounded generated-WAV smoke is enabled.
  • classifier: /v1/classify with dry_run=true and include_evidence=false, positive sysfs delta required.
  • genai: health-only by default; skips when loaded=false unless explicitly opted in.
  • doc_triage: one approved synthetic sample under the service sample root, with allowed_roots narrowed to that sample directory; NPU proof is via embeddings.
  • rag_endpoint and rag_health: health-only; no vector mutation.
  • advisory_gateway: health-only; closed:advisory-post gate remains closed.

Closed gates left for later approval: sending/delivery, recurring timer, GenAI cold-load smoke, advisory POSTs, Atlas/Hermes routing changes, vector mutation/reindex, and broad private document/audio/image roots.