docs(npu): update integrated health runbooks
This commit is contained in:
@@ -0,0 +1,201 @@
|
||||
# NPU integrated health checks — operator runbook notes
|
||||
|
||||
Compact, read-only operator workflow that combines the existing
|
||||
`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the
|
||||
reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization
|
||||
and fallback report. Together they form a single safe daily / on-demand NPU
|
||||
health pass.
|
||||
|
||||
Scope:
|
||||
|
||||
- Read-only against live services. No restarts, route changes, vector mutation,
|
||||
advisory POSTs, outbound sends, or memory writes.
|
||||
- No new persistent services, timers, sockets, compose services, or Dockerfiles
|
||||
are introduced by this integration. Both scripts are foreground / on-demand.
|
||||
- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`).
|
||||
Pre-existing broader binds on the live baseline ports (`18810`, `18814`,
|
||||
`18816`, `18817`) are noted in the runbook and unchanged here.
|
||||
- NPU proof requires real inference plus a positive
|
||||
`/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is
|
||||
not sufficient.
|
||||
|
||||
## When to run
|
||||
|
||||
- Daily / on-demand ops check.
|
||||
- After upgrades that touch the NPU stack, OpenVINO, or any of the live
|
||||
specialists.
|
||||
- Before any approval-gated change that depends on the NPU reflex layer.
|
||||
- As the read-only verification step of a deploy or recovery runbook.
|
||||
|
||||
## Required artifacts on the branch
|
||||
|
||||
| Path | Role |
|
||||
| --- | --- |
|
||||
| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
|
||||
| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
|
||||
| `docs/npu-utilization-digest.md` | Per-service digest reference. |
|
||||
| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). |
|
||||
|
||||
## Integrated workflow
|
||||
|
||||
### Step 1 — Listener and service-state snapshot
|
||||
|
||||
```bash
|
||||
cd ~/lab/swarm
|
||||
./scripts/npu-service-health.sh
|
||||
```
|
||||
|
||||
What it verifies, in order:
|
||||
|
||||
1. `npu_busy_time_us` counter is readable.
|
||||
2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 /
|
||||
18819 / 18820 / 18829 / 18830`.
|
||||
3. User systemd services are active/enabled for embeddings, RAG health,
|
||||
reranker, router/classifier, and the small GenAI worker.
|
||||
4. Docker Compose `whisper-server-npu` is up.
|
||||
5. Health endpoints return JSON for the live baseline and local specialists.
|
||||
6. A single non-private embeddings request to `:18817` produces a positive
|
||||
sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no
|
||||
positive delta.
|
||||
|
||||
Read the last block (`== Embeddings NPU busy-time proof ==`) first. If
|
||||
`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not,
|
||||
do not run the digest; triage the embeddings service first.
|
||||
|
||||
### Step 2 — Per-service utilization digest
|
||||
|
||||
```bash
|
||||
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
|
||||
```
|
||||
|
||||
Compact output shape:
|
||||
|
||||
```text
|
||||
NPU utilization digest <timestamp>
|
||||
counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
|
||||
services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
|
||||
- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
||||
- rerank: ok=true calls=1 docs=2 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
||||
- whisper: ok=true calls=1 jobs=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
|
||||
- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
|
||||
- genai: ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
|
||||
- doc_triage: ok=true calls=1 files=1 avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
|
||||
- rag_endpoint: ok=true mode=health_only gate=closed:vector-mutation
|
||||
- rag_health: ok=true mode=health_only
|
||||
- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
|
||||
fallbacks: skipped_cold_load=1
|
||||
```
|
||||
|
||||
Read order for ops:
|
||||
|
||||
1. `services_ok` row — anything below `9/9` means a service is down or unhealthy.
|
||||
2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real
|
||||
inference request produced a positive sysfs NPU delta.
|
||||
3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is
|
||||
intentionally not cold-loaded). Any other fallback label is a triage signal.
|
||||
4. `gate=` labels — closed gates that remain closed by design.
|
||||
|
||||
### Step 3 — Optional artifact for trend tracking
|
||||
|
||||
```bash
|
||||
scripts/npu-utilization-digest.py --format jsonl
|
||||
```
|
||||
|
||||
Writes a single JSONL line per digest under
|
||||
`/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl`. The first
|
||||
line is the summary; subsequent lines are per-service rows. No JSONL write
|
||||
happens with `--no-write`.
|
||||
|
||||
### Step 4 — Offline unit tests
|
||||
|
||||
```bash
|
||||
python -m pytest tests/test_npu_utilization_digest.py -q
|
||||
```
|
||||
|
||||
Does not require live services. Use to validate digest logic after edits or
|
||||
before merging.
|
||||
|
||||
## Compact proof interpretation
|
||||
|
||||
For each proof-capable service, both the response-level `npu_busy_delta_us`
|
||||
(when the service reports it) and the script's own sysfs before/after delta
|
||||
must agree and be `> 0`. The proof is only valid when an actual inference
|
||||
request ran. If a probe was skipped (`reason=skipped_cold_load` or
|
||||
`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row
|
||||
contributes a labeled fallback instead of a proof failure.
|
||||
|
||||
Proof currently runs on:
|
||||
|
||||
- `embeddings` (`:18817`)
|
||||
- `rerank` (`:18818`)
|
||||
- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default)
|
||||
- `classifier` (`:18819`)
|
||||
- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default);
|
||||
proof is via the embeddings service, not directly on the NPU device, so the
|
||||
row reports `mode=NPU-via-embedding-service`.
|
||||
|
||||
Intentionally health-only (no proof row):
|
||||
|
||||
- `rag_endpoint` (`:18810`) — closed:vector-mutation
|
||||
- `rag_health` (`:18814`)
|
||||
- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post
|
||||
|
||||
Intentionally skipped by default:
|
||||
|
||||
- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to
|
||||
prove the NPU is not free, so it is treated as a labeled fallback rather
|
||||
than a proof failure. Opt in with `--include-genai-smoke=true` only when the
|
||||
task actually needs a generation smoke.
|
||||
|
||||
## Exit codes and triage gates
|
||||
|
||||
`scripts/npu-service-health.sh`:
|
||||
|
||||
| Exit | Meaning | Next |
|
||||
| ---: | --- | --- |
|
||||
| 0 | All checks passed including embeddings proof. | Continue to digest. |
|
||||
| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. |
|
||||
| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. |
|
||||
| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. |
|
||||
|
||||
`scripts/npu-utilization-digest.py`:
|
||||
|
||||
| Exit | Meaning | Next |
|
||||
| ---: | --- | --- |
|
||||
| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. |
|
||||
| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. |
|
||||
|
||||
## Approval gates left closed
|
||||
|
||||
The integrated workflow intentionally does not:
|
||||
|
||||
- start, stop, restart, enable, or disable any user systemd unit or Docker
|
||||
Compose service;
|
||||
- write to or mutate the Chroma collection `obsidian_bge_npu` or any other
|
||||
vector store;
|
||||
- change Atlas/Hermes routing or model defaults;
|
||||
- post classification/generation/triage events to the advisory gateway;
|
||||
- broaden private document, image, or audio roots;
|
||||
- bind any new listener, including on `0.0.0.0`;
|
||||
- write memory, send messages, execute tools, or mutate Kanban state.
|
||||
|
||||
These remain approval-gated and are tracked on the `npu-maximization` board.
|
||||
|
||||
## Quick reference
|
||||
|
||||
```bash
|
||||
# Single-pass NPU health check (listener + systemd + embeddings proof).
|
||||
cd ~/lab/swarm && ./scripts/npu-service-health.sh
|
||||
|
||||
# Compact digest with per-service proof and fallback accounting.
|
||||
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
|
||||
|
||||
# Same, with a JSONL artifact for trend tracking.
|
||||
scripts/npu-utilization-digest.py --format jsonl
|
||||
|
||||
# Strict mode for CI / pre-merge.
|
||||
scripts/npu-utilization-digest.py --no-write --strict-proof
|
||||
|
||||
# Offline digest logic tests.
|
||||
python -m pytest tests/test_npu_utilization_digest.py -q
|
||||
```
|
||||
+122
-134
@@ -3,7 +3,7 @@ type: runbook
|
||||
system: openvino-npu-services
|
||||
status: draft
|
||||
created: 2026-06-04
|
||||
updated: 2026-06-04
|
||||
updated: 2026-06-05
|
||||
tags:
|
||||
- runbook
|
||||
- openvino
|
||||
@@ -18,33 +18,92 @@ related:
|
||||
|
||||
# OpenVINO NPU Services Runbook
|
||||
|
||||
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
|
||||
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services after the first approved `npu-maximization` lanes. It treats the NPU as a local reflex layer: classify, embed, rerank, transcribe, triage, and draft compact advisory output while Atlas/Hermes keeps final authority unless a separate approval changes that.
|
||||
|
||||
Safety posture:
|
||||
- Do not restart the live Atlas/Hermes gateway from this runbook.
|
||||
- Do not change primary Atlas/Hermes routing without explicit Will approval.
|
||||
- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
|
||||
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
|
||||
- Keep endpoints local-only unless Will explicitly approves broader exposure.
|
||||
- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
|
||||
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after a real inference.
|
||||
- Keep endpoints local-only or on the approved Docker bridge only; do not add wildcard binds.
|
||||
- Keep raw prompts, private documents, OCR text, transcripts, and secrets out of logs and durable handoffs.
|
||||
- Keep operational outputs compact: booleans, counts, paths, deltas, and gates rather than raw JSON dumps.
|
||||
|
||||
## Current service map
|
||||
## Reflex-layer topology
|
||||
|
||||
| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
|
||||
| --- | ---: | --- | --- | --- | --- | --- |
|
||||
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
|
||||
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
|
||||
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
|
||||
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
|
||||
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
||||
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
||||
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
||||
| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
```text
|
||||
event / audio / doc / query / task
|
||||
-> local OpenVINO/NPU specialists
|
||||
embeddings :18817, rerank :18818, whisper :18816,
|
||||
classifier :18819, genai worker :18820, doc/image triage :18829,
|
||||
advisory gateway 172.19.0.1:18830
|
||||
-> explicit policy and authority gates
|
||||
-> Atlas/Hermes or human only when approved/useful
|
||||
```
|
||||
|
||||
Authority split:
|
||||
- NPU services may advise, label, score, transcribe, embed, rerank, triage explicit roots/files, and draft bounded summaries.
|
||||
- NPU services must not route Atlas/Hermes, write memory, send outbound messages, restart services, execute tools, mutate Kanban, or mutate vector DBs without separate approval.
|
||||
|
||||
## Live baseline services
|
||||
|
||||
These are part of the current live local baseline. Use read-only checks unless Will explicitly asks for remediation.
|
||||
|
||||
| Capability | Port / bind | Runtime / service | State | Health / proof | Notes |
|
||||
| --- | ---: | --- | --- | --- | --- |
|
||||
| Obsidian/RAG endpoint | `18810` | `obsidian-reindex-endpoint.service` / local Python endpoint | live baseline | `http://127.0.0.1:18810/healthz`; NPU proof is indirect through embeddings/rerank | Uses collection `obsidian_bge_npu`; do not mutate/reindex in place. Discovery observed `RAG_RERANK_ENABLED=true` and `RAG_RERANK_REQUIRE_NPU_PROOF=true`; do not change from this runbook. |
|
||||
| RAG/embedding health wrapper | `18814` | `rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | Health wrapper only; use compact summaries. |
|
||||
| Whisper transcription | `18816` | Docker Compose service/container `whisper-server-npu` | live baseline | `http://127.0.0.1:18816/health`; transcription response plus sysfs busy delta must increase | Use small non-private WAV fixtures for proof. Do not restart from docs. |
|
||||
| OpenVINO embeddings | `18817` | user systemd `openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz`; embedding response and sysfs delta must be positive | Model `bge-base-en-v1.5-int8-ov`, dim 768. Existing bind is broader than new-service guidance; do not broaden anything else. |
|
||||
|
||||
## Live local-only advisory specialists
|
||||
|
||||
These services are available locally for advisory/reflex work, not for authority. Some were originally prototypes but discovery/review found them active/enabled; do not reinstall or enable again blindly.
|
||||
|
||||
| Capability | Port / bind | Runtime / service | State | Health / proof | Authority boundary |
|
||||
| --- | ---: | --- | --- | --- | --- |
|
||||
| NPU reranker | `18818` localhost | `openvino-reranker.service` / `openvino-reranker-npu/` | live local specialist | `/readyz`; `/rerank` response and positive sysfs delta | Rerank only; no vector mutation. |
|
||||
| NPU router/classifier | `18819` localhost | `openvino-router-classifier.service` / `openvino-classifier-npu/` | live local specialist, dry-run/advisory | `/healthz`; `/v1/classify` response and positive sysfs delta | Labels/recommendations only; no routing, sends, memory writes, restarts, or tool execution. |
|
||||
| Small OpenVINO GenAI worker | `18820` localhost | `openvino-genai-npu-worker.service` / `openvino-genai-npu-worker/` | live local specialist; may report `loaded=false` until used | `/healthz`, `/models`; generation proof requires positive sysfs delta | Bounded draft/title/summary jobs only; not primary Atlas chat. Avoid cold-load generation unless the task requires it. |
|
||||
| Document/image triage | `18829` localhost | `openvino-doc-image-triage-npu/` | live local specialist with explicit roots | `/healthz`, `/models`; v1 NPU proof is semantic embedding through `:18817` | Request roots may narrow configured roots, never broaden. OCR/image classification are CPU/local fallbacks. |
|
||||
| Advisory gateway | `172.19.0.1:18830` approved bridge | `openvino-advisory-gateway.service` / `openvino-advisory-gateway/` | live bridge-facing advisory wrapper | `/healthz`; classify/generate/triage responses include NPU proof | For `n8n-agent` and host cron. POSTs can write metadata events, so use health-only unless classification/draft is in scope. No wildcard bind. |
|
||||
|
||||
Port notes:
|
||||
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
||||
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
|
||||
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
|
||||
- Prefer localhost for host-only sidecars. The advisory gateway bridge bind is intentionally for Docker bridge consumers such as `n8n-agent`.
|
||||
- `18828` was an earlier review alternate for doc/image triage and should not be treated as the preferred documented port.
|
||||
- Check listeners before foreground smokes: `ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b'`.
|
||||
|
||||
## Dry-run examples and approved lane artifacts
|
||||
|
||||
The first-slice lanes below are approved as dry-run/local advisory examples. They may be merged into the repo by the integration lane, but they do not grant authority to mutate live Atlas/Hermes behavior.
|
||||
|
||||
| Lane | Approved branch / commit | Artifact paths | Safe use |
|
||||
| --- | --- | --- | --- |
|
||||
| Observability/utilization digest | `feature/npu-max-observability` @ `d661dc299` | `docs/npu-utilization-digest.md`, `scripts/npu-utilization-digest.py` | Read-only compact digest; can write JSONL under `~/.local/state/npu-utilization/digests` unless `--no-write`. Reviewer verified services_ok=9/9, proof_ok=5/5 on live smoke. |
|
||||
| Context-gate advisory CLI | `feature/npu-max-context-gate` @ `b4ef90aff` | `openvino_context_gate/`, `scripts/context-gate-advisory.py` | Plans typed context bundle sources; no retrieval, routing, memory write, or private content. Classifier URL is loopback-only and redirects fail closed. |
|
||||
| Cron/n8n advisory classifier | `feature/npu-max-cron-n8n` @ `54d3bcb7` | `openvino-advisory-gateway/docs/cron-n8n-advisory-classifier.md`, `examples/cron-advisory-dry-run.sh`, `examples/n8n-advisory-dry-run-fragment.json` | Dry-run event classification: duplicate/stale/no-op/action-required -> suppress/log/summarize/escalate recommendation, then human/Atlas gate before side effects. |
|
||||
| Explicit-root batch doc/image/audio triage | `feature/npu-max-doc-audio-triage` @ `bfa62cddb` | `docs/npu-batch-triage-dry-run.md`, `scripts/npu-batch-triage-dry-run.py`, `config/triage-roots*.yaml` | Reads only approved/narrow staging roots; reports compact counts/proof; no file moves, Obsidian/RAG writes, sends, or vector mutation. Whisper endpoint override is loopback `:18816` only. |
|
||||
| Voice/audio local-file pipeline | `feature/npu-max-voice` @ `534816249` | `docs/npu-voice-audio-pipeline.md`, `scripts/npu_voice_audio_pipeline.py` | Local audio file -> Whisper NPU -> classifier NPU -> advisory gate. No platform fetching, sends, writes, memory writes, or routing changes. |
|
||||
| Kanban/task hygiene advisory | `feature/npu-max-kanban-hygiene` @ `575a3cef6` | `scripts/kanban-hygiene-advisory.py` | Reads compact board summaries and suggests labels/next gates only. Does not call Kanban tools or mutate the board. NPU proof failures dominate generic review-required gates. |
|
||||
|
||||
Dry-run command patterns:
|
||||
|
||||
```bash
|
||||
# Compact service/proof digest; no artifact write during review.
|
||||
scripts/npu-utilization-digest.py --no-write --include-genai-smoke false
|
||||
|
||||
# Local-only context-gate planning; does not retrieve private content.
|
||||
python scripts/context-gate-advisory.py --query "How do I check NPU reranker proof?" --format compact
|
||||
|
||||
# Cron/n8n event advisory wrapper; dry-run only, one compact decision line.
|
||||
openvino-advisory-gateway/examples/cron-advisory-dry-run.sh npu-service-health warning health_check "openvino-reranker timeout twice" "service:openvino-reranker:timeout"
|
||||
|
||||
# Explicit-root triage; manifest root may be narrowed by --root, never broadened.
|
||||
python scripts/npu-batch-triage-dry-run.py --manifest config/triage-roots.test.yaml --lane receipts --root openvino-doc-image-triage-npu/samples --limit 5 --dry-run --json
|
||||
|
||||
# Local-file audio advisory; transcript omitted unless explicitly requested.
|
||||
/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py --audio /tmp/npu-voice-smoke.wav --title "synthetic smoke" --source manual_smoke --json
|
||||
```
|
||||
|
||||
## Read-only unified health check
|
||||
|
||||
@@ -55,15 +114,15 @@ cd ~/lab/swarm
|
||||
./scripts/npu-service-health.sh
|
||||
```
|
||||
|
||||
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||
The script is read-only. It checks listeners for the live baseline and local specialists, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||
|
||||
Manual minimal checks:
|
||||
|
||||
```bash
|
||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||
cat "$BUSY"
|
||||
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
|
||||
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
|
||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b' || true
|
||||
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service openvino-reranker.service openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service openvino-advisory-gateway.service
|
||||
cd ~/lab/swarm && docker compose ps whisper-server-npu
|
||||
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
```
|
||||
@@ -87,23 +146,7 @@ A healthy NPU path has:
|
||||
|
||||
## Service-specific smoke checks
|
||||
|
||||
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
|
||||
|
||||
Safe foreground-server pattern:
|
||||
|
||||
```bash
|
||||
server_pid=""
|
||||
cleanup() {
|
||||
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
|
||||
kill "$server_pid"
|
||||
wait "$server_pid" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
trap cleanup EXIT
|
||||
# start prototype server with --host 127.0.0.1 --port <port> &
|
||||
# server_pid=$!
|
||||
# run curl/smoke commands, then let trap stop it
|
||||
```
|
||||
For any foreground prototype/server smoke, run it in a terminal you control or capture its PID and stop it at the end. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning unless Will explicitly approved persistent service enablement. Several specialists are already live; do not start duplicate listeners.
|
||||
|
||||
### Whisper NPU (`:18816`)
|
||||
|
||||
@@ -115,7 +158,6 @@ curl -fsS http://127.0.0.1:18816/health | jq .
|
||||
|
||||
Operational notes:
|
||||
- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
|
||||
- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
|
||||
- Do not restart it from this runbook unless Will asked for remediation.
|
||||
|
||||
### OpenVINO embeddings (`:18817`)
|
||||
@@ -127,26 +169,10 @@ curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
|
||||
Operational notes:
|
||||
- User systemd unit: `openvino-embeddings.service`.
|
||||
- Model: `bge-base-en-v1.5-int8-ov`.
|
||||
- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
|
||||
- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.
|
||||
|
||||
### Reranker prototype (`:18818`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18818\b' || true
|
||||
cd ~/lab/swarm/openvino-reranker-npu
|
||||
source /home/will/.venvs/openvino-reranker/bin/activate
|
||||
OPENVINO_RERANKER_HOST=127.0.0.1 \
|
||||
OPENVINO_RERANKER_PORT=18818 \
|
||||
OPENVINO_RERANKER_DEVICE=NPU \
|
||||
OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
|
||||
python server.py
|
||||
```
|
||||
|
||||
From another shell:
|
||||
### Reranker (`:18818`)
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18818/readyz | jq .
|
||||
@@ -154,107 +180,78 @@ python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
|
||||
- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
|
||||
- Rerank may score candidate passages only. Any change to RAG answer selection, rerank policy, or vector DB behavior requires separate approval and rollback notes.
|
||||
|
||||
### Router/classifier prototype (`:18819`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18819\b' || true
|
||||
cd ~/lab/swarm/openvino-classifier-npu
|
||||
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
|
||||
```
|
||||
|
||||
Smoke:
|
||||
### Router/classifier (`:18819`)
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18819/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
|
||||
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":false,"dry_run":true}}' | jq '{id, labels, npu_busy_delta_us, sysfs_npu_busy_delta_us}'
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
|
||||
- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
|
||||
- Must remain dry-run/advisory and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
|
||||
|
||||
### Small GenAI NPU worker (`:18820`)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18820\b' || true
|
||||
cd ~/lab/swarm/openvino-genai-npu-worker
|
||||
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18820/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18820/models | jq .
|
||||
curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded local jobs such as title, summary, notification condensation, and memory-candidate drafting after the relevant job is approved.
|
||||
- Avoid generation smokes that cold-load the model unless the task explicitly calls for it.
|
||||
|
||||
### Document/image triage prototype (`:18829` optional review port)
|
||||
|
||||
Foreground review start only, after confirming the port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18829\b' || true
|
||||
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||
```
|
||||
|
||||
Smoke:
|
||||
### Document/image triage (`:18829`)
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18829/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18829/models | jq .
|
||||
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
||||
```
|
||||
|
||||
Approval gate:
|
||||
- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
|
||||
- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
|
||||
- Do not include raw OCR text or full source paths unless Will explicitly asks for one-off debugging.
|
||||
- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.
|
||||
|
||||
## Systemd and Compose recommendations
|
||||
### Advisory gateway (`172.19.0.1:18830`)
|
||||
|
||||
Recommended management split:
|
||||
- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
|
||||
- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
|
||||
- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
|
||||
```bash
|
||||
curl -fsS http://172.19.0.1:18830/healthz | jq .
|
||||
docker exec n8n-agent wget -qO- -T 8 http://172.19.0.1:18830/healthz
|
||||
```
|
||||
|
||||
User-systemd unit expectations for optional prototypes:
|
||||
- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
|
||||
- `ExecStart` uses the existing venv path documented by the prototype.
|
||||
- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
|
||||
- `Restart=on-failure`, not aggressive restart loops.
|
||||
- Logs go to user journal; do not log raw request bodies.
|
||||
- Start manually for smoke; enable on boot only after Will approval.
|
||||
Approval gate:
|
||||
- Classification/generation/triage POSTs are advisory only and may write metadata counters. Do not wire outputs to sends, restarts, memory writes, tool execution, or Atlas/Hermes routing without a separate reviewed approval.
|
||||
|
||||
Compose expectations for existing swarm services:
|
||||
- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps <service>` for read-only checks.
|
||||
- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
|
||||
## Approval-gated / not-live integrations
|
||||
|
||||
The following remain closed even though dry-run examples and local specialists exist:
|
||||
|
||||
| Integration | Current gate |
|
||||
| --- | --- |
|
||||
| Primary Atlas/Hermes routing changes | closed; no live routing authority changes from this program slice |
|
||||
| Memory writes from NPU classifier/GenAI/advisory gateway | closed |
|
||||
| Telegram/Discord/email/outbound sends from cron/n8n/voice/advisory output | closed |
|
||||
| Service restarts or tool execution triggered by classifier/gateway output | closed |
|
||||
| Automatic Kanban task mutation, assignment, block/unblock, completion, or task creation | closed |
|
||||
| Broad private document/image/audio root processing | closed; only explicit approved/narrow roots |
|
||||
| Vector DB mutation/reindex or Chroma collection replacement | closed |
|
||||
| Wildcard binds or broader exposure for new services | closed |
|
||||
| GenAI worker as primary chat model | closed; bounded local drafts only |
|
||||
| Diffusion/image generation on the NPU | rejected/parked for this program |
|
||||
|
||||
## Monitoring and logging notes
|
||||
|
||||
Minimum recurring monitoring should include:
|
||||
- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
|
||||
- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
|
||||
- Docker Compose health for `whisper-server-npu`.
|
||||
- Listener presence for live baseline and any approved specialist ports.
|
||||
- User service state for OpenVINO services and Docker Compose health for `whisper-server-npu`.
|
||||
- HTTP health endpoint success.
|
||||
- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
|
||||
- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
|
||||
- Compact counts/deltas/gates only. Avoid raw prompts, transcripts, OCR text, private document names, credentials, and API keys.
|
||||
|
||||
Useful log commands:
|
||||
|
||||
@@ -264,23 +261,14 @@ journalctl --user -u rag-embedding-health.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-reranker.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
|
||||
journalctl --user -u openvino-advisory-gateway.service -n 100 --no-pager
|
||||
cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
|
||||
```
|
||||
|
||||
## Approval gates
|
||||
## Approved/parked outcomes
|
||||
|
||||
Requires explicit Will approval before proceeding:
|
||||
- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
|
||||
- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
|
||||
- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
|
||||
- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
|
||||
- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
|
||||
- Restarting the live Atlas/Hermes gateway.
|
||||
- Deleting, overwriting, or in-place reindexing existing vector collections.
|
||||
- Broadening bind addresses or exposure beyond local-only defaults.
|
||||
|
||||
Approved/parked outcomes:
|
||||
- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
|
||||
- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
|
||||
- Live baseline retained: RAG endpoint (`:18810`), RAG health wrapper (`:18814`), Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`).
|
||||
- Live local-only advisory/reflex specialists: reranker (`:18818`), router/classifier (`:18819`), GenAI worker (`:18820`), doc/image triage (`:18829`), advisory gateway bridge (`172.19.0.1:18830`).
|
||||
- Approved dry-run examples: utilization digest, context gate plan, cron/n8n advisory classifier, explicit-root batch triage, local-file voice/audio pipeline, Kanban hygiene advisory.
|
||||
- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
|
||||
- Rejected for this NPU program: diffusion/image generation.
|
||||
|
||||
Reference in New Issue
Block a user