ops: validate NPU sidecar health runbook

This commit is contained in:
William Valentin
2026-06-04 12:27:44 -07:00
parent 18c6bb5235
commit d6ca0391b7
2 changed files with 33 additions and 10 deletions
+5
View File
@@ -45,6 +45,10 @@ printf 'busy_path=%s\n' "$BUSY_PATH"
printf 'busy_time_us=%s\n' "$(busy_value)"
section "Listeners"
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
# alternate used to avoid collisions during prior smoke tests.
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
section "User service states"
@@ -73,6 +77,7 @@ http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true
http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true
section "Embeddings NPU busy-time proof"
if [[ ! -r "$BUSY_PATH" ]]; then
@@ -35,11 +35,11 @@ Safety posture:
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/health` | embedding response and sysfs delta must be positive |
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:<port>/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
| Document/image triage prototype | 18829 optional review port; 18828 alternate smoke port | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
Port notes:
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
@@ -55,17 +55,17 @@ cd ~/lab/swarm
./scripts/npu-service-health.sh
```
The script is read-only. It checks listeners, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
Manual minimal checks:
```bash
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
cat "$BUSY"
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
cd ~/lab/swarm && docker compose ps whisper-server-npu
curl -fsS http://127.0.0.1:18817/health | jq .
curl -fsS http://127.0.0.1:18817/healthz | jq .
```
Embedding NPU proof:
@@ -87,6 +87,24 @@ A healthy NPU path has:
## Service-specific smoke checks
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
Safe foreground-server pattern:
```bash
server_pid=""
cleanup() {
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
kill "$server_pid"
wait "$server_pid" 2>/dev/null || true
fi
}
trap cleanup EXIT
# start prototype server with --host 127.0.0.1 --port <port> &
# server_pid=$!
# run curl/smoke commands, then let trap stop it
```
### Whisper NPU (`:18816`)
```bash
@@ -104,7 +122,7 @@ Operational notes:
```bash
systemctl --user status openvino-embeddings.service --no-pager
curl -fsS http://127.0.0.1:18817/health | jq .
curl -fsS http://127.0.0.1:18817/healthz | jq .
```
Operational notes:
@@ -186,21 +204,21 @@ Approval gate:
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
### Document/image triage prototype (`:18828`/`:18829` review ports)
### Document/image triage prototype (`:18829`, with `:18828` as alternate)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep -E ':(18828|18829)\b' || true
cd ~/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD"
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
```
Smoke:
```bash
curl -fsS http://127.0.0.1:18828/healthz | jq .
curl -fsS http://127.0.0.1:18828/models | jq .
curl -fsS http://127.0.0.1:18829/healthz | jq .
curl -fsS http://127.0.0.1:18829/models | jq .
/home/will/.venvs/npu/bin/python tests/smoke_test.py
```