ops: validate NPU sidecar health runbook
This commit is contained in:
@@ -45,6 +45,10 @@ printf 'busy_path=%s\n' "$BUSY_PATH"
|
||||
printf 'busy_time_us=%s\n' "$(busy_value)"
|
||||
|
||||
section "Listeners"
|
||||
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
|
||||
# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
|
||||
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
|
||||
# alternate used to avoid collisions during prior smoke tests.
|
||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
||||
|
||||
section "User service states"
|
||||
@@ -73,6 +77,7 @@ http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true
|
||||
http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
|
||||
http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
|
||||
http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
|
||||
http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true
|
||||
|
||||
section "Embeddings NPU busy-time proof"
|
||||
if [[ ! -r "$BUSY_PATH" ]]; then
|
||||
|
||||
+28
-10
@@ -35,11 +35,11 @@ Safety posture:
|
||||
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
|
||||
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
|
||||
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
|
||||
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/health` | embedding response and sysfs delta must be positive |
|
||||
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
|
||||
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
||||
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
||||
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
||||
| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:<port>/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
| Document/image triage prototype | 18829 optional review port; 18828 alternate smoke port | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
|
||||
Port notes:
|
||||
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
||||
@@ -55,17 +55,17 @@ cd ~/lab/swarm
|
||||
./scripts/npu-service-health.sh
|
||||
```
|
||||
|
||||
The script is read-only. It checks listeners, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||
|
||||
Manual minimal checks:
|
||||
|
||||
```bash
|
||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||
cat "$BUSY"
|
||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
||||
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
|
||||
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
|
||||
cd ~/lab/swarm && docker compose ps whisper-server-npu
|
||||
curl -fsS http://127.0.0.1:18817/health | jq .
|
||||
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
```
|
||||
|
||||
Embedding NPU proof:
|
||||
@@ -87,6 +87,24 @@ A healthy NPU path has:
|
||||
|
||||
## Service-specific smoke checks
|
||||
|
||||
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
|
||||
|
||||
Safe foreground-server pattern:
|
||||
|
||||
```bash
|
||||
server_pid=""
|
||||
cleanup() {
|
||||
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
|
||||
kill "$server_pid"
|
||||
wait "$server_pid" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
trap cleanup EXIT
|
||||
# start prototype server with --host 127.0.0.1 --port <port> &
|
||||
# server_pid=$!
|
||||
# run curl/smoke commands, then let trap stop it
|
||||
```
|
||||
|
||||
### Whisper NPU (`:18816`)
|
||||
|
||||
```bash
|
||||
@@ -104,7 +122,7 @@ Operational notes:
|
||||
|
||||
```bash
|
||||
systemctl --user status openvino-embeddings.service --no-pager
|
||||
curl -fsS http://127.0.0.1:18817/health | jq .
|
||||
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||
```
|
||||
|
||||
Operational notes:
|
||||
@@ -186,21 +204,21 @@ Approval gate:
|
||||
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
||||
|
||||
### Document/image triage prototype (`:18828`/`:18829` review ports)
|
||||
### Document/image triage prototype (`:18829`, with `:18828` as alternate)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep -E ':(18828|18829)\b' || true
|
||||
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD"
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18828/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18828/models | jq .
|
||||
curl -fsS http://127.0.0.1:18829/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18829/models | jq .
|
||||
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user