ops: validate NPU sidecar health runbook
This commit is contained in:
@@ -45,6 +45,10 @@ printf 'busy_path=%s\n' "$BUSY_PATH"
|
|||||||
printf 'busy_time_us=%s\n' "$(busy_value)"
|
printf 'busy_time_us=%s\n' "$(busy_value)"
|
||||||
|
|
||||||
section "Listeners"
|
section "Listeners"
|
||||||
|
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
|
||||||
|
# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
|
||||||
|
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
|
||||||
|
# alternate used to avoid collisions during prior smoke tests.
|
||||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
||||||
|
|
||||||
section "User service states"
|
section "User service states"
|
||||||
@@ -73,6 +77,7 @@ http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true
|
|||||||
http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
|
http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
|
||||||
http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
|
http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
|
||||||
http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
|
http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
|
||||||
|
http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true
|
||||||
|
|
||||||
section "Embeddings NPU busy-time proof"
|
section "Embeddings NPU busy-time proof"
|
||||||
if [[ ! -r "$BUSY_PATH" ]]; then
|
if [[ ! -r "$BUSY_PATH" ]]; then
|
||||||
|
|||||||
+28
-10
@@ -35,11 +35,11 @@ Safety posture:
|
|||||||
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
|
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
|
||||||
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
|
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
|
||||||
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
|
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
|
||||||
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/health` | embedding response and sysfs delta must be positive |
|
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
|
||||||
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
||||||
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
||||||
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
||||||
| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:<port>/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
| Document/image triage prototype | 18829 optional review port; 18828 alternate smoke port | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||||
|
|
||||||
Port notes:
|
Port notes:
|
||||||
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
||||||
@@ -55,17 +55,17 @@ cd ~/lab/swarm
|
|||||||
./scripts/npu-service-health.sh
|
./scripts/npu-service-health.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
The script is read-only. It checks listeners, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
|
||||||
|
|
||||||
Manual minimal checks:
|
Manual minimal checks:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||||
cat "$BUSY"
|
cat "$BUSY"
|
||||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
|
||||||
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
|
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
|
||||||
cd ~/lab/swarm && docker compose ps whisper-server-npu
|
cd ~/lab/swarm && docker compose ps whisper-server-npu
|
||||||
curl -fsS http://127.0.0.1:18817/health | jq .
|
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||||
```
|
```
|
||||||
|
|
||||||
Embedding NPU proof:
|
Embedding NPU proof:
|
||||||
@@ -87,6 +87,24 @@ A healthy NPU path has:
|
|||||||
|
|
||||||
## Service-specific smoke checks
|
## Service-specific smoke checks
|
||||||
|
|
||||||
|
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
|
||||||
|
|
||||||
|
Safe foreground-server pattern:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
server_pid=""
|
||||||
|
cleanup() {
|
||||||
|
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
|
||||||
|
kill "$server_pid"
|
||||||
|
wait "$server_pid" 2>/dev/null || true
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
trap cleanup EXIT
|
||||||
|
# start prototype server with --host 127.0.0.1 --port <port> &
|
||||||
|
# server_pid=$!
|
||||||
|
# run curl/smoke commands, then let trap stop it
|
||||||
|
```
|
||||||
|
|
||||||
### Whisper NPU (`:18816`)
|
### Whisper NPU (`:18816`)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -104,7 +122,7 @@ Operational notes:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
systemctl --user status openvino-embeddings.service --no-pager
|
systemctl --user status openvino-embeddings.service --no-pager
|
||||||
curl -fsS http://127.0.0.1:18817/health | jq .
|
curl -fsS http://127.0.0.1:18817/healthz | jq .
|
||||||
```
|
```
|
||||||
|
|
||||||
Operational notes:
|
Operational notes:
|
||||||
@@ -186,21 +204,21 @@ Approval gate:
|
|||||||
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
||||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
||||||
|
|
||||||
### Document/image triage prototype (`:18828`/`:18829` review ports)
|
### Document/image triage prototype (`:18829`, with `:18828` as alternate)
|
||||||
|
|
||||||
Foreground review start only, after confirming port is free:
|
Foreground review start only, after confirming port is free:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ss -ltnp | grep -E ':(18828|18829)\b' || true
|
ss -ltnp | grep -E ':(18828|18829)\b' || true
|
||||||
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
||||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD"
|
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||||
```
|
```
|
||||||
|
|
||||||
Smoke:
|
Smoke:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl -fsS http://127.0.0.1:18828/healthz | jq .
|
curl -fsS http://127.0.0.1:18829/healthz | jq .
|
||||||
curl -fsS http://127.0.0.1:18828/models | jq .
|
curl -fsS http://127.0.0.1:18829/models | jq .
|
||||||
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user