diff --git a/scripts/npu-service-health.sh b/scripts/npu-service-health.sh index 460e0b2..23849f5 100755 --- a/scripts/npu-service-health.sh +++ b/scripts/npu-service-health.sh @@ -45,6 +45,10 @@ printf 'busy_path=%s\n' "$BUSY_PATH" printf 'busy_time_us=%s\n' "$(busy_value)" section "Listeners" +# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817, +# approved prototypes 18818/18819/18820, and optional doc/image triage 18829. +# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only +# alternate used to avoid collisions during prior smoke tests. ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true section "User service states" @@ -73,6 +77,7 @@ http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true +http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true section "Embeddings NPU busy-time proof" if [[ ! -r "$BUSY_PATH" ]]; then diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md index 05d3206..7867fe8 100644 --- a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md @@ -35,11 +35,11 @@ Safety posture: | Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection | | RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured | | Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase | -| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/health` | embedding response and sysfs delta must be positive | +| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive | | NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive | | NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` | | Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` | -| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | +| Document/image triage prototype | 18829 optional review port; 18828 alternate smoke port | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | Port notes: - `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding. @@ -55,17 +55,17 @@ cd ~/lab/swarm ./scripts/npu-service-health.sh ``` -The script is read-only. It checks listeners, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof. +The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof. Manual minimal checks: ```bash BUSY=/sys/class/accel/accel0/device/npu_busy_time_us cat "$BUSY" -ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true +ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true systemctl --user is-active openvino-embeddings.service rag-embedding-health.service cd ~/lab/swarm && docker compose ps whisper-server-npu -curl -fsS http://127.0.0.1:18817/health | jq . +curl -fsS http://127.0.0.1:18817/healthz | jq . ``` Embedding NPU proof: @@ -87,6 +87,24 @@ A healthy NPU path has: ## Service-specific smoke checks +For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement. + +Safe foreground-server pattern: + +```bash +server_pid="" +cleanup() { + if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then + kill "$server_pid" + wait "$server_pid" 2>/dev/null || true + fi +} +trap cleanup EXIT +# start prototype server with --host 127.0.0.1 --port & +# server_pid=$! +# run curl/smoke commands, then let trap stop it +``` + ### Whisper NPU (`:18816`) ```bash @@ -104,7 +122,7 @@ Operational notes: ```bash systemctl --user status openvino-embeddings.service --no-pager -curl -fsS http://127.0.0.1:18817/health | jq . +curl -fsS http://127.0.0.1:18817/healthz | jq . ``` Operational notes: @@ -186,21 +204,21 @@ Approval gate: - May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement. - Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting. -### Document/image triage prototype (`:18828`/`:18829` review ports) +### Document/image triage prototype (`:18829`, with `:18828` as alternate) Foreground review start only, after confirming port is free: ```bash ss -ltnp | grep -E ':(18828|18829)\b' || true cd ~/lab/swarm/openvino-doc-image-triage-npu -/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD" +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" ``` Smoke: ```bash -curl -fsS http://127.0.0.1:18828/healthz | jq . -curl -fsS http://127.0.0.1:18828/models | jq . +curl -fsS http://127.0.0.1:18829/healthz | jq . +curl -fsS http://127.0.0.1:18829/models | jq . /home/will/.venvs/npu/bin/python tests/smoke_test.py ```