From 08fb9ca686b33e4ff1504e019ff7f801da59e579 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Fri, 5 Jun 2026 15:52:51 -0700 Subject: [PATCH] docs(npu): update integrated health runbooks --- docs/npu-integrated-health-ops.md | 201 ++++++++++++++ .../Runbooks/OpenVINO NPU Services Runbook.md | 256 +++++++++--------- 2 files changed, 323 insertions(+), 134 deletions(-) create mode 100644 docs/npu-integrated-health-ops.md diff --git a/docs/npu-integrated-health-ops.md b/docs/npu-integrated-health-ops.md new file mode 100644 index 0000000..9593698 --- /dev/null +++ b/docs/npu-integrated-health-ops.md @@ -0,0 +1,201 @@ +# NPU integrated health checks — operator runbook notes + +Compact, read-only operator workflow that combines the existing +`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the +reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization +and fallback report. Together they form a single safe daily / on-demand NPU +health pass. + +Scope: + +- Read-only against live services. No restarts, route changes, vector mutation, + advisory POSTs, outbound sends, or memory writes. +- No new persistent services, timers, sockets, compose services, or Dockerfiles + are introduced by this integration. Both scripts are foreground / on-demand. +- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`). + Pre-existing broader binds on the live baseline ports (`18810`, `18814`, + `18816`, `18817`) are noted in the runbook and unchanged here. +- NPU proof requires real inference plus a positive + `/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is + not sufficient. + +## When to run + +- Daily / on-demand ops check. +- After upgrades that touch the NPU stack, OpenVINO, or any of the live + specialists. +- Before any approval-gated change that depends on the NPU reflex layer. +- As the read-only verification step of a deploy or recovery runbook. + +## Required artifacts on the branch + +| Path | Role | +| --- | --- | +| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. | +| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. | +| `docs/npu-utilization-digest.md` | Per-service digest reference. | +| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). | + +## Integrated workflow + +### Step 1 — Listener and service-state snapshot + +```bash +cd ~/lab/swarm +./scripts/npu-service-health.sh +``` + +What it verifies, in order: + +1. `npu_busy_time_us` counter is readable. +2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 / + 18819 / 18820 / 18829 / 18830`. +3. User systemd services are active/enabled for embeddings, RAG health, + reranker, router/classifier, and the small GenAI worker. +4. Docker Compose `whisper-server-npu` is up. +5. Health endpoints return JSON for the live baseline and local specialists. +6. A single non-private embeddings request to `:18817` produces a positive + sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no + positive delta. + +Read the last block (`== Embeddings NPU busy-time proof ==`) first. If +`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not, +do not run the digest; triage the embeddings service first. + +### Step 2 — Per-service utilization digest + +```bash +scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text +``` + +Compact output shape: + +```text +NPU utilization digest +counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us= +services_ok=/ proof_ok=/ fallbacks= gates_closed= +- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU +- rerank: ok=true calls=1 docs=2 avg_ms=... npu_delta_us=... proof=true mode=NPU +- whisper: ok=true calls=1 jobs=1 avg_ms=... npu_delta_us=... proof=true mode=NPU +- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ... +- genai: ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load +- doc_triage: ok=true calls=1 files=1 avg_ms=... npu_delta_us=... proof=true gate=closed:private-root +- rag_endpoint: ok=true mode=health_only gate=closed:vector-mutation +- rag_health: ok=true mode=health_only +- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post +fallbacks: skipped_cold_load=1 +``` + +Read order for ops: + +1. `services_ok` row — anything below `9/9` means a service is down or unhealthy. +2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real + inference request produced a positive sysfs NPU delta. +3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is + intentionally not cold-loaded). Any other fallback label is a triage signal. +4. `gate=` labels — closed gates that remain closed by design. + +### Step 3 — Optional artifact for trend tracking + +```bash +scripts/npu-utilization-digest.py --format jsonl +``` + +Writes a single JSONL line per digest under +`/home/will/.local/state/npu-utilization/digests/.jsonl`. The first +line is the summary; subsequent lines are per-service rows. No JSONL write +happens with `--no-write`. + +### Step 4 — Offline unit tests + +```bash +python -m pytest tests/test_npu_utilization_digest.py -q +``` + +Does not require live services. Use to validate digest logic after edits or +before merging. + +## Compact proof interpretation + +For each proof-capable service, both the response-level `npu_busy_delta_us` +(when the service reports it) and the script's own sysfs before/after delta +must agree and be `> 0`. The proof is only valid when an actual inference +request ran. If a probe was skipped (`reason=skipped_cold_load` or +`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row +contributes a labeled fallback instead of a proof failure. + +Proof currently runs on: + +- `embeddings` (`:18817`) +- `rerank` (`:18818`) +- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default) +- `classifier` (`:18819`) +- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default); + proof is via the embeddings service, not directly on the NPU device, so the + row reports `mode=NPU-via-embedding-service`. + +Intentionally health-only (no proof row): + +- `rag_endpoint` (`:18810`) — closed:vector-mutation +- `rag_health` (`:18814`) +- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post + +Intentionally skipped by default: + +- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to + prove the NPU is not free, so it is treated as a labeled fallback rather + than a proof failure. Opt in with `--include-genai-smoke=true` only when the + task actually needs a generation smoke. + +## Exit codes and triage gates + +`scripts/npu-service-health.sh`: + +| Exit | Meaning | Next | +| ---: | --- | --- | +| 0 | All checks passed including embeddings proof. | Continue to digest. | +| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. | +| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. | +| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. | + +`scripts/npu-utilization-digest.py`: + +| Exit | Meaning | Next | +| ---: | --- | --- | +| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. | +| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. | + +## Approval gates left closed + +The integrated workflow intentionally does not: + +- start, stop, restart, enable, or disable any user systemd unit or Docker + Compose service; +- write to or mutate the Chroma collection `obsidian_bge_npu` or any other + vector store; +- change Atlas/Hermes routing or model defaults; +- post classification/generation/triage events to the advisory gateway; +- broaden private document, image, or audio roots; +- bind any new listener, including on `0.0.0.0`; +- write memory, send messages, execute tools, or mutate Kanban state. + +These remain approval-gated and are tracked on the `npu-maximization` board. + +## Quick reference + +```bash +# Single-pass NPU health check (listener + systemd + embeddings proof). +cd ~/lab/swarm && ./scripts/npu-service-health.sh + +# Compact digest with per-service proof and fallback accounting. +scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text + +# Same, with a JSONL artifact for trend tracking. +scripts/npu-utilization-digest.py --format jsonl + +# Strict mode for CI / pre-merge. +scripts/npu-utilization-digest.py --no-write --strict-proof + +# Offline digest logic tests. +python -m pytest tests/test_npu_utilization_digest.py -q +``` diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md index 98a7607..94b0c81 100644 --- a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md @@ -3,7 +3,7 @@ type: runbook system: openvino-npu-services status: draft created: 2026-06-04 -updated: 2026-06-04 +updated: 2026-06-05 tags: - runbook - openvino @@ -18,33 +18,92 @@ related: # OpenVINO NPU Services Runbook -This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board. +This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services after the first approved `npu-maximization` lanes. It treats the NPU as a local reflex layer: classify, embed, rerank, transcribe, triage, and draft compact advisory output while Atlas/Hermes keeps final authority unless a separate approval changes that. Safety posture: - Do not restart the live Atlas/Hermes gateway from this runbook. - Do not change primary Atlas/Hermes routing without explicit Will approval. - Do not delete, overwrite, or in-place reindex existing Chroma/vector collections. -- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference. -- Keep endpoints local-only unless Will explicitly approves broader exposure. -- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs. +- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after a real inference. +- Keep endpoints local-only or on the approved Docker bridge only; do not add wildcard binds. +- Keep raw prompts, private documents, OCR text, transcripts, and secrets out of logs and durable handoffs. +- Keep operational outputs compact: booleans, counts, paths, deltas, and gates rather than raw JSON dumps. -## Current service map +## Reflex-layer topology -| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof | -| --- | ---: | --- | --- | --- | --- | --- | -| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection | -| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured | -| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase | -| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive | -| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive | -| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` | -| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` | -| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | +```text +event / audio / doc / query / task + -> local OpenVINO/NPU specialists + embeddings :18817, rerank :18818, whisper :18816, + classifier :18819, genai worker :18820, doc/image triage :18829, + advisory gateway 172.19.0.1:18830 + -> explicit policy and authority gates + -> Atlas/Hermes or human only when approved/useful +``` + +Authority split: +- NPU services may advise, label, score, transcribe, embed, rerank, triage explicit roots/files, and draft bounded summaries. +- NPU services must not route Atlas/Hermes, write memory, send outbound messages, restart services, execute tools, mutate Kanban, or mutate vector DBs without separate approval. + +## Live baseline services + +These are part of the current live local baseline. Use read-only checks unless Will explicitly asks for remediation. + +| Capability | Port / bind | Runtime / service | State | Health / proof | Notes | +| --- | ---: | --- | --- | --- | --- | +| Obsidian/RAG endpoint | `18810` | `obsidian-reindex-endpoint.service` / local Python endpoint | live baseline | `http://127.0.0.1:18810/healthz`; NPU proof is indirect through embeddings/rerank | Uses collection `obsidian_bge_npu`; do not mutate/reindex in place. Discovery observed `RAG_RERANK_ENABLED=true` and `RAG_RERANK_REQUIRE_NPU_PROOF=true`; do not change from this runbook. | +| RAG/embedding health wrapper | `18814` | `rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | Health wrapper only; use compact summaries. | +| Whisper transcription | `18816` | Docker Compose service/container `whisper-server-npu` | live baseline | `http://127.0.0.1:18816/health`; transcription response plus sysfs busy delta must increase | Use small non-private WAV fixtures for proof. Do not restart from docs. | +| OpenVINO embeddings | `18817` | user systemd `openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz`; embedding response and sysfs delta must be positive | Model `bge-base-en-v1.5-int8-ov`, dim 768. Existing bind is broader than new-service guidance; do not broaden anything else. | + +## Live local-only advisory specialists + +These services are available locally for advisory/reflex work, not for authority. Some were originally prototypes but discovery/review found them active/enabled; do not reinstall or enable again blindly. + +| Capability | Port / bind | Runtime / service | State | Health / proof | Authority boundary | +| --- | ---: | --- | --- | --- | --- | +| NPU reranker | `18818` localhost | `openvino-reranker.service` / `openvino-reranker-npu/` | live local specialist | `/readyz`; `/rerank` response and positive sysfs delta | Rerank only; no vector mutation. | +| NPU router/classifier | `18819` localhost | `openvino-router-classifier.service` / `openvino-classifier-npu/` | live local specialist, dry-run/advisory | `/healthz`; `/v1/classify` response and positive sysfs delta | Labels/recommendations only; no routing, sends, memory writes, restarts, or tool execution. | +| Small OpenVINO GenAI worker | `18820` localhost | `openvino-genai-npu-worker.service` / `openvino-genai-npu-worker/` | live local specialist; may report `loaded=false` until used | `/healthz`, `/models`; generation proof requires positive sysfs delta | Bounded draft/title/summary jobs only; not primary Atlas chat. Avoid cold-load generation unless the task requires it. | +| Document/image triage | `18829` localhost | `openvino-doc-image-triage-npu/` | live local specialist with explicit roots | `/healthz`, `/models`; v1 NPU proof is semantic embedding through `:18817` | Request roots may narrow configured roots, never broaden. OCR/image classification are CPU/local fallbacks. | +| Advisory gateway | `172.19.0.1:18830` approved bridge | `openvino-advisory-gateway.service` / `openvino-advisory-gateway/` | live bridge-facing advisory wrapper | `/healthz`; classify/generate/triage responses include NPU proof | For `n8n-agent` and host cron. POSTs can write metadata events, so use health-only unless classification/draft is in scope. No wildcard bind. | Port notes: -- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding. -- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port. -- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`. +- Prefer localhost for host-only sidecars. The advisory gateway bridge bind is intentionally for Docker bridge consumers such as `n8n-agent`. +- `18828` was an earlier review alternate for doc/image triage and should not be treated as the preferred documented port. +- Check listeners before foreground smokes: `ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b'`. + +## Dry-run examples and approved lane artifacts + +The first-slice lanes below are approved as dry-run/local advisory examples. They may be merged into the repo by the integration lane, but they do not grant authority to mutate live Atlas/Hermes behavior. + +| Lane | Approved branch / commit | Artifact paths | Safe use | +| --- | --- | --- | --- | +| Observability/utilization digest | `feature/npu-max-observability` @ `d661dc299` | `docs/npu-utilization-digest.md`, `scripts/npu-utilization-digest.py` | Read-only compact digest; can write JSONL under `~/.local/state/npu-utilization/digests` unless `--no-write`. Reviewer verified services_ok=9/9, proof_ok=5/5 on live smoke. | +| Context-gate advisory CLI | `feature/npu-max-context-gate` @ `b4ef90aff` | `openvino_context_gate/`, `scripts/context-gate-advisory.py` | Plans typed context bundle sources; no retrieval, routing, memory write, or private content. Classifier URL is loopback-only and redirects fail closed. | +| Cron/n8n advisory classifier | `feature/npu-max-cron-n8n` @ `54d3bcb7` | `openvino-advisory-gateway/docs/cron-n8n-advisory-classifier.md`, `examples/cron-advisory-dry-run.sh`, `examples/n8n-advisory-dry-run-fragment.json` | Dry-run event classification: duplicate/stale/no-op/action-required -> suppress/log/summarize/escalate recommendation, then human/Atlas gate before side effects. | +| Explicit-root batch doc/image/audio triage | `feature/npu-max-doc-audio-triage` @ `bfa62cddb` | `docs/npu-batch-triage-dry-run.md`, `scripts/npu-batch-triage-dry-run.py`, `config/triage-roots*.yaml` | Reads only approved/narrow staging roots; reports compact counts/proof; no file moves, Obsidian/RAG writes, sends, or vector mutation. Whisper endpoint override is loopback `:18816` only. | +| Voice/audio local-file pipeline | `feature/npu-max-voice` @ `534816249` | `docs/npu-voice-audio-pipeline.md`, `scripts/npu_voice_audio_pipeline.py` | Local audio file -> Whisper NPU -> classifier NPU -> advisory gate. No platform fetching, sends, writes, memory writes, or routing changes. | +| Kanban/task hygiene advisory | `feature/npu-max-kanban-hygiene` @ `575a3cef6` | `scripts/kanban-hygiene-advisory.py` | Reads compact board summaries and suggests labels/next gates only. Does not call Kanban tools or mutate the board. NPU proof failures dominate generic review-required gates. | + +Dry-run command patterns: + +```bash +# Compact service/proof digest; no artifact write during review. +scripts/npu-utilization-digest.py --no-write --include-genai-smoke false + +# Local-only context-gate planning; does not retrieve private content. +python scripts/context-gate-advisory.py --query "How do I check NPU reranker proof?" --format compact + +# Cron/n8n event advisory wrapper; dry-run only, one compact decision line. +openvino-advisory-gateway/examples/cron-advisory-dry-run.sh npu-service-health warning health_check "openvino-reranker timeout twice" "service:openvino-reranker:timeout" + +# Explicit-root triage; manifest root may be narrowed by --root, never broadened. +python scripts/npu-batch-triage-dry-run.py --manifest config/triage-roots.test.yaml --lane receipts --root openvino-doc-image-triage-npu/samples --limit 5 --dry-run --json + +# Local-file audio advisory; transcript omitted unless explicitly requested. +/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py --audio /tmp/npu-voice-smoke.wav --title "synthetic smoke" --source manual_smoke --json +``` ## Read-only unified health check @@ -55,15 +114,15 @@ cd ~/lab/swarm ./scripts/npu-service-health.sh ``` -The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof. +The script is read-only. It checks listeners for the live baseline and local specialists, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof. Manual minimal checks: ```bash BUSY=/sys/class/accel/accel0/device/npu_busy_time_us cat "$BUSY" -ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true -systemctl --user is-active openvino-embeddings.service rag-embedding-health.service +ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b' || true +systemctl --user is-active openvino-embeddings.service rag-embedding-health.service openvino-reranker.service openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service openvino-advisory-gateway.service cd ~/lab/swarm && docker compose ps whisper-server-npu curl -fsS http://127.0.0.1:18817/healthz | jq . ``` @@ -87,23 +146,7 @@ A healthy NPU path has: ## Service-specific smoke checks -For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement. - -Safe foreground-server pattern: - -```bash -server_pid="" -cleanup() { - if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then - kill "$server_pid" - wait "$server_pid" 2>/dev/null || true - fi -} -trap cleanup EXIT -# start prototype server with --host 127.0.0.1 --port & -# server_pid=$! -# run curl/smoke commands, then let trap stop it -``` +For any foreground prototype/server smoke, run it in a terminal you control or capture its PID and stop it at the end. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning unless Will explicitly approved persistent service enablement. Several specialists are already live; do not start duplicate listeners. ### Whisper NPU (`:18816`) @@ -115,7 +158,6 @@ curl -fsS http://127.0.0.1:18816/health | jq . Operational notes: - Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`. -- Consistent with existing swarm service patterns because it is a containerized service with Compose health. - Do not restart it from this runbook unless Will asked for remediation. ### OpenVINO embeddings (`:18817`) @@ -127,26 +169,10 @@ curl -fsS http://127.0.0.1:18817/healthz | jq . Operational notes: - User systemd unit: `openvino-embeddings.service`. -- Model: `bge-base-en-v1.5-int8-ov`. - Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`. - Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place. -### Reranker prototype (`:18818`) - -Foreground review start only, after confirming port is free: - -```bash -ss -ltnp | grep ':18818\b' || true -cd ~/lab/swarm/openvino-reranker-npu -source /home/will/.venvs/openvino-reranker/bin/activate -OPENVINO_RERANKER_HOST=127.0.0.1 \ -OPENVINO_RERANKER_PORT=18818 \ -OPENVINO_RERANKER_DEVICE=NPU \ -OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \ -python server.py -``` - -From another shell: +### Reranker (`:18818`) ```bash curl -fsS http://127.0.0.1:18818/readyz | jq . @@ -154,107 +180,78 @@ python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818 ``` Approval gate: -- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval. -- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma. +- Rerank may score candidate passages only. Any change to RAG answer selection, rerank policy, or vector DB behavior requires separate approval and rollback notes. -### Router/classifier prototype (`:18819`) - -Foreground review start only, after confirming port is free: - -```bash -ss -ltnp | grep ':18819\b' || true -cd ~/lab/swarm/openvino-classifier-npu -/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819 -``` - -Smoke: +### Router/classifier (`:18819`) ```bash curl -fsS http://127.0.0.1:18819/healthz | jq . curl -fsS http://127.0.0.1:18819/v1/classify \ -H 'Content-Type: application/json' \ - -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq . + -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":false,"dry_run":true}}' | jq '{id, labels, npu_busy_delta_us, sysfs_npu_busy_delta_us}' ``` Approval gate: -- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement. -- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval. +- Must remain dry-run/advisory and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval. ### Small GenAI NPU worker (`:18820`) -Foreground review start only, after confirming port is free: - -```bash -ss -ltnp | grep ':18820\b' || true -cd ~/lab/swarm/openvino-genai-npu-worker -/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820 -``` - -Smoke: - ```bash curl -fsS http://127.0.0.1:18820/healthz | jq . curl -fsS http://127.0.0.1:18820/models | jq . -curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \ - -H 'Content-Type: application/json' \ - -d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq . ``` Approval gate: -- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement. -- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting. +- Must not become primary Atlas/Hermes model routing. Use only for bounded local jobs such as title, summary, notification condensation, and memory-candidate drafting after the relevant job is approved. +- Avoid generation smokes that cold-load the model unless the task explicitly calls for it. -### Document/image triage prototype (`:18829` optional review port) - -Foreground review start only, after confirming the port is free: - -```bash -ss -ltnp | grep ':18829\b' || true -cd ~/lab/swarm/openvino-doc-image-triage-npu -/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" -``` - -Smoke: +### Document/image triage (`:18829`) ```bash curl -fsS http://127.0.0.1:18829/healthz | jq . curl -fsS http://127.0.0.1:18829/models | jq . -/home/will/.venvs/npu/bin/python tests/smoke_test.py ``` Approval gate: - Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots. -- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response. +- Do not include raw OCR text or full source paths unless Will explicitly asks for one-off debugging. - v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks. -## Systemd and Compose recommendations +### Advisory gateway (`172.19.0.1:18830`) -Recommended management split: -- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`). -- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker). -- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates. +```bash +curl -fsS http://172.19.0.1:18830/healthz | jq . +docker exec n8n-agent wget -qO- -T 8 http://172.19.0.1:18830/healthz +``` -User-systemd unit expectations for optional prototypes: -- `WorkingDirectory` points at the service directory under `~/lab/swarm/`. -- `ExecStart` uses the existing venv path documented by the prototype. -- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint. -- `Restart=on-failure`, not aggressive restart loops. -- Logs go to user journal; do not log raw request bodies. -- Start manually for smoke; enable on boot only after Will approval. +Approval gate: +- Classification/generation/triage POSTs are advisory only and may write metadata counters. Do not wire outputs to sends, restarts, memory writes, tool execution, or Atlas/Hermes routing without a separate reviewed approval. -Compose expectations for existing swarm services: -- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps ` for read-only checks. -- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval. +## Approval-gated / not-live integrations + +The following remain closed even though dry-run examples and local specialists exist: + +| Integration | Current gate | +| --- | --- | +| Primary Atlas/Hermes routing changes | closed; no live routing authority changes from this program slice | +| Memory writes from NPU classifier/GenAI/advisory gateway | closed | +| Telegram/Discord/email/outbound sends from cron/n8n/voice/advisory output | closed | +| Service restarts or tool execution triggered by classifier/gateway output | closed | +| Automatic Kanban task mutation, assignment, block/unblock, completion, or task creation | closed | +| Broad private document/image/audio root processing | closed; only explicit approved/narrow roots | +| Vector DB mutation/reindex or Chroma collection replacement | closed | +| Wildcard binds or broader exposure for new services | closed | +| GenAI worker as primary chat model | closed; bounded local drafts only | +| Diffusion/image generation on the NPU | rejected/parked for this program | ## Monitoring and logging notes Minimum recurring monitoring should include: -- Listener presence for `18816`, `18817`, and any approved optional prototype ports. -- User service state for `openvino-embeddings.service` and any approved optional prototype unit. -- Docker Compose health for `whisper-server-npu`. +- Listener presence for live baseline and any approved specialist ports. +- User service state for OpenVINO services and Docker Compose health for `whisper-server-npu`. - HTTP health endpoint success. - Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central. -- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys. +- Compact counts/deltas/gates only. Avoid raw prompts, transcripts, OCR text, private document names, credentials, and API keys. Useful log commands: @@ -264,23 +261,14 @@ journalctl --user -u rag-embedding-health.service -n 100 --no-pager journalctl --user -u openvino-reranker.service -n 100 --no-pager journalctl --user -u openvino-router-classifier.service -n 100 --no-pager journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager +journalctl --user -u openvino-advisory-gateway.service -n 100 --no-pager cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu ``` -## Approval gates +## Approved/parked outcomes -Requires explicit Will approval before proceeding: -- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`. -- Assigning a final persistent port to document/image triage or enabling it as a persistent service. -- Enabling live RAG reranking or any request path that changes Atlas/RAG answers. -- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions. -- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications. -- Restarting the live Atlas/Hermes gateway. -- Deleting, overwriting, or in-place reindexing existing vector collections. -- Broadening bind addresses or exposure beyond local-only defaults. - -Approved/parked outcomes: -- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`). -- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`. +- Live baseline retained: RAG endpoint (`:18810`), RAG health wrapper (`:18814`), Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`). +- Live local-only advisory/reflex specialists: reranker (`:18818`), router/classifier (`:18819`), GenAI worker (`:18820`), doc/image triage (`:18829`), advisory gateway bridge (`172.19.0.1:18830`). +- Approved dry-run examples: utilization digest, context gate plan, cron/n8n advisory classifier, explicit-root batch triage, local-file voice/audio pipeline, Kanban hygiene advisory. - Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case. - Rejected for this NPU program: diffusion/image generation.