docs(npu): update integrated health runbooks

2026-06-05 15:52:51 -07:00
parent 9e5ffa0fd0
commit 08fb9ca686
2 changed files with 323 additions and 134 deletions
@@ -0,0 +1,201 @@
+# NPU integrated health checks — operator runbook notes
+
+Compact, read-only operator workflow that combines the existing
+`scripts/npu-service-health.sh` listener/systemd/embedding-proof probe with the
+reviewer-approved `scripts/npu-utilization-digest.py` per-service utilization
+and fallback report. Together they form a single safe daily / on-demand NPU
+health pass.
+
+Scope:
+
+- Read-only against live services. No restarts, route changes, vector mutation,
+  advisory POSTs, outbound sends, or memory writes.
+- No new persistent services, timers, sockets, compose services, or Dockerfiles
+  are introduced by this integration. Both scripts are foreground / on-demand.
+- Binds verified local-only or on the approved Docker bridge (`172.19.0.1:18830`).
+  Pre-existing broader binds on the live baseline ports (`18810`, `18814`,
+  `18816`, `18817`) are noted in the runbook and unchanged here.
+- NPU proof requires real inference plus a positive
+  `/sys/class/accel/accel0/device/npu_busy_time_us` delta. HTTP 200 alone is
+  not sufficient.
+
+## When to run
+
+- Daily / on-demand ops check.
+- After upgrades that touch the NPU stack, OpenVINO, or any of the live
+  specialists.
+- Before any approval-gated change that depends on the NPU reflex layer.
+- As the read-only verification step of a deploy or recovery runbook.
+
+## Required artifacts on the branch
+
+| Path | Role |
+| --- | --- |
+| `scripts/npu-service-health.sh` | Listener / systemd / Docker / health endpoint / single embedding proof. Existing baseline script. |
+| `scripts/npu-utilization-digest.py` | Per-service utilization digest with NPU proof per probe, compact text or JSONL output, optional JSONL artifact. |
+| `docs/npu-utilization-digest.md` | Per-service digest reference. |
+| `tests/test_npu_utilization_digest.py` | Offline unit tests for the digest (no live services required). |
+
+## Integrated workflow
+
+### Step 1 — Listener and service-state snapshot
+
+```bash
+cd ~/lab/swarm
+./scripts/npu-service-health.sh
+```
+
+What it verifies, in order:
+
+1. `npu_busy_time_us` counter is readable.
+2. Required listeners are present on `18810 / 18814 / 18816 / 18817 / 18818 /
+   18819 / 18820 / 18829 / 18830`.
+3. User systemd services are active/enabled for embeddings, RAG health,
+   reranker, router/classifier, and the small GenAI worker.
+4. Docker Compose `whisper-server-npu` is up.
+5. Health endpoints return JSON for the live baseline and local specialists.
+6. A single non-private embeddings request to `:18817` produces a positive
+   sysfs `npu_busy_time_us` delta; the script exits nonzero if there is no
+   positive delta.
+
+Read the last block (`== Embeddings NPU busy-time proof ==`) first. If
+`result=ok` and `sysfs_delta_us > 0`, the central NPU path is healthy. If not,
+do not run the digest; triage the embeddings service first.
+
+### Step 2 — Per-service utilization digest
+
+```bash
+scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
+```
+
+Compact output shape:
+
+```text
+NPU utilization digest <timestamp>
+counter=/sys/class/accel/accel0/device/npu_busy_time_us delta_us=<total>
+services_ok=<ok>/<total> proof_ok=<ok>/<proof-capable> fallbacks=<n> gates_closed=<n>
+- embeddings: ok=true calls=1 avg_ms=... npu_delta_us=... proof=true mode=NPU
+- rerank:     ok=true calls=1 docs=2   avg_ms=... npu_delta_us=... proof=true mode=NPU
+- whisper:    ok=true calls=1 jobs=1   avg_ms=... npu_delta_us=... proof=true mode=NPU
+- classifier: ok=true calls=1 events=1 avg_ms=... npu_delta_us=... proof=true dry_run=true ...
+- genai:      ok=true jobs=0 loaded=false mode=loaded=false reason=skipped_cold_load
+- doc_triage: ok=true calls=1 files=1  avg_ms=... npu_delta_us=... proof=true gate=closed:private-root
+- rag_endpoint:   ok=true mode=health_only gate=closed:vector-mutation
+- rag_health:     ok=true mode=health_only
+- advisory_gateway: ok=true mode=health_only gate=closed:advisory-post
+fallbacks: skipped_cold_load=1
+```
+
+Read order for ops:
+
+1. `services_ok` row — anything below `9/9` means a service is down or unhealthy.
+2. `proof_ok` row — `proof_ok=5/5` means every probe that ran with a real
+   inference request produced a positive sysfs NPU delta.
+3. `fallbacks:` line — `skipped_cold_load=1` is expected (GenAI worker is
+   intentionally not cold-loaded). Any other fallback label is a triage signal.
+4. `gate=` labels — closed gates that remain closed by design.
+
+### Step 3 — Optional artifact for trend tracking
+
+```bash
+scripts/npu-utilization-digest.py --format jsonl
+```
+
+Writes a single JSONL line per digest under
+`/home/will/.local/state/npu-utilization/digests/<timestamp>.jsonl`. The first
+line is the summary; subsequent lines are per-service rows. No JSONL write
+happens with `--no-write`.
+
+### Step 4 — Offline unit tests
+
+```bash
+python -m pytest tests/test_npu_utilization_digest.py -q
+```
+
+Does not require live services. Use to validate digest logic after edits or
+before merging.
+
+## Compact proof interpretation
+
+For each proof-capable service, both the response-level `npu_busy_delta_us`
+(when the service reports it) and the script's own sysfs before/after delta
+must agree and be `> 0`. The proof is only valid when an actual inference
+request ran. If a probe was skipped (`reason=skipped_cold_load` or
+`reason=smoke_disabled`), `proof_ok` for that row is `None` and the row
+contributes a labeled fallback instead of a proof failure.
+
+Proof currently runs on:
+
+- `embeddings` (`:18817`)
+- `rerank` (`:18818`)
+- `whisper` (`:18816`) when `--include-whisper-smoke=true` (default)
+- `classifier` (`:18819`)
+- `doc_triage` (`:18829`) when `--include-doc-triage-smoke=true` (default);
+  proof is via the embeddings service, not directly on the NPU device, so the
+  row reports `mode=NPU-via-embedding-service`.
+
+Intentionally health-only (no proof row):
+
+- `rag_endpoint` (`:18810`) — closed:vector-mutation
+- `rag_health` (`:18814`)
+- `advisory_gateway` (`172.19.0.1:18830`) — closed:advisory-post
+
+Intentionally skipped by default:
+
+- `genai` (`:18820`) — `loaded=false` until first use; cold-loading just to
+  prove the NPU is not free, so it is treated as a labeled fallback rather
+  than a proof failure. Opt in with `--include-genai-smoke=true` only when the
+  task actually needs a generation smoke.
+
+## Exit codes and triage gates
+
+`scripts/npu-service-health.sh`:
+
+| Exit | Meaning | Next |
+| ---: | --- | --- |
+| 0 | All checks passed including embeddings proof. | Continue to digest. |
+| 2 | `npu_busy_time_us` not readable. | Check kernel/driver; do not run digest. |
+| 3 | Embedding request failed. | Triage `openvino-embeddings.service` and port `:18817`. |
+| 4 | Embedding request succeeded but sysfs delta `<= 0`. | Service reachable but not on the NPU; check service logs and device bind. |
+
+`scripts/npu-utilization-digest.py`:
+
+| Exit | Meaning | Next |
+| ---: | --- | --- |
+| 0 | All reachable services handled; proof/fallback accounting completed. | Inspect `proof_ok` and `fallbacks:` for any unexpected labels. |
+| 2 | `--strict-proof` was set and at least one proof-required probe ran without a positive sysfs delta. | Triage the named service's NPU path. |
+
+## Approval gates left closed
+
+The integrated workflow intentionally does not:
+
+- start, stop, restart, enable, or disable any user systemd unit or Docker
+  Compose service;
+- write to or mutate the Chroma collection `obsidian_bge_npu` or any other
+  vector store;
+- change Atlas/Hermes routing or model defaults;
+- post classification/generation/triage events to the advisory gateway;
+- broaden private document, image, or audio roots;
+- bind any new listener, including on `0.0.0.0`;
+- write memory, send messages, execute tools, or mutate Kanban state.
+
+These remain approval-gated and are tracked on the `npu-maximization` board.
+
+## Quick reference
+
+```bash
+# Single-pass NPU health check (listener + systemd + embeddings proof).
+cd ~/lab/swarm && ./scripts/npu-service-health.sh
+
+# Compact digest with per-service proof and fallback accounting.
+scripts/npu-utilization-digest.py --no-write --include-genai-smoke false --format text
+
+# Same, with a JSONL artifact for trend tracking.
+scripts/npu-utilization-digest.py --format jsonl
+
+# Strict mode for CI / pre-merge.
+scripts/npu-utilization-digest.py --no-write --strict-proof
+
+# Offline digest logic tests.
+python -m pytest tests/test_npu_utilization_digest.py -q
+```
@@ -3,7 +3,7 @@ type: runbook
 system: openvino-npu-services
 status: draft
 created: 2026-06-04
-updated: 2026-06-04
+updated: 2026-06-05
 tags:
  - runbook
  - openvino
@@ -18,33 +18,92 @@ related:

 # OpenVINO NPU Services Runbook

-This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
+This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services after the first approved `npu-maximization` lanes. It treats the NPU as a local reflex layer: classify, embed, rerank, transcribe, triage, and draft compact advisory output while Atlas/Hermes keeps final authority unless a separate approval changes that.

 Safety posture:
 - Do not restart the live Atlas/Hermes gateway from this runbook.
 - Do not change primary Atlas/Hermes routing without explicit Will approval.
 - Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
- Keep endpoints local-only unless Will explicitly approves broader exposure.
- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
+- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after a real inference.
+- Keep endpoints local-only or on the approved Docker bridge only; do not add wildcard binds.
+- Keep raw prompts, private documents, OCR text, transcripts, and secrets out of logs and durable handoffs.
+- Keep operational outputs compact: booleans, counts, paths, deltas, and gates rather than raw JSON dumps.

-## Current service map
+## Reflex-layer topology

-| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
-| --- | ---: | --- | --- | --- | --- | --- |
-| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
-| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
-| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
-| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
-| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
-| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
-| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
-| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
+```text
+event / audio / doc / query / task
+  -> local OpenVINO/NPU specialists
+       embeddings :18817, rerank :18818, whisper :18816,
+       classifier :18819, genai worker :18820, doc/image triage :18829,
+       advisory gateway 172.19.0.1:18830
+  -> explicit policy and authority gates
+  -> Atlas/Hermes or human only when approved/useful
+```
+
+Authority split:
+- NPU services may advise, label, score, transcribe, embed, rerank, triage explicit roots/files, and draft bounded summaries.
+- NPU services must not route Atlas/Hermes, write memory, send outbound messages, restart services, execute tools, mutate Kanban, or mutate vector DBs without separate approval.
+
+## Live baseline services
+
+These are part of the current live local baseline. Use read-only checks unless Will explicitly asks for remediation.
+
+| Capability | Port / bind | Runtime / service | State | Health / proof | Notes |
+| --- | ---: | --- | --- | --- | --- |
+| Obsidian/RAG endpoint | `18810` | `obsidian-reindex-endpoint.service` / local Python endpoint | live baseline | `http://127.0.0.1:18810/healthz`; NPU proof is indirect through embeddings/rerank | Uses collection `obsidian_bge_npu`; do not mutate/reindex in place. Discovery observed `RAG_RERANK_ENABLED=true` and `RAG_RERANK_REQUIRE_NPU_PROOF=true`; do not change from this runbook. |
+| RAG/embedding health wrapper | `18814` | `rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | Health wrapper only; use compact summaries. |
+| Whisper transcription | `18816` | Docker Compose service/container `whisper-server-npu` | live baseline | `http://127.0.0.1:18816/health`; transcription response plus sysfs busy delta must increase | Use small non-private WAV fixtures for proof. Do not restart from docs. |
+| OpenVINO embeddings | `18817` | user systemd `openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz`; embedding response and sysfs delta must be positive | Model `bge-base-en-v1.5-int8-ov`, dim 768. Existing bind is broader than new-service guidance; do not broaden anything else. |
+
+## Live local-only advisory specialists
+
+These services are available locally for advisory/reflex work, not for authority. Some were originally prototypes but discovery/review found them active/enabled; do not reinstall or enable again blindly.
+
+| Capability | Port / bind | Runtime / service | State | Health / proof | Authority boundary |
+| --- | ---: | --- | --- | --- | --- |
+| NPU reranker | `18818` localhost | `openvino-reranker.service` / `openvino-reranker-npu/` | live local specialist | `/readyz`; `/rerank` response and positive sysfs delta | Rerank only; no vector mutation. |
+| NPU router/classifier | `18819` localhost | `openvino-router-classifier.service` / `openvino-classifier-npu/` | live local specialist, dry-run/advisory | `/healthz`; `/v1/classify` response and positive sysfs delta | Labels/recommendations only; no routing, sends, memory writes, restarts, or tool execution. |
+| Small OpenVINO GenAI worker | `18820` localhost | `openvino-genai-npu-worker.service` / `openvino-genai-npu-worker/` | live local specialist; may report `loaded=false` until used | `/healthz`, `/models`; generation proof requires positive sysfs delta | Bounded draft/title/summary jobs only; not primary Atlas chat. Avoid cold-load generation unless the task requires it. |
+| Document/image triage | `18829` localhost | `openvino-doc-image-triage-npu/` | live local specialist with explicit roots | `/healthz`, `/models`; v1 NPU proof is semantic embedding through `:18817` | Request roots may narrow configured roots, never broaden. OCR/image classification are CPU/local fallbacks. |
+| Advisory gateway | `172.19.0.1:18830` approved bridge | `openvino-advisory-gateway.service` / `openvino-advisory-gateway/` | live bridge-facing advisory wrapper | `/healthz`; classify/generate/triage responses include NPU proof | For `n8n-agent` and host cron. POSTs can write metadata events, so use health-only unless classification/draft is in scope. No wildcard bind. |

 Port notes:
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
+- Prefer localhost for host-only sidecars. The advisory gateway bridge bind is intentionally for Docker bridge consumers such as `n8n-agent`.
+- `18828` was an earlier review alternate for doc/image triage and should not be treated as the preferred documented port.
+- Check listeners before foreground smokes: `ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b'`.
+
+## Dry-run examples and approved lane artifacts
+
+The first-slice lanes below are approved as dry-run/local advisory examples. They may be merged into the repo by the integration lane, but they do not grant authority to mutate live Atlas/Hermes behavior.
+
+| Lane | Approved branch / commit | Artifact paths | Safe use |
+| --- | --- | --- | --- |
+| Observability/utilization digest | `feature/npu-max-observability` @ `d661dc299` | `docs/npu-utilization-digest.md`, `scripts/npu-utilization-digest.py` | Read-only compact digest; can write JSONL under `~/.local/state/npu-utilization/digests` unless `--no-write`. Reviewer verified services_ok=9/9, proof_ok=5/5 on live smoke. |
+| Context-gate advisory CLI | `feature/npu-max-context-gate` @ `b4ef90aff` | `openvino_context_gate/`, `scripts/context-gate-advisory.py` | Plans typed context bundle sources; no retrieval, routing, memory write, or private content. Classifier URL is loopback-only and redirects fail closed. |
+| Cron/n8n advisory classifier | `feature/npu-max-cron-n8n` @ `54d3bcb7` | `openvino-advisory-gateway/docs/cron-n8n-advisory-classifier.md`, `examples/cron-advisory-dry-run.sh`, `examples/n8n-advisory-dry-run-fragment.json` | Dry-run event classification: duplicate/stale/no-op/action-required -> suppress/log/summarize/escalate recommendation, then human/Atlas gate before side effects. |
+| Explicit-root batch doc/image/audio triage | `feature/npu-max-doc-audio-triage` @ `bfa62cddb` | `docs/npu-batch-triage-dry-run.md`, `scripts/npu-batch-triage-dry-run.py`, `config/triage-roots*.yaml` | Reads only approved/narrow staging roots; reports compact counts/proof; no file moves, Obsidian/RAG writes, sends, or vector mutation. Whisper endpoint override is loopback `:18816` only. |
+| Voice/audio local-file pipeline | `feature/npu-max-voice` @ `534816249` | `docs/npu-voice-audio-pipeline.md`, `scripts/npu_voice_audio_pipeline.py` | Local audio file -> Whisper NPU -> classifier NPU -> advisory gate. No platform fetching, sends, writes, memory writes, or routing changes. |
+| Kanban/task hygiene advisory | `feature/npu-max-kanban-hygiene` @ `575a3cef6` | `scripts/kanban-hygiene-advisory.py` | Reads compact board summaries and suggests labels/next gates only. Does not call Kanban tools or mutate the board. NPU proof failures dominate generic review-required gates. |
+
+Dry-run command patterns:
+
+```bash
+# Compact service/proof digest; no artifact write during review.
+scripts/npu-utilization-digest.py --no-write --include-genai-smoke false
+
+# Local-only context-gate planning; does not retrieve private content.
+python scripts/context-gate-advisory.py --query "How do I check NPU reranker proof?" --format compact
+
+# Cron/n8n event advisory wrapper; dry-run only, one compact decision line.
+openvino-advisory-gateway/examples/cron-advisory-dry-run.sh npu-service-health warning health_check "openvino-reranker timeout twice" "service:openvino-reranker:timeout"
+
+# Explicit-root triage; manifest root may be narrowed by --root, never broadened.
+python scripts/npu-batch-triage-dry-run.py --manifest config/triage-roots.test.yaml --lane receipts --root openvino-doc-image-triage-npu/samples --limit 5 --dry-run --json
+
+# Local-file audio advisory; transcript omitted unless explicitly requested.
+/home/will/.venvs/npu/bin/python scripts/npu_voice_audio_pipeline.py --audio /tmp/npu-voice-smoke.wav --title "synthetic smoke" --source manual_smoke --json
+```

 ## Read-only unified health check

@@ -55,15 +114,15 @@ cd ~/lab/swarm
 ./scripts/npu-service-health.sh
 ```

-The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
+The script is read-only. It checks listeners for the live baseline and local specialists, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.

 Manual minimal checks:

 ```bash
 BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
 cat "$BUSY"
-ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
-systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
+ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18829|18830)\b' || true
+systemctl --user is-active openvino-embeddings.service rag-embedding-health.service openvino-reranker.service openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service openvino-advisory-gateway.service
 cd ~/lab/swarm && docker compose ps whisper-server-npu
 curl -fsS http://127.0.0.1:18817/healthz | jq .
 ```
@@ -87,23 +146,7 @@ A healthy NPU path has:

 ## Service-specific smoke checks

-For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
-
-Safe foreground-server pattern:
-
-```bash
-server_pid=""
-cleanup() {
-  if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
-    kill "$server_pid"
-    wait "$server_pid" 2>/dev/null || true
-  fi
-}
-trap cleanup EXIT
-# start prototype server with --host 127.0.0.1 --port <port> &
-# server_pid=$!
-# run curl/smoke commands, then let trap stop it
-```
+For any foreground prototype/server smoke, run it in a terminal you control or capture its PID and stop it at the end. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning unless Will explicitly approved persistent service enablement. Several specialists are already live; do not start duplicate listeners.

 ### Whisper NPU (`:18816`)

@@ -115,7 +158,6 @@ curl -fsS http://127.0.0.1:18816/health | jq .

 Operational notes:
 - Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
 - Do not restart it from this runbook unless Will asked for remediation.

 ### OpenVINO embeddings (`:18817`)
@@ -127,26 +169,10 @@ curl -fsS http://127.0.0.1:18817/healthz | jq .

 Operational notes:
 - User systemd unit: `openvino-embeddings.service`.
- Model: `bge-base-en-v1.5-int8-ov`.
 - Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
 - Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.

-### Reranker prototype (`:18818`)
-
-Foreground review start only, after confirming port is free:
-
-```bash
-ss -ltnp | grep ':18818\b' || true
-cd ~/lab/swarm/openvino-reranker-npu
-source /home/will/.venvs/openvino-reranker/bin/activate
-OPENVINO_RERANKER_HOST=127.0.0.1 \
-OPENVINO_RERANKER_PORT=18818 \
-OPENVINO_RERANKER_DEVICE=NPU \
-OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
-python server.py
-```
-
-From another shell:
+### Reranker (`:18818`)

 ```bash
 curl -fsS http://127.0.0.1:18818/readyz | jq .
@@ -154,107 +180,78 @@ python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
 ```

 Approval gate:
- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
+- Rerank may score candidate passages only. Any change to RAG answer selection, rerank policy, or vector DB behavior requires separate approval and rollback notes.

-### Router/classifier prototype (`:18819`)
-
-Foreground review start only, after confirming port is free:
-
-```bash
-ss -ltnp | grep ':18819\b' || true
-cd ~/lab/swarm/openvino-classifier-npu
-/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
-```
-
-Smoke:
+### Router/classifier (`:18819`)

 ```bash
 curl -fsS http://127.0.0.1:18819/healthz | jq .
 curl -fsS http://127.0.0.1:18819/v1/classify \
  -H 'Content-Type: application/json' \
-  -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
+  -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":false,"dry_run":true}}' | jq '{id, labels, npu_busy_delta_us, sysfs_npu_busy_delta_us}'
 ```

 Approval gate:
- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
+- Must remain dry-run/advisory and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.

 ### Small GenAI NPU worker (`:18820`)

-Foreground review start only, after confirming port is free:
-
-```bash
-ss -ltnp | grep ':18820\b' || true
-cd ~/lab/swarm/openvino-genai-npu-worker
-/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
-```
-
-Smoke:
-
 ```bash
 curl -fsS http://127.0.0.1:18820/healthz | jq .
 curl -fsS http://127.0.0.1:18820/models | jq .
-curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
-  -H 'Content-Type: application/json' \
-  -d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
 ```

 Approval gate:
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
+- Must not become primary Atlas/Hermes model routing. Use only for bounded local jobs such as title, summary, notification condensation, and memory-candidate drafting after the relevant job is approved.
+- Avoid generation smokes that cold-load the model unless the task explicitly calls for it.

-### Document/image triage prototype (`:18829` optional review port)
-
-Foreground review start only, after confirming the port is free:
-
-```bash
-ss -ltnp | grep ':18829\b' || true
-cd ~/lab/swarm/openvino-doc-image-triage-npu
-/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
-```
-
-Smoke:
+### Document/image triage (`:18829`)

 ```bash
 curl -fsS http://127.0.0.1:18829/healthz | jq .
 curl -fsS http://127.0.0.1:18829/models | jq .
-/home/will/.venvs/npu/bin/python tests/smoke_test.py
 ```

 Approval gate:
 - Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
+- Do not include raw OCR text or full source paths unless Will explicitly asks for one-off debugging.
 - v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.

-## Systemd and Compose recommendations
+### Advisory gateway (`172.19.0.1:18830`)

-Recommended management split:
- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
+```bash
+curl -fsS http://172.19.0.1:18830/healthz | jq .
+docker exec n8n-agent wget -qO- -T 8 http://172.19.0.1:18830/healthz
+```

-User-systemd unit expectations for optional prototypes:
- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
- `ExecStart` uses the existing venv path documented by the prototype.
- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
- `Restart=on-failure`, not aggressive restart loops.
- Logs go to user journal; do not log raw request bodies.
- Start manually for smoke; enable on boot only after Will approval.
+Approval gate:
+- Classification/generation/triage POSTs are advisory only and may write metadata counters. Do not wire outputs to sends, restarts, memory writes, tool execution, or Atlas/Hermes routing without a separate reviewed approval.

-Compose expectations for existing swarm services:
- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps <service>` for read-only checks.
- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
+## Approval-gated / not-live integrations
+
+The following remain closed even though dry-run examples and local specialists exist:
+
+| Integration | Current gate |
+| --- | --- |
+| Primary Atlas/Hermes routing changes | closed; no live routing authority changes from this program slice |
+| Memory writes from NPU classifier/GenAI/advisory gateway | closed |
+| Telegram/Discord/email/outbound sends from cron/n8n/voice/advisory output | closed |
+| Service restarts or tool execution triggered by classifier/gateway output | closed |
+| Automatic Kanban task mutation, assignment, block/unblock, completion, or task creation | closed |
+| Broad private document/image/audio root processing | closed; only explicit approved/narrow roots |
+| Vector DB mutation/reindex or Chroma collection replacement | closed |
+| Wildcard binds or broader exposure for new services | closed |
+| GenAI worker as primary chat model | closed; bounded local drafts only |
+| Diffusion/image generation on the NPU | rejected/parked for this program |

 ## Monitoring and logging notes

 Minimum recurring monitoring should include:
- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
- Docker Compose health for `whisper-server-npu`.
+- Listener presence for live baseline and any approved specialist ports.
+- User service state for OpenVINO services and Docker Compose health for `whisper-server-npu`.
 - HTTP health endpoint success.
 - Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
+- Compact counts/deltas/gates only. Avoid raw prompts, transcripts, OCR text, private document names, credentials, and API keys.

 Useful log commands:

@@ -264,23 +261,14 @@ journalctl --user -u rag-embedding-health.service -n 100 --no-pager
 journalctl --user -u openvino-reranker.service -n 100 --no-pager
 journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
 journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
+journalctl --user -u openvino-advisory-gateway.service -n 100 --no-pager
 cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
 ```

-## Approval gates
+## Approved/parked outcomes

-Requires explicit Will approval before proceeding:
- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
- Restarting the live Atlas/Hermes gateway.
- Deleting, overwriting, or in-place reindexing existing vector collections.
- Broadening bind addresses or exposure beyond local-only defaults.
-
-Approved/parked outcomes:
- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
+- Live baseline retained: RAG endpoint (`:18810`), RAG health wrapper (`:18814`), Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`).
+- Live local-only advisory/reflex specialists: reranker (`:18818`), router/classifier (`:18819`), GenAI worker (`:18820`), doc/image triage (`:18829`), advisory gateway bridge (`172.19.0.1:18830`).
+- Approved dry-run examples: utilization digest, context gate plan, cron/n8n advisory classifier, explicit-root batch triage, local-file voice/audio pipeline, Kanban hygiene advisory.
 - Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
 - Rejected for this NPU program: diffusion/image generation.