diff --git a/README.md b/README.md
index 48b9fa6..3ff6a2d 100644
--- a/README.md
+++ b/README.md
@@ -19,6 +19,7 @@ swarm/
│ └── vm/ # VM provisioning role (local)
├── openclaw/ # Live mirror of guest ~/.openclaw/
├── docker-compose.yaml # LiteLLM + supporting services
+├── docs/ # Swarm/agentmon/n8n infrastructure docs + diagrams
├── litellm-config.yaml # LiteLLM static config
├── litellm-init-credentials.sh # Register API keys into LiteLLM DB
├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent)
@@ -29,6 +30,15 @@ swarm/
└── README.md # This file
```
+## Current swarm/service architecture
+
+For the current host-side AI/search/voice automation stack, n8n watchdogs, and agentmon monitoring layer, see:
+
+- [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks
+- [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram
+- [`docs/diagram-maintenance.md`](docs/diagram-maintenance.md) — diagram upkeep conventions
+- OpenVINO NPU services and prototypes are documented in `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` and the component READMEs under `openvino-*-npu*/`. Live baseline ports are RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; sidecar ports `:18818`, `:18819`, `:18820`, and optional doc/image triage `:18829` are approved prototypes only, not live Atlas/Hermes routing.
+
## VM: zap
| Property | Value |
diff --git a/docs/diagram-maintenance.md b/docs/diagram-maintenance.md
new file mode 100644
index 0000000..675b9cf
--- /dev/null
+++ b/docs/diagram-maintenance.md
@@ -0,0 +1,66 @@
+# Diagram maintenance
+
+Keep infrastructure diagrams current as first-class documentation, not as one-off screenshots.
+
+## Current diagrams
+
+- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — full Atlas/Hermes + n8n + agentmon + local AI/search/voice topology.
+
+## When to update an existing diagram
+
+Update the relevant diagram in the same change set when you change any of these:
+
+- service topology, ports, or container names
+- monitoring or alerting paths
+- n8n workflow architecture
+- Hermes/Atlas routing or gateway responsibilities
+- local AI/search/voice endpoints
+- OpenVINO NPU live/prototype status, ports, or safety gates (`:18810`, `:18816`, `:18817`, `:18818`, `:18819`, `:18820`, optional `:18829`)
+- Obsidian/RAG data flow
+- OpenClaw/VM operational mode
+- ownership/source-of-truth paths for a component
+
+## When to create a new diagram
+
+Create a new focused diagram when the existing overview would become too dense. Good candidates:
+
+- n8n workflow family or alerting-only diagram
+- agentmon internals: collectors → NATS → processor → Postgres → query/UI
+- Obsidian/RAG automation pipeline
+- local AI routing: Hermes/LiteLLM/llama.cpp/Ollama/provider boundaries
+- OpenVINO NPU assistant sidecars, with live baseline and approved/not-live prototype lanes separated
+- messaging/channel routing: Telegram/Discord/email → Hermes/n8n/alerts
+- disaster recovery / backup topology
+
+## Style rules
+
+- Prefer standalone `.html` files with inline SVG so they render offline in any browser.
+- Keep the source file committed alongside the docs; do not rely on generated screenshots as the only artifact.
+- Link diagrams from the nearest README or operational doc.
+- Keep labels operational: service name, port, responsibility, and data direction.
+- Avoid secrets, credential names that imply secret values, private tokens, raw webhook URLs, or sensitive sample payloads.
+- Do not imply live Atlas/Hermes/RAG routing to an OpenVINO NPU prototype unless a reviewed implementation actually enabled it; label approved prototypes as `not live` or `approval required`.
+- If a raw export or live config was used to build the diagram, commit only the sanitized diagram/docs, not the raw sensitive source.
+
+## Verification before committing
+
+```bash
+# Check the files are valid text and do not contain obvious secret markers
+python - <<'PY'
+from pathlib import Path
+for p in Path('docs').glob('*.html'):
+ text = p.read_text()
+ hits = [s for s in ['api_key', 'token', 'password', 'Authorization', 'Bearer ', 'secret'] if s.lower() in text.lower()]
+ print(p, hits)
+PY
+
+# Inspect targeted diff only
+git diff --stat -- docs README.md
+```
+
+After editing diagrams, commit with a docs-focused message, for example:
+
+```bash
+git add docs/*.md docs/*.html README.md
+git commit -m "docs: update swarm infrastructure diagrams"
+```
diff --git a/docs/swarm-infrastructure.html b/docs/swarm-infrastructure.html
new file mode 100644
index 0000000..edc6862
--- /dev/null
+++ b/docs/swarm-infrastructure.html
@@ -0,0 +1,115 @@
+
+
+
+
+
+ Will's Swarm Infrastructure
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Hermes gateway layer
+
+ n8n + agentmon observability
+
+ local swarm services
+
+
+ Telegram DM/groups
+ Discord #ops-alerts
+ Email Gmail IMAP
+
+
+ Atlas / Hermes default profile gateway tools • memory • specialists
+
+
+ n8n-agent automation workflows :18808 host / :5678 container
+ agentmon-query aggregate snapshots/API :8081 /v1/events
+ agentmon pipeline ingest :8080 NATS JetStream event processor Postgres DB web UI :8082 swarm.snapshot + openclaw.snapshot
+
+
+ LiteLLM LLM router + DB :18804
+ Search SearXNG + Brave MCP :18803 / :18802
+ Voice Kokoro + Whisper :18805 / :18816
+ Docker services agentmon.monitor=true swarm/service snapshots
+ OpenClaw VMs currently dormant openclaw.snapshot
+ Obsidian / RAG RAG endpoint :18810 Chroma obsidian_bge_npu
+ NPU sidecars approved prototypes; not live :18818/:18819/:18820/:18829
+
+
+ host local AI llama.cpp :18806 Ollama fallback :18807 OpenVINO embed :18817 live Whisper NPU :18816 live
+
+
+
+ Legend
+ Gateway/Search/Voice
+ Automation/API
+ Data/AI stores
+ Event bus/pipeline
+ Monitoring / not-live prototype flows
+
+
+
+
+
Monitoring model • n8n direct probes critical ports • agentmon aggregates Docker/OpenClaw snapshots • n8n polls agentmon for stale/degraded state
+
Operational endpoints • n8n: 127.0.0.1:18808 • agentmon query/UI: 8081 / 8082 • live NPU: RAG 18810, Whisper 18816, embeddings 18817 • prototypes not live-routed: 18818/18819/18820/18829
+
Source paths • Swarm repo: ~/lab/swarm • Agentmon repo: ~/lab/agentmon • Workflows: swarm-common/n8n-workflows
+
+
+
+
+
diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md
new file mode 100644
index 0000000..dd47587
--- /dev/null
+++ b/docs/swarm-infrastructure.md
@@ -0,0 +1,250 @@
+# Swarm Infrastructure
+
+This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the `zap` workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.
+
+## High-level topology
+
+```text
+Telegram / Discord / Email
+ |
+ v
+Hermes / Atlas gateway (default profile)
+ |
+ +--> local tools and specialist profiles
+ +--> n8n automation workflows on :18808
+
+n8n automation
+ |
+ +--> direct watchdog probes for key service ports
+ +--> Agentmon Health Watchdog -> agentmon-query :8081
+ +--> Obsidian, RAG, voice memo, URL capture, digest workflows
+
+agentmon
+ |
+ +--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
+ +--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
+ +--> NATS JetStream -> event processor -> Postgres
+ +--> query API / UI on :8081 / :8082
+
+local AI/search/voice services
+ |
+ +--> LiteLLM :18804
+ +--> SearXNG :18803
+ +--> Brave MCP :18802
+ +--> llama.cpp :18806
+ +--> Ollama embeddings :18807 (legacy/CPU fallback)
+ +--> OpenVINO NPU embeddings :18817
+ +--> Kokoro TTS :18805
+ +--> Whisper NPU :18816
+ +--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829
+```
+
+See also:
+
+- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — visual architecture diagram
+- [`diagram-maintenance.md`](./diagram-maintenance.md) — how to keep diagrams updated and when to create new ones
+
+## Runtime layers
+
+### 1. Messaging and agent gateway
+
+- **Hermes / Atlas default profile** is the production messaging gateway.
+- Connected platforms include Telegram, Discord, and email.
+- Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
+- Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.
+
+### 2. n8n automation
+
+Container/service:
+
+- `n8n-agent`
+- Host URL: `http://127.0.0.1:18808`
+- Container URL: `http://127.0.0.1:5678`
+- Compose project: `/home/will/lab/swarm/docker-compose.yaml`
+
+Important workflow source exports live under:
+
+- `swarm-common/n8n-workflows/`
+
+Current health/automation patterns:
+
+- **Swarm Health Watchdog**: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
+- **Agentmon Health Watchdog**: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
+- **RAG and Embedding Health Watchdog**: checks RAG/search/embedding path.
+- Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.
+
+### 3. Agentmon monitoring layer
+
+Repo:
+
+- `/home/will/lab/agentmon`
+
+Compose services:
+
+- `agentmon-ingest` on `:8080` — ingestion gateway, `/healthz`
+- `agentmon-query` on `:8081` — query API, `/healthz`, `/v1/events`, `/v1/stats/summary`
+- `agentmon-ui` on `:8082` — web UI, `/healthz`
+- `agentmon-processor` — NATS to Postgres event processor
+- `agentmon-swarm-monitor` — monitors Docker containers labeled `agentmon.monitor=true`
+- `agentmon-openclaw-monitor` — emits OpenClaw VM snapshots
+- `agentmon-db` — Postgres
+- `agentmon-nats` — NATS JetStream
+
+Key query endpoints:
+
+```text
+http://127.0.0.1:8080/healthz
+http://127.0.0.1:8081/healthz
+http://127.0.0.1:8082/healthz
+http://127.0.0.1:8081/v1/stats/summary
+http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
+http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
+http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3
+```
+
+From inside `n8n-agent`, use the Docker bridge gateway:
+
+```text
+http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
+```
+
+### 4. Local AI, search, and voice services
+
+Docker services:
+
+- `litellm` — `:18804`, OpenAI-compatible LLM router
+- `litellm-db` — Postgres backing LiteLLM
+- `searxng` — `:18803`, local metasearch
+- `brave-search` — `:18802`, Brave Search MCP server
+- `kokoro-tts` — `:18805`, local TTS
+- `whisper-server-npu` — `:18816`, OpenVINO NPU local transcription
+- `n8n-agent` — `:18808`, automation
+
+Host/user services:
+
+- `llama-server.service` — `:18806`, local llama.cpp OpenAI-compatible LLM
+- `ollama.service` — `:18807`, legacy/CPU embeddings API fallback
+- `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
+- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n
+- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings
+- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction
+- `voice-memo-processor.service` — `:18813`, voice memo processing
+- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper
+
+Approved but not live-routed OpenVINO NPU sidecars:
+
+| Port | Component | State | Safety boundary |
+| ---: | --- | --- | --- |
+| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves |
+| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
+| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing |
+| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` |
+
+These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
+
+### 5. Obsidian and RAG
+
+Vault:
+
+- `/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap`
+
+Local REST API:
+
+- HTTP: `127.0.0.1:27123`
+- HTTPS: `127.0.0.1:27124`
+
+RAG/vector store:
+
+- ChromaDB path: `~/.hermes/data/rag-search/chroma/`
+- Reindex state/progress: active BGE/NPU state in `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` and `obsidian_bge_npu_reindex_progress.json`; legacy Ollama state in `obsidian_index_state.json` remains for comparison/fallback.
+- Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`, collection `obsidian_bge_npu`.
+- Legacy comparison/fallback collection: `obsidian`, built with Ollama on `:18807` using `nomic-embed-text`.
+- Reindex endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test.
+
+## Monitoring model
+
+The monitoring design is intentionally layered:
+
+1. **n8n direct probes** check critical service endpoints and send deduped alerts.
+2. **agentmon** continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
+3. **n8n Agentmon Health Watchdog** polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
+4. **Hermes/Atlas** can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.
+
+This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.
+
+## Agentmon Health Watchdog
+
+Workflow source:
+
+- `swarm-common/n8n-workflows/agentmon-health-watchdog.json`
+
+Installed n8n workflow:
+
+- Name: `Agentmon Health Watchdog`
+- ID: `AgentmonHealthWatchdog`
+- Schedule: every 5 minutes
+
+Alert conditions:
+
+- `agentmon-ingest`, `agentmon-query`, or `agentmon-ui` `/healthz` fails.
+- Latest `swarm.snapshot` is missing.
+- Latest `swarm.snapshot` is older than 3 minutes.
+- Snapshot issues are non-empty.
+- Required agentmon services are missing or not healthy/running:
+ - `agentmon-ingest`
+ - `agentmon-query`
+ - `agentmon-ui`
+ - `agentmon-processor`
+ - `agentmon-swarm-monitor`
+ - `agentmon-db`
+ - `agentmon-nats`
+
+Deduplication:
+
+- Alert after 2 failed checks.
+- Reminder every 6 failed runs.
+- Recovery message when state returns healthy.
+
+## Operational quick checks
+
+From the host:
+
+```bash
+cd /home/will/lab/swarm
+make status
+make local-ai-health
+./scripts/npu-service-health.sh # read-only; includes sysfs busy-time proof for :18817
+curl -fsS http://127.0.0.1:18808/healthz
+curl -fsS http://127.0.0.1:8081/healthz
+curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .
+```
+
+From inside `n8n-agent`:
+
+```bash
+docker exec n8n-agent /bin/sh -lc '
+ wget -qO- -T 5 http://172.19.0.1:8081/healthz
+ wget -qO- -T 5 "http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1" | head -c 500
+'
+```
+
+Verify n8n workflow activation:
+
+```bash
+docker exec -u node n8n-agent n8n export:workflow \
+ --id=AgentmonHealthWatchdog \
+ --output=/tmp/agentmon-export.json
+
+docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
+jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
+```
+
+## Notes and pitfalls
+
+- Do not commit `.env`, decrypted credentials, raw credential exports, or runtime DB files.
+- n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
+- From host, use `127.0.0.1:`.
+- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services.
+- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
+- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
+- OpenVINO NPU sidecars on `:18818`, `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. Do not draw live Atlas/Hermes/RAG arrows to them in diagrams until that approval and implementation actually exist.
diff --git a/scripts/npu-service-health.sh b/scripts/npu-service-health.sh
new file mode 100755
index 0000000..23849f5
--- /dev/null
+++ b/scripts/npu-service-health.sh
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+# Read-only health probe for Will's local OpenVINO/NPU services.
+# This script intentionally does not start, stop, restart, enable, reindex, or route anything.
+
+BUSY_PATH=${BUSY_PATH:-/sys/class/accel/accel0/device/npu_busy_time_us}
+CURL_TIMEOUT=${CURL_TIMEOUT:-8}
+EMBED_MODEL=${EMBED_MODEL:-bge-base-en-v1.5-int8-ov}
+EMBED_URL=${EMBED_URL:-http://127.0.0.1:18817/v1/embeddings}
+
+have() { command -v "$1" >/dev/null 2>&1; }
+
+json_pretty() {
+ if have jq; then
+ jq .
+ else
+ python -m json.tool
+ fi
+}
+
+section() {
+ printf '\n== %s ==\n' "$1"
+}
+
+http_json() {
+ local name=$1 url=$2
+ printf '\n[%s] %s\n' "$name" "$url"
+ if ! curl -fsS --max-time "$CURL_TIMEOUT" "$url" | json_pretty; then
+ printf 'status=unavailable_or_non_json\n'
+ return 1
+ fi
+}
+
+busy_value() {
+ if [[ -r "$BUSY_PATH" ]]; then
+ tr -d '\n' < "$BUSY_PATH"
+ else
+ printf 'missing'
+ fi
+}
+
+section "NPU counter"
+printf 'busy_path=%s\n' "$BUSY_PATH"
+printf 'busy_time_us=%s\n' "$(busy_value)"
+
+section "Listeners"
+# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
+# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
+# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
+# alternate used to avoid collisions during prior smoke tests.
+ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
+
+section "User service states"
+for unit in \
+ openvino-embeddings.service \
+ rag-embedding-health.service \
+ openvino-reranker.service \
+ openvino-router-classifier.service \
+ openvino-genai-npu-worker.service; do
+ active=$(systemctl --user is-active "$unit" 2>/dev/null || true)
+ enabled=$(systemctl --user is-enabled "$unit" 2>/dev/null || true)
+ printf '%-38s active=%-10s enabled=%s\n' "$unit" "${active:-unknown}" "${enabled:-unknown}"
+done
+
+section "Docker service states"
+if [[ -d /home/will/lab/swarm ]]; then
+ (cd /home/will/lab/swarm && docker compose ps whisper-server-npu 2>/dev/null) || true
+fi
+
+section "HTTP health"
+http_json "RAG endpoint" "http://127.0.0.1:18810/healthz" || true
+http_json "RAG/embedding health wrapper" "http://127.0.0.1:18814/healthz" || true
+http_json "Whisper NPU" "http://127.0.0.1:18816/health" || true
+http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true
+# Prototypes are expected to be unavailable until explicitly started/approved.
+http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true
+http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true
+http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true
+http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true
+
+section "Embeddings NPU busy-time proof"
+if [[ ! -r "$BUSY_PATH" ]]; then
+ printf 'result=failed reason=missing_busy_counter\n'
+ exit 2
+fi
+before=$(busy_value)
+response=$(curl -fsS --max-time "$CURL_TIMEOUT" \
+ "$EMBED_URL" \
+ -H 'Content-Type: application/json' \
+ -d "{\"input\":\"non-private npu health probe\",\"model\":\"$EMBED_MODEL\"}" || true)
+after=$(busy_value)
+if [[ -z "$response" ]]; then
+ printf 'result=failed reason=embedding_request_failed before_us=%s after_us=%s\n' "$before" "$after"
+ exit 3
+fi
+delta=$((after - before))
+printf 'sysfs_before_us=%s\nsysfs_after_us=%s\nsysfs_delta_us=%s\n' "$before" "$after" "$delta"
+RESPONSE_JSON="$response" python - <<'PY' || true
+import json, os
+try:
+ data = json.loads(os.environ.get('RESPONSE_JSON', ''))
+except Exception as exc:
+ print(f'response_parse_error={type(exc).__name__}: {exc}')
+ raise SystemExit(0)
+print(f"response_object={data.get('object')}")
+print(f"response_model={data.get('model')}")
+print(f"response_npu_busy_delta_us={data.get('npu_busy_delta_us')}")
+print(f"embedding_count={len(data.get('data', []))}")
+PY
+if (( delta <= 0 )); then
+ printf 'result=failed reason=no_positive_sysfs_npu_delta\n'
+ exit 4
+fi
+printf 'result=ok\n'
diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md
new file mode 100644
index 0000000..8711888
--- /dev/null
+++ b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md
@@ -0,0 +1,309 @@
+---
+type: service-catalog
+created: 2026-05-14T14:50:46-07:00
+updated: 2026-06-04T11:35:00-07:00
+tags:
+ - service-catalog
+ - swarm
+ - hermes
+ - automation
+---
+
+# Service Catalog
+
+Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs.
+
+> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted.
+
+## Quick links
+
+- [[Ops Home]]
+- [[Obsidian Automation Health]]
+- [[Obsidian Plugin Setup]]
+- [[Runbooks Home]]
+- [[Projects Home]]
+- [[Decisions Home]]
+
+## Primary repositories and config locations
+
+| Area | Path / command | Purpose |
+| --- | --- | --- |
+| Swarm repo | `~/lab/swarm` | Docker services, n8n, local AI, OpenClaw helpers, service scripts |
+| Swarm Makefile | `cd ~/lab/swarm && make help` | Authoritative operations target list |
+| n8n workflow exports | `~/lab/swarm/swarm-common/n8n-workflows/` | Versioned workflow backups |
+| Shared Obsidian vault | `~/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap` | Active API-backed vault |
+| Hermes config | `~/.hermes/config.yaml` | Atlas/Hermes model, tools, gateway, profiles |
+| Hermes env/secrets | `~/.hermes/.env` | Secrets; do not print or commit |
+| Hermes source | `~/.hermes/hermes-agent` | Atlas local source checkout |
+| Hermes skills | `~/.hermes/skills/` | Procedural docs and reusable playbooks |
+
+## Local endpoints
+
+| Service | Port | Status | Purpose | Health / base URL |
+| --- | --- | --- | --- | --- |
+| Brave Search MCP | 18802 | HTTP 406 on plain GET `/mcp` | Brave Search MCP server for Hermes MCP tools | `http://127.0.0.1:18802/mcp` |
+| SearXNG | 18803 | OK 200 | SearXNG metasearch | `http://127.0.0.1:18803/search?q=test&format=json` |
+| LiteLLM | 18804 | no listener / HTTP 000 on 2026-05-27 | LiteLLM OpenAI-compatible model proxy | `http://127.0.0.1:18804/health/liveliness` |
+| Kokoro TTS | 18805 | OK 200 | Kokoro local TTS | `http://127.0.0.1:18805/health` |
+| llama.cpp | 18806 | OK 200 | llama.cpp local LLM | `http://127.0.0.1:18806/v1/models` |
+| Ollama embeddings | 18807 | OK 200 | Ollama embeddings API | `http://127.0.0.1:18807/api/version` |
+| n8n | 18808 | OK 200 | n8n workflow automation | `http://127.0.0.1:18808/healthz` |
+| Docker health | 18809 | OK 200 | Docker/container health API | `http://127.0.0.1:18809/health` |
+| Obsidian reindex | 18810 | OK 200 | Obsidian/RAG reindex trigger | `http://127.0.0.1:18810/healthz` |
+| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` |
+| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` |
+| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` |
+| RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
+| Whisper OpenVINO NPU | 18816 | OK 200 / Docker healthy on 2026-06-04 | Intel NPU Whisper transcription service | `http://127.0.0.1:18816/health` |
+| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` |
+| OpenVINO NPU reranker prototype | 18818 | approved prototype; not enabled live | Optional second-stage RAG reranker | `http://127.0.0.1:18818/readyz` |
+| OpenVINO router/classifier prototype | 18819 | approved prototype; not enabled live | Dry-run Atlas/Hermes message classifier/router | `http://127.0.0.1:18819/healthz` |
+| OpenVINO GenAI NPU worker prototype | 18820 | approved prototype; not enabled live | Bounded local background generation worker | `http://127.0.0.1:18820/healthz` |
+| OpenVINO document/image triage prototype | 18828/18829 | approved foreground prototype; not enabled live | Local document/image triage with NPU embeddings stage via `:18817` | `http://127.0.0.1:/healthz` |
+| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` |
+
+## Docker services
+
+| Container | Status | Ports |
+| --- | --- | --- |
+| n8n-agent | Up 21 hours (healthy) | 0.0.0.0:18808->5678/tcp, [::]:18808->5678/tcp |
+| whisper-server-gpu | Up 27 hours (healthy) | 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp |
+| whisper-server | Up 27 hours (healthy) | 0.0.0.0:18811->8080/tcp, [::]:18811->8080/tcp |
+| kokoro-tts | Up 25 hours | 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp |
+| brave-search | Up 25 hours | 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp |
+| searxng | Up 25 hours | 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp |
+
+Management commands:
+
+```bash
+cd ~/lab/swarm
+make ps
+make status
+make local-ai-health
+make api-health
+make timers
+./scripts/npu-service-health.sh
+```
+
+## Host-side systemd/user services
+
+Important known services:
+
+| Unit | Purpose |
+| --- | --- |
+| `llama-server.service` | Host-side llama.cpp local LLM on 18806 |
+| `ollama.service` | Host-side Ollama embeddings on 18807 |
+| `docker-health-endpoint.service` | Container health API on 18809 |
+| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 |
+| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 |
+| `voice-memo-processor.service` | Voice memo processing on 18813 |
+| `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 |
+| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 |
+| `openvino-reranker.service` | Optional NPU reranker prototype on 18818; not installed/enabled without approval |
+| `openvino-router-classifier.service` | Optional dry-run router/classifier prototype on 18819; not installed/enabled without approval |
+| `openvino-genai-npu-worker.service` | Optional bounded GenAI worker prototype on 18820; not installed/enabled without approval |
+
+Useful checks:
+
+```bash
+systemctl --user list-units '*obsidian*' '*rag*' '*url-content*' '*voice-memo*' '*docker-health*' --all
+systemctl --user list-timers
+journalctl --user -u obsidian-reindex-endpoint.service -n 50 --no-pager
+```
+
+## n8n workflows
+
+n8n UI/API: `http://127.0.0.1:18808`
+
+| Workflow | ID | State |
+| --- | --- | --- |
+| Calendar to Obsidian Notes | QRCCdHNXZUHc2Oz4 | inactive |
+| Daily OpenClaw Session Digest | qqYwAD05AvRHrHPc | inactive |
+| Evening Digest | PlZywwqL8MRNEAN6 | active |
+| Gmail Inbox Monitor + Obsidian Notes | whtdorf7yJMVYeHm | active |
+| IMAP Inbox Triage + Obsidian Notes | 9sFwRyUDz51csAp7 | active |
+| IMAP Inbox Triage + Obsidian Notes (squareffect) | xjUoQf97TkBrawc8 | inactive |
+| IMAP Inbox Triage + Obsidian Notes (wills-portal) | kHDK9QdUSiAJ8rCM | inactive |
+| Morning Brief | g3IdGZCK1EtTsv9T | active |
+| n8n Failure Digest | G9ylNbHbnJ6fWX2C | active |
+| Nightly Obsidian Vault Sync | 75JCevkdgkyCr2qH | inactive |
+| Obsidian Chat Summary Capture | LF3i86l3NkxpayxL | active |
+| Obsidian Daily Review | YZyJ5G0Ur8D6TlM8 | active |
+| Obsidian Health + Reindex | PCtD3PuQjzKLyEEE | active |
+| Obsidian Inbox Triage | 6SKSZWZwuJNwuO2P | active |
+| Obsidian URL to Note | Ori3Bu5u5ODtxxyD | active |
+| Obsidian Vault Reindex | 85ntyyphDJ4Ms2b4 | active |
+| Obsidian Weekly Decision Runbook Extractor | UWLMOQQVxbTX6Sis | active |
+| OpenClaw Action Bus | Jwi54VWMdlLqYnRo | inactive |
+| OpenClaw Reminder Webhook | RUR1CGn0ikkxbPin | inactive |
+| RAG and Embedding Health Watchdog | SwKaPtYqUJrakpFu | active |
+| Swarm Health Watchdog | lDKocSFXBQWQrDd3 | active |
+| Voice Memo Capture (Audio URL + Local Whisper) | El1BHJZ56JlzhrRZ | active |
+| Web-to-Notes Capture (Local LLM + Obsidian) | GSmzuA5dgGgyRg5v | active |
+
+Obsidian webhook endpoints:
+
+| Workflow | Method / URL | Input |
+| --- | --- | --- |
+| Obsidian Chat Summary Capture | `POST http://127.0.0.1:18808/webhook/obsidian-chat-summary` | JSON with `type`, `title`, `summary`, `content`, optional `tags`, `metadata` |
+| Obsidian URL to Note | `POST http://127.0.0.1:18808/webhook/obsidian-url-to-note` | JSON with `url`, optional `folder`, `tags`, `notes` |
+
+## Hermes capabilities
+
+### Enabled toolsets
+
+| Toolset | Description |
+| --- | --- |
+| web | 🔍 Web Search & Scraping |
+| browser | 🌐 Browser Automation |
+| terminal | 💻 Terminal & Processes |
+| file | 📁 File Operations |
+| code_execution | ⚡ Code Execution |
+| vision | 👁️ Vision / Image Analysis |
+| image_gen | 🎨 Image Generation |
+| tts | 🔊 Text-to-Speech |
+| skills | 📚 Skills |
+| todo | 📋 Task Planning |
+| memory | 💾 Memory |
+| session_search | 🔎 Session Search |
+| clarify | ❓ Clarifying Questions |
+| delegation | 👥 Task Delegation |
+| cronjob | ⏰ Cron Jobs |
+| messaging | 📨 Cross-Platform Messaging |
+
+### Disabled toolsets
+
+| Toolset | Description |
+| --- | --- |
+| video | 🎬 Video Analysis |
+| video_gen | 🎬 Video Generation |
+| moa | 🧠 Mixture of Agents |
+| rag_search | 🧠 RAG Search |
+| rl | 🧪 RL Training |
+| homeassistant | 🏠 Home Assistant |
+| spotify | 🎵 Spotify |
+| yuanbao | 🤖 Yuanbao |
+| computer_use | 🖱️ Computer Use (macOS) |
+
+### MCP servers
+
+```text
+MCP Servers:
+
+ Name Transport Tools Status
+ ──────────────── ────────────────────────────── ──────────── ──────────
+ brave-search http://127.0.0.1:18802/mcp all ✓ enabled
+```
+
+### Hermes profiles
+
+```text
+Profile Model Gateway Alias Distribution
+ ─────────────── ─────────────────────────── ─────────── ─────────── ────────────────────
+ ◆default gpt-5.5 running — —
+ atlas gpt-5.5 stopped — —
+ engineer gpt-5.5 stopped — —
+ glm-simple glm-5.1 stopped — —
+ ops gpt-5.5 stopped — —
+ orchestrator gpt-5.5 stopped — —
+ researcher gpt-5.5 stopped — —
+ reviewer gpt-5.5 stopped — —
+ writer gpt-5.5 stopped — —
+```
+
+### Hermes cron jobs
+
+```text
+
+┌─────────────────────────────────────────────────────────────────────────┐
+│ Scheduled Jobs │
+└─────────────────────────────────────────────────────────────────────────┘
+
+ c515ca076b73 [active]
+ Name: Hermes config git snapshot
+ Schedule: 0 3 * * *
+ Repeat: ∞
+ Next run: 2026-05-15T03:00:00-07:00
+ Deliver: discord:1494453542243532932
+ Script: hermes_git_snapshot.sh
+ Mode: no-agent (script stdout delivered directly)
+ Last run: 2026-05-11T03:00:37.525856-07:00 ok
+
+ c15ee395a38d [active]
+ Name: atlas-minio-self-backup
+ Schedule: 0 3 * * *
+ Repeat: ∞
+ Next run: 2026-05-15T03:00:00-07:00
+ Deliver: origin
+ Script: atlas-backup-to-minio-cron.sh
+ Mode: no-agent (script stdout delivered directly)
+
+ 1ef682e65695 [active]
+ Name: watch pi-agent-hermes-bound kanban
+ Schedule: every 2m
+ Repeat: ∞
+ Next run: 2026-05-14T14:49:39.352638-07:00
+ Deliver: local
+ Script: watch_pi_agent_kanban.py
+ Mode: no-agent (script stdout delivered directly)
+ Last run: 2026-05-14T14:47:39.352638-07:00 ok
+
+```
+
+## Local AI and automation routing
+
+| Capability | Preferred endpoint/tool | Notes |
+| --- | --- | --- |
+| Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available |
+| Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models |
+| Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation |
+| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback |
+| Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation |
+| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback |
+| Workflow automation | n8n `18808` | Durable jobs and webhooks |
+| Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index |
+
+## Obsidian integration
+
+| Component | Location / endpoint | Purpose |
+| --- | --- | --- |
+| Local REST API | `http://127.0.0.1:27123` and `https://127.0.0.1:27124` | Read/write notes and execute commands |
+| Autostart entry | `~/.config/autostart/obsidian-autostart.desktop` | Launches Obsidian at graphical login |
+| Autostart script | `~/.local/bin/start-obsidian-if-needed` | Idempotent launcher for Obsidian |
+| Reindex endpoint | `http://127.0.0.1:18810/reindex` | Rebuilds/updates local Obsidian/RAG index |
+| Dataview plugin | Vault `.obsidian/plugins/dataview` | Dashboard tables |
+| Tasks plugin | Vault `.obsidian/plugins/obsidian-tasks-plugin` | Dashboard task queries |
+
+## Source-of-truth docs
+
+| Topic | Where |
+| --- | --- |
+| Swarm operations | Hermes skill `swarm`; `~/lab/swarm/Makefile` |
+| n8n API/workflow management | Hermes skill `swarm`, reference `n8n-api-and-workflows.md` |
+| Obsidian filesystem/API usage | Hermes skill `obsidian` |
+| Hermes CLI/toolsets/gateway/profiles | Hermes skill `hermes-agent`; `hermes --help`; `hermes tools list` |
+| Obsidian automation workflows | `~/lab/swarm/swarm-common/n8n-workflows/obsidian-*.json` |
+| Runbooks | [[Runbooks Home]] |
+| OpenVINO NPU service operations | [[OpenVINO NPU Services Runbook]]; `~/lab/swarm/scripts/npu-service-health.sh` |
+
+## Safety notes
+
+- Do not print `.env`, API keys, tokens, auth JSON, or decrypted n8n credentials.
+- From inside the `n8n-agent` container, host services are reached via `http://172.19.0.1:`, not `127.0.0.1:`.
+- Use file-based workflow updates for large n8n JSON payloads.
+- After structural n8n workflow edits, deactivate/reactivate the workflow.
+- Prefer `make` targets in `~/lab/swarm` for routine service operations.
+- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof.
+- Check git status before committing; commit only targeted non-secret source/config/docs.
+
+## Refresh procedure
+
+To refresh this catalog:
+
+```bash
+cd ~/lab/swarm
+make status
+hermes tools list
+hermes mcp list
+# Ask Atlas: "refresh the Obsidian Service Catalog"
+```
diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md
new file mode 100644
index 0000000..98a7607
--- /dev/null
+++ b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md
@@ -0,0 +1,286 @@
+---
+type: runbook
+system: openvino-npu-services
+status: draft
+created: 2026-06-04
+updated: 2026-06-04
+tags:
+ - runbook
+ - openvino
+ - npu
+ - swarm
+ - atlas
+related:
+ - [[Service Catalog]]
+ - [[Swarm Operating Manual]]
+ - [[Atlas Capability Upgrade Program]]
+---
+
+# OpenVINO NPU Services Runbook
+
+This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
+
+Safety posture:
+- Do not restart the live Atlas/Hermes gateway from this runbook.
+- Do not change primary Atlas/Hermes routing without explicit Will approval.
+- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
+- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
+- Keep endpoints local-only unless Will explicitly approves broader exposure.
+- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
+
+## Current service map
+
+| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
+| --- | ---: | --- | --- | --- | --- | --- |
+| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
+| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
+| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
+| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
+| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
+| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
+| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
+| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
+
+Port notes:
+- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
+- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
+- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
+
+## Read-only unified health check
+
+From the swarm repo:
+
+```bash
+cd ~/lab/swarm
+./scripts/npu-service-health.sh
+```
+
+The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
+
+Manual minimal checks:
+
+```bash
+BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
+cat "$BUSY"
+ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
+systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
+cd ~/lab/swarm && docker compose ps whisper-server-npu
+curl -fsS http://127.0.0.1:18817/healthz | jq .
+```
+
+Embedding NPU proof:
+
+```bash
+BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
+before=$(cat "$BUSY")
+curl -fsS http://127.0.0.1:18817/v1/embeddings \
+ -H 'Content-Type: application/json' \
+ -d '{"input":"non-private npu health probe","model":"bge-base-en-v1.5-int8-ov"}' | jq '{model, object, npu_busy_delta_us, embedding_count:(.data|length)}'
+after=$(cat "$BUSY")
+echo "sysfs_npu_busy_delta_us=$((after-before))"
+```
+
+A healthy NPU path has:
+- HTTP success from the endpoint.
+- Response-level `npu_busy_delta_us > 0` when the service reports it.
+- Sysfs `after - before > 0`.
+
+## Service-specific smoke checks
+
+For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
+
+Safe foreground-server pattern:
+
+```bash
+server_pid=""
+cleanup() {
+ if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
+ kill "$server_pid"
+ wait "$server_pid" 2>/dev/null || true
+ fi
+}
+trap cleanup EXIT
+# start prototype server with --host 127.0.0.1 --port &
+# server_pid=$!
+# run curl/smoke commands, then let trap stop it
+```
+
+### Whisper NPU (`:18816`)
+
+```bash
+curl -fsS http://127.0.0.1:18816/health | jq .
+# For a real transcription smoke, use a small non-private WAV fixture only.
+# Verify both response npu_busy_delta_us and sysfs busy-time delta.
+```
+
+Operational notes:
+- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
+- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
+- Do not restart it from this runbook unless Will asked for remediation.
+
+### OpenVINO embeddings (`:18817`)
+
+```bash
+systemctl --user status openvino-embeddings.service --no-pager
+curl -fsS http://127.0.0.1:18817/healthz | jq .
+```
+
+Operational notes:
+- User systemd unit: `openvino-embeddings.service`.
+- Model: `bge-base-en-v1.5-int8-ov`.
+- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
+- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.
+
+### Reranker prototype (`:18818`)
+
+Foreground review start only, after confirming port is free:
+
+```bash
+ss -ltnp | grep ':18818\b' || true
+cd ~/lab/swarm/openvino-reranker-npu
+source /home/will/.venvs/openvino-reranker/bin/activate
+OPENVINO_RERANKER_HOST=127.0.0.1 \
+OPENVINO_RERANKER_PORT=18818 \
+OPENVINO_RERANKER_DEVICE=NPU \
+OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
+python server.py
+```
+
+From another shell:
+
+```bash
+curl -fsS http://127.0.0.1:18818/readyz | jq .
+python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
+```
+
+Approval gate:
+- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
+- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
+
+### Router/classifier prototype (`:18819`)
+
+Foreground review start only, after confirming port is free:
+
+```bash
+ss -ltnp | grep ':18819\b' || true
+cd ~/lab/swarm/openvino-classifier-npu
+/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
+```
+
+Smoke:
+
+```bash
+curl -fsS http://127.0.0.1:18819/healthz | jq .
+curl -fsS http://127.0.0.1:18819/v1/classify \
+ -H 'Content-Type: application/json' \
+ -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
+```
+
+Approval gate:
+- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
+- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
+
+### Small GenAI NPU worker (`:18820`)
+
+Foreground review start only, after confirming port is free:
+
+```bash
+ss -ltnp | grep ':18820\b' || true
+cd ~/lab/swarm/openvino-genai-npu-worker
+/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
+```
+
+Smoke:
+
+```bash
+curl -fsS http://127.0.0.1:18820/healthz | jq .
+curl -fsS http://127.0.0.1:18820/models | jq .
+curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
+ -H 'Content-Type: application/json' \
+ -d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
+```
+
+Approval gate:
+- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
+- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
+
+### Document/image triage prototype (`:18829` optional review port)
+
+Foreground review start only, after confirming the port is free:
+
+```bash
+ss -ltnp | grep ':18829\b' || true
+cd ~/lab/swarm/openvino-doc-image-triage-npu
+/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
+```
+
+Smoke:
+
+```bash
+curl -fsS http://127.0.0.1:18829/healthz | jq .
+curl -fsS http://127.0.0.1:18829/models | jq .
+/home/will/.venvs/npu/bin/python tests/smoke_test.py
+```
+
+Approval gate:
+- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
+- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
+- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.
+
+## Systemd and Compose recommendations
+
+Recommended management split:
+- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
+- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
+- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
+
+User-systemd unit expectations for optional prototypes:
+- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
+- `ExecStart` uses the existing venv path documented by the prototype.
+- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
+- `Restart=on-failure`, not aggressive restart loops.
+- Logs go to user journal; do not log raw request bodies.
+- Start manually for smoke; enable on boot only after Will approval.
+
+Compose expectations for existing swarm services:
+- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps ` for read-only checks.
+- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
+
+## Monitoring and logging notes
+
+Minimum recurring monitoring should include:
+- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
+- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
+- Docker Compose health for `whisper-server-npu`.
+- HTTP health endpoint success.
+- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
+- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
+
+Useful log commands:
+
+```bash
+journalctl --user -u openvino-embeddings.service -n 100 --no-pager
+journalctl --user -u rag-embedding-health.service -n 100 --no-pager
+journalctl --user -u openvino-reranker.service -n 100 --no-pager
+journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
+journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
+cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
+```
+
+## Approval gates
+
+Requires explicit Will approval before proceeding:
+- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
+- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
+- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
+- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
+- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
+- Restarting the live Atlas/Hermes gateway.
+- Deleting, overwriting, or in-place reindexing existing vector collections.
+- Broadening bind addresses or exposure beyond local-only defaults.
+
+Approved/parked outcomes:
+- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
+- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
+- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
+- Rejected for this NPU program: diffusion/image generation.