docs(npu): update service maps and runbooks

This commit is contained in:
William Valentin
2026-06-04 13:08:18 -07:00
parent 703c1df860
commit 420df812c0
7 changed files with 1151 additions and 0 deletions
@@ -0,0 +1,309 @@
---
type: service-catalog
created: 2026-05-14T14:50:46-07:00
updated: 2026-06-04T11:35:00-07:00
tags:
- service-catalog
- swarm
- hermes
- automation
---
# Service Catalog
Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs.
> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted.
## Quick links
- [[Ops Home]]
- [[Obsidian Automation Health]]
- [[Obsidian Plugin Setup]]
- [[Runbooks Home]]
- [[Projects Home]]
- [[Decisions Home]]
## Primary repositories and config locations
| Area | Path / command | Purpose |
| --- | --- | --- |
| Swarm repo | `~/lab/swarm` | Docker services, n8n, local AI, OpenClaw helpers, service scripts |
| Swarm Makefile | `cd ~/lab/swarm && make help` | Authoritative operations target list |
| n8n workflow exports | `~/lab/swarm/swarm-common/n8n-workflows/` | Versioned workflow backups |
| Shared Obsidian vault | `~/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap` | Active API-backed vault |
| Hermes config | `~/.hermes/config.yaml` | Atlas/Hermes model, tools, gateway, profiles |
| Hermes env/secrets | `~/.hermes/.env` | Secrets; do not print or commit |
| Hermes source | `~/.hermes/hermes-agent` | Atlas local source checkout |
| Hermes skills | `~/.hermes/skills/` | Procedural docs and reusable playbooks |
## Local endpoints
| Service | Port | Status | Purpose | Health / base URL |
| --- | --- | --- | --- | --- |
| Brave Search MCP | 18802 | HTTP 406 on plain GET `/mcp` | Brave Search MCP server for Hermes MCP tools | `http://127.0.0.1:18802/mcp` |
| SearXNG | 18803 | OK 200 | SearXNG metasearch | `http://127.0.0.1:18803/search?q=test&format=json` |
| LiteLLM | 18804 | no listener / HTTP 000 on 2026-05-27 | LiteLLM OpenAI-compatible model proxy | `http://127.0.0.1:18804/health/liveliness` |
| Kokoro TTS | 18805 | OK 200 | Kokoro local TTS | `http://127.0.0.1:18805/health` |
| llama.cpp | 18806 | OK 200 | llama.cpp local LLM | `http://127.0.0.1:18806/v1/models` |
| Ollama embeddings | 18807 | OK 200 | Ollama embeddings API | `http://127.0.0.1:18807/api/version` |
| n8n | 18808 | OK 200 | n8n workflow automation | `http://127.0.0.1:18808/healthz` |
| Docker health | 18809 | OK 200 | Docker/container health API | `http://127.0.0.1:18809/health` |
| Obsidian reindex | 18810 | OK 200 | Obsidian/RAG reindex trigger | `http://127.0.0.1:18810/healthz` |
| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` |
| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` |
| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` |
| RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` |
| Whisper OpenVINO NPU | 18816 | OK 200 / Docker healthy on 2026-06-04 | Intel NPU Whisper transcription service | `http://127.0.0.1:18816/health` |
| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` |
| OpenVINO NPU reranker prototype | 18818 | approved prototype; not enabled live | Optional second-stage RAG reranker | `http://127.0.0.1:18818/readyz` |
| OpenVINO router/classifier prototype | 18819 | approved prototype; not enabled live | Dry-run Atlas/Hermes message classifier/router | `http://127.0.0.1:18819/healthz` |
| OpenVINO GenAI NPU worker prototype | 18820 | approved prototype; not enabled live | Bounded local background generation worker | `http://127.0.0.1:18820/healthz` |
| OpenVINO document/image triage prototype | 18828/18829 | approved foreground prototype; not enabled live | Local document/image triage with NPU embeddings stage via `:18817` | `http://127.0.0.1:<port>/healthz` |
| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` |
## Docker services
| Container | Status | Ports |
| --- | --- | --- |
| n8n-agent | Up 21 hours (healthy) | 0.0.0.0:18808->5678/tcp, [::]:18808->5678/tcp |
| whisper-server-gpu | Up 27 hours (healthy) | 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp |
| whisper-server | Up 27 hours (healthy) | 0.0.0.0:18811->8080/tcp, [::]:18811->8080/tcp |
| kokoro-tts | Up 25 hours | 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp |
| brave-search | Up 25 hours | 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp |
| searxng | Up 25 hours | 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp |
Management commands:
```bash
cd ~/lab/swarm
make ps
make status
make local-ai-health
make api-health
make timers
./scripts/npu-service-health.sh
```
## Host-side systemd/user services
Important known services:
| Unit | Purpose |
| --- | --- |
| `llama-server.service` | Host-side llama.cpp local LLM on 18806 |
| `ollama.service` | Host-side Ollama embeddings on 18807 |
| `docker-health-endpoint.service` | Container health API on 18809 |
| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 |
| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 |
| `voice-memo-processor.service` | Voice memo processing on 18813 |
| `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 |
| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 |
| `openvino-reranker.service` | Optional NPU reranker prototype on 18818; not installed/enabled without approval |
| `openvino-router-classifier.service` | Optional dry-run router/classifier prototype on 18819; not installed/enabled without approval |
| `openvino-genai-npu-worker.service` | Optional bounded GenAI worker prototype on 18820; not installed/enabled without approval |
Useful checks:
```bash
systemctl --user list-units '*obsidian*' '*rag*' '*url-content*' '*voice-memo*' '*docker-health*' --all
systemctl --user list-timers
journalctl --user -u obsidian-reindex-endpoint.service -n 50 --no-pager
```
## n8n workflows
n8n UI/API: `http://127.0.0.1:18808`
| Workflow | ID | State |
| --- | --- | --- |
| Calendar to Obsidian Notes | QRCCdHNXZUHc2Oz4 | inactive |
| Daily OpenClaw Session Digest | qqYwAD05AvRHrHPc | inactive |
| Evening Digest | PlZywwqL8MRNEAN6 | active |
| Gmail Inbox Monitor + Obsidian Notes | whtdorf7yJMVYeHm | active |
| IMAP Inbox Triage + Obsidian Notes | 9sFwRyUDz51csAp7 | active |
| IMAP Inbox Triage + Obsidian Notes (squareffect) | xjUoQf97TkBrawc8 | inactive |
| IMAP Inbox Triage + Obsidian Notes (wills-portal) | kHDK9QdUSiAJ8rCM | inactive |
| Morning Brief | g3IdGZCK1EtTsv9T | active |
| n8n Failure Digest | G9ylNbHbnJ6fWX2C | active |
| Nightly Obsidian Vault Sync | 75JCevkdgkyCr2qH | inactive |
| Obsidian Chat Summary Capture | LF3i86l3NkxpayxL | active |
| Obsidian Daily Review | YZyJ5G0Ur8D6TlM8 | active |
| Obsidian Health + Reindex | PCtD3PuQjzKLyEEE | active |
| Obsidian Inbox Triage | 6SKSZWZwuJNwuO2P | active |
| Obsidian URL to Note | Ori3Bu5u5ODtxxyD | active |
| Obsidian Vault Reindex | 85ntyyphDJ4Ms2b4 | active |
| Obsidian Weekly Decision Runbook Extractor | UWLMOQQVxbTX6Sis | active |
| OpenClaw Action Bus | Jwi54VWMdlLqYnRo | inactive |
| OpenClaw Reminder Webhook | RUR1CGn0ikkxbPin | inactive |
| RAG and Embedding Health Watchdog | SwKaPtYqUJrakpFu | active |
| Swarm Health Watchdog | lDKocSFXBQWQrDd3 | active |
| Voice Memo Capture (Audio URL + Local Whisper) | El1BHJZ56JlzhrRZ | active |
| Web-to-Notes Capture (Local LLM + Obsidian) | GSmzuA5dgGgyRg5v | active |
Obsidian webhook endpoints:
| Workflow | Method / URL | Input |
| --- | --- | --- |
| Obsidian Chat Summary Capture | `POST http://127.0.0.1:18808/webhook/obsidian-chat-summary` | JSON with `type`, `title`, `summary`, `content`, optional `tags`, `metadata` |
| Obsidian URL to Note | `POST http://127.0.0.1:18808/webhook/obsidian-url-to-note` | JSON with `url`, optional `folder`, `tags`, `notes` |
## Hermes capabilities
### Enabled toolsets
| Toolset | Description |
| --- | --- |
| web | 🔍 Web Search & Scraping |
| browser | 🌐 Browser Automation |
| terminal | 💻 Terminal & Processes |
| file | 📁 File Operations |
| code_execution | ⚡ Code Execution |
| vision | 👁️ Vision / Image Analysis |
| image_gen | 🎨 Image Generation |
| tts | 🔊 Text-to-Speech |
| skills | 📚 Skills |
| todo | 📋 Task Planning |
| memory | 💾 Memory |
| session_search | 🔎 Session Search |
| clarify | ❓ Clarifying Questions |
| delegation | 👥 Task Delegation |
| cronjob | ⏰ Cron Jobs |
| messaging | 📨 Cross-Platform Messaging |
### Disabled toolsets
| Toolset | Description |
| --- | --- |
| video | 🎬 Video Analysis |
| video_gen | 🎬 Video Generation |
| moa | 🧠 Mixture of Agents |
| rag_search | 🧠 RAG Search |
| rl | 🧪 RL Training |
| homeassistant | 🏠 Home Assistant |
| spotify | 🎵 Spotify |
| yuanbao | 🤖 Yuanbao |
| computer_use | 🖱️ Computer Use (macOS) |
### MCP servers
```text
MCP Servers:
Name Transport Tools Status
──────────────── ────────────────────────────── ──────────── ──────────
brave-search http://127.0.0.1:18802/mcp all ✓ enabled
```
### Hermes profiles
```text
Profile Model Gateway Alias Distribution
─────────────── ─────────────────────────── ─────────── ─────────── ────────────────────
◆default gpt-5.5 running — —
atlas gpt-5.5 stopped — —
engineer gpt-5.5 stopped — —
glm-simple glm-5.1 stopped — —
ops gpt-5.5 stopped — —
orchestrator gpt-5.5 stopped — —
researcher gpt-5.5 stopped — —
reviewer gpt-5.5 stopped — —
writer gpt-5.5 stopped — —
```
### Hermes cron jobs
```text
┌─────────────────────────────────────────────────────────────────────────┐
│ Scheduled Jobs │
└─────────────────────────────────────────────────────────────────────────┘
c515ca076b73 [active]
Name: Hermes config git snapshot
Schedule: 0 3 * * *
Repeat: ∞
Next run: 2026-05-15T03:00:00-07:00
Deliver: discord:1494453542243532932
Script: hermes_git_snapshot.sh
Mode: no-agent (script stdout delivered directly)
Last run: 2026-05-11T03:00:37.525856-07:00 ok
c15ee395a38d [active]
Name: atlas-minio-self-backup
Schedule: 0 3 * * *
Repeat: ∞
Next run: 2026-05-15T03:00:00-07:00
Deliver: origin
Script: atlas-backup-to-minio-cron.sh
Mode: no-agent (script stdout delivered directly)
1ef682e65695 [active]
Name: watch pi-agent-hermes-bound kanban
Schedule: every 2m
Repeat: ∞
Next run: 2026-05-14T14:49:39.352638-07:00
Deliver: local
Script: watch_pi_agent_kanban.py
Mode: no-agent (script stdout delivered directly)
Last run: 2026-05-14T14:47:39.352638-07:00 ok
```
## Local AI and automation routing
| Capability | Preferred endpoint/tool | Notes |
| --- | --- | --- |
| Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available |
| Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models |
| Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation |
| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback |
| Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation |
| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback |
| Workflow automation | n8n `18808` | Durable jobs and webhooks |
| Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index |
## Obsidian integration
| Component | Location / endpoint | Purpose |
| --- | --- | --- |
| Local REST API | `http://127.0.0.1:27123` and `https://127.0.0.1:27124` | Read/write notes and execute commands |
| Autostart entry | `~/.config/autostart/obsidian-autostart.desktop` | Launches Obsidian at graphical login |
| Autostart script | `~/.local/bin/start-obsidian-if-needed` | Idempotent launcher for Obsidian |
| Reindex endpoint | `http://127.0.0.1:18810/reindex` | Rebuilds/updates local Obsidian/RAG index |
| Dataview plugin | Vault `.obsidian/plugins/dataview` | Dashboard tables |
| Tasks plugin | Vault `.obsidian/plugins/obsidian-tasks-plugin` | Dashboard task queries |
## Source-of-truth docs
| Topic | Where |
| --- | --- |
| Swarm operations | Hermes skill `swarm`; `~/lab/swarm/Makefile` |
| n8n API/workflow management | Hermes skill `swarm`, reference `n8n-api-and-workflows.md` |
| Obsidian filesystem/API usage | Hermes skill `obsidian` |
| Hermes CLI/toolsets/gateway/profiles | Hermes skill `hermes-agent`; `hermes --help`; `hermes tools list` |
| Obsidian automation workflows | `~/lab/swarm/swarm-common/n8n-workflows/obsidian-*.json` |
| Runbooks | [[Runbooks Home]] |
| OpenVINO NPU service operations | [[OpenVINO NPU Services Runbook]]; `~/lab/swarm/scripts/npu-service-health.sh` |
## Safety notes
- Do not print `.env`, API keys, tokens, auth JSON, or decrypted n8n credentials.
- From inside the `n8n-agent` container, host services are reached via `http://172.19.0.1:<port>`, not `127.0.0.1:<port>`.
- Use file-based workflow updates for large n8n JSON payloads.
- After structural n8n workflow edits, deactivate/reactivate the workflow.
- Prefer `make` targets in `~/lab/swarm` for routine service operations.
- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof.
- Check git status before committing; commit only targeted non-secret source/config/docs.
## Refresh procedure
To refresh this catalog:
```bash
cd ~/lab/swarm
make status
hermes tools list
hermes mcp list
# Ask Atlas: "refresh the Obsidian Service Catalog"
```
@@ -0,0 +1,286 @@
---
type: runbook
system: openvino-npu-services
status: draft
created: 2026-06-04
updated: 2026-06-04
tags:
- runbook
- openvino
- npu
- swarm
- atlas
related:
- [[Service Catalog]]
- [[Swarm Operating Manual]]
- [[Atlas Capability Upgrade Program]]
---
# OpenVINO NPU Services Runbook
This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board.
Safety posture:
- Do not restart the live Atlas/Hermes gateway from this runbook.
- Do not change primary Atlas/Hermes routing without explicit Will approval.
- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections.
- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference.
- Keep endpoints local-only unless Will explicitly approves broader exposure.
- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs.
## Current service map
| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof |
| --- | ---: | --- | --- | --- | --- | --- |
| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection |
| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured |
| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase |
| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive |
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
Port notes:
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
## Read-only unified health check
From the swarm repo:
```bash
cd ~/lab/swarm
./scripts/npu-service-health.sh
```
The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof.
Manual minimal checks:
```bash
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
cat "$BUSY"
ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true
systemctl --user is-active openvino-embeddings.service rag-embedding-health.service
cd ~/lab/swarm && docker compose ps whisper-server-npu
curl -fsS http://127.0.0.1:18817/healthz | jq .
```
Embedding NPU proof:
```bash
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
before=$(cat "$BUSY")
curl -fsS http://127.0.0.1:18817/v1/embeddings \
-H 'Content-Type: application/json' \
-d '{"input":"non-private npu health probe","model":"bge-base-en-v1.5-int8-ov"}' | jq '{model, object, npu_busy_delta_us, embedding_count:(.data|length)}'
after=$(cat "$BUSY")
echo "sysfs_npu_busy_delta_us=$((after-before))"
```
A healthy NPU path has:
- HTTP success from the endpoint.
- Response-level `npu_busy_delta_us > 0` when the service reports it.
- Sysfs `after - before > 0`.
## Service-specific smoke checks
For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement.
Safe foreground-server pattern:
```bash
server_pid=""
cleanup() {
if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then
kill "$server_pid"
wait "$server_pid" 2>/dev/null || true
fi
}
trap cleanup EXIT
# start prototype server with --host 127.0.0.1 --port <port> &
# server_pid=$!
# run curl/smoke commands, then let trap stop it
```
### Whisper NPU (`:18816`)
```bash
curl -fsS http://127.0.0.1:18816/health | jq .
# For a real transcription smoke, use a small non-private WAV fixture only.
# Verify both response npu_busy_delta_us and sysfs busy-time delta.
```
Operational notes:
- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`.
- Consistent with existing swarm service patterns because it is a containerized service with Compose health.
- Do not restart it from this runbook unless Will asked for remediation.
### OpenVINO embeddings (`:18817`)
```bash
systemctl --user status openvino-embeddings.service --no-pager
curl -fsS http://127.0.0.1:18817/healthz | jq .
```
Operational notes:
- User systemd unit: `openvino-embeddings.service`.
- Model: `bge-base-en-v1.5-int8-ov`.
- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`.
- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place.
### Reranker prototype (`:18818`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18818\b' || true
cd ~/lab/swarm/openvino-reranker-npu
source /home/will/.venvs/openvino-reranker/bin/activate
OPENVINO_RERANKER_HOST=127.0.0.1 \
OPENVINO_RERANKER_PORT=18818 \
OPENVINO_RERANKER_DEVICE=NPU \
OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \
python server.py
```
From another shell:
```bash
curl -fsS http://127.0.0.1:18818/readyz | jq .
python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818
```
Approval gate:
- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval.
- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma.
### Router/classifier prototype (`:18819`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18819\b' || true
cd ~/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
```
Smoke:
```bash
curl -fsS http://127.0.0.1:18819/healthz | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
-H 'Content-Type: application/json' \
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
```
Approval gate:
- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement.
- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval.
### Small GenAI NPU worker (`:18820`)
Foreground review start only, after confirming port is free:
```bash
ss -ltnp | grep ':18820\b' || true
cd ~/lab/swarm/openvino-genai-npu-worker
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
```
Smoke:
```bash
curl -fsS http://127.0.0.1:18820/healthz | jq .
curl -fsS http://127.0.0.1:18820/models | jq .
curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \
-H 'Content-Type: application/json' \
-d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq .
```
Approval gate:
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
### Document/image triage prototype (`:18829` optional review port)
Foreground review start only, after confirming the port is free:
```bash
ss -ltnp | grep ':18829\b' || true
cd ~/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
```
Smoke:
```bash
curl -fsS http://127.0.0.1:18829/healthz | jq .
curl -fsS http://127.0.0.1:18829/models | jq .
/home/will/.venvs/npu/bin/python tests/smoke_test.py
```
Approval gate:
- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots.
- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response.
- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks.
## Systemd and Compose recommendations
Recommended management split:
- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`).
- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker).
- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates.
User-systemd unit expectations for optional prototypes:
- `WorkingDirectory` points at the service directory under `~/lab/swarm/`.
- `ExecStart` uses the existing venv path documented by the prototype.
- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint.
- `Restart=on-failure`, not aggressive restart loops.
- Logs go to user journal; do not log raw request bodies.
- Start manually for smoke; enable on boot only after Will approval.
Compose expectations for existing swarm services:
- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps <service>` for read-only checks.
- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval.
## Monitoring and logging notes
Minimum recurring monitoring should include:
- Listener presence for `18816`, `18817`, and any approved optional prototype ports.
- User service state for `openvino-embeddings.service` and any approved optional prototype unit.
- Docker Compose health for `whisper-server-npu`.
- HTTP health endpoint success.
- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central.
- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys.
Useful log commands:
```bash
journalctl --user -u openvino-embeddings.service -n 100 --no-pager
journalctl --user -u rag-embedding-health.service -n 100 --no-pager
journalctl --user -u openvino-reranker.service -n 100 --no-pager
journalctl --user -u openvino-router-classifier.service -n 100 --no-pager
journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager
cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu
```
## Approval gates
Requires explicit Will approval before proceeding:
- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`.
- Assigning a final persistent port to document/image triage or enabling it as a persistent service.
- Enabling live RAG reranking or any request path that changes Atlas/RAG answers.
- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions.
- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications.
- Restarting the live Atlas/Hermes gateway.
- Deleting, overwriting, or in-place reindexing existing vector collections.
- Broadening bind addresses or exposure beyond local-only defaults.
Approved/parked outcomes:
- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`).
- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`.
- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case.
- Rejected for this NPU program: diffusion/image generation.