From a1f5b4c3a905194e3b32cf693e15a2f4e694ba28 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Thu, 4 Jun 2026 12:29:53 -0700 Subject: [PATCH] docs: update OpenVINO NPU service maps --- README.md | 1 + docs/diagram-maintenance.md | 3 +++ docs/swarm-infrastructure.html | 20 ++++++++++--------- docs/swarm-infrastructure.md | 14 +++++++++++++ openvino-classifier-npu/README.md | 5 +++-- openvino-doc-image-triage-npu/README.md | 18 +++++++++-------- openvino-genai-npu-worker/README.md | 4 ++++ .../Resources/Service Catalog.md | 5 +++-- .../Runbooks/OpenVINO NPU Services Runbook.md | 16 +++++++-------- 9 files changed, 57 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index e9e8de4..3ff6a2d 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,7 @@ For the current host-side AI/search/voice automation stack, n8n watchdogs, and a - [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks - [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram - [`docs/diagram-maintenance.md`](docs/diagram-maintenance.md) — diagram upkeep conventions +- OpenVINO NPU services and prototypes are documented in `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` and the component READMEs under `openvino-*-npu*/`. Live baseline ports are RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; sidecar ports `:18818`, `:18819`, `:18820`, and optional doc/image triage `:18829` are approved prototypes only, not live Atlas/Hermes routing. ## VM: zap diff --git a/docs/diagram-maintenance.md b/docs/diagram-maintenance.md index 54f4c42..675b9cf 100644 --- a/docs/diagram-maintenance.md +++ b/docs/diagram-maintenance.md @@ -15,6 +15,7 @@ Update the relevant diagram in the same change set when you change any of these: - n8n workflow architecture - Hermes/Atlas routing or gateway responsibilities - local AI/search/voice endpoints +- OpenVINO NPU live/prototype status, ports, or safety gates (`:18810`, `:18816`, `:18817`, `:18818`, `:18819`, `:18820`, optional `:18829`) - Obsidian/RAG data flow - OpenClaw/VM operational mode - ownership/source-of-truth paths for a component @@ -27,6 +28,7 @@ Create a new focused diagram when the existing overview would become too dense. - agentmon internals: collectors → NATS → processor → Postgres → query/UI - Obsidian/RAG automation pipeline - local AI routing: Hermes/LiteLLM/llama.cpp/Ollama/provider boundaries +- OpenVINO NPU assistant sidecars, with live baseline and approved/not-live prototype lanes separated - messaging/channel routing: Telegram/Discord/email → Hermes/n8n/alerts - disaster recovery / backup topology @@ -37,6 +39,7 @@ Create a new focused diagram when the existing overview would become too dense. - Link diagrams from the nearest README or operational doc. - Keep labels operational: service name, port, responsibility, and data direction. - Avoid secrets, credential names that imply secret values, private tokens, raw webhook URLs, or sensitive sample payloads. +- Do not imply live Atlas/Hermes/RAG routing to an OpenVINO NPU prototype unless a reviewed implementation actually enabled it; label approved prototypes as `not live` or `approval required`. - If a raw export or live config was used to build the diagram, commit only the sanitized diagram/docs, not the raw sensitive source. ## Verification before committing diff --git a/docs/swarm-infrastructure.html b/docs/swarm-infrastructure.html index 06de736..edc6862 100644 --- a/docs/swarm-infrastructure.html +++ b/docs/swarm-infrastructure.html @@ -27,7 +27,7 @@

Will's Swarm Infrastructure

Atlas/Hermes gateway + n8n automation + agentmon monitoring + local AI/search/voice services
- + @@ -40,7 +40,7 @@ .edge{fill:none; stroke:#38bdf8; stroke-width:1.8; marker-end:url(#arrow); opacity:.8}.edgeG{fill:none; stroke:#34d399; stroke-width:1.8; marker-end:url(#arrowGreen); opacity:.85}.edgeO{fill:none; stroke:#fb923c; stroke-width:1.8; marker-end:url(#arrowOrange); opacity:.85}.edgeR{fill:none; stroke:#fb7185; stroke-width:1.8; stroke-dasharray:5,4; marker-end:url(#arrowRose); opacity:.85} - + @@ -58,13 +58,14 @@ + Hermes gateway layer n8n + agentmon observability - + local swarm services @@ -86,28 +87,29 @@ VoiceKokoro + Whisper:18805 / :18816 Docker servicesagentmon.monitor=trueswarm/service snapshots OpenClaw VMscurrently dormantopenclaw.snapshot - Obsidian / RAG:27123/:27124 + ChromaDB + Obsidian / RAGRAG endpoint :18810Chroma obsidian_bge_npu + NPU sidecarsapproved prototypes; not live:18818/:18819/:18820/:18829 - host local AIllama.cpp :18806Ollama fallback :18807OpenVINO NPU embed :18817 + host local AIllama.cpp :18806Ollama fallback :18807OpenVINO embed :18817 liveWhisper NPU :18816 live - + Legend Gateway/Search/Voice Automation/API Data/AI stores Event bus/pipeline - Monitoring flows + Monitoring / not-live prototype flows

Monitoring model

  • • n8n direct probes critical ports
  • • agentmon aggregates Docker/OpenClaw snapshots
  • • n8n polls agentmon for stale/degraded state
-

Operational endpoints

  • • n8n: 127.0.0.1:18808
  • • agentmon query/UI: 8081 / 8082
  • • local LLM/embed: 18806 / 18817
  • • Ollama fallback: 18807
+

Operational endpoints

  • • n8n: 127.0.0.1:18808
  • • agentmon query/UI: 8081 / 8082
  • • live NPU: RAG 18810, Whisper 18816, embeddings 18817
  • • prototypes not live-routed: 18818/18819/18820/18829

Source paths

  • • Swarm repo: ~/lab/swarm
  • • Agentmon repo: ~/lab/agentmon
  • • Workflows: swarm-common/n8n-workflows
- +
diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md index b780285..dd47587 100644 --- a/docs/swarm-infrastructure.md +++ b/docs/swarm-infrastructure.md @@ -36,6 +36,7 @@ local AI/search/voice services +--> OpenVINO NPU embeddings :18817 +--> Kokoro TTS :18805 +--> Whisper NPU :18816 + +--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829 ``` See also: @@ -130,6 +131,17 @@ Host/user services: - `voice-memo-processor.service` — `:18813`, voice memo processing - `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper +Approved but not live-routed OpenVINO NPU sidecars: + +| Port | Component | State | Safety boundary | +| ---: | --- | --- | --- | +| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves | +| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages | +| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing | +| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` | + +These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof. + ### 5. Obsidian and RAG Vault: @@ -201,6 +213,7 @@ From the host: cd /home/will/lab/swarm make status make local-ai-health +./scripts/npu-service-health.sh # read-only; includes sysfs busy-time proof for :18817 curl -fsS http://127.0.0.1:18808/healthz curl -fsS http://127.0.0.1:8081/healthz curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq . @@ -234,3 +247,4 @@ jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json - From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services. - Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing. - OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default. +- OpenVINO NPU sidecars on `:18818`, `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. Do not draw live Atlas/Hermes/RAG arrows to them in diagrams until that approval and implementation actually exist. diff --git a/openvino-classifier-npu/README.md b/openvino-classifier-npu/README.md index 1d42223..d2447ea 100644 --- a/openvino-classifier-npu/README.md +++ b/openvino-classifier-npu/README.md @@ -116,7 +116,8 @@ after review/approval: ```bash cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service systemctl --user daemon-reload -systemctl --user enable --now openvino-router-classifier.service +systemctl --user start openvino-router-classifier.service +systemctl --user status openvino-router-classifier.service --no-pager ``` -Do not enable it as part of this prototype task without explicit approval. +Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands. diff --git a/openvino-doc-image-triage-npu/README.md b/openvino-doc-image-triage-npu/README.md index 56890db..66e076e 100644 --- a/openvino-doc-image-triage-npu/README.md +++ b/openvino-doc-image-triage-npu/README.md @@ -88,29 +88,31 @@ Include OCR/sidecar text in a single response only when explicitly requested: ## HTTP usage -Check that port 18820 is free first: +The prototype is CLI-first. If a foreground HTTP server is needed for review, prefer optional port `18829` so it does not collide with the GenAI worker prototype on `18820`. Check the port first: ```bash -ss -ltnp | grep ':18820\b' || true +ss -ltnp | grep ':18829\b' || true ``` -Start local-only server: +Start a local-only server and stop it after the smoke: ```bash cd /home/will/lab/swarm/openvino-doc-image-triage-npu -/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18820 --allowed-root "$PWD" +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" ``` -Call it: +Call it with synthetic/non-private fixtures only: ```bash -curl -sS http://127.0.0.1:18820/healthz | jq -curl -sS http://127.0.0.1:18820/models | jq -curl -sS -X POST http://127.0.0.1:18820/triage \ +curl -sS http://127.0.0.1:18829/healthz | jq +curl -sS http://127.0.0.1:18829/models | jq +curl -sS -X POST http://127.0.0.1:18829/triage \ -H 'Content-Type: application/json' \ -d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq ``` +Do not install or enable a persistent service for this prototype without explicit approval, and do not point it at private document/image directories during smoke tests. + ## Smoke test ```bash diff --git a/openvino-genai-npu-worker/README.md b/openvino-genai-npu-worker/README.md index c7b241b..e7f85cf 100644 --- a/openvino-genai-npu-worker/README.md +++ b/openvino-genai-npu-worker/README.md @@ -102,6 +102,10 @@ curl -s http://127.0.0.1:18820/v1/worker/generate \ Response includes `npu_busy_delta_us`; treat zero as failure even if HTTP status is 200. +## Optional systemd user service + +A draft unit exists at `systemd/openvino-genai-npu-worker.service` for later review. Do not copy, enable, or autostart it unless Will explicitly approves persistent service enablement. Foreground smoke on `127.0.0.1:18820` plus positive sysfs NPU busy-time delta is required before any installation discussion. + ## Safety boundaries - Binds only to `127.0.0.1` by default; non-local bind is refused in code. diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md index 7824688..6024345 100644 --- a/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md @@ -257,9 +257,9 @@ Profile Model Gateway Alias Distribu | Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available | | Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models | | Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation | -| Embeddings | Ollama `18807` | Use raw Ollama API root, not `/v1`, for `/api/embed` | +| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback | | Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation | -| Speech-to-text | Whisper `18811` and wrappers | Local transcription fallback | +| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback | | Workflow automation | n8n `18808` | Durable jobs and webhooks | | Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index | @@ -293,6 +293,7 @@ Profile Model Gateway Alias Distribu - Use file-based workflow updates for large n8n JSON payloads. - After structural n8n workflow edits, deactivate/reactivate the workflow. - Prefer `make` targets in `~/lab/swarm` for routine service operations. +- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof. - Check git status before committing; commit only targeted non-secret source/config/docs. ## Refresh procedure diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md index 05d3206..b707536 100644 --- a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md @@ -39,11 +39,11 @@ Safety posture: | NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive | | NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` | | Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` | -| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | +| Document/image triage prototype | optional 18829 for review only | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | Port notes: - `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding. -- `18820` was used by the GenAI worker prototype. The document/image triage prototype README still contains a `18820` example, but review used `18828`/`18829` to avoid collision. Prefer `18828`/`18829` for triage foreground review until Will approves a final persistent port. +- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port. - Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`. ## Read-only unified health check @@ -186,21 +186,21 @@ Approval gate: - May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement. - Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting. -### Document/image triage prototype (`:18828`/`:18829` review ports) +### Document/image triage prototype (`:18829` optional review port) -Foreground review start only, after confirming port is free: +Foreground review start only, after confirming the port is free: ```bash -ss -ltnp | grep -E ':(18828|18829)\b' || true +ss -ltnp | grep ':18829\b' || true cd ~/lab/swarm/openvino-doc-image-triage-npu -/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD" +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" ``` Smoke: ```bash -curl -fsS http://127.0.0.1:18828/healthz | jq . -curl -fsS http://127.0.0.1:18828/models | jq . +curl -fsS http://127.0.0.1:18829/healthz | jq . +curl -fsS http://127.0.0.1:18829/models | jq . /home/will/.venvs/npu/bin/python tests/smoke_test.py ```