docs: update OpenVINO NPU service maps

2026-06-04 12:29:53 -07:00
parent 5b01b1bd11
commit a1f5b4c3a9
9 changed files with 57 additions and 29 deletions
@@ -257,9 +257,9 @@ Profile          Model                        Gateway      Alias        Distribu
 | Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available |
 | Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models |
 | Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation |
-| Embeddings | Ollama `18807` | Use raw Ollama API root, not `/v1`, for `/api/embed` |
+| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback |
 | Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation |
-| Speech-to-text | Whisper `18811` and wrappers | Local transcription fallback |
+| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback |
 | Workflow automation | n8n `18808` | Durable jobs and webhooks |
 | Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index |

@@ -293,6 +293,7 @@ Profile          Model                        Gateway      Alias        Distribu
 - Use file-based workflow updates for large n8n JSON payloads.
 - After structural n8n workflow edits, deactivate/reactivate the workflow.
 - Prefer `make` targets in `~/lab/swarm` for routine service operations.
+- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof.
 - Check git status before committing; commit only targeted non-secret source/config/docs.

 ## Refresh procedure
@@ -39,11 +39,11 @@ Safety posture:
 | NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
 | NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
 | Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
-| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:<port>/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
+| Document/image triage prototype | optional 18829 for review only | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |

 Port notes:
 - `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
- `18820` was used by the GenAI worker prototype. The document/image triage prototype README still contains a `18820` example, but review used `18828`/`18829` to avoid collision. Prefer `18828`/`18829` for triage foreground review until Will approves a final persistent port.
+- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
 - Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.

 ## Read-only unified health check
@@ -186,21 +186,21 @@ Approval gate:
 - May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
 - Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.

-### Document/image triage prototype (`:18828`/`:18829` review ports)
+### Document/image triage prototype (`:18829` optional review port)

-Foreground review start only, after confirming port is free:
+Foreground review start only, after confirming the port is free:

 ```bash
-ss -ltnp | grep -E ':(18828|18829)\b' || true
+ss -ltnp | grep ':18829\b' || true
 cd ~/lab/swarm/openvino-doc-image-triage-npu
-/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD"
+/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
 ```

 Smoke:

 ```bash
-curl -fsS http://127.0.0.1:18828/healthz | jq .
-curl -fsS http://127.0.0.1:18828/models | jq .
+curl -fsS http://127.0.0.1:18829/healthz | jq .
+curl -fsS http://127.0.0.1:18829/models | jq .
 /home/will/.venvs/npu/bin/python tests/smoke_test.py
 ```