docs: update OpenVINO NPU service maps
This commit is contained in:
@@ -257,9 +257,9 @@ Profile Model Gateway Alias Distribu
|
||||
| Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available |
|
||||
| Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models |
|
||||
| Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation |
|
||||
| Embeddings | Ollama `18807` | Use raw Ollama API root, not `/v1`, for `/api/embed` |
|
||||
| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback |
|
||||
| Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation |
|
||||
| Speech-to-text | Whisper `18811` and wrappers | Local transcription fallback |
|
||||
| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback |
|
||||
| Workflow automation | n8n `18808` | Durable jobs and webhooks |
|
||||
| Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index |
|
||||
|
||||
@@ -293,6 +293,7 @@ Profile Model Gateway Alias Distribu
|
||||
- Use file-based workflow updates for large n8n JSON payloads.
|
||||
- After structural n8n workflow edits, deactivate/reactivate the workflow.
|
||||
- Prefer `make` targets in `~/lab/swarm` for routine service operations.
|
||||
- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof.
|
||||
- Check git status before committing; commit only targeted non-secret source/config/docs.
|
||||
|
||||
## Refresh procedure
|
||||
|
||||
+8
-8
@@ -39,11 +39,11 @@ Safety posture:
|
||||
| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive |
|
||||
| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` |
|
||||
| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` |
|
||||
| Document/image triage prototype | 18828 or 18829 for review only | foreground local-only server; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:<port>/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
| Document/image triage prototype | optional 18829 for review only | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local |
|
||||
|
||||
Port notes:
|
||||
- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding.
|
||||
- `18820` was used by the GenAI worker prototype. The document/image triage prototype README still contains a `18820` example, but review used `18828`/`18829` to avoid collision. Prefer `18828`/`18829` for triage foreground review until Will approves a final persistent port.
|
||||
- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port.
|
||||
- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`.
|
||||
|
||||
## Read-only unified health check
|
||||
@@ -186,21 +186,21 @@ Approval gate:
|
||||
- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement.
|
||||
- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting.
|
||||
|
||||
### Document/image triage prototype (`:18828`/`:18829` review ports)
|
||||
### Document/image triage prototype (`:18829` optional review port)
|
||||
|
||||
Foreground review start only, after confirming port is free:
|
||||
Foreground review start only, after confirming the port is free:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep -E ':(18828|18829)\b' || true
|
||||
ss -ltnp | grep ':18829\b' || true
|
||||
cd ~/lab/swarm/openvino-doc-image-triage-npu
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18828 --allowed-root "$PWD"
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||
```
|
||||
|
||||
Smoke:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18828/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18828/models | jq .
|
||||
curl -fsS http://127.0.0.1:18829/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18829/models | jq .
|
||||
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
||||
```
|
||||
|
||||
|
||||
Reference in New Issue
Block a user