Document live OpenVINO NPU sidecars

This commit is contained in:
William Valentin
2026-06-04 15:32:32 -07:00
parent 85c496a59e
commit 401321a6d5
5 changed files with 55 additions and 18 deletions
+12 -9
View File
@@ -36,7 +36,7 @@ local AI/search/voice services
+--> OpenVINO NPU embeddings :18817
+--> Kokoro TTS :18805
+--> Whisper NPU :18816
+--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829
+--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829
```
See also:
@@ -126,21 +126,24 @@ Host/user services:
- `ollama.service``:18807`, legacy/CPU embeddings API fallback
- `openvino-embeddings.service``:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
- `docker-health-endpoint.service``:18809`, read-only container health for n8n
- `obsidian-reindex-endpoint.service``:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with optional request-time `:18818` reranking disabled by default
- `obsidian-reindex-endpoint.service``:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with request-time `:18818` reranking enabled with vector-order fallback
- `url-content-extractor.service``:18812`, YouTube/PDF/web extraction
- `voice-memo-processor.service``:18813`, voice memo processing
- `rag-embedding-health.service``:18814`, RAG/embedding health wrapper
- `openvino-router-classifier.service``:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only
- `openvino-genai-npu-worker.service``:18820`, local-only bounded GenAI worker for small background generation jobs
- `openvino-doc-image-triage.service``:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement
Approved but not live-routed OpenVINO NPU sidecars:
Local-only OpenVINO NPU sidecars:
| Port | Component | State | Safety boundary |
| ---: | --- | --- | --- |
| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves |
| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing |
| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` |
| `18818` | reranker | live user service; request-time second stage for `:18810/semantic-search` | no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof |
| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing |
| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` |
These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
These sidecars bind to `127.0.0.1` by default and must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
### 5. Obsidian and RAG
@@ -254,4 +257,4 @@ jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:<host-port>` for host-published swarm services.
- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
- OpenVINO NPU sidecars on `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes/classifier/GenAI arrows to prototypes until approval and implementation actually exist.
- OpenVINO NPU sidecars on `:18819`, `:18820`, and `:18829` are live local-only services, but remain isolated specialists. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration.
+3 -4
View File
@@ -128,14 +128,13 @@ Fixture messages live at `fixtures/atlas_hermes_messages.jsonl`.
## Optional systemd user unit
A draft unit is included as `openvino-router-classifier.service`. Install only
after review/approval:
A reviewed local-only user service unit is included as `openvino-router-classifier.service`. Install/enable it when the dry-run classifier should persist across logins:
```bash
cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
systemctl --user daemon-reload
systemctl --user start openvino-router-classifier.service
systemctl --user enable --now openvino-router-classifier.service
systemctl --user status openvino-router-classifier.service --no-pager
```
Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.
The service is persistent, but classifier decisions remain dry-run until a separate approved routing change lands. Do not connect it to live Atlas/Hermes routing, memory writes, service restarts, or outbound messages.
+12 -2
View File
@@ -38,6 +38,7 @@ Not configured in v1:
- `triage.py` — core library and CLI.
- `server.py` — stdlib HTTP server with `/healthz`, `/models`, `/triage`, `/triage/batch`.
- `openvino-doc-image-triage.service` — local-only user-systemd service template for `127.0.0.1:18829`, limited to this prototype directory as its default allowed root.
- `make_samples.py` — creates synthetic non-private image/PDF samples.
- `tests/smoke_test.py` — end-to-end smoke test, including NPU busy-time verification when `:18817` is reachable.
- `samples/` — generated synthetic fixtures.
@@ -91,7 +92,7 @@ Include OCR/sidecar text in a single response only when explicitly requested:
## HTTP usage
The prototype is CLI-first. HTTP is optional and not enabled by default. If a foreground HTTP server is needed for review, prefer optional port `18829` so it does not collide with the GenAI worker prototype on `18820`. Check the port first:
The prototype is CLI-first, and the local HTTP wrapper can be run as a reviewed user-systemd service on `127.0.0.1:18829` with an allowlist rooted at this prototype directory. Keep it local-only and do not broaden allowed roots to private document/image directories without explicit approval. Check the port first:
```bash
ss -ltnp | grep ':18829\b' || true
@@ -104,6 +105,15 @@ cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
```
Install/enable the reviewed local-only service template when the HTTP wrapper should persist across logins:
```bash
install -m 0644 openvino-doc-image-triage.service ~/.config/systemd/user/openvino-doc-image-triage.service
systemctl --user daemon-reload
systemctl --user enable --now openvino-doc-image-triage.service
systemctl --user status openvino-doc-image-triage.service --no-pager
```
Call it with synthetic/non-private fixtures only:
```bash
@@ -114,7 +124,7 @@ curl -sS -X POST http://127.0.0.1:18829/triage \
-d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq
```
Do not install or enable a persistent service for this prototype without explicit approval, and do not point it at private document/image directories during smoke tests.
Do not point it at private document/image directories during smoke tests unless Will explicitly approves the exact source root.
## Smoke test
@@ -0,0 +1,16 @@
[Unit]
Description=OpenVINO NPU document/image triage HTTP Service (local-only, port 18829)
After=network.target openvino-embeddings.service
Wants=openvino-embeddings.service
[Service]
Type=simple
WorkingDirectory=/home/will/lab/swarm/openvino-doc-image-triage-npu
Environment=DOC_IMAGE_TRIAGE_HOST=127.0.0.1
Environment=DOC_IMAGE_TRIAGE_PORT=18829
ExecStart=/home/will/.venvs/npu/bin/python /home/will/lab/swarm/openvino-doc-image-triage-npu/server.py --host 127.0.0.1 --port 18829 --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu
Restart=on-failure
RestartSec=5
[Install]
WantedBy=default.target
+12 -3
View File
@@ -19,7 +19,7 @@ The worker does not write memory, does not restart Atlas/Hermes, does not change
- `worker.py` — stdlib HTTP API plus CLI wrapper.
- `smoke_llm_npu.py` — direct GenAI smoke test with NPU busy-time verification.
- `tests/test_worker.py` — unit tests with a fake GenAI pipeline and synthetic busy-time counter.
- `systemd/openvino-genai-npu-worker.service`optional user-service template; not installed by this prototype.
- `systemd/openvino-genai-npu-worker.service`reviewed local-only user-service template for `127.0.0.1:18820`.
## Model/cache
@@ -129,9 +129,18 @@ OV_GENAI_NPU_PORT=18820
Only `127.0.0.1` is accepted by the current prototype; wider binds require an explicit code change and approval.
## Optional systemd user service
## Systemd user service
A draft unit exists at `systemd/openvino-genai-npu-worker.service` for later review. Do not copy, enable, or autostart it unless Will explicitly approves persistent service enablement. Foreground smoke on `127.0.0.1:18820` plus positive sysfs NPU busy-time delta is required before any installation discussion.
A reviewed local-only unit exists at `systemd/openvino-genai-npu-worker.service` for persistent background use after foreground smoke succeeds with a positive NPU busy-time delta:
```bash
install -m 0644 systemd/openvino-genai-npu-worker.service ~/.config/systemd/user/openvino-genai-npu-worker.service
systemctl --user daemon-reload
systemctl --user enable --now openvino-genai-npu-worker.service
systemctl --user status openvino-genai-npu-worker.service --no-pager
```
The service remains isolated: do not route primary Atlas/Hermes chat, gateway output, or automatic memory writes to it without a separate approved integration.
## Safety boundaries