Document live OpenVINO NPU sidecars
This commit is contained in:
@@ -36,7 +36,7 @@ local AI/search/voice services
|
||||
+--> OpenVINO NPU embeddings :18817
|
||||
+--> Kokoro TTS :18805
|
||||
+--> Whisper NPU :18816
|
||||
+--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829
|
||||
+--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829
|
||||
```
|
||||
|
||||
See also:
|
||||
@@ -126,21 +126,24 @@ Host/user services:
|
||||
- `ollama.service` — `:18807`, legacy/CPU embeddings API fallback
|
||||
- `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
|
||||
- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n
|
||||
- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with optional request-time `:18818` reranking disabled by default
|
||||
- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with request-time `:18818` reranking enabled with vector-order fallback
|
||||
- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction
|
||||
- `voice-memo-processor.service` — `:18813`, voice memo processing
|
||||
- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper
|
||||
- `openvino-router-classifier.service` — `:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only
|
||||
- `openvino-genai-npu-worker.service` — `:18820`, local-only bounded GenAI worker for small background generation jobs
|
||||
- `openvino-doc-image-triage.service` — `:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement
|
||||
|
||||
Approved but not live-routed OpenVINO NPU sidecars:
|
||||
Local-only OpenVINO NPU sidecars:
|
||||
|
||||
| Port | Component | State | Safety boundary |
|
||||
| ---: | --- | --- | --- |
|
||||
| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves |
|
||||
| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
||||
| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing |
|
||||
| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` |
|
||||
| `18818` | reranker | live user service; request-time second stage for `:18810/semantic-search` | no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof |
|
||||
| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
||||
| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing |
|
||||
| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` |
|
||||
|
||||
These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
|
||||
These sidecars bind to `127.0.0.1` by default and must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
|
||||
|
||||
### 5. Obsidian and RAG
|
||||
|
||||
@@ -254,4 +257,4 @@ jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
|
||||
- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:<host-port>` for host-published swarm services.
|
||||
- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
|
||||
- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
|
||||
- OpenVINO NPU sidecars on `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes/classifier/GenAI arrows to prototypes until approval and implementation actually exist.
|
||||
- OpenVINO NPU sidecars on `:18819`, `:18820`, and `:18829` are live local-only services, but remain isolated specialists. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration.
|
||||
|
||||
@@ -128,14 +128,13 @@ Fixture messages live at `fixtures/atlas_hermes_messages.jsonl`.
|
||||
|
||||
## Optional systemd user unit
|
||||
|
||||
A draft unit is included as `openvino-router-classifier.service`. Install only
|
||||
after review/approval:
|
||||
A reviewed local-only user service unit is included as `openvino-router-classifier.service`. Install/enable it when the dry-run classifier should persist across logins:
|
||||
|
||||
```bash
|
||||
cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user start openvino-router-classifier.service
|
||||
systemctl --user enable --now openvino-router-classifier.service
|
||||
systemctl --user status openvino-router-classifier.service --no-pager
|
||||
```
|
||||
|
||||
Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.
|
||||
The service is persistent, but classifier decisions remain dry-run until a separate approved routing change lands. Do not connect it to live Atlas/Hermes routing, memory writes, service restarts, or outbound messages.
|
||||
|
||||
@@ -38,6 +38,7 @@ Not configured in v1:
|
||||
|
||||
- `triage.py` — core library and CLI.
|
||||
- `server.py` — stdlib HTTP server with `/healthz`, `/models`, `/triage`, `/triage/batch`.
|
||||
- `openvino-doc-image-triage.service` — local-only user-systemd service template for `127.0.0.1:18829`, limited to this prototype directory as its default allowed root.
|
||||
- `make_samples.py` — creates synthetic non-private image/PDF samples.
|
||||
- `tests/smoke_test.py` — end-to-end smoke test, including NPU busy-time verification when `:18817` is reachable.
|
||||
- `samples/` — generated synthetic fixtures.
|
||||
@@ -91,7 +92,7 @@ Include OCR/sidecar text in a single response only when explicitly requested:
|
||||
|
||||
## HTTP usage
|
||||
|
||||
The prototype is CLI-first. HTTP is optional and not enabled by default. If a foreground HTTP server is needed for review, prefer optional port `18829` so it does not collide with the GenAI worker prototype on `18820`. Check the port first:
|
||||
The prototype is CLI-first, and the local HTTP wrapper can be run as a reviewed user-systemd service on `127.0.0.1:18829` with an allowlist rooted at this prototype directory. Keep it local-only and do not broaden allowed roots to private document/image directories without explicit approval. Check the port first:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18829\b' || true
|
||||
@@ -104,6 +105,15 @@ cd /home/will/lab/swarm/openvino-doc-image-triage-npu
|
||||
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
||||
```
|
||||
|
||||
Install/enable the reviewed local-only service template when the HTTP wrapper should persist across logins:
|
||||
|
||||
```bash
|
||||
install -m 0644 openvino-doc-image-triage.service ~/.config/systemd/user/openvino-doc-image-triage.service
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now openvino-doc-image-triage.service
|
||||
systemctl --user status openvino-doc-image-triage.service --no-pager
|
||||
```
|
||||
|
||||
Call it with synthetic/non-private fixtures only:
|
||||
|
||||
```bash
|
||||
@@ -114,7 +124,7 @@ curl -sS -X POST http://127.0.0.1:18829/triage \
|
||||
-d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq
|
||||
```
|
||||
|
||||
Do not install or enable a persistent service for this prototype without explicit approval, and do not point it at private document/image directories during smoke tests.
|
||||
Do not point it at private document/image directories during smoke tests unless Will explicitly approves the exact source root.
|
||||
|
||||
## Smoke test
|
||||
|
||||
|
||||
@@ -0,0 +1,16 @@
|
||||
[Unit]
|
||||
Description=OpenVINO NPU document/image triage HTTP Service (local-only, port 18829)
|
||||
After=network.target openvino-embeddings.service
|
||||
Wants=openvino-embeddings.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
WorkingDirectory=/home/will/lab/swarm/openvino-doc-image-triage-npu
|
||||
Environment=DOC_IMAGE_TRIAGE_HOST=127.0.0.1
|
||||
Environment=DOC_IMAGE_TRIAGE_PORT=18829
|
||||
ExecStart=/home/will/.venvs/npu/bin/python /home/will/lab/swarm/openvino-doc-image-triage-npu/server.py --host 127.0.0.1 --port 18829 --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
@@ -19,7 +19,7 @@ The worker does not write memory, does not restart Atlas/Hermes, does not change
|
||||
- `worker.py` — stdlib HTTP API plus CLI wrapper.
|
||||
- `smoke_llm_npu.py` — direct GenAI smoke test with NPU busy-time verification.
|
||||
- `tests/test_worker.py` — unit tests with a fake GenAI pipeline and synthetic busy-time counter.
|
||||
- `systemd/openvino-genai-npu-worker.service` — optional user-service template; not installed by this prototype.
|
||||
- `systemd/openvino-genai-npu-worker.service` — reviewed local-only user-service template for `127.0.0.1:18820`.
|
||||
|
||||
## Model/cache
|
||||
|
||||
@@ -129,9 +129,18 @@ OV_GENAI_NPU_PORT=18820
|
||||
|
||||
Only `127.0.0.1` is accepted by the current prototype; wider binds require an explicit code change and approval.
|
||||
|
||||
## Optional systemd user service
|
||||
## Systemd user service
|
||||
|
||||
A draft unit exists at `systemd/openvino-genai-npu-worker.service` for later review. Do not copy, enable, or autostart it unless Will explicitly approves persistent service enablement. Foreground smoke on `127.0.0.1:18820` plus positive sysfs NPU busy-time delta is required before any installation discussion.
|
||||
A reviewed local-only unit exists at `systemd/openvino-genai-npu-worker.service` for persistent background use after foreground smoke succeeds with a positive NPU busy-time delta:
|
||||
|
||||
```bash
|
||||
install -m 0644 systemd/openvino-genai-npu-worker.service ~/.config/systemd/user/openvino-genai-npu-worker.service
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now openvino-genai-npu-worker.service
|
||||
systemctl --user status openvino-genai-npu-worker.service --no-pager
|
||||
```
|
||||
|
||||
The service remains isolated: do not route primary Atlas/Hermes chat, gateway output, or automatic memory writes to it without a separate approved integration.
|
||||
|
||||
## Safety boundaries
|
||||
|
||||
|
||||
Reference in New Issue
Block a user