diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md index 347912a..4b8ff52 100644 --- a/docs/swarm-infrastructure.md +++ b/docs/swarm-infrastructure.md @@ -36,7 +36,7 @@ local AI/search/voice services +--> OpenVINO NPU embeddings :18817 +--> Kokoro TTS :18805 +--> Whisper NPU :18816 - +--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829 + +--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829 ``` See also: @@ -126,21 +126,24 @@ Host/user services: - `ollama.service` — `:18807`, legacy/CPU embeddings API fallback - `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`) - `docker-health-endpoint.service` — `:18809`, read-only container health for n8n -- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with optional request-time `:18818` reranking disabled by default +- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with request-time `:18818` reranking enabled with vector-order fallback - `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction - `voice-memo-processor.service` — `:18813`, voice memo processing - `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper +- `openvino-router-classifier.service` — `:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only +- `openvino-genai-npu-worker.service` — `:18820`, local-only bounded GenAI worker for small background generation jobs +- `openvino-doc-image-triage.service` — `:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement -Approved but not live-routed OpenVINO NPU sidecars: +Local-only OpenVINO NPU sidecars: | Port | Component | State | Safety boundary | | ---: | --- | --- | --- | -| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves | -| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages | -| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing | -| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` | +| `18818` | reranker | live user service; request-time second stage for `:18810/semantic-search` | no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof | +| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages | +| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing | +| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` | -These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof. +These sidecars bind to `127.0.0.1` by default and must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof. ### 5. Obsidian and RAG @@ -254,4 +257,4 @@ jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json - From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services. - Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing. - OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default. -- OpenVINO NPU sidecars on `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes/classifier/GenAI arrows to prototypes until approval and implementation actually exist. +- OpenVINO NPU sidecars on `:18819`, `:18820`, and `:18829` are live local-only services, but remain isolated specialists. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration. diff --git a/openvino-classifier-npu/README.md b/openvino-classifier-npu/README.md index 9f32d21..3a08a37 100644 --- a/openvino-classifier-npu/README.md +++ b/openvino-classifier-npu/README.md @@ -128,14 +128,13 @@ Fixture messages live at `fixtures/atlas_hermes_messages.jsonl`. ## Optional systemd user unit -A draft unit is included as `openvino-router-classifier.service`. Install only -after review/approval: +A reviewed local-only user service unit is included as `openvino-router-classifier.service`. Install/enable it when the dry-run classifier should persist across logins: ```bash cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service systemctl --user daemon-reload -systemctl --user start openvino-router-classifier.service +systemctl --user enable --now openvino-router-classifier.service systemctl --user status openvino-router-classifier.service --no-pager ``` -Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands. +The service is persistent, but classifier decisions remain dry-run until a separate approved routing change lands. Do not connect it to live Atlas/Hermes routing, memory writes, service restarts, or outbound messages. diff --git a/openvino-doc-image-triage-npu/README.md b/openvino-doc-image-triage-npu/README.md index d7e8af4..6e65aac 100644 --- a/openvino-doc-image-triage-npu/README.md +++ b/openvino-doc-image-triage-npu/README.md @@ -38,6 +38,7 @@ Not configured in v1: - `triage.py` — core library and CLI. - `server.py` — stdlib HTTP server with `/healthz`, `/models`, `/triage`, `/triage/batch`. +- `openvino-doc-image-triage.service` — local-only user-systemd service template for `127.0.0.1:18829`, limited to this prototype directory as its default allowed root. - `make_samples.py` — creates synthetic non-private image/PDF samples. - `tests/smoke_test.py` — end-to-end smoke test, including NPU busy-time verification when `:18817` is reachable. - `samples/` — generated synthetic fixtures. @@ -91,7 +92,7 @@ Include OCR/sidecar text in a single response only when explicitly requested: ## HTTP usage -The prototype is CLI-first. HTTP is optional and not enabled by default. If a foreground HTTP server is needed for review, prefer optional port `18829` so it does not collide with the GenAI worker prototype on `18820`. Check the port first: +The prototype is CLI-first, and the local HTTP wrapper can be run as a reviewed user-systemd service on `127.0.0.1:18829` with an allowlist rooted at this prototype directory. Keep it local-only and do not broaden allowed roots to private document/image directories without explicit approval. Check the port first: ```bash ss -ltnp | grep ':18829\b' || true @@ -104,6 +105,15 @@ cd /home/will/lab/swarm/openvino-doc-image-triage-npu /home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" ``` +Install/enable the reviewed local-only service template when the HTTP wrapper should persist across logins: + +```bash +install -m 0644 openvino-doc-image-triage.service ~/.config/systemd/user/openvino-doc-image-triage.service +systemctl --user daemon-reload +systemctl --user enable --now openvino-doc-image-triage.service +systemctl --user status openvino-doc-image-triage.service --no-pager +``` + Call it with synthetic/non-private fixtures only: ```bash @@ -114,7 +124,7 @@ curl -sS -X POST http://127.0.0.1:18829/triage \ -d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq ``` -Do not install or enable a persistent service for this prototype without explicit approval, and do not point it at private document/image directories during smoke tests. +Do not point it at private document/image directories during smoke tests unless Will explicitly approves the exact source root. ## Smoke test diff --git a/openvino-doc-image-triage-npu/openvino-doc-image-triage.service b/openvino-doc-image-triage-npu/openvino-doc-image-triage.service new file mode 100644 index 0000000..cc8816d --- /dev/null +++ b/openvino-doc-image-triage-npu/openvino-doc-image-triage.service @@ -0,0 +1,16 @@ +[Unit] +Description=OpenVINO NPU document/image triage HTTP Service (local-only, port 18829) +After=network.target openvino-embeddings.service +Wants=openvino-embeddings.service + +[Service] +Type=simple +WorkingDirectory=/home/will/lab/swarm/openvino-doc-image-triage-npu +Environment=DOC_IMAGE_TRIAGE_HOST=127.0.0.1 +Environment=DOC_IMAGE_TRIAGE_PORT=18829 +ExecStart=/home/will/.venvs/npu/bin/python /home/will/lab/swarm/openvino-doc-image-triage-npu/server.py --host 127.0.0.1 --port 18829 --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=default.target diff --git a/openvino-genai-npu-worker/README.md b/openvino-genai-npu-worker/README.md index 60aaed7..7a5b3a0 100644 --- a/openvino-genai-npu-worker/README.md +++ b/openvino-genai-npu-worker/README.md @@ -19,7 +19,7 @@ The worker does not write memory, does not restart Atlas/Hermes, does not change - `worker.py` — stdlib HTTP API plus CLI wrapper. - `smoke_llm_npu.py` — direct GenAI smoke test with NPU busy-time verification. - `tests/test_worker.py` — unit tests with a fake GenAI pipeline and synthetic busy-time counter. -- `systemd/openvino-genai-npu-worker.service` — optional user-service template; not installed by this prototype. +- `systemd/openvino-genai-npu-worker.service` — reviewed local-only user-service template for `127.0.0.1:18820`. ## Model/cache @@ -129,9 +129,18 @@ OV_GENAI_NPU_PORT=18820 Only `127.0.0.1` is accepted by the current prototype; wider binds require an explicit code change and approval. -## Optional systemd user service +## Systemd user service -A draft unit exists at `systemd/openvino-genai-npu-worker.service` for later review. Do not copy, enable, or autostart it unless Will explicitly approves persistent service enablement. Foreground smoke on `127.0.0.1:18820` plus positive sysfs NPU busy-time delta is required before any installation discussion. +A reviewed local-only unit exists at `systemd/openvino-genai-npu-worker.service` for persistent background use after foreground smoke succeeds with a positive NPU busy-time delta: + +```bash +install -m 0644 systemd/openvino-genai-npu-worker.service ~/.config/systemd/user/openvino-genai-npu-worker.service +systemctl --user daemon-reload +systemctl --user enable --now openvino-genai-npu-worker.service +systemctl --user status openvino-genai-npu-worker.service --no-pager +``` + +The service remains isolated: do not route primary Atlas/Hermes chat, gateway output, or automatic memory writes to it without a separate approved integration. ## Safety boundaries