feat: refresh OpenVINO GenAI NPU worker prototype

This commit is contained in:
William Valentin
2026-06-04 12:13:44 -07:00
parent cb874f9743
commit b538a5a1f9
7 changed files with 253 additions and 36 deletions
+26
View File
@@ -18,6 +18,7 @@ The worker does not write memory, does not restart Atlas/Hermes, does not change
- `CONTRACT.md` — bounded-worker service contract, endpoint/CLI API, smoke plan, NPU verification, docs implications, and no-go criteria.
- `worker.py` — stdlib HTTP API plus CLI wrapper.
- `smoke_llm_npu.py` — direct GenAI smoke test with NPU busy-time verification.
- `tests/test_worker.py` — unit tests with a fake GenAI pipeline and synthetic busy-time counter.
- `systemd/openvino-genai-npu-worker.service` — optional user-service template; not installed by this prototype.
## Model/cache
@@ -73,15 +74,20 @@ Observed cold-ish smoke after download/cache setup:
--input 'Kanban task asks for a small OpenVINO GenAI NPU worker prototype.'
```
Exit code is non-zero if validation fails, generation fails, or the worker-reported `npu_busy_delta_us` is not positive.
## HTTP usage
Start locally only:
```bash
cd /home/will/lab/swarm/openvino-genai-npu-worker
ss -ltnp | grep ':18820' && { echo 'port 18820 already in use'; exit 1; } || true
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
```
The server also refuses startup if a listener is already accepting connections on `127.0.0.1:18820`.
Endpoints:
```text
@@ -103,6 +109,26 @@ curl -s http://127.0.0.1:18820/v1/worker/generate \
Response includes `npu_busy_delta_us`; treat zero as failure even if HTTP status is 200.
## Unit tests
These tests use only synthetic strings and a fake GenAI pipeline, so they do not load the model or touch private data:
```bash
cd /home/will/lab/swarm/openvino-genai-npu-worker
python -m pytest -q
```
## Environment variables
```text
OV_GENAI_NPU_MODEL=/home/will/models/openvino-genai/Qwen2.5-1.5B-Instruct-int4-ov
OV_GENAI_NPU_CACHE=/home/will/.cache/openvino/genai-npu/qwen2.5-1.5b-int4
OV_GENAI_NPU_HOST=127.0.0.1
OV_GENAI_NPU_PORT=18820
```
Only `127.0.0.1` is accepted by the current prototype; wider binds require an explicit code change and approval.
## Safety boundaries
- Binds only to `127.0.0.1` by default; non-local bind is refused in code.