feat: refresh OpenVINO GenAI NPU worker prototype
This commit is contained in:
@@ -18,6 +18,7 @@ The worker does not write memory, does not restart Atlas/Hermes, does not change
|
||||
- `CONTRACT.md` — bounded-worker service contract, endpoint/CLI API, smoke plan, NPU verification, docs implications, and no-go criteria.
|
||||
- `worker.py` — stdlib HTTP API plus CLI wrapper.
|
||||
- `smoke_llm_npu.py` — direct GenAI smoke test with NPU busy-time verification.
|
||||
- `tests/test_worker.py` — unit tests with a fake GenAI pipeline and synthetic busy-time counter.
|
||||
- `systemd/openvino-genai-npu-worker.service` — optional user-service template; not installed by this prototype.
|
||||
|
||||
## Model/cache
|
||||
@@ -73,15 +74,20 @@ Observed cold-ish smoke after download/cache setup:
|
||||
--input 'Kanban task asks for a small OpenVINO GenAI NPU worker prototype.'
|
||||
```
|
||||
|
||||
Exit code is non-zero if validation fails, generation fails, or the worker-reported `npu_busy_delta_us` is not positive.
|
||||
|
||||
## HTTP usage
|
||||
|
||||
Start locally only:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm/openvino-genai-npu-worker
|
||||
ss -ltnp | grep ':18820' && { echo 'port 18820 already in use'; exit 1; } || true
|
||||
/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820
|
||||
```
|
||||
|
||||
The server also refuses startup if a listener is already accepting connections on `127.0.0.1:18820`.
|
||||
|
||||
Endpoints:
|
||||
|
||||
```text
|
||||
@@ -103,6 +109,26 @@ curl -s http://127.0.0.1:18820/v1/worker/generate \
|
||||
|
||||
Response includes `npu_busy_delta_us`; treat zero as failure even if HTTP status is 200.
|
||||
|
||||
## Unit tests
|
||||
|
||||
These tests use only synthetic strings and a fake GenAI pipeline, so they do not load the model or touch private data:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm/openvino-genai-npu-worker
|
||||
python -m pytest -q
|
||||
```
|
||||
|
||||
## Environment variables
|
||||
|
||||
```text
|
||||
OV_GENAI_NPU_MODEL=/home/will/models/openvino-genai/Qwen2.5-1.5B-Instruct-int4-ov
|
||||
OV_GENAI_NPU_CACHE=/home/will/.cache/openvino/genai-npu/qwen2.5-1.5b-int4
|
||||
OV_GENAI_NPU_HOST=127.0.0.1
|
||||
OV_GENAI_NPU_PORT=18820
|
||||
```
|
||||
|
||||
Only `127.0.0.1` is accepted by the current prototype; wider binds require an explicit code change and approval.
|
||||
|
||||
## Safety boundaries
|
||||
|
||||
- Binds only to `127.0.0.1` by default; non-local bind is refused in code.
|
||||
|
||||
Reference in New Issue
Block a user