feat: refresh OpenVINO GenAI NPU worker prototype

This commit is contained in:
William Valentin
2026-06-04 12:13:44 -07:00
parent cb874f9743
commit b538a5a1f9
7 changed files with 253 additions and 36 deletions
+9 -2
View File
@@ -1,6 +1,6 @@
# Bounded OpenVINO GenAI NPU worker contract
Status: proposed prototype contract; not a live Atlas/Hermes routing dependency.
Status: prototype contract implemented locally; not a live Atlas/Hermes routing dependency.
Default address: `http://127.0.0.1:18820`.
## Purpose and hard boundary
@@ -167,7 +167,7 @@ Validation/error behavior:
- Unsupported path: `404` JSON `{"error":"not found"}`.
- Unsupported job, empty input, too-long input, invalid token bound, missing model, or generation failure: JSON `{"error":"..."}` with non-2xx preferred for future implementations. The current stdlib prototype returns `400` for these errors.
- If `npu_busy_delta_us <= 0`, the response should be treated as failed by smoke tests even if an HTTP handler emitted `200`.
- If `npu_busy_delta_us <= 0`, the response should be treated as failed by smoke tests even if an HTTP handler emitted `200`; the refreshed prototype returns `503` with the generation payload plus an `error` field.
## Prompt/job contract
@@ -253,6 +253,13 @@ Also verify the temporary listener is gone:
ss -ltnp | grep ':18820' && { echo 'temporary smoke server still running'; exit 1; } || true
```
Unit tests that do not load the model or require private data:
```bash
cd /home/will/lab/swarm/openvino-genai-npu-worker
python -m pytest -q
```
## NPU busy-time verification plan
Acceptance for any NPU claim requires all of the following: