feat: refresh OpenVINO GenAI NPU worker prototype
This commit is contained in:
@@ -1,6 +1,6 @@
|
||||
# Bounded OpenVINO GenAI NPU worker contract
|
||||
|
||||
Status: proposed prototype contract; not a live Atlas/Hermes routing dependency.
|
||||
Status: prototype contract implemented locally; not a live Atlas/Hermes routing dependency.
|
||||
Default address: `http://127.0.0.1:18820`.
|
||||
|
||||
## Purpose and hard boundary
|
||||
@@ -167,7 +167,7 @@ Validation/error behavior:
|
||||
|
||||
- Unsupported path: `404` JSON `{"error":"not found"}`.
|
||||
- Unsupported job, empty input, too-long input, invalid token bound, missing model, or generation failure: JSON `{"error":"..."}` with non-2xx preferred for future implementations. The current stdlib prototype returns `400` for these errors.
|
||||
- If `npu_busy_delta_us <= 0`, the response should be treated as failed by smoke tests even if an HTTP handler emitted `200`.
|
||||
- If `npu_busy_delta_us <= 0`, the response should be treated as failed by smoke tests even if an HTTP handler emitted `200`; the refreshed prototype returns `503` with the generation payload plus an `error` field.
|
||||
|
||||
## Prompt/job contract
|
||||
|
||||
@@ -253,6 +253,13 @@ Also verify the temporary listener is gone:
|
||||
ss -ltnp | grep ':18820' && { echo 'temporary smoke server still running'; exit 1; } || true
|
||||
```
|
||||
|
||||
Unit tests that do not load the model or require private data:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm/openvino-genai-npu-worker
|
||||
python -m pytest -q
|
||||
```
|
||||
|
||||
## NPU busy-time verification plan
|
||||
|
||||
Acceptance for any NPU claim requires all of the following:
|
||||
|
||||
Reference in New Issue
Block a user