[verified] refresh OpenVINO NPU reranker prototype
This commit is contained in:
@@ -13,8 +13,9 @@ This service is intentionally not wired into live RAG by default.
|
||||
## Files
|
||||
|
||||
- `SPEC.md` — endpoint/CLI contract, model/runtime recommendation, smoke/NPU proof plan, RAG integration plan, docs implications, and no-go criteria.
|
||||
- `server.py` — stdlib HTTP OpenVINO Runtime service.
|
||||
- `server.py` — stdlib HTTP OpenVINO Runtime service with fail-fast localhost listener conflict checks and request validation.
|
||||
- `smoke.py` — non-private API/ranking/NPU busy-time smoke test.
|
||||
- `tests/test_server_validation.py` — stdlib unit checks for request validation and listener conflict detection.
|
||||
- `openvino-reranker.service` — optional user-systemd unit.
|
||||
|
||||
## One-time setup
|
||||
@@ -62,7 +63,7 @@ OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco
|
||||
python /home/will/lab/swarm/openvino-reranker-npu/server.py
|
||||
```
|
||||
|
||||
Startup performs a non-private smoke inference and fails closed when `OPENVINO_RERANKER_DEVICE=NPU` but `npu_busy_time_us` does not increase.
|
||||
Startup performs a non-private smoke inference and fails closed when `OPENVINO_RERANKER_DEVICE=NPU` but `npu_busy_time_us` does not increase. It also checks whether the requested listener can bind before compiling the OpenVINO model, so obvious port conflicts fail fast; the real server bind still happens immediately after model load.
|
||||
|
||||
## API
|
||||
|
||||
@@ -110,6 +111,16 @@ Expected:
|
||||
- The top result matches the non-private fixture expectation.
|
||||
- Response and sysfs `npu_busy_delta_us` are positive.
|
||||
|
||||
## Validation checks
|
||||
|
||||
```bash
|
||||
source /home/will/.venvs/openvino-reranker/bin/activate
|
||||
PYTHONPATH=/home/will/lab/swarm/openvino-reranker-npu \
|
||||
python -m unittest discover -s /home/will/lab/swarm/openvino-reranker-npu/tests
|
||||
```
|
||||
|
||||
These checks do not compile the OpenVINO model; they cover request validation and fail-fast listener conflict detection.
|
||||
|
||||
## Optional systemd user service
|
||||
|
||||
Install the unit only after the foreground command and smoke test pass:
|
||||
|
||||
Reference in New Issue
Block a user