[verified] refresh OpenVINO router classifier prototype

2026-06-04 12:14:26 -07:00
parent 4cf3414fdb
commit 1413cfd888
6 changed files with 196 additions and 10 deletions
@@ -191,7 +191,7 @@ Response:

 Batch limits for prototype review:

- Keep batches small, ideally <= 32 items.
+- Keep batches small; the prototype rejects empty batches and batches larger than `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE` (default `32`).
 - Use only synthetic fixtures unless Will explicitly approves a real non-private sample set.
 - Do not retain request bodies to disk.

@@ -213,6 +213,7 @@ Required flags/env:
 - `--port` / `OPENVINO_CLASSIFIER_PORT`; default `18819`.
 - `--embed-url` / `OPENVINO_CLASSIFIER_EMBED_URL`; default `http://127.0.0.1:18817/v1/embeddings`.
 - `--timeout-s` / `OPENVINO_CLASSIFIER_TIMEOUT_S`; default `30`.
+- `--max-batch-size` / `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`; default `32`.
 - `--no-warmup` to defer prototype embedding until first request.

 A future dedicated CLI mode may be added for one-shot JSONL classification, but foreground HTTP review is sufficient for the dry-run contract.
@@ -280,6 +281,13 @@ echo "$response" | jq '{npu_busy_delta_us, sysfs_npu_busy_delta_us, warnings}'
 echo "outer_sysfs_npu_busy_delta_us=$((after-before))"
 ```

+Optional localhost smoke helper, after starting the foreground service:
+
+```bash
+/home/will/.venvs/npu/bin/python openvino-classifier-npu/smoke_classifier.py \
+  --base-url http://127.0.0.1:18819
+```
+
 Acceptance for an NPU-backed classification request:

 - HTTP request succeeds.