[verified] refresh OpenVINO router classifier prototype
This commit is contained in:
@@ -191,7 +191,7 @@ Response:
|
||||
|
||||
Batch limits for prototype review:
|
||||
|
||||
- Keep batches small, ideally <= 32 items.
|
||||
- Keep batches small; the prototype rejects empty batches and batches larger than `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE` (default `32`).
|
||||
- Use only synthetic fixtures unless Will explicitly approves a real non-private sample set.
|
||||
- Do not retain request bodies to disk.
|
||||
|
||||
@@ -213,6 +213,7 @@ Required flags/env:
|
||||
- `--port` / `OPENVINO_CLASSIFIER_PORT`; default `18819`.
|
||||
- `--embed-url` / `OPENVINO_CLASSIFIER_EMBED_URL`; default `http://127.0.0.1:18817/v1/embeddings`.
|
||||
- `--timeout-s` / `OPENVINO_CLASSIFIER_TIMEOUT_S`; default `30`.
|
||||
- `--max-batch-size` / `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`; default `32`.
|
||||
- `--no-warmup` to defer prototype embedding until first request.
|
||||
|
||||
A future dedicated CLI mode may be added for one-shot JSONL classification, but foreground HTTP review is sufficient for the dry-run contract.
|
||||
@@ -280,6 +281,13 @@ echo "$response" | jq '{npu_busy_delta_us, sysfs_npu_busy_delta_us, warnings}'
|
||||
echo "outer_sysfs_npu_busy_delta_us=$((after-before))"
|
||||
```
|
||||
|
||||
Optional localhost smoke helper, after starting the foreground service:
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python openvino-classifier-npu/smoke_classifier.py \
|
||||
--base-url http://127.0.0.1:18819
|
||||
```
|
||||
|
||||
Acceptance for an NPU-backed classification request:
|
||||
|
||||
- HTTP request succeeds.
|
||||
|
||||
Reference in New Issue
Block a user