feat(npu): add dry-run classifier router prototype
This commit is contained in:
@@ -0,0 +1,141 @@
|
||||
# OpenVINO NPU router classifier prototype
|
||||
|
||||
Dry-run Atlas/Hermes message classifier/router prototype.
|
||||
|
||||
The detailed dry-run contract is in [`CONTRACT.md`](./CONTRACT.md), including the
|
||||
recommended model/runtime, HTTP/CLI schema, smoke-test plan, NPU busy-time proof,
|
||||
docs/diagram implications, and no-go/defer criteria.
|
||||
|
||||
It reuses the existing OpenVINO NPU embeddings service on `127.0.0.1:18817` and
|
||||
serves an inspectable stdlib HTTP API on `127.0.0.1:18819`. It does not change
|
||||
live Hermes/Atlas routing, write memory, mutate vector collections, restart
|
||||
services, or send external messages.
|
||||
|
||||
## Runtime shape
|
||||
|
||||
- Service: `atlas-router-classifier`
|
||||
- Default port: `18819`
|
||||
- Default bind: `127.0.0.1`
|
||||
- Upstream: `http://127.0.0.1:18817/v1/embeddings`
|
||||
- Batch limit: `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`, default `32`
|
||||
- Model label: `bge-base-en-v1.5-int8-ov/prototype-router-v0`
|
||||
- NPU proof: `/sys/class/accel/accel0/device/npu_busy_time_us` before/after plus upstream `npu_busy_delta_us`
|
||||
|
||||
The classifier uses deterministic high-precision rules for safety/urgency/tool
|
||||
signals plus cosine similarity against curated embedding prototypes for workflow
|
||||
and memory recommendations. This is intentionally tunable without model training.
|
||||
|
||||
## API
|
||||
|
||||
### GET `/healthz`
|
||||
|
||||
Returns service metadata, labels, prototype count, NPU sysfs counter, and warmup
|
||||
NPU delta.
|
||||
|
||||
### GET `/v1/labels`
|
||||
|
||||
Returns label enum values, thresholds, and prototype IDs without dumping private
|
||||
fixtures.
|
||||
|
||||
### POST `/v1/classify`
|
||||
|
||||
Request:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "optional trace id",
|
||||
"text": "User message or task body to classify.",
|
||||
"context": {"platform": "cli", "source": "user"},
|
||||
"options": {
|
||||
"include_evidence": true,
|
||||
"include_embedding_debug": false,
|
||||
"dry_run": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Response includes:
|
||||
|
||||
- `labels.tool_needed`: boolean, confidence, threshold, reason codes
|
||||
- `labels.memory_candidate`: `none | user_preference | durable_user_fact | environment_fact | workflow_convention | skill_candidate`
|
||||
- `labels.urgency`: `low | normal | high | critical`
|
||||
- `labels.workflow_category`: `chat | research | coding | debugging | devops | smart_home | media | note_taking | productivity | kanban | unknown`
|
||||
- `labels.safety_confirmation_required`: boolean, confidence, reason codes
|
||||
- `npu_busy_delta_us` and `sysfs_npu_busy_delta_us`
|
||||
- `evidence` when requested
|
||||
|
||||
### POST `/v1/batch_classify`
|
||||
|
||||
Request:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [{"id": "m1", "text": "What time is it?"}],
|
||||
"options": {"include_evidence": false, "dry_run": true}
|
||||
}
|
||||
```
|
||||
|
||||
## Local smoke test
|
||||
|
||||
Check that the proposed port is free first:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18819' || true
|
||||
```
|
||||
|
||||
Run without installing anything extra; `/home/will/.venvs/npu` already has the
|
||||
stdlib plus requests/openvino stack used by the upstream embeddings service:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm/openvino-classifier-npu
|
||||
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
|
||||
```
|
||||
|
||||
Environment variables mirror the flags: `OPENVINO_CLASSIFIER_HOST`,
|
||||
`OPENVINO_CLASSIFIER_PORT`, `OPENVINO_CLASSIFIER_EMBED_URL`,
|
||||
`OPENVINO_CLASSIFIER_TIMEOUT_S`, and `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`.
|
||||
|
||||
Then from another shell:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18819/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true}}' | jq .
|
||||
```
|
||||
|
||||
A valid NPU-backed response must have positive `npu_busy_delta_us`; HTTP 200 by
|
||||
itself is not considered proof.
|
||||
|
||||
Synthetic fixture smoke helper, after the foreground service is running:
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python smoke_classifier.py --base-url http://127.0.0.1:18819
|
||||
```
|
||||
|
||||
The helper refuses non-local URLs, checks fixture label expectations, and prints
|
||||
response plus outer sysfs NPU busy deltas.
|
||||
|
||||
## Tests
|
||||
|
||||
Unit tests use a fake embedding client and do not touch the NPU:
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v
|
||||
```
|
||||
|
||||
Fixture messages live at `fixtures/atlas_hermes_messages.jsonl`.
|
||||
|
||||
## Optional systemd user unit
|
||||
|
||||
A draft unit is included as `openvino-router-classifier.service`. Install only
|
||||
after review/approval:
|
||||
|
||||
```bash
|
||||
cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user start openvino-router-classifier.service
|
||||
systemctl --user status openvino-router-classifier.service --no-pager
|
||||
```
|
||||
|
||||
Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.
|
||||
Reference in New Issue
Block a user