OpenVINO NPU router classifier prototype
Dry-run Atlas/Hermes message classifier/router prototype.
It reuses the existing OpenVINO NPU embeddings service on 127.0.0.1:18817 and
serves an inspectable stdlib HTTP API on 127.0.0.1:18819. It does not change
live Hermes/Atlas routing, write memory, mutate vector collections, restart
services, or send external messages.
Runtime shape
- Service:
atlas-router-classifier - Default port:
18819 - Default bind:
127.0.0.1 - Upstream:
http://127.0.0.1:18817/v1/embeddings - Model label:
bge-base-en-v1.5-int8-ov/prototype-router-v0 - NPU proof:
/sys/class/accel/accel0/device/npu_busy_time_usbefore/after plus upstreamnpu_busy_delta_us
The classifier uses deterministic high-precision rules for safety/urgency/tool signals plus cosine similarity against curated embedding prototypes for workflow and memory recommendations. This is intentionally tunable without model training.
API
GET /healthz
Returns service metadata, labels, prototype count, NPU sysfs counter, and warmup NPU delta.
GET /v1/labels
Returns label enum values, thresholds, and prototype IDs without dumping private fixtures.
POST /v1/classify
Request:
{
"id": "optional trace id",
"text": "User message or task body to classify.",
"context": {"platform": "cli", "source": "user"},
"options": {
"include_evidence": true,
"include_embedding_debug": false,
"dry_run": true
}
}
Response includes:
labels.tool_needed: boolean, confidence, threshold, reason codeslabels.memory_candidate:none | user_preference | durable_user_fact | environment_fact | workflow_convention | skill_candidatelabels.urgency:low | normal | high | criticallabels.workflow_category:chat | research | coding | debugging | devops | smart_home | media | note_taking | productivity | kanban | unknownlabels.safety_confirmation_required: boolean, confidence, reason codesnpu_busy_delta_usandsysfs_npu_busy_delta_usevidencewhen requested
POST /v1/batch_classify
Request:
{
"items": [{"id": "m1", "text": "What time is it?"}],
"options": {"include_evidence": false, "dry_run": true}
}
Local smoke test
Check that the proposed port is free first:
ss -ltnp | grep ':18819' || true
Run without installing anything extra; /home/will/.venvs/npu already has the
stdlib plus requests/openvino stack used by the upstream embeddings service:
cd /home/will/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
Then from another shell:
curl -fsS http://127.0.0.1:18819/healthz | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
-H 'Content-Type: application/json' \
-d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true}}' | jq .
A valid NPU-backed response must have positive npu_busy_delta_us; HTTP 200 by
itself is not considered proof.
Tests
Unit tests use a fake embedding client and do not touch the NPU:
/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v
Fixture messages live at fixtures/atlas_hermes_messages.jsonl.
Optional systemd user unit
A draft unit is included as openvino-router-classifier.service. Install only
after review/approval:
cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
systemctl --user daemon-reload
systemctl --user start openvino-router-classifier.service
systemctl --user status openvino-router-classifier.service --no-pager
Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.