feat(npu): add dry-run classifier router prototype

2026-06-04 13:07:51 -07:00
parent 0683253157
commit ea452886f3
7 changed files with 1305 additions and 0 deletions
@@ -0,0 +1,141 @@
+# OpenVINO NPU router classifier prototype
+
+Dry-run Atlas/Hermes message classifier/router prototype.
+
+The detailed dry-run contract is in [`CONTRACT.md`](./CONTRACT.md), including the
+recommended model/runtime, HTTP/CLI schema, smoke-test plan, NPU busy-time proof,
+docs/diagram implications, and no-go/defer criteria.
+
+It reuses the existing OpenVINO NPU embeddings service on `127.0.0.1:18817` and
+serves an inspectable stdlib HTTP API on `127.0.0.1:18819`. It does not change
+live Hermes/Atlas routing, write memory, mutate vector collections, restart
+services, or send external messages.
+
+## Runtime shape
+
+- Service: `atlas-router-classifier`
+- Default port: `18819`
+- Default bind: `127.0.0.1`
+- Upstream: `http://127.0.0.1:18817/v1/embeddings`
+- Batch limit: `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`, default `32`
+- Model label: `bge-base-en-v1.5-int8-ov/prototype-router-v0`
+- NPU proof: `/sys/class/accel/accel0/device/npu_busy_time_us` before/after plus upstream `npu_busy_delta_us`
+
+The classifier uses deterministic high-precision rules for safety/urgency/tool
+signals plus cosine similarity against curated embedding prototypes for workflow
+and memory recommendations. This is intentionally tunable without model training.
+
+## API
+
+### GET `/healthz`
+
+Returns service metadata, labels, prototype count, NPU sysfs counter, and warmup
+NPU delta.
+
+### GET `/v1/labels`
+
+Returns label enum values, thresholds, and prototype IDs without dumping private
+fixtures.
+
+### POST `/v1/classify`
+
+Request:
+
+```json
+{
+  "id": "optional trace id",
+  "text": "User message or task body to classify.",
+  "context": {"platform": "cli", "source": "user"},
+  "options": {
+    "include_evidence": true,
+    "include_embedding_debug": false,
+    "dry_run": true
+  }
+}
+```
+
+Response includes:
+
+- `labels.tool_needed`: boolean, confidence, threshold, reason codes
+- `labels.memory_candidate`: `none | user_preference | durable_user_fact | environment_fact | workflow_convention | skill_candidate`
+- `labels.urgency`: `low | normal | high | critical`
+- `labels.workflow_category`: `chat | research | coding | debugging | devops | smart_home | media | note_taking | productivity | kanban | unknown`
+- `labels.safety_confirmation_required`: boolean, confidence, reason codes
+- `npu_busy_delta_us` and `sysfs_npu_busy_delta_us`
+- `evidence` when requested
+
+### POST `/v1/batch_classify`
+
+Request:
+
+```json
+{
+  "items": [{"id": "m1", "text": "What time is it?"}],
+  "options": {"include_evidence": false, "dry_run": true}
+}
+```
+
+## Local smoke test
+
+Check that the proposed port is free first:
+
+```bash
+ss -ltnp | grep ':18819' || true
+```
+
+Run without installing anything extra; `/home/will/.venvs/npu` already has the
+stdlib plus requests/openvino stack used by the upstream embeddings service:
+
+```bash
+cd /home/will/lab/swarm/openvino-classifier-npu
+/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
+```
+
+Environment variables mirror the flags: `OPENVINO_CLASSIFIER_HOST`,
+`OPENVINO_CLASSIFIER_PORT`, `OPENVINO_CLASSIFIER_EMBED_URL`,
+`OPENVINO_CLASSIFIER_TIMEOUT_S`, and `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`.
+
+Then from another shell:
+
+```bash
+curl -fsS http://127.0.0.1:18819/healthz | jq .
+curl -fsS http://127.0.0.1:18819/v1/classify \
+  -H 'Content-Type: application/json' \
+  -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true}}' | jq .
+```
+
+A valid NPU-backed response must have positive `npu_busy_delta_us`; HTTP 200 by
+itself is not considered proof.
+
+Synthetic fixture smoke helper, after the foreground service is running:
+
+```bash
+/home/will/.venvs/npu/bin/python smoke_classifier.py --base-url http://127.0.0.1:18819
+```
+
+The helper refuses non-local URLs, checks fixture label expectations, and prints
+response plus outer sysfs NPU busy deltas.
+
+## Tests
+
+Unit tests use a fake embedding client and do not touch the NPU:
+
+```bash
+/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v
+```
+
+Fixture messages live at `fixtures/atlas_hermes_messages.jsonl`.
+
+## Optional systemd user unit
+
+A draft unit is included as `openvino-router-classifier.service`. Install only
+after review/approval:
+
+```bash
+cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
+systemctl --user daemon-reload
+systemctl --user start openvino-router-classifier.service
+systemctl --user status openvino-router-classifier.service --no-pager
+```
+
+Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.