Files
swarm-master/openvino-classifier-npu
William Valentin 95c8774923 Merge branch 'kanban/npu-docs-diagrams' into kanban/npu-integration
# Conflicts:
#	openvino-doc-image-triage-npu/README.md
#	openvino-genai-npu-worker/README.md
#	swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md
2026-06-04 12:47:38 -07:00
..

OpenVINO NPU router classifier prototype

Dry-run Atlas/Hermes message classifier/router prototype.

The detailed dry-run contract is in CONTRACT.md, including the recommended model/runtime, HTTP/CLI schema, smoke-test plan, NPU busy-time proof, docs/diagram implications, and no-go/defer criteria.

It reuses the existing OpenVINO NPU embeddings service on 127.0.0.1:18817 and serves an inspectable stdlib HTTP API on 127.0.0.1:18819. It does not change live Hermes/Atlas routing, write memory, mutate vector collections, restart services, or send external messages.

Runtime shape

  • Service: atlas-router-classifier
  • Default port: 18819
  • Default bind: 127.0.0.1
  • Upstream: http://127.0.0.1:18817/v1/embeddings
  • Batch limit: OPENVINO_CLASSIFIER_MAX_BATCH_SIZE, default 32
  • Model label: bge-base-en-v1.5-int8-ov/prototype-router-v0
  • NPU proof: /sys/class/accel/accel0/device/npu_busy_time_us before/after plus upstream npu_busy_delta_us

The classifier uses deterministic high-precision rules for safety/urgency/tool signals plus cosine similarity against curated embedding prototypes for workflow and memory recommendations. This is intentionally tunable without model training.

API

GET /healthz

Returns service metadata, labels, prototype count, NPU sysfs counter, and warmup NPU delta.

GET /v1/labels

Returns label enum values, thresholds, and prototype IDs without dumping private fixtures.

POST /v1/classify

Request:

{
  "id": "optional trace id",
  "text": "User message or task body to classify.",
  "context": {"platform": "cli", "source": "user"},
  "options": {
    "include_evidence": true,
    "include_embedding_debug": false,
    "dry_run": true
  }
}

Response includes:

  • labels.tool_needed: boolean, confidence, threshold, reason codes
  • labels.memory_candidate: none | user_preference | durable_user_fact | environment_fact | workflow_convention | skill_candidate
  • labels.urgency: low | normal | high | critical
  • labels.workflow_category: chat | research | coding | debugging | devops | smart_home | media | note_taking | productivity | kanban | unknown
  • labels.safety_confirmation_required: boolean, confidence, reason codes
  • npu_busy_delta_us and sysfs_npu_busy_delta_us
  • evidence when requested

POST /v1/batch_classify

Request:

{
  "items": [{"id": "m1", "text": "What time is it?"}],
  "options": {"include_evidence": false, "dry_run": true}
}

Local smoke test

Check that the proposed port is free first:

ss -ltnp | grep ':18819' || true

Run without installing anything extra; /home/will/.venvs/npu already has the stdlib plus requests/openvino stack used by the upstream embeddings service:

cd /home/will/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819

Environment variables mirror the flags: OPENVINO_CLASSIFIER_HOST, OPENVINO_CLASSIFIER_PORT, OPENVINO_CLASSIFIER_EMBED_URL, OPENVINO_CLASSIFIER_TIMEOUT_S, and OPENVINO_CLASSIFIER_MAX_BATCH_SIZE.

Then from another shell:

curl -fsS http://127.0.0.1:18819/healthz | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
  -H 'Content-Type: application/json' \
  -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true}}' | jq .

A valid NPU-backed response must have positive npu_busy_delta_us; HTTP 200 by itself is not considered proof.

Synthetic fixture smoke helper, after the foreground service is running:

/home/will/.venvs/npu/bin/python smoke_classifier.py --base-url http://127.0.0.1:18819

The helper refuses non-local URLs, checks fixture label expectations, and prints response plus outer sysfs NPU busy deltas.

Tests

Unit tests use a fake embedding client and do not touch the NPU:

/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v

Fixture messages live at fixtures/atlas_hermes_messages.jsonl.

Optional systemd user unit

A draft unit is included as openvino-router-classifier.service. Install only after review/approval:

cp openvino-router-classifier.service ~/.config/systemd/user/openvino-router-classifier.service
systemctl --user daemon-reload
systemctl --user start openvino-router-classifier.service
systemctl --user status openvino-router-classifier.service --no-pager

Do not enable it at boot or connect it to live Atlas/Hermes routing as part of this prototype task without explicit approval. Keep classifier decisions dry-run until a separate approved routing change lands.