feat(npu): add dry-run classifier router prototype
This commit is contained in:
@@ -0,0 +1,339 @@
|
||||
# OpenVINO NPU classifier/router dry-run contract
|
||||
|
||||
Status: specification for dry-run prototype refresh
|
||||
Target port: `127.0.0.1:18819`
|
||||
Owner context: Atlas/Hermes local assistant sidecar evaluation
|
||||
|
||||
This service is an advisory classifier for Atlas/Hermes automation hints. It may suggest labels such as tool-needed, memory-candidate type, urgency, workflow category, and safety-confirmation-required, but it must not make or enforce live routing, memory, tool, or safety decisions without a separate explicit approval from Will.
|
||||
|
||||
## Recommended model and runtime
|
||||
|
||||
Recommended v1 runtime: small local Python HTTP/CLI service backed by the existing OpenVINO NPU embeddings service on `127.0.0.1:18817`.
|
||||
|
||||
Recommended v1 model shape:
|
||||
|
||||
- Primary signal: `bge-base-en-v1.5-int8-ov` embeddings from the live embeddings service.
|
||||
- Classifier layer: inspectable deterministic rules plus cosine similarity against curated synthetic/prototype utterances.
|
||||
- Model label: `bge-base-en-v1.5-int8-ov/prototype-router-v0`.
|
||||
- Device proof: request-level `npu_busy_delta_us` from `:18817` plus direct sysfs before/after reads from `/sys/class/accel/accel0/device/npu_busy_time_us`.
|
||||
|
||||
Why this is preferred for the dry run:
|
||||
|
||||
1. It reuses the already-live NPU embeddings path rather than adding a second model conversion/runtime dependency before contract validation.
|
||||
2. Rules and prototypes are transparent enough for safety-sensitive routing hints; a reviewer can inspect why a message was labeled.
|
||||
3. It avoids fine-tuning or training on private Atlas/Hermes transcripts.
|
||||
4. It keeps the service small, localhost-only, and easy to start/stop during smoke tests.
|
||||
5. It produces NPU activity through the embeddings path while making clear that final decision logic remains advisory.
|
||||
|
||||
Defer a dedicated NPU sequence-classification model such as TinyBERT/MiniLM until the dry-run labels and thresholds have been evaluated against synthetic fixtures and explicitly-approved non-private examples. If pursued later, use OpenVINO Runtime/Optimum export with fixed input shapes suitable for NPU, and keep the rule layer for safety gates.
|
||||
|
||||
## Non-goals and safety invariants
|
||||
|
||||
The service must not:
|
||||
|
||||
- Change Hermes/Atlas model routing, gateway routing, memory writes, tool-use permissions, or safety-confirmation behavior.
|
||||
- Restart, stop, enable, or persist any live Atlas/Hermes/gateway/RAG service.
|
||||
- Bind to anything broader than `127.0.0.1` by default.
|
||||
- Mutate Chroma/vector collections, trigger reindexing, or write to RAG state.
|
||||
- Process private document/image directories or private transcript dumps for smoke testing.
|
||||
- Log raw prompts by default beyond normal foreground stderr during local review.
|
||||
- Claim NPU success from HTTP 200 alone.
|
||||
|
||||
## Endpoint contract
|
||||
|
||||
All HTTP endpoints are local-only by default.
|
||||
|
||||
Base URL:
|
||||
|
||||
```text
|
||||
http://127.0.0.1:18819
|
||||
```
|
||||
|
||||
### GET `/healthz`, `/health`, `/readyz`, `/`
|
||||
|
||||
Purpose: liveness/readiness metadata.
|
||||
|
||||
Response fields:
|
||||
|
||||
- `status`: `starting | ok`
|
||||
- `service`: `atlas-router-classifier`
|
||||
- `version`: service version string
|
||||
- `mode`: always `dry_run`
|
||||
- `model`: model/runtime label
|
||||
- `embed_url`: upstream embeddings URL
|
||||
- `device`: expected to say `NPU-via-embedding-service` or equivalent
|
||||
- `labels`: supported label names
|
||||
- `embedding_dim`: embedding dimension after warmup
|
||||
- `prototype_count`: number of synthetic prototype examples loaded
|
||||
- `prototype_npu_busy_delta_us`: warmup delta reported by upstream embeddings, if available
|
||||
- `npu_busy_time_us`: current sysfs counter value, if readable
|
||||
- `warnings`: list of non-fatal warnings
|
||||
|
||||
A healthy service is not enough to prove NPU execution. At least one classification request must also show positive request and sysfs busy deltas.
|
||||
|
||||
### GET `/v1/labels`
|
||||
|
||||
Purpose: publish schema information without dumping private examples.
|
||||
|
||||
Response fields:
|
||||
|
||||
- `model`
|
||||
- `thresholds`
|
||||
- `tool_needed`: recommended threshold `0.72`
|
||||
- `memory_candidate`: recommended threshold `0.78`
|
||||
- `safety_confirmation_required`: recommended threshold `0.80`
|
||||
- `workflow_category`: recommended threshold `0.52`
|
||||
- `enums`
|
||||
- `memory_candidate`: `none`, `user_preference`, `durable_user_fact`, `environment_fact`, `workflow_convention`, `skill_candidate`
|
||||
- `urgency`: `low`, `normal`, `high`, `critical`
|
||||
- `workflow_category`: `chat`, `research`, `coding`, `debugging`, `devops`, `smart_home`, `media`, `note_taking`, `productivity`, `kanban`, `unknown`
|
||||
- `prototype_ids`: names of curated synthetic prototype buckets
|
||||
|
||||
### POST `/v1/classify`
|
||||
|
||||
Purpose: classify one user/task message for advisory dry-run hints.
|
||||
|
||||
Request:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "optional-trace-id",
|
||||
"text": "Urgent: check whether port 18817 is listening and inspect systemd logs.",
|
||||
"context": {
|
||||
"platform": "cli",
|
||||
"source": "user"
|
||||
},
|
||||
"options": {
|
||||
"include_evidence": true,
|
||||
"include_embedding_debug": false,
|
||||
"dry_run": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Required behavior:
|
||||
|
||||
- Reject empty text with HTTP 400.
|
||||
- Default `dry_run` to true.
|
||||
- Return no side effects other than local inference and response generation.
|
||||
- Include evidence by default unless `include_evidence=false`.
|
||||
- Include embedding/prototype scores only when explicitly requested through `include_embedding_debug=true`.
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "optional-trace-id",
|
||||
"model": "bge-base-en-v1.5-int8-ov/prototype-router-v0",
|
||||
"created": 1780590000,
|
||||
"duration_ms": 12.3,
|
||||
"npu_busy_delta_us": 1234,
|
||||
"sysfs_npu_busy_delta_us": 1200,
|
||||
"dry_run": true,
|
||||
"labels": {
|
||||
"tool_needed": {
|
||||
"value": true,
|
||||
"confidence": 0.84,
|
||||
"threshold": 0.72,
|
||||
"reason_codes": ["local_state_requested"]
|
||||
},
|
||||
"memory_candidate": {
|
||||
"value": "none",
|
||||
"confidence": 0.31,
|
||||
"threshold": 0.78,
|
||||
"reason_codes": []
|
||||
},
|
||||
"urgency": {
|
||||
"value": "high",
|
||||
"confidence": 0.84,
|
||||
"scores": {"low": 0.0, "normal": 0.2, "high": 0.84, "critical": 0.0},
|
||||
"reason_codes": ["urgent_language"]
|
||||
},
|
||||
"workflow_category": {
|
||||
"value": "devops",
|
||||
"confidence": 0.86,
|
||||
"scores": {"devops": 0.86, "unknown": 0.14}
|
||||
},
|
||||
"safety_confirmation_required": {
|
||||
"value": false,
|
||||
"confidence": 0.0,
|
||||
"threshold": 0.8,
|
||||
"reason_codes": []
|
||||
}
|
||||
},
|
||||
"warnings": [],
|
||||
"evidence": []
|
||||
}
|
||||
```
|
||||
|
||||
### POST `/v1/batch_classify`
|
||||
|
||||
Purpose: classify a bounded batch of non-private synthetic or explicitly-approved messages.
|
||||
|
||||
Request:
|
||||
|
||||
```json
|
||||
{
|
||||
"items": [
|
||||
{"id": "m1", "text": "What time is it in Seattle right now?"},
|
||||
{"id": "m2", "text": "Restart the live Atlas gateway and switch primary routing."}
|
||||
],
|
||||
"options": {"include_evidence": false, "dry_run": true}
|
||||
}
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
- `model`
|
||||
- `duration_ms`
|
||||
- aggregate `npu_busy_delta_us`
|
||||
- `results`: array of `/v1/classify` responses
|
||||
|
||||
Batch limits for prototype review:
|
||||
|
||||
- Keep batches small; the prototype rejects empty batches and batches larger than `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE` (default `32`).
|
||||
- Use only synthetic fixtures unless Will explicitly approves a real non-private sample set.
|
||||
- Do not retain request bodies to disk.
|
||||
|
||||
## CLI contract
|
||||
|
||||
The same implementation should support foreground review from the service directory:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm/openvino-classifier-npu
|
||||
/home/will/.venvs/npu/bin/python router_classifier.py \
|
||||
--host 127.0.0.1 \
|
||||
--port 18819 \
|
||||
--embed-url http://127.0.0.1:18817/v1/embeddings
|
||||
```
|
||||
|
||||
Required flags/env:
|
||||
|
||||
- `--host` / `OPENVINO_CLASSIFIER_HOST`; default `127.0.0.1`.
|
||||
- `--port` / `OPENVINO_CLASSIFIER_PORT`; default `18819`.
|
||||
- `--embed-url` / `OPENVINO_CLASSIFIER_EMBED_URL`; default `http://127.0.0.1:18817/v1/embeddings`.
|
||||
- `--timeout-s` / `OPENVINO_CLASSIFIER_TIMEOUT_S`; default `30`.
|
||||
- `--max-batch-size` / `OPENVINO_CLASSIFIER_MAX_BATCH_SIZE`; default `32`.
|
||||
- `--no-warmup` to defer prototype embedding until first request.
|
||||
|
||||
A future dedicated CLI mode may be added for one-shot JSONL classification, but foreground HTTP review is sufficient for the dry-run contract.
|
||||
|
||||
## Synthetic smoke-test plan
|
||||
|
||||
Preconditions:
|
||||
|
||||
1. Confirm `:18817` embeddings service is healthy.
|
||||
2. Confirm `:18819` is not already listening.
|
||||
3. Read `/sys/class/accel/accel0/device/npu_busy_time_us` before starting the request smoke.
|
||||
4. Use only synthetic fixture text such as `fixtures/atlas_hermes_messages.jsonl`.
|
||||
|
||||
Unit/schema smoke, no NPU dependency:
|
||||
|
||||
```bash
|
||||
cd /home/will/lab/swarm
|
||||
/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v
|
||||
```
|
||||
|
||||
Foreground service smoke:
|
||||
|
||||
```bash
|
||||
ss -ltnp | grep ':18819\b' || true
|
||||
cd /home/will/lab/swarm/openvino-classifier-npu
|
||||
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819
|
||||
```
|
||||
|
||||
From another shell:
|
||||
|
||||
```bash
|
||||
curl -fsS http://127.0.0.1:18819/healthz | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/labels | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"smoke-devops","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
|
||||
curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"smoke-safety","text":"Restart the live Atlas gateway and switch primary routing to the new classifier.","options":{"include_evidence":true,"dry_run":true}}' | jq .
|
||||
```
|
||||
|
||||
Expected label checks:
|
||||
|
||||
- `smoke-devops`: `tool_needed.value=true`, `urgency.value=high`, `workflow_category.value=devops`.
|
||||
- `smoke-safety`: `safety_confirmation_required.value=true`, no actual restart or routing change.
|
||||
- Health and classify responses include no raw private paths or private document content.
|
||||
|
||||
Shutdown:
|
||||
|
||||
- Stop the foreground server with Ctrl-C.
|
||||
- Re-run `ss -ltnp | grep ':18819\b' || true` and confirm no listener remains.
|
||||
|
||||
## NPU busy-time verification plan
|
||||
|
||||
Use sysfs plus service response fields; do not accept HTTP 200 alone.
|
||||
|
||||
```bash
|
||||
BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
|
||||
before=$(cat "$BUSY")
|
||||
response=$(curl -fsS http://127.0.0.1:18819/v1/classify \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"id":"npu-proof","text":"Check current systemd service status for the embeddings service.","options":{"include_evidence":false,"dry_run":true}}')
|
||||
after=$(cat "$BUSY")
|
||||
echo "$response" | jq '{npu_busy_delta_us, sysfs_npu_busy_delta_us, warnings}'
|
||||
echo "outer_sysfs_npu_busy_delta_us=$((after-before))"
|
||||
```
|
||||
|
||||
Optional localhost smoke helper, after starting the foreground service:
|
||||
|
||||
```bash
|
||||
/home/will/.venvs/npu/bin/python openvino-classifier-npu/smoke_classifier.py \
|
||||
--base-url http://127.0.0.1:18819
|
||||
```
|
||||
|
||||
Acceptance for an NPU-backed classification request:
|
||||
|
||||
- HTTP request succeeds.
|
||||
- Response `npu_busy_delta_us > 0` from upstream embeddings.
|
||||
- Response `sysfs_npu_busy_delta_us > 0` when sysfs is readable.
|
||||
- Outer shell `after-before > 0`.
|
||||
- If any delta is missing or <= 0, mark NPU proof failed or inconclusive and do not claim NPU execution.
|
||||
|
||||
## Docs and diagram implications
|
||||
|
||||
If this prototype is refreshed or reviewed, update documentation to show:
|
||||
|
||||
- Live baseline remains RAG `:18810`, RAG health `:18814`, Whisper NPU `:18816`, and embeddings `:18817`.
|
||||
- Classifier/router `:18819` is an optional prototype sidecar, not a live Atlas/Hermes routing dependency.
|
||||
- Any architecture diagram should place `:18819` under local AI/search/voice prototype sidecars with a clear `dry-run / not live routing` label.
|
||||
- Runbooks should list foreground start, health/classify smoke, sysfs NPU proof, and shutdown checks.
|
||||
- Service catalog entries should state `not installed/enabled` until Will approves persistent service enablement.
|
||||
- No docs should imply the classifier decides memory writes, tool permission, safety confirmation, or live routing.
|
||||
|
||||
Relevant docs inventory:
|
||||
|
||||
- `docs/swarm-infrastructure.md`
|
||||
- `docs/swarm-infrastructure.html`
|
||||
- `docs/diagram-maintenance.md`
|
||||
- `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md`
|
||||
- `swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md`
|
||||
|
||||
## No-go / defer criteria
|
||||
|
||||
Do not proceed to implementation refresh, persistent service enablement, or live integration if any of the following hold:
|
||||
|
||||
- `:18817` embeddings is unavailable and no approved NPU embedding fallback exists.
|
||||
- `/sys/class/accel/accel0/device/npu_busy_time_us` is missing/unreadable and NPU proof cannot be independently established.
|
||||
- Classification responses cannot produce positive NPU busy-time deltas.
|
||||
- `:18819` is already occupied by an unknown or live service.
|
||||
- Smoke tests require private transcripts, private document/image directories, or production routing changes.
|
||||
- Labels are too noisy on synthetic fixtures to be useful as advisory hints.
|
||||
- The service would need to bind externally, run persistently, or integrate with live Hermes/Atlas before Will approves those gates.
|
||||
- Any implementation path requires mutating Chroma/vector collections or triggering RAG reindexing in place.
|
||||
|
||||
## Implementation handoff notes
|
||||
|
||||
Recommended next engineer actions:
|
||||
|
||||
1. Verify or refresh `openvino-classifier-npu/router_classifier.py` to match this contract.
|
||||
2. Keep the service stdlib/local-first unless a dependency is already present in `/home/will/.venvs/npu`.
|
||||
3. Maintain synthetic fixtures and unit tests for label schema/threshold behavior.
|
||||
4. Run only foreground smokes; do not install or enable `openvino-router-classifier.service`.
|
||||
5. Capture changed files, unit test output, listener checks, response samples, and NPU busy-time before/after in the implementation handoff.
|
||||
Reference in New Issue
Block a user