diff --git a/openvino-classifier-npu/CONTRACT.md b/openvino-classifier-npu/CONTRACT.md new file mode 100644 index 0000000..8e29eac --- /dev/null +++ b/openvino-classifier-npu/CONTRACT.md @@ -0,0 +1,331 @@ +# OpenVINO NPU classifier/router dry-run contract + +Status: specification for dry-run prototype refresh +Target port: `127.0.0.1:18819` +Owner context: Atlas/Hermes local assistant sidecar evaluation + +This service is an advisory classifier for Atlas/Hermes automation hints. It may suggest labels such as tool-needed, memory-candidate type, urgency, workflow category, and safety-confirmation-required, but it must not make or enforce live routing, memory, tool, or safety decisions without a separate explicit approval from Will. + +## Recommended model and runtime + +Recommended v1 runtime: small local Python HTTP/CLI service backed by the existing OpenVINO NPU embeddings service on `127.0.0.1:18817`. + +Recommended v1 model shape: + +- Primary signal: `bge-base-en-v1.5-int8-ov` embeddings from the live embeddings service. +- Classifier layer: inspectable deterministic rules plus cosine similarity against curated synthetic/prototype utterances. +- Model label: `bge-base-en-v1.5-int8-ov/prototype-router-v0`. +- Device proof: request-level `npu_busy_delta_us` from `:18817` plus direct sysfs before/after reads from `/sys/class/accel/accel0/device/npu_busy_time_us`. + +Why this is preferred for the dry run: + +1. It reuses the already-live NPU embeddings path rather than adding a second model conversion/runtime dependency before contract validation. +2. Rules and prototypes are transparent enough for safety-sensitive routing hints; a reviewer can inspect why a message was labeled. +3. It avoids fine-tuning or training on private Atlas/Hermes transcripts. +4. It keeps the service small, localhost-only, and easy to start/stop during smoke tests. +5. It produces NPU activity through the embeddings path while making clear that final decision logic remains advisory. + +Defer a dedicated NPU sequence-classification model such as TinyBERT/MiniLM until the dry-run labels and thresholds have been evaluated against synthetic fixtures and explicitly-approved non-private examples. If pursued later, use OpenVINO Runtime/Optimum export with fixed input shapes suitable for NPU, and keep the rule layer for safety gates. + +## Non-goals and safety invariants + +The service must not: + +- Change Hermes/Atlas model routing, gateway routing, memory writes, tool-use permissions, or safety-confirmation behavior. +- Restart, stop, enable, or persist any live Atlas/Hermes/gateway/RAG service. +- Bind to anything broader than `127.0.0.1` by default. +- Mutate Chroma/vector collections, trigger reindexing, or write to RAG state. +- Process private document/image directories or private transcript dumps for smoke testing. +- Log raw prompts by default beyond normal foreground stderr during local review. +- Claim NPU success from HTTP 200 alone. + +## Endpoint contract + +All HTTP endpoints are local-only by default. + +Base URL: + +```text +http://127.0.0.1:18819 +``` + +### GET `/healthz`, `/health`, `/readyz`, `/` + +Purpose: liveness/readiness metadata. + +Response fields: + +- `status`: `starting | ok` +- `service`: `atlas-router-classifier` +- `version`: service version string +- `mode`: always `dry_run` +- `model`: model/runtime label +- `embed_url`: upstream embeddings URL +- `device`: expected to say `NPU-via-embedding-service` or equivalent +- `labels`: supported label names +- `embedding_dim`: embedding dimension after warmup +- `prototype_count`: number of synthetic prototype examples loaded +- `prototype_npu_busy_delta_us`: warmup delta reported by upstream embeddings, if available +- `npu_busy_time_us`: current sysfs counter value, if readable +- `warnings`: list of non-fatal warnings + +A healthy service is not enough to prove NPU execution. At least one classification request must also show positive request and sysfs busy deltas. + +### GET `/v1/labels` + +Purpose: publish schema information without dumping private examples. + +Response fields: + +- `model` +- `thresholds` + - `tool_needed`: recommended threshold `0.72` + - `memory_candidate`: recommended threshold `0.78` + - `safety_confirmation_required`: recommended threshold `0.80` + - `workflow_category`: recommended threshold `0.52` +- `enums` + - `memory_candidate`: `none`, `user_preference`, `durable_user_fact`, `environment_fact`, `workflow_convention`, `skill_candidate` + - `urgency`: `low`, `normal`, `high`, `critical` + - `workflow_category`: `chat`, `research`, `coding`, `debugging`, `devops`, `smart_home`, `media`, `note_taking`, `productivity`, `kanban`, `unknown` +- `prototype_ids`: names of curated synthetic prototype buckets + +### POST `/v1/classify` + +Purpose: classify one user/task message for advisory dry-run hints. + +Request: + +```json +{ + "id": "optional-trace-id", + "text": "Urgent: check whether port 18817 is listening and inspect systemd logs.", + "context": { + "platform": "cli", + "source": "user" + }, + "options": { + "include_evidence": true, + "include_embedding_debug": false, + "dry_run": true + } +} +``` + +Required behavior: + +- Reject empty text with HTTP 400. +- Default `dry_run` to true. +- Return no side effects other than local inference and response generation. +- Include evidence by default unless `include_evidence=false`. +- Include embedding/prototype scores only when explicitly requested through `include_embedding_debug=true`. + +Response: + +```json +{ + "id": "optional-trace-id", + "model": "bge-base-en-v1.5-int8-ov/prototype-router-v0", + "created": 1780590000, + "duration_ms": 12.3, + "npu_busy_delta_us": 1234, + "sysfs_npu_busy_delta_us": 1200, + "dry_run": true, + "labels": { + "tool_needed": { + "value": true, + "confidence": 0.84, + "threshold": 0.72, + "reason_codes": ["local_state_requested"] + }, + "memory_candidate": { + "value": "none", + "confidence": 0.31, + "threshold": 0.78, + "reason_codes": [] + }, + "urgency": { + "value": "high", + "confidence": 0.84, + "scores": {"low": 0.0, "normal": 0.2, "high": 0.84, "critical": 0.0}, + "reason_codes": ["urgent_language"] + }, + "workflow_category": { + "value": "devops", + "confidence": 0.86, + "scores": {"devops": 0.86, "unknown": 0.14} + }, + "safety_confirmation_required": { + "value": false, + "confidence": 0.0, + "threshold": 0.8, + "reason_codes": [] + } + }, + "warnings": [], + "evidence": [] +} +``` + +### POST `/v1/batch_classify` + +Purpose: classify a bounded batch of non-private synthetic or explicitly-approved messages. + +Request: + +```json +{ + "items": [ + {"id": "m1", "text": "What time is it in Seattle right now?"}, + {"id": "m2", "text": "Restart the live Atlas gateway and switch primary routing."} + ], + "options": {"include_evidence": false, "dry_run": true} +} +``` + +Response: + +- `model` +- `duration_ms` +- aggregate `npu_busy_delta_us` +- `results`: array of `/v1/classify` responses + +Batch limits for prototype review: + +- Keep batches small, ideally <= 32 items. +- Use only synthetic fixtures unless Will explicitly approves a real non-private sample set. +- Do not retain request bodies to disk. + +## CLI contract + +The same implementation should support foreground review from the service directory: + +```bash +cd /home/will/lab/swarm/openvino-classifier-npu +/home/will/.venvs/npu/bin/python router_classifier.py \ + --host 127.0.0.1 \ + --port 18819 \ + --embed-url http://127.0.0.1:18817/v1/embeddings +``` + +Required flags/env: + +- `--host` / `OPENVINO_CLASSIFIER_HOST`; default `127.0.0.1`. +- `--port` / `OPENVINO_CLASSIFIER_PORT`; default `18819`. +- `--embed-url` / `OPENVINO_CLASSIFIER_EMBED_URL`; default `http://127.0.0.1:18817/v1/embeddings`. +- `--timeout-s` / `OPENVINO_CLASSIFIER_TIMEOUT_S`; default `30`. +- `--no-warmup` to defer prototype embedding until first request. + +A future dedicated CLI mode may be added for one-shot JSONL classification, but foreground HTTP review is sufficient for the dry-run contract. + +## Synthetic smoke-test plan + +Preconditions: + +1. Confirm `:18817` embeddings service is healthy. +2. Confirm `:18819` is not already listening. +3. Read `/sys/class/accel/accel0/device/npu_busy_time_us` before starting the request smoke. +4. Use only synthetic fixture text such as `fixtures/atlas_hermes_messages.jsonl`. + +Unit/schema smoke, no NPU dependency: + +```bash +cd /home/will/lab/swarm +/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v +``` + +Foreground service smoke: + +```bash +ss -ltnp | grep ':18819\b' || true +cd /home/will/lab/swarm/openvino-classifier-npu +/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819 +``` + +From another shell: + +```bash +curl -fsS http://127.0.0.1:18819/healthz | jq . +curl -fsS http://127.0.0.1:18819/v1/labels | jq . +curl -fsS http://127.0.0.1:18819/v1/classify \ + -H 'Content-Type: application/json' \ + -d '{"id":"smoke-devops","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq . +curl -fsS http://127.0.0.1:18819/v1/classify \ + -H 'Content-Type: application/json' \ + -d '{"id":"smoke-safety","text":"Restart the live Atlas gateway and switch primary routing to the new classifier.","options":{"include_evidence":true,"dry_run":true}}' | jq . +``` + +Expected label checks: + +- `smoke-devops`: `tool_needed.value=true`, `urgency.value=high`, `workflow_category.value=devops`. +- `smoke-safety`: `safety_confirmation_required.value=true`, no actual restart or routing change. +- Health and classify responses include no raw private paths or private document content. + +Shutdown: + +- Stop the foreground server with Ctrl-C. +- Re-run `ss -ltnp | grep ':18819\b' || true` and confirm no listener remains. + +## NPU busy-time verification plan + +Use sysfs plus service response fields; do not accept HTTP 200 alone. + +```bash +BUSY=/sys/class/accel/accel0/device/npu_busy_time_us +before=$(cat "$BUSY") +response=$(curl -fsS http://127.0.0.1:18819/v1/classify \ + -H 'Content-Type: application/json' \ + -d '{"id":"npu-proof","text":"Check current systemd service status for the embeddings service.","options":{"include_evidence":false,"dry_run":true}}') +after=$(cat "$BUSY") +echo "$response" | jq '{npu_busy_delta_us, sysfs_npu_busy_delta_us, warnings}' +echo "outer_sysfs_npu_busy_delta_us=$((after-before))" +``` + +Acceptance for an NPU-backed classification request: + +- HTTP request succeeds. +- Response `npu_busy_delta_us > 0` from upstream embeddings. +- Response `sysfs_npu_busy_delta_us > 0` when sysfs is readable. +- Outer shell `after-before > 0`. +- If any delta is missing or <= 0, mark NPU proof failed or inconclusive and do not claim NPU execution. + +## Docs and diagram implications + +If this prototype is refreshed or reviewed, update documentation to show: + +- Live baseline remains RAG `:18810`, RAG health `:18814`, Whisper NPU `:18816`, and embeddings `:18817`. +- Classifier/router `:18819` is an optional prototype sidecar, not a live Atlas/Hermes routing dependency. +- Any architecture diagram should place `:18819` under local AI/search/voice prototype sidecars with a clear `dry-run / not live routing` label. +- Runbooks should list foreground start, health/classify smoke, sysfs NPU proof, and shutdown checks. +- Service catalog entries should state `not installed/enabled` until Will approves persistent service enablement. +- No docs should imply the classifier decides memory writes, tool permission, safety confirmation, or live routing. + +Relevant docs inventory: + +- `docs/swarm-infrastructure.md` +- `docs/swarm-infrastructure.html` +- `docs/diagram-maintenance.md` +- `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` +- `swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md` + +## No-go / defer criteria + +Do not proceed to implementation refresh, persistent service enablement, or live integration if any of the following hold: + +- `:18817` embeddings is unavailable and no approved NPU embedding fallback exists. +- `/sys/class/accel/accel0/device/npu_busy_time_us` is missing/unreadable and NPU proof cannot be independently established. +- Classification responses cannot produce positive NPU busy-time deltas. +- `:18819` is already occupied by an unknown or live service. +- Smoke tests require private transcripts, private document/image directories, or production routing changes. +- Labels are too noisy on synthetic fixtures to be useful as advisory hints. +- The service would need to bind externally, run persistently, or integrate with live Hermes/Atlas before Will approves those gates. +- Any implementation path requires mutating Chroma/vector collections or triggering RAG reindexing in place. + +## Implementation handoff notes + +Recommended next engineer actions: + +1. Verify or refresh `openvino-classifier-npu/router_classifier.py` to match this contract. +2. Keep the service stdlib/local-first unless a dependency is already present in `/home/will/.venvs/npu`. +3. Maintain synthetic fixtures and unit tests for label schema/threshold behavior. +4. Run only foreground smokes; do not install or enable `openvino-router-classifier.service`. +5. Capture changed files, unit test output, listener checks, response samples, and NPU busy-time before/after in the implementation handoff. diff --git a/openvino-classifier-npu/README.md b/openvino-classifier-npu/README.md index 1d42223..74654cb 100644 --- a/openvino-classifier-npu/README.md +++ b/openvino-classifier-npu/README.md @@ -2,6 +2,10 @@ Dry-run Atlas/Hermes message classifier/router prototype. +The detailed dry-run contract is in [`CONTRACT.md`](./CONTRACT.md), including the +recommended model/runtime, HTTP/CLI schema, smoke-test plan, NPU busy-time proof, +docs/diagram implications, and no-go/defer criteria. + It reuses the existing OpenVINO NPU embeddings service on `127.0.0.1:18817` and serves an inspectable stdlib HTTP API on `127.0.0.1:18819`. It does not change live Hermes/Atlas routing, write memory, mutate vector collections, restart