Files
swarm-master/openvino-classifier-npu/CONTRACT.md
2026-06-04 13:07:51 -07:00

13 KiB

OpenVINO NPU classifier/router dry-run contract

Status: specification for dry-run prototype refresh Target port: 127.0.0.1:18819 Owner context: Atlas/Hermes local assistant sidecar evaluation

This service is an advisory classifier for Atlas/Hermes automation hints. It may suggest labels such as tool-needed, memory-candidate type, urgency, workflow category, and safety-confirmation-required, but it must not make or enforce live routing, memory, tool, or safety decisions without a separate explicit approval from Will.

Recommended v1 runtime: small local Python HTTP/CLI service backed by the existing OpenVINO NPU embeddings service on 127.0.0.1:18817.

Recommended v1 model shape:

  • Primary signal: bge-base-en-v1.5-int8-ov embeddings from the live embeddings service.
  • Classifier layer: inspectable deterministic rules plus cosine similarity against curated synthetic/prototype utterances.
  • Model label: bge-base-en-v1.5-int8-ov/prototype-router-v0.
  • Device proof: request-level npu_busy_delta_us from :18817 plus direct sysfs before/after reads from /sys/class/accel/accel0/device/npu_busy_time_us.

Why this is preferred for the dry run:

  1. It reuses the already-live NPU embeddings path rather than adding a second model conversion/runtime dependency before contract validation.
  2. Rules and prototypes are transparent enough for safety-sensitive routing hints; a reviewer can inspect why a message was labeled.
  3. It avoids fine-tuning or training on private Atlas/Hermes transcripts.
  4. It keeps the service small, localhost-only, and easy to start/stop during smoke tests.
  5. It produces NPU activity through the embeddings path while making clear that final decision logic remains advisory.

Defer a dedicated NPU sequence-classification model such as TinyBERT/MiniLM until the dry-run labels and thresholds have been evaluated against synthetic fixtures and explicitly-approved non-private examples. If pursued later, use OpenVINO Runtime/Optimum export with fixed input shapes suitable for NPU, and keep the rule layer for safety gates.

Non-goals and safety invariants

The service must not:

  • Change Hermes/Atlas model routing, gateway routing, memory writes, tool-use permissions, or safety-confirmation behavior.
  • Restart, stop, enable, or persist any live Atlas/Hermes/gateway/RAG service.
  • Bind to anything broader than 127.0.0.1 by default.
  • Mutate Chroma/vector collections, trigger reindexing, or write to RAG state.
  • Process private document/image directories or private transcript dumps for smoke testing.
  • Log raw prompts by default beyond normal foreground stderr during local review.
  • Claim NPU success from HTTP 200 alone.

Endpoint contract

All HTTP endpoints are local-only by default.

Base URL:

http://127.0.0.1:18819

GET /healthz, /health, /readyz, /

Purpose: liveness/readiness metadata.

Response fields:

  • status: starting | ok
  • service: atlas-router-classifier
  • version: service version string
  • mode: always dry_run
  • model: model/runtime label
  • embed_url: upstream embeddings URL
  • device: expected to say NPU-via-embedding-service or equivalent
  • labels: supported label names
  • embedding_dim: embedding dimension after warmup
  • prototype_count: number of synthetic prototype examples loaded
  • prototype_npu_busy_delta_us: warmup delta reported by upstream embeddings, if available
  • npu_busy_time_us: current sysfs counter value, if readable
  • warnings: list of non-fatal warnings

A healthy service is not enough to prove NPU execution. At least one classification request must also show positive request and sysfs busy deltas.

GET /v1/labels

Purpose: publish schema information without dumping private examples.

Response fields:

  • model
  • thresholds
    • tool_needed: recommended threshold 0.72
    • memory_candidate: recommended threshold 0.78
    • safety_confirmation_required: recommended threshold 0.80
    • workflow_category: recommended threshold 0.52
  • enums
    • memory_candidate: none, user_preference, durable_user_fact, environment_fact, workflow_convention, skill_candidate
    • urgency: low, normal, high, critical
    • workflow_category: chat, research, coding, debugging, devops, smart_home, media, note_taking, productivity, kanban, unknown
  • prototype_ids: names of curated synthetic prototype buckets

POST /v1/classify

Purpose: classify one user/task message for advisory dry-run hints.

Request:

{
  "id": "optional-trace-id",
  "text": "Urgent: check whether port 18817 is listening and inspect systemd logs.",
  "context": {
    "platform": "cli",
    "source": "user"
  },
  "options": {
    "include_evidence": true,
    "include_embedding_debug": false,
    "dry_run": true
  }
}

Required behavior:

  • Reject empty text with HTTP 400.
  • Default dry_run to true.
  • Return no side effects other than local inference and response generation.
  • Include evidence by default unless include_evidence=false.
  • Include embedding/prototype scores only when explicitly requested through include_embedding_debug=true.

Response:

{
  "id": "optional-trace-id",
  "model": "bge-base-en-v1.5-int8-ov/prototype-router-v0",
  "created": 1780590000,
  "duration_ms": 12.3,
  "npu_busy_delta_us": 1234,
  "sysfs_npu_busy_delta_us": 1200,
  "dry_run": true,
  "labels": {
    "tool_needed": {
      "value": true,
      "confidence": 0.84,
      "threshold": 0.72,
      "reason_codes": ["local_state_requested"]
    },
    "memory_candidate": {
      "value": "none",
      "confidence": 0.31,
      "threshold": 0.78,
      "reason_codes": []
    },
    "urgency": {
      "value": "high",
      "confidence": 0.84,
      "scores": {"low": 0.0, "normal": 0.2, "high": 0.84, "critical": 0.0},
      "reason_codes": ["urgent_language"]
    },
    "workflow_category": {
      "value": "devops",
      "confidence": 0.86,
      "scores": {"devops": 0.86, "unknown": 0.14}
    },
    "safety_confirmation_required": {
      "value": false,
      "confidence": 0.0,
      "threshold": 0.8,
      "reason_codes": []
    }
  },
  "warnings": [],
  "evidence": []
}

POST /v1/batch_classify

Purpose: classify a bounded batch of non-private synthetic or explicitly-approved messages.

Request:

{
  "items": [
    {"id": "m1", "text": "What time is it in Seattle right now?"},
    {"id": "m2", "text": "Restart the live Atlas gateway and switch primary routing."}
  ],
  "options": {"include_evidence": false, "dry_run": true}
}

Response:

  • model
  • duration_ms
  • aggregate npu_busy_delta_us
  • results: array of /v1/classify responses

Batch limits for prototype review:

  • Keep batches small; the prototype rejects empty batches and batches larger than OPENVINO_CLASSIFIER_MAX_BATCH_SIZE (default 32).
  • Use only synthetic fixtures unless Will explicitly approves a real non-private sample set.
  • Do not retain request bodies to disk.

CLI contract

The same implementation should support foreground review from the service directory:

cd /home/will/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py \
  --host 127.0.0.1 \
  --port 18819 \
  --embed-url http://127.0.0.1:18817/v1/embeddings

Required flags/env:

  • --host / OPENVINO_CLASSIFIER_HOST; default 127.0.0.1.
  • --port / OPENVINO_CLASSIFIER_PORT; default 18819.
  • --embed-url / OPENVINO_CLASSIFIER_EMBED_URL; default http://127.0.0.1:18817/v1/embeddings.
  • --timeout-s / OPENVINO_CLASSIFIER_TIMEOUT_S; default 30.
  • --max-batch-size / OPENVINO_CLASSIFIER_MAX_BATCH_SIZE; default 32.
  • --no-warmup to defer prototype embedding until first request.

A future dedicated CLI mode may be added for one-shot JSONL classification, but foreground HTTP review is sufficient for the dry-run contract.

Synthetic smoke-test plan

Preconditions:

  1. Confirm :18817 embeddings service is healthy.
  2. Confirm :18819 is not already listening.
  3. Read /sys/class/accel/accel0/device/npu_busy_time_us before starting the request smoke.
  4. Use only synthetic fixture text such as fixtures/atlas_hermes_messages.jsonl.

Unit/schema smoke, no NPU dependency:

cd /home/will/lab/swarm
/home/will/.venvs/npu/bin/python -m unittest discover -s openvino-classifier-npu/tests -v

Foreground service smoke:

ss -ltnp | grep ':18819\b' || true
cd /home/will/lab/swarm/openvino-classifier-npu
/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819

From another shell:

curl -fsS http://127.0.0.1:18819/healthz | jq .
curl -fsS http://127.0.0.1:18819/v1/labels | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
  -H 'Content-Type: application/json' \
  -d '{"id":"smoke-devops","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq .
curl -fsS http://127.0.0.1:18819/v1/classify \
  -H 'Content-Type: application/json' \
  -d '{"id":"smoke-safety","text":"Restart the live Atlas gateway and switch primary routing to the new classifier.","options":{"include_evidence":true,"dry_run":true}}' | jq .

Expected label checks:

  • smoke-devops: tool_needed.value=true, urgency.value=high, workflow_category.value=devops.
  • smoke-safety: safety_confirmation_required.value=true, no actual restart or routing change.
  • Health and classify responses include no raw private paths or private document content.

Shutdown:

  • Stop the foreground server with Ctrl-C.
  • Re-run ss -ltnp | grep ':18819\b' || true and confirm no listener remains.

NPU busy-time verification plan

Use sysfs plus service response fields; do not accept HTTP 200 alone.

BUSY=/sys/class/accel/accel0/device/npu_busy_time_us
before=$(cat "$BUSY")
response=$(curl -fsS http://127.0.0.1:18819/v1/classify \
  -H 'Content-Type: application/json' \
  -d '{"id":"npu-proof","text":"Check current systemd service status for the embeddings service.","options":{"include_evidence":false,"dry_run":true}}')
after=$(cat "$BUSY")
echo "$response" | jq '{npu_busy_delta_us, sysfs_npu_busy_delta_us, warnings}'
echo "outer_sysfs_npu_busy_delta_us=$((after-before))"

Optional localhost smoke helper, after starting the foreground service:

/home/will/.venvs/npu/bin/python openvino-classifier-npu/smoke_classifier.py \
  --base-url http://127.0.0.1:18819

Acceptance for an NPU-backed classification request:

  • HTTP request succeeds.
  • Response npu_busy_delta_us > 0 from upstream embeddings.
  • Response sysfs_npu_busy_delta_us > 0 when sysfs is readable.
  • Outer shell after-before > 0.
  • If any delta is missing or <= 0, mark NPU proof failed or inconclusive and do not claim NPU execution.

Docs and diagram implications

If this prototype is refreshed or reviewed, update documentation to show:

  • Live baseline remains RAG :18810, RAG health :18814, Whisper NPU :18816, and embeddings :18817.
  • Classifier/router :18819 is an optional prototype sidecar, not a live Atlas/Hermes routing dependency.
  • Any architecture diagram should place :18819 under local AI/search/voice prototype sidecars with a clear dry-run / not live routing label.
  • Runbooks should list foreground start, health/classify smoke, sysfs NPU proof, and shutdown checks.
  • Service catalog entries should state not installed/enabled until Will approves persistent service enablement.
  • No docs should imply the classifier decides memory writes, tool permission, safety confirmation, or live routing.

Relevant docs inventory:

  • docs/swarm-infrastructure.md
  • docs/swarm-infrastructure.html
  • docs/diagram-maintenance.md
  • swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md
  • swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md

No-go / defer criteria

Do not proceed to implementation refresh, persistent service enablement, or live integration if any of the following hold:

  • :18817 embeddings is unavailable and no approved NPU embedding fallback exists.
  • /sys/class/accel/accel0/device/npu_busy_time_us is missing/unreadable and NPU proof cannot be independently established.
  • Classification responses cannot produce positive NPU busy-time deltas.
  • :18819 is already occupied by an unknown or live service.
  • Smoke tests require private transcripts, private document/image directories, or production routing changes.
  • Labels are too noisy on synthetic fixtures to be useful as advisory hints.
  • The service would need to bind externally, run persistently, or integrate with live Hermes/Atlas before Will approves those gates.
  • Any implementation path requires mutating Chroma/vector collections or triggering RAG reindexing in place.

Implementation handoff notes

Recommended next engineer actions:

  1. Verify or refresh openvino-classifier-npu/router_classifier.py to match this contract.
  2. Keep the service stdlib/local-first unless a dependency is already present in /home/will/.venvs/npu.
  3. Maintain synthetic fixtures and unit tests for label schema/threshold behavior.
  4. Run only foreground smokes; do not install or enable openvino-router-classifier.service.
  5. Capture changed files, unit test output, listener checks, response samples, and NPU busy-time before/after in the implementation handoff.