Files
swarm-master/openvino-doc-image-triage-npu/SPEC.md
T
2026-06-04 13:07:51 -07:00

8.6 KiB

OpenVINO NPU document/image triage spec

Status: CLI-first prototype specification; not a live Atlas/Hermes integration.

Safety stance

  • Default workflow is local CLI execution against explicitly named files.
  • Optional HTTP is disabled unless a human starts it, is constrained to loopback (127.0.0.1, ::1, or localhost), and is intended for 127.0.0.1:18829 only.
  • No persistent systemd unit, Docker service, gateway hook, Atlas/Hermes route, RAG route, Chroma/vector collection mutation, or in-place reindexing is part of this spec.
  • Smoke data must be synthetic/non-private only. Do not point this tool at Will's private document, image, screenshot, Downloads, Desktop, Obsidian, or photo-library directories without explicit approval.
  • NPU claims require /sys/class/accel/accel0/device/npu_busy_time_us before/after deltas. HTTP 200, JSON output, or model-load success alone is not NPU proof.

Recommended v1 runtime:

  • File intake, hashing, MIME/extension checks, image/PDF rendering, sidecar/native PDF text extraction, metadata extraction, and category fallback: local Python CPU path using Pillow plus optional pypdf/pypdfium2.
  • Needs-attention semantic check: reuse the live localhost OpenVINO embeddings service on 127.0.0.1:18817, currently bge-base-en-v1.5-int8-ov, and verify each embedding call with npu_busy_time_us deltas.
  • Category classification in v1: CPU rule fallback, explicitly reported as not an NPU image model.

Why this is the recommended v1:

  • It avoids private-data exposure: no external upload path and no broader local file scanning.
  • It avoids collection/routing risk by using the existing embeddings API as a stateless feature extractor only; it does not write to RAG or Chroma.
  • It gives a real NPU verification hook for the semantic stage without overclaiming that OCR/image classification are NPU-backed.
  • It keeps the prototype useful even when optional PDF dependencies or the embeddings service are unavailable: it can fall back to CPU-only metadata/rule output and mark NPU verification false.

Deferred model work:

  • NPU image category classifier: defer until a static-shape OpenVINO IR image model such as MobileNet/EfficientNet/ResNet is selected, calibrated for the label set, and smoke-tested with busy-time deltas.
  • NPU OCR/VLM: defer; OCR remains local CPU text plumbing in v1.

CLI contract

Command:

cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python triage.py \
  --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu \
  --max-pages 3 \
  --pretty \
  samples/synthetic_invoice.png samples/synthetic_invoice.pdf

Inputs:

  • Positional paths: one or more local image/PDF paths.
  • --allowed-root ROOT: may repeat; every requested path must resolve under one of these roots. Default is current directory.
  • --max-pages N: maximum rendered/extracted PDF pages; default 3.
  • --no-embeddings: disables the localhost :18817 embedding/NPU check and reports CPU fallback/no text.
  • --dry-run: skip image/PDF rendering while still checking intake/hash/text/metadata where available.
  • --include-ocr-text: include raw extracted/sidecar text in this single response only; off by default.
  • --include-full-path: include resolved full paths; off by default.
  • --pretty: pretty-print JSON.

Output:

  • Batch JSON: { "ok": bool, "files": [...], "generated_at": "..." }.
  • Per file result includes file_id as sha256:<digest>, source_path_basename, media type, file size, pages, classification, needs-attention result, metadata counts/flags, privacy flags, and processing-device summary.
  • Raw OCR/text and full paths are omitted unless explicitly requested.
  • NPU evidence is per embedding call: used, verified_npu, npu_busy_delta_us, endpoint, and wall time.

Exit behavior:

  • Exit 0 when all files triage successfully.
  • Exit 2 when one or more files fail policy/intake/processing checks.

Optional localhost HTTP contract

HTTP is optional and not enabled by this spec. If explicitly started for a smoke or local demo, use localhost and port 18829:

cd /home/will/lab/swarm/openvino-doc-image-triage-npu
ss -ltnp | grep ':18829\b' || true
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"

Endpoints:

  • GET /healthz or /health: service name, bind policy, configured allowed roots, privacy flags, and current npu_busy_time_us.
  • GET /models: reports v1 stages and whether each is CPU or NPU-backed.
  • POST /triage: { "path": "/local/file", "options": {...} } -> { "ok": true, "result": ... }.
  • POST /triage/batch: { "paths": ["/local/file"], "options": {...} } -> batch JSON.

HTTP privacy/policy rules:

  • Server startup --allowed-root is the outer allowlist.
  • Request options.allowed_roots may narrow that allowlist but must not widen it.
  • Request options.embedding_url may only target the configured local loopback embeddings route http://127.0.0.1:18817/v1/embeddings (or localhost equivalent); external or alternate endpoints are rejected.
  • Request bodies and raw text are not logged by the stdlib handler.
  • Stop the temporary server after the smoke/demo.

Synthetic smoke-test plan

Use only generated fixtures under the prototype directory:

cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python make_samples.py
/home/will/.venvs/npu/bin/python tests/smoke_test.py

Expected smoke coverage:

  • Creates synthetic invoice/receipt/form-like image/PDF fixtures.
  • Runs CLI triage against the synthetic invoice image/PDF under an explicit allowed root.
  • Asserts privacy flags (external_uploads: false, no full path by default).
  • Asserts invoice category/needs-attention behavior on synthetic text.
  • Starts a temporary localhost HTTP server on a preflighted free ephemeral port, calls /healthz and /triage, verifies no full path leakage, rejects attempts to widen allowed roots, rejects external embedding URLs, and verifies non-loopback binds are rejected.
  • Terminates the temporary server.

The smoke port in tests should stay OS-assigned ephemeral/non-live to avoid claiming 18829 as a persistent service.

NPU busy-time verification plan

For every test that claims NPU use:

  1. Read /sys/class/accel/accel0/device/npu_busy_time_us before the operation.
  2. Perform an operation that should call the live embeddings service on 127.0.0.1:18817 with non-empty synthetic text.
  3. Read npu_busy_time_us after the operation.
  4. Require both:
    • the per-result embedding object reports used: true, verified_npu: true, and npu_busy_delta_us > 0; and
    • the outer before/after sysfs value increased.
  5. If sysfs is missing or :18817 is unavailable, do not claim NPU success; report CPU fallback / embedding unavailable and keep the smoke result honest.

Docs and diagram implications

  • Service maps should list document/image triage as CLI-first and optional prototype 127.0.0.1:18829, not live unless explicitly started.
  • Diagrams must not draw live Atlas/Hermes/gateway/RAG routing to this triage lane.
  • If shown with other candidate sidecars, label it separately from live services: live baseline remains RAG :18810, Whisper NPU :18816, and embeddings :18817; prototype sidecars are reranker :18818, classifier/router :18819, GenAI worker :18820, and optional doc/image triage :18829.
  • Runbooks should include CLI smoke, localhost listener checks, busy-time delta verification, and server shutdown instructions.
  • Documentation should state CPU vs NPU stages explicitly so the prototype does not imply NPU OCR or NPU image classification.

No-go / defer criteria

Do not proceed to implementation, live integration, or persistent service enablement if any of these are true:

  • Will has not explicitly approved live routing or persistent service enablement.
  • The requested source path is a private document/image directory or broad home-directory scan rather than synthetic fixtures or an explicitly approved narrow root.
  • The workflow would mutate Obsidian, RAG, Chroma/vector collections, or reindex in place.
  • The optional server would need to bind anywhere other than localhost.
  • NPU busy-time does not increase for an operation being described as NPU-backed.
  • Raw OCR text or full paths would be logged, uploaded, stored durably, or returned without explicit request.
  • PDF/image dependencies are missing and the task requires rendered page analysis rather than metadata/text-only fallback.
  • A future image classifier/OCR/VLM model has not been selected, converted/quantized to OpenVINO, calibrated for the task, and verified on synthetic fixtures with busy-time deltas.