147 lines
8.6 KiB
Markdown
147 lines
8.6 KiB
Markdown
# OpenVINO NPU document/image triage spec
|
|
|
|
Status: CLI-first prototype specification; not a live Atlas/Hermes integration.
|
|
|
|
## Safety stance
|
|
|
|
- Default workflow is local CLI execution against explicitly named files.
|
|
- Optional HTTP is disabled unless a human starts it, is constrained to loopback (`127.0.0.1`, `::1`, or `localhost`), and is intended for `127.0.0.1:18829` only.
|
|
- No persistent systemd unit, Docker service, gateway hook, Atlas/Hermes route, RAG route, Chroma/vector collection mutation, or in-place reindexing is part of this spec.
|
|
- Smoke data must be synthetic/non-private only. Do not point this tool at Will's private document, image, screenshot, Downloads, Desktop, Obsidian, or photo-library directories without explicit approval.
|
|
- NPU claims require `/sys/class/accel/accel0/device/npu_busy_time_us` before/after deltas. HTTP 200, JSON output, or model-load success alone is not NPU proof.
|
|
|
|
## Recommended model/runtime
|
|
|
|
Recommended v1 runtime:
|
|
|
|
- File intake, hashing, MIME/extension checks, image/PDF rendering, sidecar/native PDF text extraction, metadata extraction, and category fallback: local Python CPU path using Pillow plus optional `pypdf`/`pypdfium2`.
|
|
- Needs-attention semantic check: reuse the live localhost OpenVINO embeddings service on `127.0.0.1:18817`, currently `bge-base-en-v1.5-int8-ov`, and verify each embedding call with `npu_busy_time_us` deltas.
|
|
- Category classification in v1: CPU rule fallback, explicitly reported as not an NPU image model.
|
|
|
|
Why this is the recommended v1:
|
|
|
|
- It avoids private-data exposure: no external upload path and no broader local file scanning.
|
|
- It avoids collection/routing risk by using the existing embeddings API as a stateless feature extractor only; it does not write to RAG or Chroma.
|
|
- It gives a real NPU verification hook for the semantic stage without overclaiming that OCR/image classification are NPU-backed.
|
|
- It keeps the prototype useful even when optional PDF dependencies or the embeddings service are unavailable: it can fall back to CPU-only metadata/rule output and mark NPU verification false.
|
|
|
|
Deferred model work:
|
|
|
|
- NPU image category classifier: defer until a static-shape OpenVINO IR image model such as MobileNet/EfficientNet/ResNet is selected, calibrated for the label set, and smoke-tested with busy-time deltas.
|
|
- NPU OCR/VLM: defer; OCR remains local CPU text plumbing in v1.
|
|
|
|
## CLI contract
|
|
|
|
Command:
|
|
|
|
```bash
|
|
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
|
|
/home/will/.venvs/npu/bin/python triage.py \
|
|
--allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu \
|
|
--max-pages 3 \
|
|
--pretty \
|
|
samples/synthetic_invoice.png samples/synthetic_invoice.pdf
|
|
```
|
|
|
|
Inputs:
|
|
|
|
- Positional `paths`: one or more local image/PDF paths.
|
|
- `--allowed-root ROOT`: may repeat; every requested path must resolve under one of these roots. Default is current directory.
|
|
- `--max-pages N`: maximum rendered/extracted PDF pages; default 3.
|
|
- `--no-embeddings`: disables the localhost `:18817` embedding/NPU check and reports CPU fallback/no text.
|
|
- `--dry-run`: skip image/PDF rendering while still checking intake/hash/text/metadata where available.
|
|
- `--include-ocr-text`: include raw extracted/sidecar text in this single response only; off by default.
|
|
- `--include-full-path`: include resolved full paths; off by default.
|
|
- `--pretty`: pretty-print JSON.
|
|
|
|
Output:
|
|
|
|
- Batch JSON: `{ "ok": bool, "files": [...], "generated_at": "..." }`.
|
|
- Per file result includes `file_id` as `sha256:<digest>`, `source_path_basename`, media type, file size, pages, classification, needs-attention result, metadata counts/flags, privacy flags, and processing-device summary.
|
|
- Raw OCR/text and full paths are omitted unless explicitly requested.
|
|
- NPU evidence is per embedding call: `used`, `verified_npu`, `npu_busy_delta_us`, endpoint, and wall time.
|
|
|
|
Exit behavior:
|
|
|
|
- Exit 0 when all files triage successfully.
|
|
- Exit 2 when one or more files fail policy/intake/processing checks.
|
|
|
|
## Optional localhost HTTP contract
|
|
|
|
HTTP is optional and not enabled by this spec. If explicitly started for a smoke or local demo, use localhost and port 18829:
|
|
|
|
```bash
|
|
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
|
|
ss -ltnp | grep ':18829\b' || true
|
|
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
|
|
```
|
|
|
|
Endpoints:
|
|
|
|
- `GET /healthz` or `/health`: service name, bind policy, configured allowed roots, privacy flags, and current `npu_busy_time_us`.
|
|
- `GET /models`: reports v1 stages and whether each is CPU or NPU-backed.
|
|
- `POST /triage`: `{ "path": "/local/file", "options": {...} }` -> `{ "ok": true, "result": ... }`.
|
|
- `POST /triage/batch`: `{ "paths": ["/local/file"], "options": {...} }` -> batch JSON.
|
|
|
|
HTTP privacy/policy rules:
|
|
|
|
- Server startup `--allowed-root` is the outer allowlist.
|
|
- Request `options.allowed_roots` may narrow that allowlist but must not widen it.
|
|
- Request `options.embedding_url` may only target the configured local loopback embeddings route `http://127.0.0.1:18817/v1/embeddings` (or localhost equivalent); external or alternate endpoints are rejected.
|
|
- Request bodies and raw text are not logged by the stdlib handler.
|
|
- Stop the temporary server after the smoke/demo.
|
|
|
|
## Synthetic smoke-test plan
|
|
|
|
Use only generated fixtures under the prototype directory:
|
|
|
|
```bash
|
|
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
|
|
/home/will/.venvs/npu/bin/python make_samples.py
|
|
/home/will/.venvs/npu/bin/python tests/smoke_test.py
|
|
```
|
|
|
|
Expected smoke coverage:
|
|
|
|
- Creates synthetic invoice/receipt/form-like image/PDF fixtures.
|
|
- Runs CLI triage against the synthetic invoice image/PDF under an explicit allowed root.
|
|
- Asserts privacy flags (`external_uploads: false`, no full path by default).
|
|
- Asserts invoice category/needs-attention behavior on synthetic text.
|
|
- Starts a temporary localhost HTTP server on a preflighted free ephemeral port, calls `/healthz` and `/triage`, verifies no full path leakage, rejects attempts to widen allowed roots, rejects external embedding URLs, and verifies non-loopback binds are rejected.
|
|
- Terminates the temporary server.
|
|
|
|
The smoke port in tests should stay OS-assigned ephemeral/non-live to avoid claiming `18829` as a persistent service.
|
|
|
|
## NPU busy-time verification plan
|
|
|
|
For every test that claims NPU use:
|
|
|
|
1. Read `/sys/class/accel/accel0/device/npu_busy_time_us` before the operation.
|
|
2. Perform an operation that should call the live embeddings service on `127.0.0.1:18817` with non-empty synthetic text.
|
|
3. Read `npu_busy_time_us` after the operation.
|
|
4. Require both:
|
|
- the per-result embedding object reports `used: true`, `verified_npu: true`, and `npu_busy_delta_us > 0`; and
|
|
- the outer before/after sysfs value increased.
|
|
5. If sysfs is missing or `:18817` is unavailable, do not claim NPU success; report CPU fallback / embedding unavailable and keep the smoke result honest.
|
|
|
|
## Docs and diagram implications
|
|
|
|
- Service maps should list document/image triage as CLI-first and optional prototype `127.0.0.1:18829`, not live unless explicitly started.
|
|
- Diagrams must not draw live Atlas/Hermes/gateway/RAG routing to this triage lane.
|
|
- If shown with other candidate sidecars, label it separately from live services: live baseline remains RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; prototype sidecars are reranker `:18818`, classifier/router `:18819`, GenAI worker `:18820`, and optional doc/image triage `:18829`.
|
|
- Runbooks should include CLI smoke, localhost listener checks, busy-time delta verification, and server shutdown instructions.
|
|
- Documentation should state CPU vs NPU stages explicitly so the prototype does not imply NPU OCR or NPU image classification.
|
|
|
|
## No-go / defer criteria
|
|
|
|
Do not proceed to implementation, live integration, or persistent service enablement if any of these are true:
|
|
|
|
- Will has not explicitly approved live routing or persistent service enablement.
|
|
- The requested source path is a private document/image directory or broad home-directory scan rather than synthetic fixtures or an explicitly approved narrow root.
|
|
- The workflow would mutate Obsidian, RAG, Chroma/vector collections, or reindex in place.
|
|
- The optional server would need to bind anywhere other than localhost.
|
|
- NPU busy-time does not increase for an operation being described as NPU-backed.
|
|
- Raw OCR text or full paths would be logged, uploaded, stored durably, or returned without explicit request.
|
|
- PDF/image dependencies are missing and the task requires rendered page analysis rather than metadata/text-only fallback.
|
|
- A future image classifier/OCR/VLM model has not been selected, converted/quantized to OpenVINO, calibrated for the task, and verified on synthetic fixtures with busy-time deltas.
|