diff --git a/openvino-doc-image-triage-npu/README.md b/openvino-doc-image-triage-npu/README.md index 56890db..f21d7be 100644 --- a/openvino-doc-image-triage-npu/README.md +++ b/openvino-doc-image-triage-npu/README.md @@ -1,7 +1,8 @@ # OpenVINO NPU document/image triage prototype -Local-only prototype for triaging screenshots, photos/scans, and PDF page images. +Local-only, CLI-first prototype for triaging screenshots, photos/scans, and PDF page images. It returns structured JSON metadata and explicitly reports CPU vs NPU stages. +Optional HTTP is a localhost-only prototype on `127.0.0.1:18829` when explicitly started; it is not a live Atlas/Hermes/RAG integration. Location: `/home/will/lab/swarm/openvino-doc-image-triage-npu/` @@ -13,6 +14,8 @@ Location: `/home/will/lab/swarm/openvino-doc-image-triage-npu/` - Full source paths are omitted by default; responses include basename and SHA-256. - Allowed roots are enforced for CLI/server requests. - This prototype does not mutate Obsidian, RAG, Chroma, vector collections, routing, or gateway services. +- Do not process broad private document/image directories; use generated synthetic fixtures unless Will explicitly approves a narrow source root. +- See `SPEC.md` for the full CLI contract, smoke-test plan, NPU verification plan, docs implications, and no-go/defer criteria. ## CPU vs NPU stages @@ -88,25 +91,25 @@ Include OCR/sidecar text in a single response only when explicitly requested: ## HTTP usage -Check that port 18820 is free first: +HTTP is optional and not enabled by default. Check that port 18829 is free first: ```bash -ss -ltnp | grep ':18820\b' || true +ss -ltnp | grep ':18829\b' || true ``` Start local-only server: ```bash cd /home/will/lab/swarm/openvino-doc-image-triage-npu -/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18820 --allowed-root "$PWD" +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" ``` Call it: ```bash -curl -sS http://127.0.0.1:18820/healthz | jq -curl -sS http://127.0.0.1:18820/models | jq -curl -sS -X POST http://127.0.0.1:18820/triage \ +curl -sS http://127.0.0.1:18829/healthz | jq +curl -sS http://127.0.0.1:18829/models | jq +curl -sS -X POST http://127.0.0.1:18829/triage \ -H 'Content-Type: application/json' \ -d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq ``` diff --git a/openvino-doc-image-triage-npu/SPEC.md b/openvino-doc-image-triage-npu/SPEC.md new file mode 100644 index 0000000..07885e0 --- /dev/null +++ b/openvino-doc-image-triage-npu/SPEC.md @@ -0,0 +1,146 @@ +# OpenVINO NPU document/image triage spec + +Status: CLI-first prototype specification; not a live Atlas/Hermes integration. + +## Safety stance + +- Default workflow is local CLI execution against explicitly named files. +- Optional HTTP is disabled unless a human starts it, binds to localhost, and is intended for `127.0.0.1:18829` only. +- No persistent systemd unit, Docker service, gateway hook, Atlas/Hermes route, RAG route, Chroma/vector collection mutation, or in-place reindexing is part of this spec. +- Smoke data must be synthetic/non-private only. Do not point this tool at Will's private document, image, screenshot, Downloads, Desktop, Obsidian, or photo-library directories without explicit approval. +- NPU claims require `/sys/class/accel/accel0/device/npu_busy_time_us` before/after deltas. HTTP 200, JSON output, or model-load success alone is not NPU proof. + +## Recommended model/runtime + +Recommended v1 runtime: + +- File intake, hashing, MIME/extension checks, image/PDF rendering, sidecar/native PDF text extraction, metadata extraction, and category fallback: local Python CPU path using Pillow plus optional `pypdf`/`pypdfium2`. +- Needs-attention semantic check: reuse the live localhost OpenVINO embeddings service on `127.0.0.1:18817`, currently `bge-base-en-v1.5-int8-ov`, and verify each embedding call with `npu_busy_time_us` deltas. +- Category classification in v1: CPU rule fallback, explicitly reported as not an NPU image model. + +Why this is the recommended v1: + +- It avoids private-data exposure: no external upload path and no broader local file scanning. +- It avoids collection/routing risk by using the existing embeddings API as a stateless feature extractor only; it does not write to RAG or Chroma. +- It gives a real NPU verification hook for the semantic stage without overclaiming that OCR/image classification are NPU-backed. +- It keeps the prototype useful even when optional PDF dependencies or the embeddings service are unavailable: it can fall back to CPU-only metadata/rule output and mark NPU verification false. + +Deferred model work: + +- NPU image category classifier: defer until a static-shape OpenVINO IR image model such as MobileNet/EfficientNet/ResNet is selected, calibrated for the label set, and smoke-tested with busy-time deltas. +- NPU OCR/VLM: defer; OCR remains local CPU text plumbing in v1. + +## CLI contract + +Command: + +```bash +cd /home/will/lab/swarm/openvino-doc-image-triage-npu +/home/will/.venvs/npu/bin/python triage.py \ + --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu \ + --max-pages 3 \ + --pretty \ + samples/synthetic_invoice.png samples/synthetic_invoice.pdf +``` + +Inputs: + +- Positional `paths`: one or more local image/PDF paths. +- `--allowed-root ROOT`: may repeat; every requested path must resolve under one of these roots. Default is current directory. +- `--max-pages N`: maximum rendered/extracted PDF pages; default 3. +- `--no-embeddings`: disables the localhost `:18817` embedding/NPU check and reports CPU fallback/no text. +- `--dry-run`: skip image/PDF rendering while still checking intake/hash/text/metadata where available. +- `--include-ocr-text`: include raw extracted/sidecar text in this single response only; off by default. +- `--include-full-path`: include resolved full paths; off by default. +- `--pretty`: pretty-print JSON. + +Output: + +- Batch JSON: `{ "ok": bool, "files": [...], "generated_at": "..." }`. +- Per file result includes `file_id` as `sha256:`, `source_path_basename`, media type, file size, pages, classification, needs-attention result, metadata counts/flags, privacy flags, and processing-device summary. +- Raw OCR/text and full paths are omitted unless explicitly requested. +- NPU evidence is per embedding call: `used`, `verified_npu`, `npu_busy_delta_us`, endpoint, and wall time. + +Exit behavior: + +- Exit 0 when all files triage successfully. +- Exit 2 when one or more files fail policy/intake/processing checks. + +## Optional localhost HTTP contract + +HTTP is optional and not enabled by this spec. If explicitly started for a smoke or local demo, use localhost and port 18829: + +```bash +cd /home/will/lab/swarm/openvino-doc-image-triage-npu +ss -ltnp | grep ':18829\b' || true +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" +``` + +Endpoints: + +- `GET /healthz` or `/health`: service name, bind policy, configured allowed roots, privacy flags, and current `npu_busy_time_us`. +- `GET /models`: reports v1 stages and whether each is CPU or NPU-backed. +- `POST /triage`: `{ "path": "/local/file", "options": {...} }` -> `{ "ok": true, "result": ... }`. +- `POST /triage/batch`: `{ "paths": ["/local/file"], "options": {...} }` -> batch JSON. + +HTTP privacy/policy rules: + +- Server startup `--allowed-root` is the outer allowlist. +- Request `options.allowed_roots` may narrow that allowlist but must not widen it. +- Request `options.embedding_url` may only target the configured local loopback embeddings route `http://127.0.0.1:18817/v1/embeddings` (or localhost equivalent); external or alternate endpoints are rejected. +- Request bodies and raw text are not logged by the stdlib handler. +- Stop the temporary server after the smoke/demo. + +## Synthetic smoke-test plan + +Use only generated fixtures under the prototype directory: + +```bash +cd /home/will/lab/swarm/openvino-doc-image-triage-npu +/home/will/.venvs/npu/bin/python make_samples.py +/home/will/.venvs/npu/bin/python tests/smoke_test.py +``` + +Expected smoke coverage: + +- Creates synthetic invoice/receipt/form-like image/PDF fixtures. +- Runs CLI triage against the synthetic invoice image/PDF under an explicit allowed root. +- Asserts privacy flags (`external_uploads: false`, no full path by default). +- Asserts invoice category/needs-attention behavior on synthetic text. +- Starts a temporary localhost HTTP server on an ephemeral smoke port, calls `/healthz` and `/triage`, verifies no full path leakage, rejects attempts to widen allowed roots, and rejects external embedding URLs. +- Terminates the temporary server. + +The smoke port in tests should stay ephemeral/non-live (currently `18828`) to avoid claiming `18829` as a persistent service. + +## NPU busy-time verification plan + +For every test that claims NPU use: + +1. Read `/sys/class/accel/accel0/device/npu_busy_time_us` before the operation. +2. Perform an operation that should call the live embeddings service on `127.0.0.1:18817` with non-empty synthetic text. +3. Read `npu_busy_time_us` after the operation. +4. Require both: + - the per-result embedding object reports `used: true`, `verified_npu: true`, and `npu_busy_delta_us > 0`; and + - the outer before/after sysfs value increased. +5. If sysfs is missing or `:18817` is unavailable, do not claim NPU success; report CPU fallback / embedding unavailable and keep the smoke result honest. + +## Docs and diagram implications + +- Service maps should list document/image triage as CLI-first and optional prototype `127.0.0.1:18829`, not live unless explicitly started. +- Diagrams must not draw live Atlas/Hermes/gateway/RAG routing to this triage lane. +- If shown with other candidate sidecars, label it separately from live services: live baseline remains RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; prototype sidecars are reranker `:18818`, classifier/router `:18819`, GenAI worker `:18820`, and optional doc/image triage `:18829`. +- Runbooks should include CLI smoke, localhost listener checks, busy-time delta verification, and server shutdown instructions. +- Documentation should state CPU vs NPU stages explicitly so the prototype does not imply NPU OCR or NPU image classification. + +## No-go / defer criteria + +Do not proceed to implementation, live integration, or persistent service enablement if any of these are true: + +- Will has not explicitly approved live routing or persistent service enablement. +- The requested source path is a private document/image directory or broad home-directory scan rather than synthetic fixtures or an explicitly approved narrow root. +- The workflow would mutate Obsidian, RAG, Chroma/vector collections, or reindex in place. +- The optional server would need to bind anywhere other than localhost. +- NPU busy-time does not increase for an operation being described as NPU-backed. +- Raw OCR text or full paths would be logged, uploaded, stored durably, or returned without explicit request. +- PDF/image dependencies are missing and the task requires rendered page analysis rather than metadata/text-only fallback. +- A future image classifier/OCR/VLM model has not been selected, converted/quantized to OpenVINO, calibrated for the task, and verified on synthetic fixtures with busy-time deltas. diff --git a/openvino-doc-image-triage-npu/server.py b/openvino-doc-image-triage-npu/server.py index e95b6a1..96e62b0 100644 --- a/openvino-doc-image-triage-npu/server.py +++ b/openvino-doc-image-triage-npu/server.py @@ -163,7 +163,7 @@ class Handler(BaseHTTPRequestHandler): def main() -> int: parser = argparse.ArgumentParser(description="Local-only doc/image triage HTTP server") parser.add_argument("--host", default=os.environ.get("DOC_IMAGE_TRIAGE_HOST", "127.0.0.1")) - parser.add_argument("--port", type=int, default=int(os.environ.get("DOC_IMAGE_TRIAGE_PORT", "18820"))) + parser.add_argument("--port", type=int, default=int(os.environ.get("DOC_IMAGE_TRIAGE_PORT", "18829"))) parser.add_argument("--allowed-root", action="append", default=[], help="allowed local root; may repeat") args = parser.parse_args() roots = [Path(p).expanduser().resolve() for p in args.allowed_root] or [Path.cwd().resolve()]