docs: specify OpenVINO doc image triage prototype

This commit is contained in:
Atlas
2026-06-04 12:06:08 -07:00
parent 5b01b1bd11
commit 6b1cae016c
3 changed files with 157 additions and 8 deletions
+10 -7
View File
@@ -1,7 +1,8 @@
# OpenVINO NPU document/image triage prototype # OpenVINO NPU document/image triage prototype
Local-only prototype for triaging screenshots, photos/scans, and PDF page images. Local-only, CLI-first prototype for triaging screenshots, photos/scans, and PDF page images.
It returns structured JSON metadata and explicitly reports CPU vs NPU stages. It returns structured JSON metadata and explicitly reports CPU vs NPU stages.
Optional HTTP is a localhost-only prototype on `127.0.0.1:18829` when explicitly started; it is not a live Atlas/Hermes/RAG integration.
Location: `/home/will/lab/swarm/openvino-doc-image-triage-npu/` Location: `/home/will/lab/swarm/openvino-doc-image-triage-npu/`
@@ -13,6 +14,8 @@ Location: `/home/will/lab/swarm/openvino-doc-image-triage-npu/`
- Full source paths are omitted by default; responses include basename and SHA-256. - Full source paths are omitted by default; responses include basename and SHA-256.
- Allowed roots are enforced for CLI/server requests. - Allowed roots are enforced for CLI/server requests.
- This prototype does not mutate Obsidian, RAG, Chroma, vector collections, routing, or gateway services. - This prototype does not mutate Obsidian, RAG, Chroma, vector collections, routing, or gateway services.
- Do not process broad private document/image directories; use generated synthetic fixtures unless Will explicitly approves a narrow source root.
- See `SPEC.md` for the full CLI contract, smoke-test plan, NPU verification plan, docs implications, and no-go/defer criteria.
## CPU vs NPU stages ## CPU vs NPU stages
@@ -88,25 +91,25 @@ Include OCR/sidecar text in a single response only when explicitly requested:
## HTTP usage ## HTTP usage
Check that port 18820 is free first: HTTP is optional and not enabled by default. Check that port 18829 is free first:
```bash ```bash
ss -ltnp | grep ':18820\b' || true ss -ltnp | grep ':18829\b' || true
``` ```
Start local-only server: Start local-only server:
```bash ```bash
cd /home/will/lab/swarm/openvino-doc-image-triage-npu cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18820 --allowed-root "$PWD" /home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
``` ```
Call it: Call it:
```bash ```bash
curl -sS http://127.0.0.1:18820/healthz | jq curl -sS http://127.0.0.1:18829/healthz | jq
curl -sS http://127.0.0.1:18820/models | jq curl -sS http://127.0.0.1:18829/models | jq
curl -sS -X POST http://127.0.0.1:18820/triage \ curl -sS -X POST http://127.0.0.1:18829/triage \
-H 'Content-Type: application/json' \ -H 'Content-Type: application/json' \
-d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq -d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","options":{"allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}}' | jq
``` ```
+146
View File
@@ -0,0 +1,146 @@
# OpenVINO NPU document/image triage spec
Status: CLI-first prototype specification; not a live Atlas/Hermes integration.
## Safety stance
- Default workflow is local CLI execution against explicitly named files.
- Optional HTTP is disabled unless a human starts it, binds to localhost, and is intended for `127.0.0.1:18829` only.
- No persistent systemd unit, Docker service, gateway hook, Atlas/Hermes route, RAG route, Chroma/vector collection mutation, or in-place reindexing is part of this spec.
- Smoke data must be synthetic/non-private only. Do not point this tool at Will's private document, image, screenshot, Downloads, Desktop, Obsidian, or photo-library directories without explicit approval.
- NPU claims require `/sys/class/accel/accel0/device/npu_busy_time_us` before/after deltas. HTTP 200, JSON output, or model-load success alone is not NPU proof.
## Recommended model/runtime
Recommended v1 runtime:
- File intake, hashing, MIME/extension checks, image/PDF rendering, sidecar/native PDF text extraction, metadata extraction, and category fallback: local Python CPU path using Pillow plus optional `pypdf`/`pypdfium2`.
- Needs-attention semantic check: reuse the live localhost OpenVINO embeddings service on `127.0.0.1:18817`, currently `bge-base-en-v1.5-int8-ov`, and verify each embedding call with `npu_busy_time_us` deltas.
- Category classification in v1: CPU rule fallback, explicitly reported as not an NPU image model.
Why this is the recommended v1:
- It avoids private-data exposure: no external upload path and no broader local file scanning.
- It avoids collection/routing risk by using the existing embeddings API as a stateless feature extractor only; it does not write to RAG or Chroma.
- It gives a real NPU verification hook for the semantic stage without overclaiming that OCR/image classification are NPU-backed.
- It keeps the prototype useful even when optional PDF dependencies or the embeddings service are unavailable: it can fall back to CPU-only metadata/rule output and mark NPU verification false.
Deferred model work:
- NPU image category classifier: defer until a static-shape OpenVINO IR image model such as MobileNet/EfficientNet/ResNet is selected, calibrated for the label set, and smoke-tested with busy-time deltas.
- NPU OCR/VLM: defer; OCR remains local CPU text plumbing in v1.
## CLI contract
Command:
```bash
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python triage.py \
--allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu \
--max-pages 3 \
--pretty \
samples/synthetic_invoice.png samples/synthetic_invoice.pdf
```
Inputs:
- Positional `paths`: one or more local image/PDF paths.
- `--allowed-root ROOT`: may repeat; every requested path must resolve under one of these roots. Default is current directory.
- `--max-pages N`: maximum rendered/extracted PDF pages; default 3.
- `--no-embeddings`: disables the localhost `:18817` embedding/NPU check and reports CPU fallback/no text.
- `--dry-run`: skip image/PDF rendering while still checking intake/hash/text/metadata where available.
- `--include-ocr-text`: include raw extracted/sidecar text in this single response only; off by default.
- `--include-full-path`: include resolved full paths; off by default.
- `--pretty`: pretty-print JSON.
Output:
- Batch JSON: `{ "ok": bool, "files": [...], "generated_at": "..." }`.
- Per file result includes `file_id` as `sha256:<digest>`, `source_path_basename`, media type, file size, pages, classification, needs-attention result, metadata counts/flags, privacy flags, and processing-device summary.
- Raw OCR/text and full paths are omitted unless explicitly requested.
- NPU evidence is per embedding call: `used`, `verified_npu`, `npu_busy_delta_us`, endpoint, and wall time.
Exit behavior:
- Exit 0 when all files triage successfully.
- Exit 2 when one or more files fail policy/intake/processing checks.
## Optional localhost HTTP contract
HTTP is optional and not enabled by this spec. If explicitly started for a smoke or local demo, use localhost and port 18829:
```bash
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
ss -ltnp | grep ':18829\b' || true
/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD"
```
Endpoints:
- `GET /healthz` or `/health`: service name, bind policy, configured allowed roots, privacy flags, and current `npu_busy_time_us`.
- `GET /models`: reports v1 stages and whether each is CPU or NPU-backed.
- `POST /triage`: `{ "path": "/local/file", "options": {...} }` -> `{ "ok": true, "result": ... }`.
- `POST /triage/batch`: `{ "paths": ["/local/file"], "options": {...} }` -> batch JSON.
HTTP privacy/policy rules:
- Server startup `--allowed-root` is the outer allowlist.
- Request `options.allowed_roots` may narrow that allowlist but must not widen it.
- Request `options.embedding_url` may only target the configured local loopback embeddings route `http://127.0.0.1:18817/v1/embeddings` (or localhost equivalent); external or alternate endpoints are rejected.
- Request bodies and raw text are not logged by the stdlib handler.
- Stop the temporary server after the smoke/demo.
## Synthetic smoke-test plan
Use only generated fixtures under the prototype directory:
```bash
cd /home/will/lab/swarm/openvino-doc-image-triage-npu
/home/will/.venvs/npu/bin/python make_samples.py
/home/will/.venvs/npu/bin/python tests/smoke_test.py
```
Expected smoke coverage:
- Creates synthetic invoice/receipt/form-like image/PDF fixtures.
- Runs CLI triage against the synthetic invoice image/PDF under an explicit allowed root.
- Asserts privacy flags (`external_uploads: false`, no full path by default).
- Asserts invoice category/needs-attention behavior on synthetic text.
- Starts a temporary localhost HTTP server on an ephemeral smoke port, calls `/healthz` and `/triage`, verifies no full path leakage, rejects attempts to widen allowed roots, and rejects external embedding URLs.
- Terminates the temporary server.
The smoke port in tests should stay ephemeral/non-live (currently `18828`) to avoid claiming `18829` as a persistent service.
## NPU busy-time verification plan
For every test that claims NPU use:
1. Read `/sys/class/accel/accel0/device/npu_busy_time_us` before the operation.
2. Perform an operation that should call the live embeddings service on `127.0.0.1:18817` with non-empty synthetic text.
3. Read `npu_busy_time_us` after the operation.
4. Require both:
- the per-result embedding object reports `used: true`, `verified_npu: true`, and `npu_busy_delta_us > 0`; and
- the outer before/after sysfs value increased.
5. If sysfs is missing or `:18817` is unavailable, do not claim NPU success; report CPU fallback / embedding unavailable and keep the smoke result honest.
## Docs and diagram implications
- Service maps should list document/image triage as CLI-first and optional prototype `127.0.0.1:18829`, not live unless explicitly started.
- Diagrams must not draw live Atlas/Hermes/gateway/RAG routing to this triage lane.
- If shown with other candidate sidecars, label it separately from live services: live baseline remains RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; prototype sidecars are reranker `:18818`, classifier/router `:18819`, GenAI worker `:18820`, and optional doc/image triage `:18829`.
- Runbooks should include CLI smoke, localhost listener checks, busy-time delta verification, and server shutdown instructions.
- Documentation should state CPU vs NPU stages explicitly so the prototype does not imply NPU OCR or NPU image classification.
## No-go / defer criteria
Do not proceed to implementation, live integration, or persistent service enablement if any of these are true:
- Will has not explicitly approved live routing or persistent service enablement.
- The requested source path is a private document/image directory or broad home-directory scan rather than synthetic fixtures or an explicitly approved narrow root.
- The workflow would mutate Obsidian, RAG, Chroma/vector collections, or reindex in place.
- The optional server would need to bind anywhere other than localhost.
- NPU busy-time does not increase for an operation being described as NPU-backed.
- Raw OCR text or full paths would be logged, uploaded, stored durably, or returned without explicit request.
- PDF/image dependencies are missing and the task requires rendered page analysis rather than metadata/text-only fallback.
- A future image classifier/OCR/VLM model has not been selected, converted/quantized to OpenVINO, calibrated for the task, and verified on synthetic fixtures with busy-time deltas.
+1 -1
View File
@@ -163,7 +163,7 @@ class Handler(BaseHTTPRequestHandler):
def main() -> int: def main() -> int:
parser = argparse.ArgumentParser(description="Local-only doc/image triage HTTP server") parser = argparse.ArgumentParser(description="Local-only doc/image triage HTTP server")
parser.add_argument("--host", default=os.environ.get("DOC_IMAGE_TRIAGE_HOST", "127.0.0.1")) parser.add_argument("--host", default=os.environ.get("DOC_IMAGE_TRIAGE_HOST", "127.0.0.1"))
parser.add_argument("--port", type=int, default=int(os.environ.get("DOC_IMAGE_TRIAGE_PORT", "18820"))) parser.add_argument("--port", type=int, default=int(os.environ.get("DOC_IMAGE_TRIAGE_PORT", "18829")))
parser.add_argument("--allowed-root", action="append", default=[], help="allowed local root; may repeat") parser.add_argument("--allowed-root", action="append", default=[], help="allowed local root; may repeat")
args = parser.parse_args() args = parser.parse_args()
roots = [Path(p).expanduser().resolve() for p in args.allowed_root] or [Path.cwd().resolve()] roots = [Path(p).expanduser().resolve() for p in args.allowed_root] or [Path.cwd().resolve()]