feat(npu): add advisory gateway wrapper
This commit is contained in:
@@ -133,6 +133,7 @@ Host/user services:
|
|||||||
- `openvino-router-classifier.service` — `:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only
|
- `openvino-router-classifier.service` — `:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only
|
||||||
- `openvino-genai-npu-worker.service` — `:18820`, local-only bounded GenAI worker for small background generation jobs
|
- `openvino-genai-npu-worker.service` — `:18820`, local-only bounded GenAI worker for small background generation jobs
|
||||||
- `openvino-doc-image-triage.service` — `:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement
|
- `openvino-doc-image-triage.service` — `:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement
|
||||||
|
- `openvino-advisory-gateway.service` — `:18830`, local-only advisory envelope wrapper over classifier, GenAI, and doc/image triage; explicit no-authority contract
|
||||||
|
|
||||||
Local-only OpenVINO NPU sidecars:
|
Local-only OpenVINO NPU sidecars:
|
||||||
|
|
||||||
@@ -142,8 +143,9 @@ Local-only OpenVINO NPU sidecars:
|
|||||||
| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
||||||
| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing |
|
| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing |
|
||||||
| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` |
|
| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` |
|
||||||
|
| `18830` | advisory gateway | live user service; host-local Hermes-accessible wrapper | returns `openvino_advisory_v1` envelopes only; no routing, memory writes, external sends, tool execution, restarts, or process-root broadening from request payloads; n8n bridge access intentionally disabled unless separately approved |
|
||||||
|
|
||||||
These sidecars bind to `127.0.0.1` by default and must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof.
|
These sidecars bind to `127.0.0.1` by default and must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference or service-reported equivalent. HTTP 200 alone is not proof.
|
||||||
|
|
||||||
### 5. Obsidian and RAG
|
### 5. Obsidian and RAG
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,85 @@
|
|||||||
|
# OpenVINO NPU advisory gateway
|
||||||
|
|
||||||
|
Local-only bounded wrapper for the classifier, GenAI worker, and doc/image triage sidecars.
|
||||||
|
|
||||||
|
- HTTP bind: `127.0.0.1:18830` only; Docker/n8n bridge access is intentionally not enabled by default
|
||||||
|
- Service: `openvino-advisory-gateway.service`
|
||||||
|
- Mode: advisory/shadow/draft only
|
||||||
|
- Metadata log: `~/.local/state/openvino-advisory-gateway/events.sqlite`
|
||||||
|
|
||||||
|
## Authority boundary
|
||||||
|
|
||||||
|
Every response includes an explicit authority block:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"may_route": false,
|
||||||
|
"may_write_memory": false,
|
||||||
|
"may_send_external": false,
|
||||||
|
"may_process_private_dirs": false,
|
||||||
|
"may_execute_tools": false,
|
||||||
|
"may_restart_services": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This service may provide hints and drafts. It must not become the live Atlas/Hermes router, memory writer, primary chat model, external sender, tool executor, service restarter, or broad private document processor without a separate approved integration.
|
||||||
|
|
||||||
|
## Endpoints
|
||||||
|
|
||||||
|
```text
|
||||||
|
GET /healthz
|
||||||
|
POST /v1/advisory/classify
|
||||||
|
POST /v1/advisory/generate
|
||||||
|
POST /v1/advisory/triage
|
||||||
|
```
|
||||||
|
|
||||||
|
### Classifier shadow call
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsS http://127.0.0.1:18830/v1/advisory/classify \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{"trace_id":"smoke","text":"Urgent: inspect service health and systemd status."}' | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bounded GenAI draft
|
||||||
|
|
||||||
|
Allowed jobs: `title`, `summary`, `notification`, `memory_candidate`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsS http://127.0.0.1:18830/v1/advisory/generate \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{"job":"title","input":"Summarize a local health check.","max_new_tokens":24}' | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
### Explicit-file doc/image triage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsS http://127.0.0.1:18830/v1/advisory/triage \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{"path":"/home/will/lab/swarm/openvino-doc-image-triage-npu/samples/synthetic_invoice.png","allowed_roots":["/home/will/lab/swarm/openvino-doc-image-triage-npu"]}' | jq .
|
||||||
|
```
|
||||||
|
|
||||||
|
The gateway requires the path to be inside both:
|
||||||
|
|
||||||
|
1. a configured allowed root on the gateway process; and
|
||||||
|
2. the request's explicit `allowed_roots` list, if one is provided.
|
||||||
|
|
||||||
|
Requests cannot broaden the process-configured roots. Do not broaden configured roots to private folders without explicit approval for that root and task.
|
||||||
|
|
||||||
|
## Install / run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
install -m 0644 openvino-advisory-gateway.service ~/.config/systemd/user/openvino-advisory-gateway.service
|
||||||
|
systemctl --user daemon-reload
|
||||||
|
systemctl --user enable --now openvino-advisory-gateway.service
|
||||||
|
systemctl --user status openvino-advisory-gateway.service --no-pager
|
||||||
|
```
|
||||||
|
|
||||||
|
`--allowed-root` may be repeated in the systemd unit when additional non-private fixture/review directories are approved. Keep the service bound to `127.0.0.1` unless Will explicitly approves a Docker-bridge exposure plan.
|
||||||
|
|
||||||
|
## Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /home/will/lab/swarm/openvino-advisory-gateway
|
||||||
|
python -m pytest tests/test_gateway.py -q
|
||||||
|
```
|
||||||
@@ -0,0 +1,350 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Local-only advisory gateway for OpenVINO NPU sidecars.
|
||||||
|
|
||||||
|
This service deliberately returns bounded advisory envelopes. It never routes,
|
||||||
|
writes memory, sends external messages, executes tools, restarts services, or
|
||||||
|
broadens document processing authority. Atlas/Hermes may use these outputs as
|
||||||
|
hints only.
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sqlite3
|
||||||
|
import time
|
||||||
|
import urllib.request
|
||||||
|
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Callable
|
||||||
|
from urllib.parse import urlparse
|
||||||
|
|
||||||
|
HOST = "127.0.0.1"
|
||||||
|
PORT = 18830
|
||||||
|
CLASSIFIER_URL = "http://127.0.0.1:18819/v1/classify"
|
||||||
|
GENAI_URL = "http://127.0.0.1:18820/v1/worker/generate"
|
||||||
|
DOC_TRIAGE_URL = "http://127.0.0.1:18829/triage"
|
||||||
|
DEFAULT_LOG_DB = Path(os.environ.get("NPU_ADVISORY_LOG_DB", "/home/will/.local/state/openvino-advisory-gateway/events.sqlite"))
|
||||||
|
DEFAULT_ALLOWED_ROOT = Path("/home/will/lab/swarm/openvino-doc-image-triage-npu")
|
||||||
|
DEFAULT_ALLOWED_ROOTS = [Path(p) for p in os.environ.get("NPU_ADVISORY_ALLOWED_ROOTS", str(DEFAULT_ALLOWED_ROOT)).split(os.pathsep) if p]
|
||||||
|
ALLOWED_GENAI_JOBS = {"title", "summary", "notification", "memory_candidate"}
|
||||||
|
|
||||||
|
AUTHORITY = {
|
||||||
|
"may_route": False,
|
||||||
|
"may_write_memory": False,
|
||||||
|
"may_send_external": False,
|
||||||
|
"may_process_private_dirs": False,
|
||||||
|
"may_execute_tools": False,
|
||||||
|
"may_restart_services": False,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def sha256_text(text: str) -> str:
|
||||||
|
return hashlib.sha256(text.encode("utf-8")).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def http_post_json(url: str, payload: dict[str, Any], timeout_s: float = 20.0) -> dict[str, Any]:
|
||||||
|
req = urllib.request.Request(url, data=json.dumps(payload).encode("utf-8"), headers={"Content-Type": "application/json"}, method="POST")
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout_s) as resp:
|
||||||
|
return json.loads(resp.read().decode("utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def http_get_json(url: str, timeout_s: float = 8.0) -> dict[str, Any]:
|
||||||
|
with urllib.request.urlopen(url, timeout=timeout_s) as resp:
|
||||||
|
body = resp.read().decode("utf-8")
|
||||||
|
try:
|
||||||
|
return json.loads(body)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
return {"ok": True, "raw_text": body[:120]}
|
||||||
|
|
||||||
|
|
||||||
|
def _npu_delta_from(result: dict[str, Any], fallback: int | None = None) -> int | None:
|
||||||
|
for key in ("npu_busy_delta_us", "sysfs_npu_busy_delta_us"):
|
||||||
|
value = result.get(key)
|
||||||
|
if isinstance(value, int):
|
||||||
|
return value
|
||||||
|
if isinstance(value, float):
|
||||||
|
return int(value)
|
||||||
|
return fallback
|
||||||
|
|
||||||
|
|
||||||
|
def _doc_triage_npu_delta(result: dict[str, Any]) -> int | None:
|
||||||
|
pages = ((result.get("result") or {}).get("pages") or []) if isinstance(result, dict) else []
|
||||||
|
best: int | None = None
|
||||||
|
for page in pages:
|
||||||
|
emb = ((page.get("needs_attention") or {}).get("embedding") or {}) if isinstance(page, dict) else {}
|
||||||
|
delta = emb.get("npu_busy_delta_us")
|
||||||
|
if isinstance(delta, int):
|
||||||
|
best = max(best or 0, delta)
|
||||||
|
return best
|
||||||
|
|
||||||
|
|
||||||
|
def build_envelope(
|
||||||
|
*,
|
||||||
|
service: str,
|
||||||
|
operation: str,
|
||||||
|
result: dict[str, Any],
|
||||||
|
mode: str = "advisory",
|
||||||
|
input_scope: str,
|
||||||
|
npu_busy_delta_us: int | None,
|
||||||
|
trace_id: str | None = None,
|
||||||
|
warnings: list[str] | None = None,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
npu_ok = bool(isinstance(npu_busy_delta_us, int) and npu_busy_delta_us > 0)
|
||||||
|
return {
|
||||||
|
"ok": True,
|
||||||
|
"schema": "openvino_advisory_v1",
|
||||||
|
"service": service,
|
||||||
|
"operation": operation,
|
||||||
|
"mode": mode,
|
||||||
|
"trace_id": trace_id,
|
||||||
|
"input_scope": input_scope,
|
||||||
|
"result": result,
|
||||||
|
"npu_proof": {"required": True, "ok": npu_ok, "npu_busy_delta_us": npu_busy_delta_us},
|
||||||
|
"authority": dict(AUTHORITY),
|
||||||
|
"warnings": warnings or [],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class AdvisoryLogger:
|
||||||
|
def __init__(self, db_path: str | Path = DEFAULT_LOG_DB):
|
||||||
|
self.db_path = Path(db_path)
|
||||||
|
self.db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
self._init()
|
||||||
|
|
||||||
|
def _init(self) -> None:
|
||||||
|
with sqlite3.connect(self.db_path) as con:
|
||||||
|
con.execute(
|
||||||
|
"""
|
||||||
|
CREATE TABLE IF NOT EXISTS advisory_events (
|
||||||
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
|
created_at REAL NOT NULL,
|
||||||
|
service TEXT NOT NULL,
|
||||||
|
operation TEXT NOT NULL,
|
||||||
|
mode TEXT NOT NULL,
|
||||||
|
input_scope TEXT NOT NULL,
|
||||||
|
input_ref TEXT NOT NULL,
|
||||||
|
npu_busy_delta_us INTEGER,
|
||||||
|
ok INTEGER NOT NULL,
|
||||||
|
raw_payload TEXT
|
||||||
|
)
|
||||||
|
"""
|
||||||
|
)
|
||||||
|
|
||||||
|
def log(self, envelope: dict[str, Any], *, input_ref: str) -> None:
|
||||||
|
proof = envelope.get("npu_proof") or {}
|
||||||
|
with sqlite3.connect(self.db_path) as con:
|
||||||
|
con.execute(
|
||||||
|
"""
|
||||||
|
INSERT INTO advisory_events(created_at, service, operation, mode, input_scope, input_ref,
|
||||||
|
npu_busy_delta_us, ok, raw_payload)
|
||||||
|
VALUES (?, ?, ?, ?, ?, ?, ?, ?, NULL)
|
||||||
|
""",
|
||||||
|
(
|
||||||
|
time.time(),
|
||||||
|
str(envelope.get("service")),
|
||||||
|
str(envelope.get("operation")),
|
||||||
|
str(envelope.get("mode")),
|
||||||
|
str(envelope.get("input_scope")),
|
||||||
|
input_ref,
|
||||||
|
proof.get("npu_busy_delta_us"),
|
||||||
|
1 if envelope.get("ok") else 0,
|
||||||
|
),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def classify_text(
|
||||||
|
text: str,
|
||||||
|
*,
|
||||||
|
trace_id: str | None = None,
|
||||||
|
http_post_json: Callable[[str, dict[str, Any], float], dict[str, Any]] = http_post_json,
|
||||||
|
logger: AdvisoryLogger | None = None,
|
||||||
|
timeout_s: float = 20.0,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
if not isinstance(text, str) or not text.strip():
|
||||||
|
raise ValueError("text must be a non-empty string")
|
||||||
|
payload = {"id": trace_id or "advisory", "text": text, "options": {"include_evidence": False, "dry_run": True}}
|
||||||
|
result = http_post_json(CLASSIFIER_URL, payload, timeout_s)
|
||||||
|
envelope = build_envelope(
|
||||||
|
service="classifier",
|
||||||
|
operation="classify",
|
||||||
|
mode="shadow",
|
||||||
|
input_scope="explicit_text",
|
||||||
|
trace_id=trace_id,
|
||||||
|
result={"labels": result.get("labels", {}), "model": result.get("model"), "service_mode": result.get("mode", "dry_run")},
|
||||||
|
npu_busy_delta_us=_npu_delta_from(result),
|
||||||
|
)
|
||||||
|
if logger:
|
||||||
|
logger.log(envelope, input_ref="text:sha256:" + sha256_text(text))
|
||||||
|
return envelope
|
||||||
|
|
||||||
|
|
||||||
|
def generate_bounded(
|
||||||
|
job: str,
|
||||||
|
text: str,
|
||||||
|
*,
|
||||||
|
max_new_tokens: int | None = None,
|
||||||
|
trace_id: str | None = None,
|
||||||
|
http_post_json: Callable[[str, dict[str, Any], float], dict[str, Any]] = http_post_json,
|
||||||
|
logger: AdvisoryLogger | None = None,
|
||||||
|
timeout_s: float = 180.0,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
if job not in ALLOWED_GENAI_JOBS:
|
||||||
|
raise ValueError("unsupported advisory generation job")
|
||||||
|
if not isinstance(text, str) or not text.strip():
|
||||||
|
raise ValueError("input must be a non-empty string")
|
||||||
|
payload: dict[str, Any] = {"job": job, "input": text}
|
||||||
|
if max_new_tokens is not None:
|
||||||
|
payload["max_new_tokens"] = max_new_tokens
|
||||||
|
result = http_post_json(GENAI_URL, payload, timeout_s)
|
||||||
|
envelope = build_envelope(
|
||||||
|
service="genai",
|
||||||
|
operation=f"generate:{job}",
|
||||||
|
mode="draft",
|
||||||
|
input_scope="explicit_text",
|
||||||
|
trace_id=trace_id,
|
||||||
|
result={"draft_text": result.get("text", ""), "json": result.get("json"), "timing_ms": result.get("timing_ms"), "final_authority": False},
|
||||||
|
npu_busy_delta_us=_npu_delta_from(result),
|
||||||
|
)
|
||||||
|
if logger:
|
||||||
|
logger.log(envelope, input_ref="text:sha256:" + sha256_text(text))
|
||||||
|
return envelope
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_allowed(path: str, allowed_roots: list[str] | None, configured_roots: list[Path] | None = None) -> tuple[Path, list[Path]]:
|
||||||
|
configured = [p.expanduser().resolve() for p in (configured_roots or DEFAULT_ALLOWED_ROOTS)]
|
||||||
|
if not configured:
|
||||||
|
raise ValueError("at least one configured allowed root is required")
|
||||||
|
requested = [Path(p).expanduser().resolve() for p in (allowed_roots or [str(p) for p in configured])]
|
||||||
|
if not requested:
|
||||||
|
raise ValueError("at least one requested allowed root is required")
|
||||||
|
for root in requested:
|
||||||
|
if not any(root == base or root.is_relative_to(base) for base in configured):
|
||||||
|
raise ValueError("requested allowed root is outside configured roots")
|
||||||
|
roots = requested
|
||||||
|
candidate = Path(path).expanduser().resolve()
|
||||||
|
if not any(candidate == root or candidate.is_relative_to(root) for root in roots):
|
||||||
|
raise ValueError("path must be inside an allowed root")
|
||||||
|
if not candidate.exists() or not candidate.is_file():
|
||||||
|
raise ValueError("path must be an existing file")
|
||||||
|
return candidate, roots
|
||||||
|
|
||||||
|
|
||||||
|
def triage_file(
|
||||||
|
path: str,
|
||||||
|
*,
|
||||||
|
allowed_roots: list[str] | None = None,
|
||||||
|
configured_roots: list[Path] | None = None,
|
||||||
|
trace_id: str | None = None,
|
||||||
|
http_post_json: Callable[[str, dict[str, Any], float], dict[str, Any]] = http_post_json,
|
||||||
|
logger: AdvisoryLogger | None = None,
|
||||||
|
timeout_s: float = 60.0,
|
||||||
|
) -> dict[str, Any]:
|
||||||
|
candidate, roots = _resolve_allowed(path, allowed_roots, configured_roots)
|
||||||
|
payload = {"path": str(candidate), "options": {"allowed_roots": [str(r) for r in roots], "max_pages": 3}}
|
||||||
|
result = http_post_json(DOC_TRIAGE_URL, payload, timeout_s)
|
||||||
|
delta = _doc_triage_npu_delta(result)
|
||||||
|
envelope = build_envelope(
|
||||||
|
service="doc_triage",
|
||||||
|
operation="triage_file",
|
||||||
|
mode="reviewable_artifact",
|
||||||
|
input_scope="explicit_file",
|
||||||
|
trace_id=trace_id,
|
||||||
|
result={"triage": result.get("result"), "final_authority": False},
|
||||||
|
npu_busy_delta_us=delta,
|
||||||
|
)
|
||||||
|
if logger:
|
||||||
|
envelope["warnings"].append("metadata-only log; raw file contents are not logged")
|
||||||
|
logger.log(envelope, input_ref="file:sha256path:" + sha256_text(str(candidate)))
|
||||||
|
return envelope
|
||||||
|
|
||||||
|
|
||||||
|
def health(*, http_get_json: Callable[[str, float], dict[str, Any]] = http_get_json) -> dict[str, Any]:
|
||||||
|
deps = {
|
||||||
|
"classifier": "http://127.0.0.1:18819/healthz",
|
||||||
|
"genai": "http://127.0.0.1:18820/healthz",
|
||||||
|
"doc_triage": "http://127.0.0.1:18829/healthz",
|
||||||
|
}
|
||||||
|
out: dict[str, Any] = {"ok": True, "service": "openvino-advisory-gateway", "mode": "advisory_only", "authority": dict(AUTHORITY), "dependencies": {}}
|
||||||
|
for name, url in deps.items():
|
||||||
|
try:
|
||||||
|
data = http_get_json(url, 8.0)
|
||||||
|
out["dependencies"][name] = {"ok": bool(data.get("ok", data.get("status") == "ok")), "service": data.get("service"), "device": data.get("device")}
|
||||||
|
except Exception as exc:
|
||||||
|
out["ok"] = False
|
||||||
|
out["dependencies"][name] = {"ok": False, "error": str(exc)}
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def _read_json(handler: BaseHTTPRequestHandler, max_bytes: int = 256 * 1024) -> dict[str, Any]:
|
||||||
|
length = int(handler.headers.get("Content-Length", "0"))
|
||||||
|
if length > max_bytes:
|
||||||
|
raise ValueError("request JSON too large")
|
||||||
|
raw = handler.rfile.read(length)
|
||||||
|
if not raw:
|
||||||
|
return {}
|
||||||
|
return json.loads(raw.decode("utf-8"))
|
||||||
|
|
||||||
|
|
||||||
|
def make_handler(logger: AdvisoryLogger, configured_roots: list[Path]):
|
||||||
|
class Handler(BaseHTTPRequestHandler):
|
||||||
|
server_version = "openvino-advisory-gateway/0.1"
|
||||||
|
|
||||||
|
def log_message(self, format: str, *args: Any) -> None: # noqa: A002 - stdlib override name
|
||||||
|
# Do not log request bodies or private paths.
|
||||||
|
print(f"{self.client_address[0]} {format % args}")
|
||||||
|
|
||||||
|
def send_json(self, status: int, payload: Any) -> None:
|
||||||
|
body = json.dumps(payload, indent=2, sort_keys=True).encode("utf-8")
|
||||||
|
self.send_response(status)
|
||||||
|
self.send_header("Content-Type", "application/json")
|
||||||
|
self.send_header("Content-Length", str(len(body)))
|
||||||
|
self.end_headers()
|
||||||
|
self.wfile.write(body)
|
||||||
|
|
||||||
|
def do_GET(self) -> None: # noqa: N802
|
||||||
|
if urlparse(self.path).path in ("/", "/health", "/healthz"):
|
||||||
|
self.send_json(200, health())
|
||||||
|
return
|
||||||
|
self.send_json(404, {"ok": False, "error": "not_found"})
|
||||||
|
|
||||||
|
def do_POST(self) -> None: # noqa: N802
|
||||||
|
path = urlparse(self.path).path
|
||||||
|
try:
|
||||||
|
payload = _read_json(self)
|
||||||
|
if path == "/v1/advisory/classify":
|
||||||
|
self.send_json(200, classify_text(str(payload.get("text", "")), trace_id=payload.get("trace_id"), logger=logger))
|
||||||
|
return
|
||||||
|
if path == "/v1/advisory/generate":
|
||||||
|
self.send_json(200, generate_bounded(str(payload.get("job", "summary")), str(payload.get("input", "")), max_new_tokens=payload.get("max_new_tokens"), trace_id=payload.get("trace_id"), logger=logger))
|
||||||
|
return
|
||||||
|
if path == "/v1/advisory/triage":
|
||||||
|
self.send_json(200, triage_file(str(payload.get("path", "")), allowed_roots=payload.get("allowed_roots"), configured_roots=configured_roots, trace_id=payload.get("trace_id"), logger=logger))
|
||||||
|
return
|
||||||
|
self.send_json(404, {"ok": False, "error": "not_found"})
|
||||||
|
except Exception as exc:
|
||||||
|
self.send_json(400, {"ok": False, "error": type(exc).__name__, "message": str(exc), "authority": dict(AUTHORITY)})
|
||||||
|
|
||||||
|
return Handler
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv: list[str] | None = None) -> int:
|
||||||
|
parser = argparse.ArgumentParser(description="Local-only OpenVINO NPU advisory gateway")
|
||||||
|
parser.add_argument("--host", default=os.environ.get("NPU_ADVISORY_HOST", HOST))
|
||||||
|
parser.add_argument("--port", type=int, default=int(os.environ.get("NPU_ADVISORY_PORT", str(PORT))))
|
||||||
|
parser.add_argument("--log-db", default=str(DEFAULT_LOG_DB))
|
||||||
|
parser.add_argument("--allowed-root", action="append", dest="allowed_roots", default=None, help="Configured file root allowed for advisory doc/image triage. May be repeated.")
|
||||||
|
args = parser.parse_args(argv)
|
||||||
|
if args.host != "127.0.0.1":
|
||||||
|
raise SystemExit("refusing non-local bind")
|
||||||
|
configured_roots = [Path(p).expanduser().resolve() for p in (args.allowed_roots or DEFAULT_ALLOWED_ROOTS)]
|
||||||
|
logger = AdvisoryLogger(args.log_db)
|
||||||
|
server = ThreadingHTTPServer((args.host, args.port), make_handler(logger, configured_roots))
|
||||||
|
print(json.dumps({"service": "openvino-advisory-gateway", "host": args.host, "port": args.port, "mode": "advisory_only"}), flush=True)
|
||||||
|
server.serve_forever()
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
@@ -0,0 +1,17 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=OpenVINO NPU advisory gateway (local-only, port 18830)
|
||||||
|
After=network.target openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service
|
||||||
|
Wants=openvino-router-classifier.service openvino-genai-npu-worker.service openvino-doc-image-triage.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
WorkingDirectory=/home/will/lab/swarm/openvino-advisory-gateway
|
||||||
|
Environment=NPU_ADVISORY_HOST=127.0.0.1
|
||||||
|
Environment=NPU_ADVISORY_PORT=18830
|
||||||
|
Environment=NPU_ADVISORY_LOG_DB=/home/will/.local/state/openvino-advisory-gateway/events.sqlite
|
||||||
|
ExecStart=/home/will/.venvs/npu/bin/python /home/will/lab/swarm/openvino-advisory-gateway/gateway.py --host 127.0.0.1 --port 18830 --allowed-root /home/will/lab/swarm/openvino-doc-image-triage-npu
|
||||||
|
Restart=on-failure
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=default.target
|
||||||
@@ -0,0 +1,134 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import sqlite3
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
|
||||||
|
import gateway
|
||||||
|
|
||||||
|
|
||||||
|
def test_authority_envelope_is_advisory_and_forbids_side_effects() -> None:
|
||||||
|
env = gateway.build_envelope(
|
||||||
|
service="classifier",
|
||||||
|
operation="classify",
|
||||||
|
mode="shadow",
|
||||||
|
result={"labels": {"workflow_category": {"value": "devops"}}},
|
||||||
|
npu_busy_delta_us=123,
|
||||||
|
input_scope="explicit_text",
|
||||||
|
)
|
||||||
|
|
||||||
|
assert env["ok"] is True
|
||||||
|
assert env["mode"] == "shadow"
|
||||||
|
assert env["authority"] == {
|
||||||
|
"may_route": False,
|
||||||
|
"may_write_memory": False,
|
||||||
|
"may_send_external": False,
|
||||||
|
"may_process_private_dirs": False,
|
||||||
|
"may_execute_tools": False,
|
||||||
|
"may_restart_services": False,
|
||||||
|
}
|
||||||
|
assert env["npu_proof"] == {"required": True, "ok": True, "npu_busy_delta_us": 123}
|
||||||
|
|
||||||
|
|
||||||
|
def test_classify_calls_sidecar_and_logs_metadata_only(tmp_path: Path) -> None:
|
||||||
|
calls: list[tuple[str, dict]] = []
|
||||||
|
|
||||||
|
def fake_post(url: str, payload: dict, timeout_s: float) -> dict:
|
||||||
|
calls.append((url, payload))
|
||||||
|
return {
|
||||||
|
"labels": {"tool_needed": {"value": True}},
|
||||||
|
"npu_busy_delta_us": 55,
|
||||||
|
"sysfs_npu_busy_delta_us": 55,
|
||||||
|
}
|
||||||
|
|
||||||
|
logger = gateway.AdvisoryLogger(tmp_path / "events.sqlite")
|
||||||
|
env = gateway.classify_text(
|
||||||
|
"Inspect live service status",
|
||||||
|
trace_id="t1",
|
||||||
|
http_post_json=fake_post,
|
||||||
|
logger=logger,
|
||||||
|
)
|
||||||
|
|
||||||
|
assert calls[0][0].endswith(":18819/v1/classify")
|
||||||
|
assert calls[0][1]["options"]["dry_run"] is True
|
||||||
|
assert env["service"] == "classifier"
|
||||||
|
assert env["authority"]["may_route"] is False
|
||||||
|
assert env["npu_proof"]["ok"] is True
|
||||||
|
|
||||||
|
with sqlite3.connect(tmp_path / "events.sqlite") as con:
|
||||||
|
row = con.execute("select service, operation, input_ref, raw_payload from advisory_events").fetchone()
|
||||||
|
assert row == ("classifier", "classify", "text:sha256:" + gateway.sha256_text("Inspect live service status"), None)
|
||||||
|
|
||||||
|
|
||||||
|
def test_generate_allows_only_bounded_jobs() -> None:
|
||||||
|
with pytest.raises(ValueError, match="unsupported advisory generation job"):
|
||||||
|
gateway.generate_bounded("primary_chat", "hello", http_post_json=lambda *_: {})
|
||||||
|
|
||||||
|
|
||||||
|
def test_generate_wraps_draft_without_final_authority() -> None:
|
||||||
|
def fake_post(url: str, payload: dict, timeout_s: float) -> dict:
|
||||||
|
return {"text": "Short title", "npu_busy_delta_us": 99, "timing_ms": {"total": 10}}
|
||||||
|
|
||||||
|
env = gateway.generate_bounded("title", "Summarize this local health check", http_post_json=fake_post)
|
||||||
|
|
||||||
|
assert env["service"] == "genai"
|
||||||
|
assert env["operation"] == "generate:title"
|
||||||
|
assert env["result"]["draft_text"] == "Short title"
|
||||||
|
assert env["result"]["final_authority"] is False
|
||||||
|
assert env["authority"]["may_send_external"] is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_doc_triage_requires_explicit_file_under_allowed_root(tmp_path: Path) -> None:
|
||||||
|
allowed = tmp_path / "allowed"
|
||||||
|
allowed.mkdir()
|
||||||
|
target = allowed / "synthetic.png"
|
||||||
|
target.write_bytes(b"not real image for unit test")
|
||||||
|
|
||||||
|
def fake_post(url: str, payload: dict, timeout_s: float) -> dict:
|
||||||
|
assert payload["path"] == str(target.resolve())
|
||||||
|
assert payload["options"]["allowed_roots"] == [str(allowed.resolve())]
|
||||||
|
return {"ok": True, "result": {"pages": [{"needs_attention": {"embedding": {"verified_npu": True, "npu_busy_delta_us": 42}}}]}}
|
||||||
|
|
||||||
|
env = gateway.triage_file(str(target), allowed_roots=[str(allowed)], configured_roots=[allowed], http_post_json=fake_post)
|
||||||
|
|
||||||
|
assert env["service"] == "doc_triage"
|
||||||
|
assert env["input_scope"] == "explicit_file"
|
||||||
|
assert env["npu_proof"]["ok"] is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_doc_triage_rejects_private_root_broadening(tmp_path: Path) -> None:
|
||||||
|
allowed = tmp_path / "allowed"
|
||||||
|
allowed.mkdir()
|
||||||
|
with pytest.raises(ValueError, match="path must be inside an allowed root"):
|
||||||
|
gateway.triage_file(str(tmp_path / "outside.png"), allowed_roots=[str(allowed)], configured_roots=[allowed], http_post_json=lambda *_: {})
|
||||||
|
|
||||||
|
|
||||||
|
def test_doc_triage_rejects_requested_root_outside_configured_roots(tmp_path: Path) -> None:
|
||||||
|
configured = tmp_path / "configured"
|
||||||
|
requested = tmp_path / "private"
|
||||||
|
requested.mkdir()
|
||||||
|
target = requested / "file.png"
|
||||||
|
target.write_bytes(b"synthetic")
|
||||||
|
|
||||||
|
with pytest.raises(ValueError, match="requested allowed root is outside configured roots"):
|
||||||
|
gateway.triage_file(
|
||||||
|
str(target),
|
||||||
|
allowed_roots=[str(requested)],
|
||||||
|
configured_roots=[configured],
|
||||||
|
http_post_json=lambda *_: {},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_health_aggregates_dependencies_without_raw_private_data() -> None:
|
||||||
|
def fake_get(url: str, timeout_s: float) -> dict:
|
||||||
|
return {"ok": True, "service": url.rsplit(":", 1)[-1]}
|
||||||
|
|
||||||
|
health = gateway.health(http_get_json=fake_get)
|
||||||
|
|
||||||
|
assert health["ok"] is True
|
||||||
|
assert set(health["dependencies"]) == {"classifier", "genai", "doc_triage"}
|
||||||
|
assert "raw" not in json.dumps(health).lower()
|
||||||
@@ -46,10 +46,10 @@ printf 'busy_time_us=%s\n' "$(busy_value)"
|
|||||||
|
|
||||||
section "Listeners"
|
section "Listeners"
|
||||||
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
|
# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817,
|
||||||
# approved prototypes 18818/18819/18820, and optional doc/image triage 18829.
|
# reranker 18818, local-only specialists 18819/18820/18829, and advisory gateway 18830.
|
||||||
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
|
# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only
|
||||||
# alternate used to avoid collisions during prior smoke tests.
|
# alternate used to avoid collisions during prior smoke tests.
|
||||||
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true
|
ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829|18830)\b' || true
|
||||||
|
|
||||||
section "User service states"
|
section "User service states"
|
||||||
for unit in \
|
for unit in \
|
||||||
|
|||||||
Reference in New Issue
Block a user