# NPU advisory dry-run comparison harness This harness compares advisory-only NPU lane recommendations against synthetic/non-private expected decisions. It is an observability gate only: it does not route, send, write memory, execute tools, restart services, broaden private scans, restart gateways, or mutate vector stores. For the operator runbook and promotion criteria, see `docs/npu-advisory-observability-runbook.md`. Treat this file as the compact command reference; the runbook is the source for how to interpret metrics and decide whether a lane is promotable later. ## Run From `/home/will/lab/swarm`: ```bash python scripts/npu-advisory-dry-run-comparison.py --format json python scripts/npu-advisory-dry-run-comparison.py --format json --include-decisions python scripts/npu-advisory-dry-run-comparison.py --format markdown ``` Strict checks for CI/review: ```bash python scripts/npu-advisory-dry-run-comparison.py --fail-on-mismatch python scripts/npu-advisory-dry-run-comparison.py --fail-on-authority-violation ``` `--fail-on-authority-violation` is expected to fail with the committed fixture set because one synthetic gateway fixture intentionally proves that `may_* = true` is caught and summarized. ## Fixture coverage Fixtures live at `fixtures/npu_advisory_dry_run/fixtures.json` and cover: - context gate; - cron/n8n advisory events; - batch document/audio triage shape; - voice/audio advisory gate; - Kanban hygiene advisory; - advisory gateway envelopes. All fixture payloads are synthetic and omit raw private content. Lane adapters use deterministic local rules or imported pure functions; they do not call live advisory services. ## Output shape JSON output uses `npu_advisory_dry_run_summary_v1` and includes totals, per-lane counts, confidence buckets, recommendation counts, authority violations, expected-outcome mismatches, and optionally per-fixture `npu_advisory_decision_v1` records. Each decision record includes timestamp, source, service, lane, input class, recommendation, expected recommendation, confidence/bucket, authority flags, allowed actions, actual action (`none_dry_run`), human/Atlas comparison, outcome, NPU proof, latency, fallback reason, and compact notes. ## Promotion gate Before any future advisory lane receives authority, a separate approval should require at minimum: - no expected-outcome mismatches for that lane's representative fixture set; - no false negatives on action-needed events; - intentionally reviewed false positives; - zero authority-safe flag violations except known negative-control fixtures; - documented rollback and a narrow, explicit authority scope. Passing this harness never grants live authority by itself. Advisory outputs flow into `npu_advisory_decision_v1` records, summary metrics, and a human/Atlas review gate. Any later promotion must be lane-specific, explicitly approved, and reversible.