diff --git a/README.md b/README.md index 48b9fa6..3ff6a2d 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ swarm/ │ └── vm/ # VM provisioning role (local) ├── openclaw/ # Live mirror of guest ~/.openclaw/ ├── docker-compose.yaml # LiteLLM + supporting services +├── docs/ # Swarm/agentmon/n8n infrastructure docs + diagrams ├── litellm-config.yaml # LiteLLM static config ├── litellm-init-credentials.sh # Register API keys into LiteLLM DB ├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent) @@ -29,6 +30,15 @@ swarm/ └── README.md # This file ``` +## Current swarm/service architecture + +For the current host-side AI/search/voice automation stack, n8n watchdogs, and agentmon monitoring layer, see: + +- [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks +- [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram +- [`docs/diagram-maintenance.md`](docs/diagram-maintenance.md) — diagram upkeep conventions +- OpenVINO NPU services and prototypes are documented in `swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md` and the component READMEs under `openvino-*-npu*/`. Live baseline ports are RAG `:18810`, Whisper NPU `:18816`, and embeddings `:18817`; sidecar ports `:18818`, `:18819`, `:18820`, and optional doc/image triage `:18829` are approved prototypes only, not live Atlas/Hermes routing. + ## VM: zap | Property | Value | diff --git a/docs/diagram-maintenance.md b/docs/diagram-maintenance.md new file mode 100644 index 0000000..675b9cf --- /dev/null +++ b/docs/diagram-maintenance.md @@ -0,0 +1,66 @@ +# Diagram maintenance + +Keep infrastructure diagrams current as first-class documentation, not as one-off screenshots. + +## Current diagrams + +- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — full Atlas/Hermes + n8n + agentmon + local AI/search/voice topology. + +## When to update an existing diagram + +Update the relevant diagram in the same change set when you change any of these: + +- service topology, ports, or container names +- monitoring or alerting paths +- n8n workflow architecture +- Hermes/Atlas routing or gateway responsibilities +- local AI/search/voice endpoints +- OpenVINO NPU live/prototype status, ports, or safety gates (`:18810`, `:18816`, `:18817`, `:18818`, `:18819`, `:18820`, optional `:18829`) +- Obsidian/RAG data flow +- OpenClaw/VM operational mode +- ownership/source-of-truth paths for a component + +## When to create a new diagram + +Create a new focused diagram when the existing overview would become too dense. Good candidates: + +- n8n workflow family or alerting-only diagram +- agentmon internals: collectors → NATS → processor → Postgres → query/UI +- Obsidian/RAG automation pipeline +- local AI routing: Hermes/LiteLLM/llama.cpp/Ollama/provider boundaries +- OpenVINO NPU assistant sidecars, with live baseline and approved/not-live prototype lanes separated +- messaging/channel routing: Telegram/Discord/email → Hermes/n8n/alerts +- disaster recovery / backup topology + +## Style rules + +- Prefer standalone `.html` files with inline SVG so they render offline in any browser. +- Keep the source file committed alongside the docs; do not rely on generated screenshots as the only artifact. +- Link diagrams from the nearest README or operational doc. +- Keep labels operational: service name, port, responsibility, and data direction. +- Avoid secrets, credential names that imply secret values, private tokens, raw webhook URLs, or sensitive sample payloads. +- Do not imply live Atlas/Hermes/RAG routing to an OpenVINO NPU prototype unless a reviewed implementation actually enabled it; label approved prototypes as `not live` or `approval required`. +- If a raw export or live config was used to build the diagram, commit only the sanitized diagram/docs, not the raw sensitive source. + +## Verification before committing + +```bash +# Check the files are valid text and do not contain obvious secret markers +python - <<'PY' +from pathlib import Path +for p in Path('docs').glob('*.html'): + text = p.read_text() + hits = [s for s in ['api_key', 'token', 'password', 'Authorization', 'Bearer ', 'secret'] if s.lower() in text.lower()] + print(p, hits) +PY + +# Inspect targeted diff only +git diff --stat -- docs README.md +``` + +After editing diagrams, commit with a docs-focused message, for example: + +```bash +git add docs/*.md docs/*.html README.md +git commit -m "docs: update swarm infrastructure diagrams" +``` diff --git a/docs/swarm-infrastructure.html b/docs/swarm-infrastructure.html new file mode 100644 index 0000000..edc6862 --- /dev/null +++ b/docs/swarm-infrastructure.html @@ -0,0 +1,115 @@ + + + + + + Will's Swarm Infrastructure + + + +
+

Will's Swarm Infrastructure

Atlas/Hermes gateway + n8n automation + agentmon monitoring + local AI/search/voice services
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Hermes gateway layer + + n8n + agentmon observability + + local swarm services + + + TelegramDM/groups + Discord#ops-alerts + EmailGmail IMAP + + + Atlas / Hermesdefault profile gatewaytools • memory • specialists + + + n8n-agentautomation workflows:18808 host / :5678 container + agentmon-queryaggregate snapshots/API:8081 /v1/events + agentmon pipelineingest :8080NATS JetStreamevent processorPostgres DBweb UI :8082swarm.snapshot + openclaw.snapshot + + + LiteLLMLLM router + DB:18804 + SearchSearXNG + Brave MCP:18803 / :18802 + VoiceKokoro + Whisper:18805 / :18816 + Docker servicesagentmon.monitor=trueswarm/service snapshots + OpenClaw VMscurrently dormantopenclaw.snapshot + Obsidian / RAGRAG endpoint :18810Chroma obsidian_bge_npu + NPU sidecarsapproved prototypes; not live:18818/:18819/:18820/:18829 + + + host local AIllama.cpp :18806Ollama fallback :18807OpenVINO embed :18817 liveWhisper NPU :18816 live + + + + Legend + Gateway/Search/Voice + Automation/API + Data/AI stores + Event bus/pipeline + Monitoring / not-live prototype flows + + +
+
+

Monitoring model

  • • n8n direct probes critical ports
  • • agentmon aggregates Docker/OpenClaw snapshots
  • • n8n polls agentmon for stale/degraded state
+

Operational endpoints

  • • n8n: 127.0.0.1:18808
  • • agentmon query/UI: 8081 / 8082
  • • live NPU: RAG 18810, Whisper 18816, embeddings 18817
  • • prototypes not live-routed: 18818/18819/18820/18829
+

Source paths

  • • Swarm repo: ~/lab/swarm
  • • Agentmon repo: ~/lab/agentmon
  • • Workflows: swarm-common/n8n-workflows
+
+ +
+ + diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md new file mode 100644 index 0000000..dd47587 --- /dev/null +++ b/docs/swarm-infrastructure.md @@ -0,0 +1,250 @@ +# Swarm Infrastructure + +This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the `zap` workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer. + +## High-level topology + +```text +Telegram / Discord / Email + | + v +Hermes / Atlas gateway (default profile) + | + +--> local tools and specialist profiles + +--> n8n automation workflows on :18808 + +n8n automation + | + +--> direct watchdog probes for key service ports + +--> Agentmon Health Watchdog -> agentmon-query :8081 + +--> Obsidian, RAG, voice memo, URL capture, digest workflows + +agentmon + | + +--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true + +--> agentmon-openclaw-monitor -> OpenClaw VM snapshots + +--> NATS JetStream -> event processor -> Postgres + +--> query API / UI on :8081 / :8082 + +local AI/search/voice services + | + +--> LiteLLM :18804 + +--> SearXNG :18803 + +--> Brave MCP :18802 + +--> llama.cpp :18806 + +--> Ollama embeddings :18807 (legacy/CPU fallback) + +--> OpenVINO NPU embeddings :18817 + +--> Kokoro TTS :18805 + +--> Whisper NPU :18816 + +--> approved/not-live NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage optional :18829 +``` + +See also: + +- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — visual architecture diagram +- [`diagram-maintenance.md`](./diagram-maintenance.md) — how to keep diagrams updated and when to create new ones + +## Runtime layers + +### 1. Messaging and agent gateway + +- **Hermes / Atlas default profile** is the production messaging gateway. +- Connected platforms include Telegram, Discord, and email. +- Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon. +- Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway. + +### 2. n8n automation + +Container/service: + +- `n8n-agent` +- Host URL: `http://127.0.0.1:18808` +- Container URL: `http://127.0.0.1:5678` +- Compose project: `/home/will/lab/swarm/docker-compose.yaml` + +Important workflow source exports live under: + +- `swarm-common/n8n-workflows/` + +Current health/automation patterns: + +- **Swarm Health Watchdog**: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc. +- **Agentmon Health Watchdog**: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state. +- **RAG and Embedding Health Watchdog**: checks RAG/search/embedding path. +- Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction. + +### 3. Agentmon monitoring layer + +Repo: + +- `/home/will/lab/agentmon` + +Compose services: + +- `agentmon-ingest` on `:8080` — ingestion gateway, `/healthz` +- `agentmon-query` on `:8081` — query API, `/healthz`, `/v1/events`, `/v1/stats/summary` +- `agentmon-ui` on `:8082` — web UI, `/healthz` +- `agentmon-processor` — NATS to Postgres event processor +- `agentmon-swarm-monitor` — monitors Docker containers labeled `agentmon.monitor=true` +- `agentmon-openclaw-monitor` — emits OpenClaw VM snapshots +- `agentmon-db` — Postgres +- `agentmon-nats` — NATS JetStream + +Key query endpoints: + +```text +http://127.0.0.1:8080/healthz +http://127.0.0.1:8081/healthz +http://127.0.0.1:8082/healthz +http://127.0.0.1:8081/v1/stats/summary +http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1 +http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20 +http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3 +``` + +From inside `n8n-agent`, use the Docker bridge gateway: + +```text +http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1 +``` + +### 4. Local AI, search, and voice services + +Docker services: + +- `litellm` — `:18804`, OpenAI-compatible LLM router +- `litellm-db` — Postgres backing LiteLLM +- `searxng` — `:18803`, local metasearch +- `brave-search` — `:18802`, Brave Search MCP server +- `kokoro-tts` — `:18805`, local TTS +- `whisper-server-npu` — `:18816`, OpenVINO NPU local transcription +- `n8n-agent` — `:18808`, automation + +Host/user services: + +- `llama-server.service` — `:18806`, local llama.cpp OpenAI-compatible LLM +- `ollama.service` — `:18807`, legacy/CPU embeddings API fallback +- `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`) +- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n +- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings +- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction +- `voice-memo-processor.service` — `:18813`, voice memo processing +- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper + +Approved but not live-routed OpenVINO NPU sidecars: + +| Port | Component | State | Safety boundary | +| ---: | --- | --- | --- | +| `18818` | reranker | approved prototype; optional foreground/user-systemd only | request-time only; no Chroma/vector mutation; no live RAG integration unless Will approves | +| `18819` | router/classifier | approved prototype; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages | +| `18820` | bounded GenAI worker | approved prototype | background jobs only; not primary Atlas/Hermes model routing | +| `18829` | document/image triage | CLI-first; optional localhost server | synthetic/non-private smoke data only; no private directory processing; NPU stage is embeddings via `:18817` | + +These sidecars must bind to `127.0.0.1` by default, must not be enabled persistently or wired into live Atlas/Hermes/RAG paths without explicit Will approval, and any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference. HTTP 200 alone is not proof. + +### 5. Obsidian and RAG + +Vault: + +- `/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap` + +Local REST API: + +- HTTP: `127.0.0.1:27123` +- HTTPS: `127.0.0.1:27124` + +RAG/vector store: + +- ChromaDB path: `~/.hermes/data/rag-search/chroma/` +- Reindex state/progress: active BGE/NPU state in `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` and `obsidian_bge_npu_reindex_progress.json`; legacy Ollama state in `obsidian_index_state.json` remains for comparison/fallback. +- Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`, collection `obsidian_bge_npu`. +- Legacy comparison/fallback collection: `obsidian`, built with Ollama on `:18807` using `nomic-embed-text`. +- Reindex endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test. + +## Monitoring model + +The monitoring design is intentionally layered: + +1. **n8n direct probes** check critical service endpoints and send deduped alerts. +2. **agentmon** continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres. +3. **n8n Agentmon Health Watchdog** polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded. +4. **Hermes/Atlas** can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks. + +This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning. + +## Agentmon Health Watchdog + +Workflow source: + +- `swarm-common/n8n-workflows/agentmon-health-watchdog.json` + +Installed n8n workflow: + +- Name: `Agentmon Health Watchdog` +- ID: `AgentmonHealthWatchdog` +- Schedule: every 5 minutes + +Alert conditions: + +- `agentmon-ingest`, `agentmon-query`, or `agentmon-ui` `/healthz` fails. +- Latest `swarm.snapshot` is missing. +- Latest `swarm.snapshot` is older than 3 minutes. +- Snapshot issues are non-empty. +- Required agentmon services are missing or not healthy/running: + - `agentmon-ingest` + - `agentmon-query` + - `agentmon-ui` + - `agentmon-processor` + - `agentmon-swarm-monitor` + - `agentmon-db` + - `agentmon-nats` + +Deduplication: + +- Alert after 2 failed checks. +- Reminder every 6 failed runs. +- Recovery message when state returns healthy. + +## Operational quick checks + +From the host: + +```bash +cd /home/will/lab/swarm +make status +make local-ai-health +./scripts/npu-service-health.sh # read-only; includes sysfs busy-time proof for :18817 +curl -fsS http://127.0.0.1:18808/healthz +curl -fsS http://127.0.0.1:8081/healthz +curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq . +``` + +From inside `n8n-agent`: + +```bash +docker exec n8n-agent /bin/sh -lc ' + wget -qO- -T 5 http://172.19.0.1:8081/healthz + wget -qO- -T 5 "http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1" | head -c 500 +' +``` + +Verify n8n workflow activation: + +```bash +docker exec -u node n8n-agent n8n export:workflow \ + --id=AgentmonHealthWatchdog \ + --output=/tmp/agentmon-export.json + +docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json +jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json +``` + +## Notes and pitfalls + +- Do not commit `.env`, decrypted credentials, raw credential exports, or runtime DB files. +- n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized. +- From host, use `127.0.0.1:`. +- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services. +- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing. +- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default. +- OpenVINO NPU sidecars on `:18818`, `:18819`, `:18820`, and optional `:18829` are prototypes/not-live unless a later approved change installs and routes them. Do not draw live Atlas/Hermes/RAG arrows to them in diagrams until that approval and implementation actually exist. diff --git a/scripts/npu-service-health.sh b/scripts/npu-service-health.sh new file mode 100755 index 0000000..23849f5 --- /dev/null +++ b/scripts/npu-service-health.sh @@ -0,0 +1,115 @@ +#!/usr/bin/env bash +set -euo pipefail + +# Read-only health probe for Will's local OpenVINO/NPU services. +# This script intentionally does not start, stop, restart, enable, reindex, or route anything. + +BUSY_PATH=${BUSY_PATH:-/sys/class/accel/accel0/device/npu_busy_time_us} +CURL_TIMEOUT=${CURL_TIMEOUT:-8} +EMBED_MODEL=${EMBED_MODEL:-bge-base-en-v1.5-int8-ov} +EMBED_URL=${EMBED_URL:-http://127.0.0.1:18817/v1/embeddings} + +have() { command -v "$1" >/dev/null 2>&1; } + +json_pretty() { + if have jq; then + jq . + else + python -m json.tool + fi +} + +section() { + printf '\n== %s ==\n' "$1" +} + +http_json() { + local name=$1 url=$2 + printf '\n[%s] %s\n' "$name" "$url" + if ! curl -fsS --max-time "$CURL_TIMEOUT" "$url" | json_pretty; then + printf 'status=unavailable_or_non_json\n' + return 1 + fi +} + +busy_value() { + if [[ -r "$BUSY_PATH" ]]; then + tr -d '\n' < "$BUSY_PATH" + else + printf 'missing' + fi +} + +section "NPU counter" +printf 'busy_path=%s\n' "$BUSY_PATH" +printf 'busy_time_us=%s\n' "$(busy_value)" + +section "Listeners" +# Required OpenVINO/NPU program ports: live baseline 18810/18816/18817, +# approved prototypes 18818/18819/18820, and optional doc/image triage 18829. +# 18814 is the existing RAG/embedding health wrapper; 18828 is a review-only +# alternate used to avoid collisions during prior smoke tests. +ss -ltnp | grep -E ':(18810|18814|18816|18817|18818|18819|18820|18828|18829)\b' || true + +section "User service states" +for unit in \ + openvino-embeddings.service \ + rag-embedding-health.service \ + openvino-reranker.service \ + openvino-router-classifier.service \ + openvino-genai-npu-worker.service; do + active=$(systemctl --user is-active "$unit" 2>/dev/null || true) + enabled=$(systemctl --user is-enabled "$unit" 2>/dev/null || true) + printf '%-38s active=%-10s enabled=%s\n' "$unit" "${active:-unknown}" "${enabled:-unknown}" +done + +section "Docker service states" +if [[ -d /home/will/lab/swarm ]]; then + (cd /home/will/lab/swarm && docker compose ps whisper-server-npu 2>/dev/null) || true +fi + +section "HTTP health" +http_json "RAG endpoint" "http://127.0.0.1:18810/healthz" || true +http_json "RAG/embedding health wrapper" "http://127.0.0.1:18814/healthz" || true +http_json "Whisper NPU" "http://127.0.0.1:18816/health" || true +http_json "OpenVINO embeddings" "http://127.0.0.1:18817/healthz" || true +# Prototypes are expected to be unavailable until explicitly started/approved. +http_json "NPU reranker prototype" "http://127.0.0.1:18818/readyz" || true +http_json "NPU router classifier prototype" "http://127.0.0.1:18819/healthz" || true +http_json "NPU GenAI worker prototype" "http://127.0.0.1:18820/healthz" || true +http_json "NPU doc/image triage prototype" "http://127.0.0.1:18829/healthz" || true + +section "Embeddings NPU busy-time proof" +if [[ ! -r "$BUSY_PATH" ]]; then + printf 'result=failed reason=missing_busy_counter\n' + exit 2 +fi +before=$(busy_value) +response=$(curl -fsS --max-time "$CURL_TIMEOUT" \ + "$EMBED_URL" \ + -H 'Content-Type: application/json' \ + -d "{\"input\":\"non-private npu health probe\",\"model\":\"$EMBED_MODEL\"}" || true) +after=$(busy_value) +if [[ -z "$response" ]]; then + printf 'result=failed reason=embedding_request_failed before_us=%s after_us=%s\n' "$before" "$after" + exit 3 +fi +delta=$((after - before)) +printf 'sysfs_before_us=%s\nsysfs_after_us=%s\nsysfs_delta_us=%s\n' "$before" "$after" "$delta" +RESPONSE_JSON="$response" python - <<'PY' || true +import json, os +try: + data = json.loads(os.environ.get('RESPONSE_JSON', '')) +except Exception as exc: + print(f'response_parse_error={type(exc).__name__}: {exc}') + raise SystemExit(0) +print(f"response_object={data.get('object')}") +print(f"response_model={data.get('model')}") +print(f"response_npu_busy_delta_us={data.get('npu_busy_delta_us')}") +print(f"embedding_count={len(data.get('data', []))}") +PY +if (( delta <= 0 )); then + printf 'result=failed reason=no_positive_sysfs_npu_delta\n' + exit 4 +fi +printf 'result=ok\n' diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md new file mode 100644 index 0000000..8711888 --- /dev/null +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Resources/Service Catalog.md @@ -0,0 +1,309 @@ +--- +type: service-catalog +created: 2026-05-14T14:50:46-07:00 +updated: 2026-06-04T11:35:00-07:00 +tags: + - service-catalog + - swarm + - hermes + - automation +--- + +# Service Catalog + +Canonical index of local services, automation tools, Hermes capabilities, and where to find their operational docs. + +> Generated by Atlas from live system inventory on `2026-05-14T14:50:46-07:00`; high-risk local AI/service rows refreshed on `2026-05-27T12:12:06-07:00`; Obsidian/RAG embedding path refreshed on `2026-06-03T21:31:01-07:00`. Secrets are intentionally omitted. + +## Quick links + +- [[Ops Home]] +- [[Obsidian Automation Health]] +- [[Obsidian Plugin Setup]] +- [[Runbooks Home]] +- [[Projects Home]] +- [[Decisions Home]] + +## Primary repositories and config locations + +| Area | Path / command | Purpose | +| --- | --- | --- | +| Swarm repo | `~/lab/swarm` | Docker services, n8n, local AI, OpenClaw helpers, service scripts | +| Swarm Makefile | `cd ~/lab/swarm && make help` | Authoritative operations target list | +| n8n workflow exports | `~/lab/swarm/swarm-common/n8n-workflows/` | Versioned workflow backups | +| Shared Obsidian vault | `~/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap` | Active API-backed vault | +| Hermes config | `~/.hermes/config.yaml` | Atlas/Hermes model, tools, gateway, profiles | +| Hermes env/secrets | `~/.hermes/.env` | Secrets; do not print or commit | +| Hermes source | `~/.hermes/hermes-agent` | Atlas local source checkout | +| Hermes skills | `~/.hermes/skills/` | Procedural docs and reusable playbooks | + +## Local endpoints + +| Service | Port | Status | Purpose | Health / base URL | +| --- | --- | --- | --- | --- | +| Brave Search MCP | 18802 | HTTP 406 on plain GET `/mcp` | Brave Search MCP server for Hermes MCP tools | `http://127.0.0.1:18802/mcp` | +| SearXNG | 18803 | OK 200 | SearXNG metasearch | `http://127.0.0.1:18803/search?q=test&format=json` | +| LiteLLM | 18804 | no listener / HTTP 000 on 2026-05-27 | LiteLLM OpenAI-compatible model proxy | `http://127.0.0.1:18804/health/liveliness` | +| Kokoro TTS | 18805 | OK 200 | Kokoro local TTS | `http://127.0.0.1:18805/health` | +| llama.cpp | 18806 | OK 200 | llama.cpp local LLM | `http://127.0.0.1:18806/v1/models` | +| Ollama embeddings | 18807 | OK 200 | Ollama embeddings API | `http://127.0.0.1:18807/api/version` | +| n8n | 18808 | OK 200 | n8n workflow automation | `http://127.0.0.1:18808/healthz` | +| Docker health | 18809 | OK 200 | Docker/container health API | `http://127.0.0.1:18809/health` | +| Obsidian reindex | 18810 | OK 200 | Obsidian/RAG reindex trigger | `http://127.0.0.1:18810/healthz` | +| Whisper CPU | 18811 | OK 200 | Whisper.cpp CPU STT fallback | `http://127.0.0.1:18811/` | +| URL extractor | 18812 | OK 200 | URL/PDF/YouTube content extractor | `http://127.0.0.1:18812/healthz` | +| Voice memo processor | 18813 | OK 200 | Voice memo processor | `http://127.0.0.1:18813/healthz` | +| RAG/embedding health | 18814 | OK 200 | RAG/OpenVINO/Obsidian health wrapper | `http://127.0.0.1:18814/healthz` | +| Whisper OpenVINO NPU | 18816 | OK 200 / Docker healthy on 2026-06-04 | Intel NPU Whisper transcription service | `http://127.0.0.1:18816/health` | +| OpenVINO embeddings | 18817 | OK 200 | Intel NPU embeddings service for live Obsidian RAG | `http://127.0.0.1:18817/health` | +| OpenVINO NPU reranker prototype | 18818 | approved prototype; not enabled live | Optional second-stage RAG reranker | `http://127.0.0.1:18818/readyz` | +| OpenVINO router/classifier prototype | 18819 | approved prototype; not enabled live | Dry-run Atlas/Hermes message classifier/router | `http://127.0.0.1:18819/healthz` | +| OpenVINO GenAI NPU worker prototype | 18820 | approved prototype; not enabled live | Bounded local background generation worker | `http://127.0.0.1:18820/healthz` | +| OpenVINO document/image triage prototype | 18828/18829 | approved foreground prototype; not enabled live | Local document/image triage with NPU embeddings stage via `:18817` | `http://127.0.0.1:/healthz` | +| Obsidian REST HTTP | 27123 | OK 200 | Obsidian Local REST API HTTP | `http://127.0.0.1:27123/` | + +## Docker services + +| Container | Status | Ports | +| --- | --- | --- | +| n8n-agent | Up 21 hours (healthy) | 0.0.0.0:18808->5678/tcp, [::]:18808->5678/tcp | +| whisper-server-gpu | Up 27 hours (healthy) | 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp | +| whisper-server | Up 27 hours (healthy) | 0.0.0.0:18811->8080/tcp, [::]:18811->8080/tcp | +| kokoro-tts | Up 25 hours | 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp | +| brave-search | Up 25 hours | 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp | +| searxng | Up 25 hours | 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp | + +Management commands: + +```bash +cd ~/lab/swarm +make ps +make status +make local-ai-health +make api-health +make timers +./scripts/npu-service-health.sh +``` + +## Host-side systemd/user services + +Important known services: + +| Unit | Purpose | +| --- | --- | +| `llama-server.service` | Host-side llama.cpp local LLM on 18806 | +| `ollama.service` | Host-side Ollama embeddings on 18807 | +| `docker-health-endpoint.service` | Container health API on 18809 | +| `obsidian-reindex-endpoint.service` | Obsidian/RAG reindex endpoint on 18810 | +| `url-content-extractor.service` | URL/PDF/YouTube extraction on 18812 | +| `voice-memo-processor.service` | Voice memo processing on 18813 | +| `rag-embedding-health.service` | RAG/OpenVINO/Obsidian health check wrapper on 18814 | +| `openvino-embeddings.service` | Intel NPU BGE embedding service on 18817 | +| `openvino-reranker.service` | Optional NPU reranker prototype on 18818; not installed/enabled without approval | +| `openvino-router-classifier.service` | Optional dry-run router/classifier prototype on 18819; not installed/enabled without approval | +| `openvino-genai-npu-worker.service` | Optional bounded GenAI worker prototype on 18820; not installed/enabled without approval | + +Useful checks: + +```bash +systemctl --user list-units '*obsidian*' '*rag*' '*url-content*' '*voice-memo*' '*docker-health*' --all +systemctl --user list-timers +journalctl --user -u obsidian-reindex-endpoint.service -n 50 --no-pager +``` + +## n8n workflows + +n8n UI/API: `http://127.0.0.1:18808` + +| Workflow | ID | State | +| --- | --- | --- | +| Calendar to Obsidian Notes | QRCCdHNXZUHc2Oz4 | inactive | +| Daily OpenClaw Session Digest | qqYwAD05AvRHrHPc | inactive | +| Evening Digest | PlZywwqL8MRNEAN6 | active | +| Gmail Inbox Monitor + Obsidian Notes | whtdorf7yJMVYeHm | active | +| IMAP Inbox Triage + Obsidian Notes | 9sFwRyUDz51csAp7 | active | +| IMAP Inbox Triage + Obsidian Notes (squareffect) | xjUoQf97TkBrawc8 | inactive | +| IMAP Inbox Triage + Obsidian Notes (wills-portal) | kHDK9QdUSiAJ8rCM | inactive | +| Morning Brief | g3IdGZCK1EtTsv9T | active | +| n8n Failure Digest | G9ylNbHbnJ6fWX2C | active | +| Nightly Obsidian Vault Sync | 75JCevkdgkyCr2qH | inactive | +| Obsidian Chat Summary Capture | LF3i86l3NkxpayxL | active | +| Obsidian Daily Review | YZyJ5G0Ur8D6TlM8 | active | +| Obsidian Health + Reindex | PCtD3PuQjzKLyEEE | active | +| Obsidian Inbox Triage | 6SKSZWZwuJNwuO2P | active | +| Obsidian URL to Note | Ori3Bu5u5ODtxxyD | active | +| Obsidian Vault Reindex | 85ntyyphDJ4Ms2b4 | active | +| Obsidian Weekly Decision Runbook Extractor | UWLMOQQVxbTX6Sis | active | +| OpenClaw Action Bus | Jwi54VWMdlLqYnRo | inactive | +| OpenClaw Reminder Webhook | RUR1CGn0ikkxbPin | inactive | +| RAG and Embedding Health Watchdog | SwKaPtYqUJrakpFu | active | +| Swarm Health Watchdog | lDKocSFXBQWQrDd3 | active | +| Voice Memo Capture (Audio URL + Local Whisper) | El1BHJZ56JlzhrRZ | active | +| Web-to-Notes Capture (Local LLM + Obsidian) | GSmzuA5dgGgyRg5v | active | + +Obsidian webhook endpoints: + +| Workflow | Method / URL | Input | +| --- | --- | --- | +| Obsidian Chat Summary Capture | `POST http://127.0.0.1:18808/webhook/obsidian-chat-summary` | JSON with `type`, `title`, `summary`, `content`, optional `tags`, `metadata` | +| Obsidian URL to Note | `POST http://127.0.0.1:18808/webhook/obsidian-url-to-note` | JSON with `url`, optional `folder`, `tags`, `notes` | + +## Hermes capabilities + +### Enabled toolsets + +| Toolset | Description | +| --- | --- | +| web | 🔍 Web Search & Scraping | +| browser | 🌐 Browser Automation | +| terminal | 💻 Terminal & Processes | +| file | 📁 File Operations | +| code_execution | ⚡ Code Execution | +| vision | 👁️ Vision / Image Analysis | +| image_gen | 🎨 Image Generation | +| tts | 🔊 Text-to-Speech | +| skills | 📚 Skills | +| todo | 📋 Task Planning | +| memory | 💾 Memory | +| session_search | 🔎 Session Search | +| clarify | ❓ Clarifying Questions | +| delegation | 👥 Task Delegation | +| cronjob | ⏰ Cron Jobs | +| messaging | 📨 Cross-Platform Messaging | + +### Disabled toolsets + +| Toolset | Description | +| --- | --- | +| video | 🎬 Video Analysis | +| video_gen | 🎬 Video Generation | +| moa | 🧠 Mixture of Agents | +| rag_search | 🧠 RAG Search | +| rl | 🧪 RL Training | +| homeassistant | 🏠 Home Assistant | +| spotify | 🎵 Spotify | +| yuanbao | 🤖 Yuanbao | +| computer_use | 🖱️ Computer Use (macOS) | + +### MCP servers + +```text +MCP Servers: + + Name Transport Tools Status + ──────────────── ────────────────────────────── ──────────── ────────── + brave-search http://127.0.0.1:18802/mcp all ✓ enabled +``` + +### Hermes profiles + +```text +Profile Model Gateway Alias Distribution + ─────────────── ─────────────────────────── ─────────── ─────────── ──────────────────── + ◆default gpt-5.5 running — — + atlas gpt-5.5 stopped — — + engineer gpt-5.5 stopped — — + glm-simple glm-5.1 stopped — — + ops gpt-5.5 stopped — — + orchestrator gpt-5.5 stopped — — + researcher gpt-5.5 stopped — — + reviewer gpt-5.5 stopped — — + writer gpt-5.5 stopped — — +``` + +### Hermes cron jobs + +```text + +┌─────────────────────────────────────────────────────────────────────────┐ +│ Scheduled Jobs │ +└─────────────────────────────────────────────────────────────────────────┘ + + c515ca076b73 [active] + Name: Hermes config git snapshot + Schedule: 0 3 * * * + Repeat: ∞ + Next run: 2026-05-15T03:00:00-07:00 + Deliver: discord:1494453542243532932 + Script: hermes_git_snapshot.sh + Mode: no-agent (script stdout delivered directly) + Last run: 2026-05-11T03:00:37.525856-07:00 ok + + c15ee395a38d [active] + Name: atlas-minio-self-backup + Schedule: 0 3 * * * + Repeat: ∞ + Next run: 2026-05-15T03:00:00-07:00 + Deliver: origin + Script: atlas-backup-to-minio-cron.sh + Mode: no-agent (script stdout delivered directly) + + 1ef682e65695 [active] + Name: watch pi-agent-hermes-bound kanban + Schedule: every 2m + Repeat: ∞ + Next run: 2026-05-14T14:49:39.352638-07:00 + Deliver: local + Script: watch_pi_agent_kanban.py + Mode: no-agent (script stdout delivered directly) + Last run: 2026-05-14T14:47:39.352638-07:00 ok + +``` + +## Local AI and automation routing + +| Capability | Preferred endpoint/tool | Notes | +| --- | --- | --- | +| Web search | SearXNG `18803` or Brave MCP `18802` | Hermes web search and MCP Brave Search are both available | +| Model proxy | LiteLLM `18804` | Use for OpenAI-compatible routed models | +| Direct local LLM | llama.cpp `18806` | Current model id: `gemma-4-26B-A4B-it-UD-IQ2_M.gguf`; useful for n8n/local automation | +| Embeddings | OpenVINO NPU `18817`; Ollama `18807` fallback | Live RAG uses `bge-base-en-v1.5-int8-ov` via OpenVINO and collection `obsidian_bge_npu`; Ollama remains a legacy/CPU fallback | +| Text-to-speech | Kokoro `18805` / Hermes TTS tool | Local speech generation | +| Speech-to-text | Whisper OpenVINO NPU `18816`; Whisper CPU `18811` fallback | NPU service is the live default; CPU remains fallback | +| Workflow automation | n8n `18808` | Durable jobs and webhooks | +| Knowledge store | Obsidian REST `27123`; RAG/Chroma local store | Obsidian notes plus Hermes rag-search index | + +## Obsidian integration + +| Component | Location / endpoint | Purpose | +| --- | --- | --- | +| Local REST API | `http://127.0.0.1:27123` and `https://127.0.0.1:27124` | Read/write notes and execute commands | +| Autostart entry | `~/.config/autostart/obsidian-autostart.desktop` | Launches Obsidian at graphical login | +| Autostart script | `~/.local/bin/start-obsidian-if-needed` | Idempotent launcher for Obsidian | +| Reindex endpoint | `http://127.0.0.1:18810/reindex` | Rebuilds/updates local Obsidian/RAG index | +| Dataview plugin | Vault `.obsidian/plugins/dataview` | Dashboard tables | +| Tasks plugin | Vault `.obsidian/plugins/obsidian-tasks-plugin` | Dashboard task queries | + +## Source-of-truth docs + +| Topic | Where | +| --- | --- | +| Swarm operations | Hermes skill `swarm`; `~/lab/swarm/Makefile` | +| n8n API/workflow management | Hermes skill `swarm`, reference `n8n-api-and-workflows.md` | +| Obsidian filesystem/API usage | Hermes skill `obsidian` | +| Hermes CLI/toolsets/gateway/profiles | Hermes skill `hermes-agent`; `hermes --help`; `hermes tools list` | +| Obsidian automation workflows | `~/lab/swarm/swarm-common/n8n-workflows/obsidian-*.json` | +| Runbooks | [[Runbooks Home]] | +| OpenVINO NPU service operations | [[OpenVINO NPU Services Runbook]]; `~/lab/swarm/scripts/npu-service-health.sh` | + +## Safety notes + +- Do not print `.env`, API keys, tokens, auth JSON, or decrypted n8n credentials. +- From inside the `n8n-agent` container, host services are reached via `http://172.19.0.1:`, not `127.0.0.1:`. +- Use file-based workflow updates for large n8n JSON payloads. +- After structural n8n workflow edits, deactivate/reactivate the workflow. +- Prefer `make` targets in `~/lab/swarm` for routine service operations. +- OpenVINO NPU prototype sidecars `:18818`, `:18819`, `:18820`, and optional `:18829` are approved prototypes only; do not enable persistent services, live Atlas/Hermes/RAG routing, vector DB mutation, or private document/image processing without explicit approval. Verify NPU usage with `/sys/class/accel/accel0/device/npu_busy_time_us`; HTTP 200 alone is not proof. +- Check git status before committing; commit only targeted non-secret source/config/docs. + +## Refresh procedure + +To refresh this catalog: + +```bash +cd ~/lab/swarm +make status +hermes tools list +hermes mcp list +# Ask Atlas: "refresh the Obsidian Service Catalog" +``` diff --git a/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md new file mode 100644 index 0000000..98a7607 --- /dev/null +++ b/swarm-common/obsidian-vault/will/will-shared-zap/Runbooks/OpenVINO NPU Services Runbook.md @@ -0,0 +1,286 @@ +--- +type: runbook +system: openvino-npu-services +status: draft +created: 2026-06-04 +updated: 2026-06-04 +tags: + - runbook + - openvino + - npu + - swarm + - atlas +related: + - [[Service Catalog]] + - [[Swarm Operating Manual]] + - [[Atlas Capability Upgrade Program]] +--- + +# OpenVINO NPU Services Runbook + +This runbook is the integrated operations view for Will's local Intel NPU/OpenVINO services from the `npu-capability-expansion` board. + +Safety posture: +- Do not restart the live Atlas/Hermes gateway from this runbook. +- Do not change primary Atlas/Hermes routing without explicit Will approval. +- Do not delete, overwrite, or in-place reindex existing Chroma/vector collections. +- Treat HTTP 200 as necessary but not sufficient for NPU-backed services; verify `/sys/class/accel/accel0/device/npu_busy_time_us` before/after an inference. +- Keep endpoints local-only unless Will explicitly approves broader exposure. +- Keep raw prompts, private documents, OCR text, and secrets out of logs and durable handoffs. + +## Current service map + +| Capability | Port | Runtime / service | Path | State | Health endpoint | NPU proof | +| --- | ---: | --- | --- | --- | --- | --- | +| Obsidian/RAG endpoint | 18810 | `obsidian-reindex-endpoint.service` / local Python endpoint | `~/lab/swarm/scripts/` | live baseline; uses collection `obsidian_bge_npu` | `http://127.0.0.1:18810/healthz` | indirect via embeddings `:18817`; do not mutate existing collection | +| RAG/embedding health wrapper | 18814 | `rag-embedding-health.service` | `~/lab/swarm/swarm-common/rag-embedding-health.service` | live baseline | `http://127.0.0.1:18814/healthz` | should exercise embeddings path when configured | +| Whisper transcription, OpenVINO NPU | 18816 | Docker Compose service/container `whisper-server-npu` | `~/lab/swarm/whisper-openvino-npu/` | live baseline | `http://127.0.0.1:18816/health` | transcription response includes `npu_busy_delta_us`; sysfs delta must increase | +| OpenVINO embeddings | 18817 | user systemd `openvino-embeddings.service` | `~/lab/swarm/scripts/openvino-embeddings-server.py`; unit in `~/lab/swarm/swarm-common/openvino-embeddings.service` | live baseline, enabled | `http://127.0.0.1:18817/healthz` | embedding response and sysfs delta must be positive | +| NPU reranker prototype | 18818 | optional user systemd `openvino-reranker.service` | `~/lab/swarm/openvino-reranker-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18818/readyz` | `/readyz` reports `device=NPU`; `/v1/rerank` response and sysfs delta must be positive | +| NPU router/classifier prototype | 18819 | optional user systemd `openvino-router-classifier.service` | `~/lab/swarm/openvino-classifier-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18819/healthz` | `/v1/classify` response has positive `npu_busy_delta_us` and `sysfs_npu_busy_delta_us` | +| Small OpenVINO GenAI NPU worker | 18820 | optional user systemd `openvino-genai-npu-worker.service` | `~/lab/swarm/openvino-genai-npu-worker/` | approved prototype; not installed/enabled | `http://127.0.0.1:18820/healthz`; `GET /models` | generation response includes positive `npu_busy_delta_us` | +| Document/image triage prototype | optional 18829 for review only; 18828 was an earlier smoke alternate | CLI-first; foreground local-only server if needed; no persistent unit yet | `~/lab/swarm/openvino-doc-image-triage-npu/` | approved prototype; not installed/enabled | `http://127.0.0.1:18829/healthz`; `GET /models` | v1 NPU stage is semantic embedding through `:18817`; image classification/OCR remain CPU/local | + +Port notes: +- `18818`, `18819`, and `18820` are reserved prototype ports from the program plan; check listeners before binding. +- `18820` is reserved for the GenAI worker prototype. Use optional `18829` for document/image triage foreground review until Will approves a final persistent port. `18828` was used in earlier review smoke only and should not be treated as the preferred documented port. +- Existing `:18817` is currently bound on `0.0.0.0` by the user service; prototype services should still default to `127.0.0.1`. + +## Read-only unified health check + +From the swarm repo: + +```bash +cd ~/lab/swarm +./scripts/npu-service-health.sh +``` + +The script is read-only. It checks listeners for `18810`, `18816`, `18817`, `18818`, `18819`, `18820`, `18829` plus the existing `18814` wrapper and `18828` review alternate, user service state, Docker Compose state for `whisper-server-npu`, JSON health endpoints, and performs a non-private embeddings request while measuring `/sys/class/accel/accel0/device/npu_busy_time_us` before and after. A positive sysfs delta is required for the embeddings proof. + +Manual minimal checks: + +```bash +BUSY=/sys/class/accel/accel0/device/npu_busy_time_us +cat "$BUSY" +ss -ltnp | grep -E ':(18810|18816|18817|18818|18819|18820|18829)\b' || true +systemctl --user is-active openvino-embeddings.service rag-embedding-health.service +cd ~/lab/swarm && docker compose ps whisper-server-npu +curl -fsS http://127.0.0.1:18817/healthz | jq . +``` + +Embedding NPU proof: + +```bash +BUSY=/sys/class/accel/accel0/device/npu_busy_time_us +before=$(cat "$BUSY") +curl -fsS http://127.0.0.1:18817/v1/embeddings \ + -H 'Content-Type: application/json' \ + -d '{"input":"non-private npu health probe","model":"bge-base-en-v1.5-int8-ov"}' | jq '{model, object, npu_busy_delta_us, embedding_count:(.data|length)}' +after=$(cat "$BUSY") +echo "sysfs_npu_busy_delta_us=$((after-before))" +``` + +A healthy NPU path has: +- HTTP success from the endpoint. +- Response-level `npu_busy_delta_us > 0` when the service reports it. +- Sysfs `after - before > 0`. + +## Service-specific smoke checks + +For any foreground prototype server below, run it in a terminal you control or capture its PID and stop it at the end of the smoke. Do not use `systemctl --user enable`, Docker Compose `up -d`, `nohup`, or shell disowning for these review smokes unless Will explicitly approved persistent service enablement. + +Safe foreground-server pattern: + +```bash +server_pid="" +cleanup() { + if [[ -n "$server_pid" ]] && kill -0 "$server_pid" 2>/dev/null; then + kill "$server_pid" + wait "$server_pid" 2>/dev/null || true + fi +} +trap cleanup EXIT +# start prototype server with --host 127.0.0.1 --port & +# server_pid=$! +# run curl/smoke commands, then let trap stop it +``` + +### Whisper NPU (`:18816`) + +```bash +curl -fsS http://127.0.0.1:18816/health | jq . +# For a real transcription smoke, use a small non-private WAV fixture only. +# Verify both response npu_busy_delta_us and sysfs busy-time delta. +``` + +Operational notes: +- Managed as Docker Compose service/container `whisper-server-npu` in `~/lab/swarm`. +- Consistent with existing swarm service patterns because it is a containerized service with Compose health. +- Do not restart it from this runbook unless Will asked for remediation. + +### OpenVINO embeddings (`:18817`) + +```bash +systemctl --user status openvino-embeddings.service --no-pager +curl -fsS http://127.0.0.1:18817/healthz | jq . +``` + +Operational notes: +- User systemd unit: `openvino-embeddings.service`. +- Model: `bge-base-en-v1.5-int8-ov`. +- Model directory: `/home/will/.cache/openvino-models/bge-base-en-v1.5-int8-ov`. +- Live RAG `:18810` uses Chroma collection `obsidian_bge_npu` through this service. Do not reindex or replace this collection in place. + +### Reranker prototype (`:18818`) + +Foreground review start only, after confirming port is free: + +```bash +ss -ltnp | grep ':18818\b' || true +cd ~/lab/swarm/openvino-reranker-npu +source /home/will/.venvs/openvino-reranker/bin/activate +OPENVINO_RERANKER_HOST=127.0.0.1 \ +OPENVINO_RERANKER_PORT=18818 \ +OPENVINO_RERANKER_DEVICE=NPU \ +OPENVINO_RERANKER_MODEL_DIR=/home/will/.cache/openvino-models/rerankers/ms-marco-MiniLM-L6-v2-int8-ov \ +python server.py +``` + +From another shell: + +```bash +curl -fsS http://127.0.0.1:18818/readyz | jq . +python ~/lab/swarm/openvino-reranker-npu/smoke.py --url http://127.0.0.1:18818 +``` + +Approval gate: +- May be installed as `openvino-reranker.service` only after foreground smoke and Will approval. +- May be integrated into RAG only behind disabled-by-default knobs such as `RAG_RERANK_ENABLED=false`; request-time reranking must not mutate Chroma. + +### Router/classifier prototype (`:18819`) + +Foreground review start only, after confirming port is free: + +```bash +ss -ltnp | grep ':18819\b' || true +cd ~/lab/swarm/openvino-classifier-npu +/home/will/.venvs/npu/bin/python router_classifier.py --host 127.0.0.1 --port 18819 +``` + +Smoke: + +```bash +curl -fsS http://127.0.0.1:18819/healthz | jq . +curl -fsS http://127.0.0.1:18819/v1/classify \ + -H 'Content-Type: application/json' \ + -d '{"id":"smoke","text":"Urgent: check whether port 18817 is listening and inspect systemd logs.","options":{"include_evidence":true,"dry_run":true}}' | jq . +``` + +Approval gate: +- May be installed as `openvino-router-classifier.service` only after Will approves live service enablement. +- Must remain dry-run and must not alter Hermes/Atlas routing, memory writes, safety confirmation flow, or outbound messages without a separate explicit approval. + +### Small GenAI NPU worker (`:18820`) + +Foreground review start only, after confirming port is free: + +```bash +ss -ltnp | grep ':18820\b' || true +cd ~/lab/swarm/openvino-genai-npu-worker +/home/will/.venvs/npu/bin/python worker.py --host 127.0.0.1 --port 18820 +``` + +Smoke: + +```bash +curl -fsS http://127.0.0.1:18820/healthz | jq . +curl -fsS http://127.0.0.1:18820/models | jq . +curl -fsS http://127.0.0.1:18820/v1/worker/condense-notification \ + -H 'Content-Type: application/json' \ + -d '{"input":"Non-private smoke notification for local NPU worker.","max_new_tokens":64}' | jq . +``` + +Approval gate: +- May be installed as `openvino-genai-npu-worker.service` only after Will approves persistent service enablement. +- Must not become primary Atlas/Hermes model routing. Use only for bounded background jobs such as title, summary, notification condensation, and memory-candidate drafting. + +### Document/image triage prototype (`:18829` optional review port) + +Foreground review start only, after confirming the port is free: + +```bash +ss -ltnp | grep ':18829\b' || true +cd ~/lab/swarm/openvino-doc-image-triage-npu +/home/will/.venvs/npu/bin/python server.py --host 127.0.0.1 --port 18829 --allowed-root "$PWD" +``` + +Smoke: + +```bash +curl -fsS http://127.0.0.1:18829/healthz | jq . +curl -fsS http://127.0.0.1:18829/models | jq . +/home/will/.venvs/npu/bin/python tests/smoke_test.py +``` + +Approval gate: +- Do not point it at arbitrary directories; allowed roots must be equal to or under configured roots. +- Do not include raw OCR text or full source paths unless Will explicitly asks for a one-off response. +- v1 only uses the NPU through `:18817` embeddings for needs-attention; image category classification and OCR are CPU/local fallbacks. + +## Systemd and Compose recommendations + +Recommended management split: +- Keep containerized services in Docker Compose when they already have Docker build/runtime shape and Compose health (`whisper-server-npu`). +- Keep host-side OpenVINO Python prototypes as user systemd services when they depend on local venvs, sysfs NPU access, model caches, and localhost-only APIs (`openvino-embeddings`, optional reranker/classifier/GenAI worker). +- Do not add the prototypes to the live gateway or primary routing during installation. Installation and routing are separate approval gates. + +User-systemd unit expectations for optional prototypes: +- `WorkingDirectory` points at the service directory under `~/lab/swarm/`. +- `ExecStart` uses the existing venv path documented by the prototype. +- `Environment` pins host to `127.0.0.1`, port, model path, device `NPU`, and any upstream endpoint. +- `Restart=on-failure`, not aggressive restart loops. +- Logs go to user journal; do not log raw request bodies. +- Start manually for smoke; enable on boot only after Will approval. + +Compose expectations for existing swarm services: +- Prefer `cd ~/lab/swarm && make ps`, `make status`, and targeted `docker compose ps ` for read-only checks. +- Do not run `docker compose up -d`, restart containers, pull images, or prune volumes from this runbook without approval. + +## Monitoring and logging notes + +Minimum recurring monitoring should include: +- Listener presence for `18816`, `18817`, and any approved optional prototype ports. +- User service state for `openvino-embeddings.service` and any approved optional prototype unit. +- Docker Compose health for `whisper-server-npu`. +- HTTP health endpoint success. +- Positive sysfs NPU busy-time delta on at least one non-private inference probe, preferably embeddings `:18817` because it is already live and central. +- Journal/container logs only at summary level. Avoid raw prompts, raw OCR text, private document names, credentials, and API keys. + +Useful log commands: + +```bash +journalctl --user -u openvino-embeddings.service -n 100 --no-pager +journalctl --user -u rag-embedding-health.service -n 100 --no-pager +journalctl --user -u openvino-reranker.service -n 100 --no-pager +journalctl --user -u openvino-router-classifier.service -n 100 --no-pager +journalctl --user -u openvino-genai-npu-worker.service -n 100 --no-pager +cd ~/lab/swarm && docker compose logs --tail 100 whisper-server-npu +``` + +## Approval gates + +Requires explicit Will approval before proceeding: +- Installing, enabling, or autostarting `openvino-reranker.service`, `openvino-router-classifier.service`, or `openvino-genai-npu-worker.service`. +- Assigning a final persistent port to document/image triage or enabling it as a persistent service. +- Enabling live RAG reranking or any request path that changes Atlas/RAG answers. +- Changing primary Atlas/Hermes routing or connecting router/classifier outputs to live decisions. +- Connecting the GenAI worker to primary Atlas chat, gateway routing, memory writes, or outbound notifications. +- Restarting the live Atlas/Hermes gateway. +- Deleting, overwriting, or in-place reindexing existing vector collections. +- Broadening bind addresses or exposure beyond local-only defaults. + +Approved/parked outcomes: +- Built/approved prototypes: reranker (`:18818`), router/classifier (`:18819`), small GenAI worker (`:18820`), document/image triage (review ports `:18828`/`:18829`). +- Live baseline retained: Whisper NPU (`:18816`), OpenVINO embeddings (`:18817`), RAG endpoint (`:18810`) using `obsidian_bge_npu`. +- Parked: always-on wake-word/audio and conventional vision detection until Will wants a concrete use case. +- Rejected for this NPU program: diffusion/image generation.