Files

T

William Valentin aeb3c9f8fb fix(npu): expose advisory gateway on docker bridge

2026-06-04 16:19:22 -07:00

12 KiB

Raw Blame History

Swarm Infrastructure

This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the zap workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.

High-level topology

Telegram / Discord / Email
        |
        v
Hermes / Atlas gateway (default profile)
        |
        +--> local tools and specialist profiles
        +--> n8n automation workflows on :18808

n8n automation
        |
        +--> direct watchdog probes for key service ports
        +--> Agentmon Health Watchdog -> agentmon-query :8081
        +--> Obsidian, RAG, voice memo, URL capture, digest workflows

agentmon
        |
        +--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
        +--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
        +--> NATS JetStream -> event processor -> Postgres
        +--> query API / UI on :8081 / :8082

local AI/search/voice services
        |
        +--> LiteLLM :18804
        +--> SearXNG :18803
        +--> Brave MCP :18802
        +--> llama.cpp :18806
        +--> Ollama embeddings :18807 (legacy/CPU fallback)
        +--> OpenVINO NPU embeddings :18817
        +--> Kokoro TTS :18805
        +--> Whisper NPU :18816
        +--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829

Runtime layers

1. Messaging and agent gateway

Hermes / Atlas default profile is the production messaging gateway.
Connected platforms include Telegram, Discord, and email.
Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.

2. n8n automation

Container/service:

n8n-agent
Host URL: http://127.0.0.1:18808
Container URL: http://127.0.0.1:5678
Compose project: /home/will/lab/swarm/docker-compose.yaml

Important workflow source exports live under:

swarm-common/n8n-workflows/

Current health/automation patterns:

Swarm Health Watchdog: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
Agentmon Health Watchdog: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
RAG and Embedding Health Watchdog: checks RAG/search/embedding path.
Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.

3. Agentmon monitoring layer

Repo:

/home/will/lab/agentmon

Compose services:

agentmon-ingest on :8080 — ingestion gateway, /healthz
agentmon-query on :8081 — query API, /healthz, /v1/events, /v1/stats/summary
agentmon-ui on :8082 — web UI, /healthz
agentmon-processor — NATS to Postgres event processor
agentmon-swarm-monitor — monitors Docker containers labeled agentmon.monitor=true
agentmon-openclaw-monitor — emits OpenClaw VM snapshots
agentmon-db — Postgres
agentmon-nats — NATS JetStream

Key query endpoints:

http://127.0.0.1:8080/healthz
http://127.0.0.1:8081/healthz
http://127.0.0.1:8082/healthz
http://127.0.0.1:8081/v1/stats/summary
http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3

From inside n8n-agent, use the Docker bridge gateway:

http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1

4. Local AI, search, and voice services

Docker services:

litellm — :18804, OpenAI-compatible LLM router
litellm-db — Postgres backing LiteLLM
searxng — :18803, local metasearch
brave-search — :18802, Brave Search MCP server
kokoro-tts — :18805, local TTS
whisper-server-npu — :18816, OpenVINO NPU local transcription
n8n-agent — :18808, automation

Host/user services:

llama-server.service — :18806, local llama.cpp OpenAI-compatible LLM
ollama.service — :18807, legacy/CPU embeddings API fallback
openvino-embeddings.service — :18817, OpenVINO NPU embeddings API (/v1/embeddings, /api/embed, /api/embeddings)
docker-health-endpoint.service — :18809, read-only container health for n8n
obsidian-reindex-endpoint.service — :18810, Obsidian/RAG reindex trigger and /semantic-search; default collection obsidian_bge_npu using OpenVINO NPU embeddings, with request-time :18818 reranking enabled with vector-order fallback
url-content-extractor.service — :18812, YouTube/PDF/web extraction
voice-memo-processor.service — :18813, voice memo processing
rag-embedding-health.service — :18814, RAG/embedding health wrapper
openvino-router-classifier.service — :18819, local-only dry-run Atlas/Hermes message classifier; advisory only
openvino-genai-npu-worker.service — :18820, local-only bounded GenAI worker for small background generation jobs
openvino-doc-image-triage.service — :18829, local-only document/image triage HTTP wrapper with allowed-root enforcement
openvino-advisory-gateway.service — 172.19.0.1:18830, Docker-bridge advisory envelope wrapper over classifier, GenAI, and doc/image triage for n8n-agent; explicit no-authority contract

Local-only OpenVINO NPU sidecars:

Port	Component	State	Safety boundary
`18818`	reranker	live user service; request-time second stage for `:18810/semantic-search`	no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof
`18819`	router/classifier	live user service; dry-run only	no Hermes/Atlas routing, memory writes, service restarts, or outbound messages
`18820`	bounded GenAI worker	live user service	background jobs only; not primary Atlas/Hermes model routing
`18829`	document/image triage	live localhost server	allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817`
`18830`	advisory gateway	live user service; bound to `172.19.0.1` for `n8n-agent` bridge access	returns `openvino_advisory_v1` envelopes only; no routing, memory writes, external sends, tool execution, restarts, or process-root broadening from request payloads; refuses wildcard binds

These sidecars bind to 127.0.0.1 by default, except openvino-advisory-gateway.service, which is explicitly approved on the Docker bridge IP 172.19.0.1 so n8n-agent can call it. They must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, external sends, tool execution, service restarts, or primary model paths without explicit Will approval. Any NPU claim requires a positive /sys/class/accel/accel0/device/npu_busy_time_us delta before/after inference or service-reported equivalent. HTTP 200 alone is not proof.

5. Obsidian and RAG

Vault:

/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap

Local REST API:

HTTP: 127.0.0.1:27123
HTTPS: 127.0.0.1:27124

RAG/vector store:

ChromaDB path: ~/.hermes/data/rag-search/chroma/
Reindex state/progress: active BGE/NPU state in ~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json and obsidian_bge_npu_reindex_progress.json; legacy Ollama state in obsidian_index_state.json remains for comparison/fallback.
Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on :18817, currently bge-base-en-v1.5-int8-ov, collection obsidian_bge_npu.
Legacy comparison/fallback collection: obsidian, built with Ollama on :18807 using nomic-embed-text.
Reindex/search endpoint: POST :18810/reindex for incremental updates, POST :18810/reindex?full=true for full semantic rebuilds, GET :18810/semantic-health to verify vectors plus a search smoke test, and POST :18810/semantic-search for n8n/Hermes semantic context lookup.
Reranker path: RAG_RERANK_ENABLED=true for :18810/semantic-search after local bake testing. /semantic-search retrieves RAG_RERANK_INITIAL_K vector candidates, calls RAG_RERANK_URL (http://127.0.0.1:18818/rerank), returns reranked RAG_RERANK_TOP_K, requires positive npu_busy_delta_us by default (RAG_RERANK_REQUIRE_NPU_PROOF=true), and falls back to vector order with rerank.error metadata on timeout/error/non-positive NPU proof. Reranking is request-time only and must not mutate Chroma/vector collections.

Monitoring model

The monitoring design is intentionally layered:

n8n direct probes check critical service endpoints and send deduped alerts.
agentmon continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
n8n Agentmon Health Watchdog polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
Hermes/Atlas can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.

This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.

Agentmon Health Watchdog

Workflow source:

swarm-common/n8n-workflows/agentmon-health-watchdog.json

Installed n8n workflow:

Name: Agentmon Health Watchdog
ID: AgentmonHealthWatchdog
Schedule: every 5 minutes

Alert conditions:

agentmon-ingest, agentmon-query, or agentmon-ui /healthz fails.
Latest swarm.snapshot is missing.
Latest swarm.snapshot is older than 3 minutes.
Snapshot issues are non-empty.
Required agentmon services are missing or not healthy/running:
- agentmon-ingest
- agentmon-query
- agentmon-ui
- agentmon-processor
- agentmon-swarm-monitor
- agentmon-db
- agentmon-nats

Deduplication:

Alert after 2 failed checks.
Reminder every 6 failed runs.
Recovery message when state returns healthy.

Operational quick checks

From the host:

cd /home/will/lab/swarm
make status
make local-ai-health
./scripts/npu-service-health.sh  # read-only; includes sysfs busy-time proof for :18817
curl -fsS http://127.0.0.1:18810/semantic-health | jq '{status,state,search_ok,result_count}'
curl -fsS http://127.0.0.1:18810/semantic-search \
  -H 'Content-Type: application/json' \
  -d '{"query":"non-private semantic smoke","top_k":2}' \
  | jq '{ok,index,top_k,search_k,rerank,result_count}'
curl -fsS http://127.0.0.1:18808/healthz
curl -fsS http://127.0.0.1:8081/healthz
curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .

From inside n8n-agent:

docker exec n8n-agent /bin/sh -lc '
  wget -qO- -T 5 http://172.19.0.1:18810/healthz
  wget -qO- -T 5 http://172.19.0.1:18814/healthz
  wget -qO- -T 5 http://172.19.0.1:18817/healthz | head -c 500
'

Verify n8n workflow activation:

docker exec -u node n8n-agent n8n export:workflow \
  --id=AgentmonHealthWatchdog \
  --output=/tmp/agentmon-export.json

docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json

Notes and pitfalls

Do not commit .env, decrypted credentials, raw credential exports, or runtime DB files.
n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
From host, use 127.0.0.1:<host-port>.
From n8n-agent, use 127.0.0.1:5678 for n8n itself and 172.19.0.1:<host-port> for host-published swarm services.
Agentmon /healthz only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
OpenVINO NPU sidecars on :18819, :18820, and :18829 are live local-only services, but remain isolated specialists. The :18818 reranker is live as a local request-time second stage for :18810/semantic-search; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration.

12 KiB Raw Blame History