Files
swarm-master/docs/swarm-infrastructure.md
T
2026-06-04 16:19:22 -07:00

12 KiB

Swarm Infrastructure

This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the zap workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.

High-level topology

Telegram / Discord / Email
        |
        v
Hermes / Atlas gateway (default profile)
        |
        +--> local tools and specialist profiles
        +--> n8n automation workflows on :18808

n8n automation
        |
        +--> direct watchdog probes for key service ports
        +--> Agentmon Health Watchdog -> agentmon-query :8081
        +--> Obsidian, RAG, voice memo, URL capture, digest workflows

agentmon
        |
        +--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
        +--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
        +--> NATS JetStream -> event processor -> Postgres
        +--> query API / UI on :8081 / :8082

local AI/search/voice services
        |
        +--> LiteLLM :18804
        +--> SearXNG :18803
        +--> Brave MCP :18802
        +--> llama.cpp :18806
        +--> Ollama embeddings :18807 (legacy/CPU fallback)
        +--> OpenVINO NPU embeddings :18817
        +--> Kokoro TTS :18805
        +--> Whisper NPU :18816
        +--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829

See also:

Runtime layers

1. Messaging and agent gateway

  • Hermes / Atlas default profile is the production messaging gateway.
  • Connected platforms include Telegram, Discord, and email.
  • Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
  • Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.

2. n8n automation

Container/service:

  • n8n-agent
  • Host URL: http://127.0.0.1:18808
  • Container URL: http://127.0.0.1:5678
  • Compose project: /home/will/lab/swarm/docker-compose.yaml

Important workflow source exports live under:

  • swarm-common/n8n-workflows/

Current health/automation patterns:

  • Swarm Health Watchdog: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
  • Agentmon Health Watchdog: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
  • RAG and Embedding Health Watchdog: checks RAG/search/embedding path.
  • Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.

3. Agentmon monitoring layer

Repo:

  • /home/will/lab/agentmon

Compose services:

  • agentmon-ingest on :8080 — ingestion gateway, /healthz
  • agentmon-query on :8081 — query API, /healthz, /v1/events, /v1/stats/summary
  • agentmon-ui on :8082 — web UI, /healthz
  • agentmon-processor — NATS to Postgres event processor
  • agentmon-swarm-monitor — monitors Docker containers labeled agentmon.monitor=true
  • agentmon-openclaw-monitor — emits OpenClaw VM snapshots
  • agentmon-db — Postgres
  • agentmon-nats — NATS JetStream

Key query endpoints:

http://127.0.0.1:8080/healthz
http://127.0.0.1:8081/healthz
http://127.0.0.1:8082/healthz
http://127.0.0.1:8081/v1/stats/summary
http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3

From inside n8n-agent, use the Docker bridge gateway:

http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1

4. Local AI, search, and voice services

Docker services:

  • litellm:18804, OpenAI-compatible LLM router
  • litellm-db — Postgres backing LiteLLM
  • searxng:18803, local metasearch
  • brave-search:18802, Brave Search MCP server
  • kokoro-tts:18805, local TTS
  • whisper-server-npu:18816, OpenVINO NPU local transcription
  • n8n-agent:18808, automation

Host/user services:

  • llama-server.service:18806, local llama.cpp OpenAI-compatible LLM
  • ollama.service:18807, legacy/CPU embeddings API fallback
  • openvino-embeddings.service:18817, OpenVINO NPU embeddings API (/v1/embeddings, /api/embed, /api/embeddings)
  • docker-health-endpoint.service:18809, read-only container health for n8n
  • obsidian-reindex-endpoint.service:18810, Obsidian/RAG reindex trigger and /semantic-search; default collection obsidian_bge_npu using OpenVINO NPU embeddings, with request-time :18818 reranking enabled with vector-order fallback
  • url-content-extractor.service:18812, YouTube/PDF/web extraction
  • voice-memo-processor.service:18813, voice memo processing
  • rag-embedding-health.service:18814, RAG/embedding health wrapper
  • openvino-router-classifier.service:18819, local-only dry-run Atlas/Hermes message classifier; advisory only
  • openvino-genai-npu-worker.service:18820, local-only bounded GenAI worker for small background generation jobs
  • openvino-doc-image-triage.service:18829, local-only document/image triage HTTP wrapper with allowed-root enforcement
  • openvino-advisory-gateway.service172.19.0.1:18830, Docker-bridge advisory envelope wrapper over classifier, GenAI, and doc/image triage for n8n-agent; explicit no-authority contract

Local-only OpenVINO NPU sidecars:

Port Component State Safety boundary
18818 reranker live user service; request-time second stage for :18810/semantic-search no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof
18819 router/classifier live user service; dry-run only no Hermes/Atlas routing, memory writes, service restarts, or outbound messages
18820 bounded GenAI worker live user service background jobs only; not primary Atlas/Hermes model routing
18829 document/image triage live localhost server allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via :18817
18830 advisory gateway live user service; bound to 172.19.0.1 for n8n-agent bridge access returns openvino_advisory_v1 envelopes only; no routing, memory writes, external sends, tool execution, restarts, or process-root broadening from request payloads; refuses wildcard binds

These sidecars bind to 127.0.0.1 by default, except openvino-advisory-gateway.service, which is explicitly approved on the Docker bridge IP 172.19.0.1 so n8n-agent can call it. They must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, external sends, tool execution, service restarts, or primary model paths without explicit Will approval. Any NPU claim requires a positive /sys/class/accel/accel0/device/npu_busy_time_us delta before/after inference or service-reported equivalent. HTTP 200 alone is not proof.

5. Obsidian and RAG

Vault:

  • /home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap

Local REST API:

  • HTTP: 127.0.0.1:27123
  • HTTPS: 127.0.0.1:27124

RAG/vector store:

  • ChromaDB path: ~/.hermes/data/rag-search/chroma/
  • Reindex state/progress: active BGE/NPU state in ~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json and obsidian_bge_npu_reindex_progress.json; legacy Ollama state in obsidian_index_state.json remains for comparison/fallback.
  • Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on :18817, currently bge-base-en-v1.5-int8-ov, collection obsidian_bge_npu.
  • Legacy comparison/fallback collection: obsidian, built with Ollama on :18807 using nomic-embed-text.
  • Reindex/search endpoint: POST :18810/reindex for incremental updates, POST :18810/reindex?full=true for full semantic rebuilds, GET :18810/semantic-health to verify vectors plus a search smoke test, and POST :18810/semantic-search for n8n/Hermes semantic context lookup.
  • Reranker path: RAG_RERANK_ENABLED=true for :18810/semantic-search after local bake testing. /semantic-search retrieves RAG_RERANK_INITIAL_K vector candidates, calls RAG_RERANK_URL (http://127.0.0.1:18818/rerank), returns reranked RAG_RERANK_TOP_K, requires positive npu_busy_delta_us by default (RAG_RERANK_REQUIRE_NPU_PROOF=true), and falls back to vector order with rerank.error metadata on timeout/error/non-positive NPU proof. Reranking is request-time only and must not mutate Chroma/vector collections.

Monitoring model

The monitoring design is intentionally layered:

  1. n8n direct probes check critical service endpoints and send deduped alerts.
  2. agentmon continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
  3. n8n Agentmon Health Watchdog polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
  4. Hermes/Atlas can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.

This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.

Agentmon Health Watchdog

Workflow source:

  • swarm-common/n8n-workflows/agentmon-health-watchdog.json

Installed n8n workflow:

  • Name: Agentmon Health Watchdog
  • ID: AgentmonHealthWatchdog
  • Schedule: every 5 minutes

Alert conditions:

  • agentmon-ingest, agentmon-query, or agentmon-ui /healthz fails.
  • Latest swarm.snapshot is missing.
  • Latest swarm.snapshot is older than 3 minutes.
  • Snapshot issues are non-empty.
  • Required agentmon services are missing or not healthy/running:
    • agentmon-ingest
    • agentmon-query
    • agentmon-ui
    • agentmon-processor
    • agentmon-swarm-monitor
    • agentmon-db
    • agentmon-nats

Deduplication:

  • Alert after 2 failed checks.
  • Reminder every 6 failed runs.
  • Recovery message when state returns healthy.

Operational quick checks

From the host:

cd /home/will/lab/swarm
make status
make local-ai-health
./scripts/npu-service-health.sh  # read-only; includes sysfs busy-time proof for :18817
curl -fsS http://127.0.0.1:18810/semantic-health | jq '{status,state,search_ok,result_count}'
curl -fsS http://127.0.0.1:18810/semantic-search \
  -H 'Content-Type: application/json' \
  -d '{"query":"non-private semantic smoke","top_k":2}' \
  | jq '{ok,index,top_k,search_k,rerank,result_count}'
curl -fsS http://127.0.0.1:18808/healthz
curl -fsS http://127.0.0.1:8081/healthz
curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .

From inside n8n-agent:

docker exec n8n-agent /bin/sh -lc '
  wget -qO- -T 5 http://172.19.0.1:18810/healthz
  wget -qO- -T 5 http://172.19.0.1:18814/healthz
  wget -qO- -T 5 http://172.19.0.1:18817/healthz | head -c 500
'

Verify n8n workflow activation:

docker exec -u node n8n-agent n8n export:workflow \
  --id=AgentmonHealthWatchdog \
  --output=/tmp/agentmon-export.json

docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json

Notes and pitfalls

  • Do not commit .env, decrypted credentials, raw credential exports, or runtime DB files.
  • n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
  • From host, use 127.0.0.1:<host-port>.
  • From n8n-agent, use 127.0.0.1:5678 for n8n itself and 172.19.0.1:<host-port> for host-published swarm services.
  • Agentmon /healthz only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
  • OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
  • OpenVINO NPU sidecars on :18819, :18820, and :18829 are live local-only services, but remain isolated specialists. The :18818 reranker is live as a local request-time second stage for :18810/semantic-search; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration.