263 lines
12 KiB
Markdown
263 lines
12 KiB
Markdown
# Swarm Infrastructure
|
|
|
|
This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the `zap` workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.
|
|
|
|
## High-level topology
|
|
|
|
```text
|
|
Telegram / Discord / Email
|
|
|
|
|
v
|
|
Hermes / Atlas gateway (default profile)
|
|
|
|
|
+--> local tools and specialist profiles
|
|
+--> n8n automation workflows on :18808
|
|
|
|
n8n automation
|
|
|
|
|
+--> direct watchdog probes for key service ports
|
|
+--> Agentmon Health Watchdog -> agentmon-query :8081
|
|
+--> Obsidian, RAG, voice memo, URL capture, digest workflows
|
|
|
|
agentmon
|
|
|
|
|
+--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
|
|
+--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
|
|
+--> NATS JetStream -> event processor -> Postgres
|
|
+--> query API / UI on :8081 / :8082
|
|
|
|
local AI/search/voice services
|
|
|
|
|
+--> LiteLLM :18804
|
|
+--> SearXNG :18803
|
|
+--> Brave MCP :18802
|
|
+--> llama.cpp :18806
|
|
+--> Ollama embeddings :18807 (legacy/CPU fallback)
|
|
+--> OpenVINO NPU embeddings :18817
|
|
+--> Kokoro TTS :18805
|
|
+--> Whisper NPU :18816
|
|
+--> local-only NPU sidecars: reranker :18818, router/classifier :18819, GenAI worker :18820, doc/image triage :18829
|
|
```
|
|
|
|
See also:
|
|
|
|
- [`swarm-infrastructure.html`](./swarm-infrastructure.html) — visual architecture diagram
|
|
- [`diagram-maintenance.md`](./diagram-maintenance.md) — how to keep diagrams updated and when to create new ones
|
|
|
|
## Runtime layers
|
|
|
|
### 1. Messaging and agent gateway
|
|
|
|
- **Hermes / Atlas default profile** is the production messaging gateway.
|
|
- Connected platforms include Telegram, Discord, and email.
|
|
- Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
|
|
- Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.
|
|
|
|
### 2. n8n automation
|
|
|
|
Container/service:
|
|
|
|
- `n8n-agent`
|
|
- Host URL: `http://127.0.0.1:18808`
|
|
- Container URL: `http://127.0.0.1:5678`
|
|
- Compose project: `/home/will/lab/swarm/docker-compose.yaml`
|
|
|
|
Important workflow source exports live under:
|
|
|
|
- `swarm-common/n8n-workflows/`
|
|
|
|
Current health/automation patterns:
|
|
|
|
- **Swarm Health Watchdog**: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
|
|
- **Agentmon Health Watchdog**: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
|
|
- **RAG and Embedding Health Watchdog**: checks RAG/search/embedding path.
|
|
- Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.
|
|
|
|
### 3. Agentmon monitoring layer
|
|
|
|
Repo:
|
|
|
|
- `/home/will/lab/agentmon`
|
|
|
|
Compose services:
|
|
|
|
- `agentmon-ingest` on `:8080` — ingestion gateway, `/healthz`
|
|
- `agentmon-query` on `:8081` — query API, `/healthz`, `/v1/events`, `/v1/stats/summary`
|
|
- `agentmon-ui` on `:8082` — web UI, `/healthz`
|
|
- `agentmon-processor` — NATS to Postgres event processor
|
|
- `agentmon-swarm-monitor` — monitors Docker containers labeled `agentmon.monitor=true`
|
|
- `agentmon-openclaw-monitor` — emits OpenClaw VM snapshots
|
|
- `agentmon-db` — Postgres
|
|
- `agentmon-nats` — NATS JetStream
|
|
|
|
Key query endpoints:
|
|
|
|
```text
|
|
http://127.0.0.1:8080/healthz
|
|
http://127.0.0.1:8081/healthz
|
|
http://127.0.0.1:8082/healthz
|
|
http://127.0.0.1:8081/v1/stats/summary
|
|
http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
|
|
http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
|
|
http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3
|
|
```
|
|
|
|
From inside `n8n-agent`, use the Docker bridge gateway:
|
|
|
|
```text
|
|
http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
|
|
```
|
|
|
|
### 4. Local AI, search, and voice services
|
|
|
|
Docker services:
|
|
|
|
- `litellm` — `:18804`, OpenAI-compatible LLM router
|
|
- `litellm-db` — Postgres backing LiteLLM
|
|
- `searxng` — `:18803`, local metasearch
|
|
- `brave-search` — `:18802`, Brave Search MCP server
|
|
- `kokoro-tts` — `:18805`, local TTS
|
|
- `whisper-server-npu` — `:18816`, OpenVINO NPU local transcription
|
|
- `n8n-agent` — `:18808`, automation
|
|
|
|
Host/user services:
|
|
|
|
- `llama-server.service` — `:18806`, local llama.cpp OpenAI-compatible LLM
|
|
- `ollama.service` — `:18807`, legacy/CPU embeddings API fallback
|
|
- `openvino-embeddings.service` — `:18817`, OpenVINO NPU embeddings API (`/v1/embeddings`, `/api/embed`, `/api/embeddings`)
|
|
- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n
|
|
- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger and `/semantic-search`; default collection `obsidian_bge_npu` using OpenVINO NPU embeddings, with request-time `:18818` reranking enabled with vector-order fallback
|
|
- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction
|
|
- `voice-memo-processor.service` — `:18813`, voice memo processing
|
|
- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper
|
|
- `openvino-router-classifier.service` — `:18819`, local-only dry-run Atlas/Hermes message classifier; advisory only
|
|
- `openvino-genai-npu-worker.service` — `:18820`, local-only bounded GenAI worker for small background generation jobs
|
|
- `openvino-doc-image-triage.service` — `:18829`, local-only document/image triage HTTP wrapper with allowed-root enforcement
|
|
- `openvino-advisory-gateway.service` — `172.19.0.1:18830`, Docker-bridge advisory envelope wrapper over classifier, GenAI, and doc/image triage for `n8n-agent`; explicit no-authority contract
|
|
|
|
Local-only OpenVINO NPU sidecars:
|
|
|
|
| Port | Component | State | Safety boundary |
|
|
| ---: | --- | --- | --- |
|
|
| `18818` | reranker | live user service; request-time second stage for `:18810/semantic-search` | no Chroma/vector mutation; vector-order fallback on timeout/error/non-positive NPU proof |
|
|
| `18819` | router/classifier | live user service; dry-run only | no Hermes/Atlas routing, memory writes, service restarts, or outbound messages |
|
|
| `18820` | bounded GenAI worker | live user service | background jobs only; not primary Atlas/Hermes model routing |
|
|
| `18829` | document/image triage | live localhost server | allowed-root limited; no private directory processing unless explicitly approved; NPU stage is embeddings via `:18817` |
|
|
| `18830` | advisory gateway | live user service; bound to `172.19.0.1` for `n8n-agent` bridge access | returns `openvino_advisory_v1` envelopes only; no routing, memory writes, external sends, tool execution, restarts, or process-root broadening from request payloads; refuses wildcard binds |
|
|
|
|
These sidecars bind to `127.0.0.1` by default, except `openvino-advisory-gateway.service`, which is explicitly approved on the Docker bridge IP `172.19.0.1` so `n8n-agent` can call it. They must not be wired into live Atlas/Hermes routing, memory writes, broad private document processing, external sends, tool execution, service restarts, or primary model paths without explicit Will approval. Any NPU claim requires a positive `/sys/class/accel/accel0/device/npu_busy_time_us` delta before/after inference or service-reported equivalent. HTTP 200 alone is not proof.
|
|
|
|
### 5. Obsidian and RAG
|
|
|
|
Vault:
|
|
|
|
- `/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap`
|
|
|
|
Local REST API:
|
|
|
|
- HTTP: `127.0.0.1:27123`
|
|
- HTTPS: `127.0.0.1:27124`
|
|
|
|
RAG/vector store:
|
|
|
|
- ChromaDB path: `~/.hermes/data/rag-search/chroma/`
|
|
- Reindex state/progress: active BGE/NPU state in `~/.hermes/data/rag-search/obsidian_bge_npu_index_state.json` and `obsidian_bge_npu_reindex_progress.json`; legacy Ollama state in `obsidian_index_state.json` remains for comparison/fallback.
|
|
- Active RAG query/reindex embedding backend: OpenVINO NPU embeddings service on `:18817`, currently `bge-base-en-v1.5-int8-ov`, collection `obsidian_bge_npu`.
|
|
- Legacy comparison/fallback collection: `obsidian`, built with Ollama on `:18807` using `nomic-embed-text`.
|
|
- Reindex/search endpoint: `POST :18810/reindex` for incremental updates, `POST :18810/reindex?full=true` for full semantic rebuilds, `GET :18810/semantic-health` to verify vectors plus a search smoke test, and `POST :18810/semantic-search` for n8n/Hermes semantic context lookup.
|
|
- Reranker path: `RAG_RERANK_ENABLED=true` for `:18810/semantic-search` after local bake testing. `/semantic-search` retrieves `RAG_RERANK_INITIAL_K` vector candidates, calls `RAG_RERANK_URL` (`http://127.0.0.1:18818/rerank`), returns reranked `RAG_RERANK_TOP_K`, requires positive `npu_busy_delta_us` by default (`RAG_RERANK_REQUIRE_NPU_PROOF=true`), and falls back to vector order with `rerank.error` metadata on timeout/error/non-positive NPU proof. Reranking is request-time only and must not mutate Chroma/vector collections.
|
|
|
|
## Monitoring model
|
|
|
|
The monitoring design is intentionally layered:
|
|
|
|
1. **n8n direct probes** check critical service endpoints and send deduped alerts.
|
|
2. **agentmon** continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
|
|
3. **n8n Agentmon Health Watchdog** polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
|
|
4. **Hermes/Atlas** can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.
|
|
|
|
This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.
|
|
|
|
## Agentmon Health Watchdog
|
|
|
|
Workflow source:
|
|
|
|
- `swarm-common/n8n-workflows/agentmon-health-watchdog.json`
|
|
|
|
Installed n8n workflow:
|
|
|
|
- Name: `Agentmon Health Watchdog`
|
|
- ID: `AgentmonHealthWatchdog`
|
|
- Schedule: every 5 minutes
|
|
|
|
Alert conditions:
|
|
|
|
- `agentmon-ingest`, `agentmon-query`, or `agentmon-ui` `/healthz` fails.
|
|
- Latest `swarm.snapshot` is missing.
|
|
- Latest `swarm.snapshot` is older than 3 minutes.
|
|
- Snapshot issues are non-empty.
|
|
- Required agentmon services are missing or not healthy/running:
|
|
- `agentmon-ingest`
|
|
- `agentmon-query`
|
|
- `agentmon-ui`
|
|
- `agentmon-processor`
|
|
- `agentmon-swarm-monitor`
|
|
- `agentmon-db`
|
|
- `agentmon-nats`
|
|
|
|
Deduplication:
|
|
|
|
- Alert after 2 failed checks.
|
|
- Reminder every 6 failed runs.
|
|
- Recovery message when state returns healthy.
|
|
|
|
## Operational quick checks
|
|
|
|
From the host:
|
|
|
|
```bash
|
|
cd /home/will/lab/swarm
|
|
make status
|
|
make local-ai-health
|
|
./scripts/npu-service-health.sh # read-only; includes sysfs busy-time proof for :18817
|
|
curl -fsS http://127.0.0.1:18810/semantic-health | jq '{status,state,search_ok,result_count}'
|
|
curl -fsS http://127.0.0.1:18810/semantic-search \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"query":"non-private semantic smoke","top_k":2}' \
|
|
| jq '{ok,index,top_k,search_k,rerank,result_count}'
|
|
curl -fsS http://127.0.0.1:18808/healthz
|
|
curl -fsS http://127.0.0.1:8081/healthz
|
|
curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .
|
|
```
|
|
|
|
From inside `n8n-agent`:
|
|
|
|
```bash
|
|
docker exec n8n-agent /bin/sh -lc '
|
|
wget -qO- -T 5 http://172.19.0.1:18810/healthz
|
|
wget -qO- -T 5 http://172.19.0.1:18814/healthz
|
|
wget -qO- -T 5 http://172.19.0.1:18817/healthz | head -c 500
|
|
'
|
|
```
|
|
|
|
Verify n8n workflow activation:
|
|
|
|
```bash
|
|
docker exec -u node n8n-agent n8n export:workflow \
|
|
--id=AgentmonHealthWatchdog \
|
|
--output=/tmp/agentmon-export.json
|
|
|
|
docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
|
|
jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
|
|
```
|
|
|
|
## Notes and pitfalls
|
|
|
|
- Do not commit `.env`, decrypted credentials, raw credential exports, or runtime DB files.
|
|
- n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
|
|
- From host, use `127.0.0.1:<host-port>`.
|
|
- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:<host-port>` for host-published swarm services.
|
|
- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
|
|
- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.
|
|
- OpenVINO NPU sidecars on `:18819`, `:18820`, and `:18829` are live local-only services, but remain isolated specialists. The `:18818` reranker is live as a local request-time second stage for `:18810/semantic-search`; it still falls back to vector order on timeout/error/non-positive NPU proof. Do not draw live Atlas/Hermes routing, memory-write, broad document-processing, or primary-model arrows to these sidecars without a separate approved integration.
|