diff --git a/README.md b/README.md
index 48b9fa6..86cf32a 100644
--- a/README.md
+++ b/README.md
@@ -19,6 +19,7 @@ swarm/
│ └── vm/ # VM provisioning role (local)
├── openclaw/ # Live mirror of guest ~/.openclaw/
├── docker-compose.yaml # LiteLLM + supporting services
+├── docs/ # Swarm/agentmon/n8n infrastructure docs + diagrams
├── litellm-config.yaml # LiteLLM static config
├── litellm-init-credentials.sh # Register API keys into LiteLLM DB
├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent)
@@ -29,6 +30,13 @@ swarm/
└── README.md # This file
```
+## Current swarm/service architecture
+
+For the current host-side AI/search/voice automation stack, n8n watchdogs, and agentmon monitoring layer, see:
+
+- [`docs/swarm-infrastructure.md`](docs/swarm-infrastructure.md) — operational overview and quick checks
+- [`docs/swarm-infrastructure.html`](docs/swarm-infrastructure.html) — dark SVG architecture diagram
+
## VM: zap
| Property | Value |
diff --git a/docs/swarm-infrastructure.html b/docs/swarm-infrastructure.html
new file mode 100644
index 0000000..a5ca953
--- /dev/null
+++ b/docs/swarm-infrastructure.html
@@ -0,0 +1,113 @@
+
+
+
+
+
+ Will's Swarm Infrastructure
+
+
+
+
+
+
+
+
+
+
Monitoring model
- • n8n direct probes critical ports
- • agentmon aggregates Docker/OpenClaw snapshots
- • n8n polls agentmon for stale/degraded state
+
Operational endpoints
- • n8n: 127.0.0.1:18808
- • agentmon query/UI: 8081 / 8082
- • local LLM/embed: 18806 / 18807
+
Source paths
- • Swarm repo: ~/lab/swarm
- • Agentmon repo: ~/lab/agentmon
- • Workflows: swarm-common/n8n-workflows
+
+
+
+
+
diff --git a/docs/swarm-infrastructure.md b/docs/swarm-infrastructure.md
new file mode 100644
index 0000000..24826f7
--- /dev/null
+++ b/docs/swarm-infrastructure.md
@@ -0,0 +1,228 @@
+# Swarm Infrastructure
+
+This document is the source-of-truth overview for Will's local swarm/agent infrastructure on the `zap` workstation. It focuses on the runtime services that support Atlas/Hermes, n8n automation, local model/search/voice tooling, Obsidian/RAG automation, and the new agentmon monitoring layer.
+
+## High-level topology
+
+```text
+Telegram / Discord / Email
+ |
+ v
+Hermes / Atlas gateway (default profile)
+ |
+ +--> local tools and specialist profiles
+ +--> n8n automation workflows on :18808
+
+n8n automation
+ |
+ +--> direct watchdog probes for key service ports
+ +--> Agentmon Health Watchdog -> agentmon-query :8081
+ +--> Obsidian, RAG, voice memo, URL capture, digest workflows
+
+agentmon
+ |
+ +--> agentmon-swarm-monitor -> Docker labels agentmon.monitor=true
+ +--> agentmon-openclaw-monitor -> OpenClaw VM snapshots
+ +--> NATS JetStream -> event processor -> Postgres
+ +--> query API / UI on :8081 / :8082
+
+local AI/search/voice services
+ |
+ +--> LiteLLM :18804
+ +--> SearXNG :18803
+ +--> Brave MCP :18802
+ +--> llama.cpp :18806
+ +--> Ollama embeddings :18807
+ +--> Kokoro TTS :18805
+ +--> Whisper :18811
+```
+
+See also: [`swarm-infrastructure.html`](./swarm-infrastructure.html) for a visual architecture diagram.
+
+## Runtime layers
+
+### 1. Messaging and agent gateway
+
+- **Hermes / Atlas default profile** is the production messaging gateway.
+- Connected platforms include Telegram, Discord, and email.
+- Atlas uses local swarm services where suitable, especially search, local LLMs, embeddings, STT/TTS, n8n, and agentmon.
+- Specialist Hermes profiles are available for delegated work, but the default profile remains the stable production gateway.
+
+### 2. n8n automation
+
+Container/service:
+
+- `n8n-agent`
+- Host URL: `http://127.0.0.1:18808`
+- Container URL: `http://127.0.0.1:5678`
+- Compose project: `/home/will/lab/swarm/docker-compose.yaml`
+
+Important workflow source exports live under:
+
+- `swarm-common/n8n-workflows/`
+
+Current health/automation patterns:
+
+- **Swarm Health Watchdog**: direct endpoint checks for search, LLM, voice, n8n, Docker health, etc.
+- **Agentmon Health Watchdog**: polls agentmon aggregate snapshots and alerts on stale/degraded monitoring state.
+- **RAG and Embedding Health Watchdog**: checks RAG/search/embedding path.
+- Obsidian workflows: health/reindex, inbox triage, daily review, URL-to-note, chat summary capture, weekly decision/runbook extraction.
+
+### 3. Agentmon monitoring layer
+
+Repo:
+
+- `/home/will/lab/agentmon`
+
+Compose services:
+
+- `agentmon-ingest` on `:8080` — ingestion gateway, `/healthz`
+- `agentmon-query` on `:8081` — query API, `/healthz`, `/v1/events`, `/v1/stats/summary`
+- `agentmon-ui` on `:8082` — web UI, `/healthz`
+- `agentmon-processor` — NATS to Postgres event processor
+- `agentmon-swarm-monitor` — monitors Docker containers labeled `agentmon.monitor=true`
+- `agentmon-openclaw-monitor` — emits OpenClaw VM snapshots
+- `agentmon-db` — Postgres
+- `agentmon-nats` — NATS JetStream
+
+Key query endpoints:
+
+```text
+http://127.0.0.1:8080/healthz
+http://127.0.0.1:8081/healthz
+http://127.0.0.1:8082/healthz
+http://127.0.0.1:8081/v1/stats/summary
+http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
+http://127.0.0.1:8081/v1/events?event_type=swarm.service.snapshot&limit=20
+http://127.0.0.1:8081/v1/events?event_type=openclaw.snapshot&limit=3
+```
+
+From inside `n8n-agent`, use the Docker bridge gateway:
+
+```text
+http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1
+```
+
+### 4. Local AI, search, and voice services
+
+Docker services:
+
+- `litellm` — `:18804`, OpenAI-compatible LLM router
+- `litellm-db` — Postgres backing LiteLLM
+- `searxng` — `:18803`, local metasearch
+- `brave-search` — `:18802`, Brave Search MCP server
+- `kokoro-tts` — `:18805`, local TTS
+- `whisper-server` — `:18811`, local transcription
+- `n8n-agent` — `:18808`, automation
+
+Host/user services:
+
+- `llama-server.service` — `:18806`, local llama.cpp OpenAI-compatible LLM
+- `ollama.service` — `:18807`, embeddings API
+- `docker-health-endpoint.service` — `:18809`, read-only container health for n8n
+- `obsidian-reindex-endpoint.service` — `:18810`, Obsidian/RAG reindex trigger
+- `url-content-extractor.service` — `:18812`, YouTube/PDF/web extraction
+- `voice-memo-processor.service` — `:18813`, voice memo processing
+- `rag-embedding-health.service` — `:18814`, RAG/embedding health wrapper
+
+### 5. Obsidian and RAG
+
+Vault:
+
+- `/home/will/lab/swarm/swarm-common/obsidian-vault/will/will-shared-zap`
+
+Local REST API:
+
+- HTTP: `127.0.0.1:27123`
+- HTTPS: `127.0.0.1:27124`
+
+RAG/vector store:
+
+- ChromaDB path: `~/.hermes/data/rag-search/chroma/`
+- Embeddings backend: Ollama on `:18807`, normally `nomic-embed-text`
+
+## Monitoring model
+
+The monitoring design is intentionally layered:
+
+1. **n8n direct probes** check critical service endpoints and send deduped alerts.
+2. **agentmon** continuously observes labeled Docker services and OpenClaw state, then writes snapshots through NATS/Postgres.
+3. **n8n Agentmon Health Watchdog** polls agentmon's aggregate state and alerts if the monitoring pipeline itself becomes stale/degraded.
+4. **Hermes/Atlas** can inspect both n8n and agentmon when troubleshooting, and can use the same endpoints as part of operational checks.
+
+This means a single process being alive is not enough: the important signal is whether collection, ingestion, processing, storage, query, and alerting are all functioning.
+
+## Agentmon Health Watchdog
+
+Workflow source:
+
+- `swarm-common/n8n-workflows/agentmon-health-watchdog.json`
+
+Installed n8n workflow:
+
+- Name: `Agentmon Health Watchdog`
+- ID: `AgentmonHealthWatchdog`
+- Schedule: every 5 minutes
+
+Alert conditions:
+
+- `agentmon-ingest`, `agentmon-query`, or `agentmon-ui` `/healthz` fails.
+- Latest `swarm.snapshot` is missing.
+- Latest `swarm.snapshot` is older than 3 minutes.
+- Snapshot issues are non-empty.
+- Required agentmon services are missing or not healthy/running:
+ - `agentmon-ingest`
+ - `agentmon-query`
+ - `agentmon-ui`
+ - `agentmon-processor`
+ - `agentmon-swarm-monitor`
+ - `agentmon-db`
+ - `agentmon-nats`
+
+Deduplication:
+
+- Alert after 2 failed checks.
+- Reminder every 6 failed runs.
+- Recovery message when state returns healthy.
+
+## Operational quick checks
+
+From the host:
+
+```bash
+cd /home/will/lab/swarm
+make status
+make local-ai-health
+curl -fsS http://127.0.0.1:18808/healthz
+curl -fsS http://127.0.0.1:8081/healthz
+curl -fsS 'http://127.0.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1' | jq .
+```
+
+From inside `n8n-agent`:
+
+```bash
+docker exec n8n-agent /bin/sh -lc '
+ wget -qO- -T 5 http://172.19.0.1:8081/healthz
+ wget -qO- -T 5 "http://172.19.0.1:8081/v1/events?event_type=swarm.snapshot&limit=1" | head -c 500
+'
+```
+
+Verify n8n workflow activation:
+
+```bash
+docker exec -u node n8n-agent n8n export:workflow \
+ --id=AgentmonHealthWatchdog \
+ --output=/tmp/agentmon-export.json
+
+docker cp n8n-agent:/tmp/agentmon-export.json /tmp/agentmon-export.json
+jq '.[0] | {id,name,active,nodes:(.nodes|length)}' /tmp/agentmon-export.json
+```
+
+## Notes and pitfalls
+
+- Do not commit `.env`, decrypted credentials, raw credential exports, or runtime DB files.
+- n8n workflow backups can contain sensitive operational data; keep timestamped raw backups untracked unless intentionally sanitized.
+- From host, use `127.0.0.1:`.
+- From `n8n-agent`, use `127.0.0.1:5678` for n8n itself and `172.19.0.1:` for host-published swarm services.
+- Agentmon `/healthz` only proves the web/API process is alive; pair it with snapshot freshness to prove the monitoring pipeline is flowing.
+- OpenClaw is intentionally dormant unless explicitly re-enabled; do not alert on VMs being shut off by default.