diff --git a/docs/plans/2026-03-18-swarm-monitor-plan.md b/docs/plans/2026-03-18-swarm-monitor-plan.md
new file mode 100644
index 0000000..8665f1c
--- /dev/null
+++ b/docs/plans/2026-03-18-swarm-monitor-plan.md
@@ -0,0 +1,1282 @@
+# Swarm Monitor Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Add a `swarm-monitor` binary that polls docker-compose services in `~/lab/swarm`, emits `swarm.snapshot` and `swarm.service.snapshot` events to NATS, and surfaces service status on the dashboard strip and a new unified `/infrastructure` page (replacing `/openclaw`).
+
+**Architecture:** New `cmd/swarm-monitor/main.go` polls via `docker inspect` exec commands and HTTP probes, emitting two event types per poll. The existing NATS → event-processor → postgres → query-api pipeline requires zero changes. Frontend adds a swarm strip to the dashboard and merges VM cards + service cards on a renamed `/infrastructure` page.
+
+**Tech Stack:** Go (exec/docker CLI, net/http), vanilla JS, existing NATS publisher pattern
+
+---
+
+### Task 1: Add agentmon labels to docker-compose.yaml
+
+**Files:**
+- Modify: `/home/will/lab/swarm/docker-compose.yaml`
+
+**Step 1: Add labels to each service**
+
+Add a `labels:` block to each monitored service. `litellm-init` is a one-shot container — do NOT label it.
+
+For `whisper-server` (after its `healthcheck:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "voice"
+      agentmon.port: "18801"
+```
+
+For `kokoro-tts` (after `restart: unless-stopped`):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "voice"
+      agentmon.port: "18805"
+```
+
+For `brave-search` (after its `environment:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "mcp"
+      agentmon.port: "18802"
+```
+
+For `searxng` (after its `volumes:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "search"
+      agentmon.port: "18803"
+```
+
+For `litellm` (after its `healthcheck:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "llm-proxy"
+      agentmon.port: "18804"
+```
+
+For `litellm-db` (after its `healthcheck:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "db"
+```
+
+For `n8n-agent` (after its `healthcheck:` block):
+```yaml
+    labels:
+      agentmon.monitor: "true"
+      agentmon.role: "automation"
+      agentmon.port: "18808"
+```
+
+**Step 2: Verify labels appear in running containers**
+
+Run: `docker ps --filter label=agentmon.monitor=true --format "table {{.Names}}\t{{.Status}}"`
+
+Expected: lists currently-running swarm containers (whichever profiles are active).
+
+**Step 3: Commit**
+
+```bash
+cd /home/will/lab/swarm
+git add docker-compose.yaml
+git commit -m "feat: add agentmon monitor labels to swarm services"
+```
+
+---
+
+### Task 2: Create swarm types
+
+**Files:**
+- Create: `internal/monitor/swarm/types.go`
+
+**Step 1: Create the types file**
+
+```go
+package swarm
+
+import "time"
+
+// ServiceSnapshot holds the collected state for one docker-compose service.
+type ServiceSnapshot struct {
+	Name           string         `json:"name"`
+	Role           string         `json:"role"`
+	ContainerState string         `json:"container_state"` // running/stopped/exited/missing
+	HealthState    string         `json:"health_state"`    // healthy/unhealthy/starting/none
+	Status         string         `json:"status"`          // healthy/degraded/down
+	UptimeSec      int64          `json:"uptime_sec,omitempty"`
+	HTTPStatus     *int           `json:"http_status,omitempty"`
+	Extra          map[string]any `json:"extra,omitempty"`
+}
+
+// SwarmSnapshot holds a rolled-up snapshot of all labeled services.
+type SwarmSnapshot struct {
+	Services  []ServiceSnapshot `json:"services"`
+	Issues    Issues            `json:"issues"`
+	Timestamp time.Time         `json:"timestamp"`
+}
+
+// Issues flags notable problems detected during a poll.
+type Issues struct {
+	ServiceDown     []string `json:"service_down,omitempty"`
+	ServiceDegraded []string `json:"service_degraded,omitempty"`
+	LLMCooldowns    bool     `json:"llm_cooldowns,omitempty"`
+}
+```
+
+**Step 2: Verify it compiles**
+
+Run: `cd /home/will/lab/agentmon && go build ./internal/monitor/swarm/`
+Expected: no errors
+
+**Step 3: Commit**
+
+```bash
+git add internal/monitor/swarm/types.go
+git commit -m "feat: add swarm monitor types"
+```
+
+---
+
+### Task 3: Create swarm collector
+
+**Files:**
+- Create: `internal/monitor/swarm/collector.go`
+
+**Step 1: Create the collector**
+
+```go
+package swarm
+
+import (
+	"context"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"os/exec"
+	"strconv"
+	"strings"
+	"time"
+)
+
+// Config holds collector configuration.
+type Config struct {
+	LiteLLMBaseURL string
+	LiteLLMAPIKey  string
+	HTTPTimeout    time.Duration
+}
+
+// dockerPsEntry is the JSON shape from `docker ps --format '{{json .}}'`.
+type dockerPsEntry struct {
+	ID     string `json:"ID"`
+	Names  string `json:"Names"`
+	Status string `json:"Status"`
+	State  string `json:"State"`
+}
+
+// dockerInspectEntry is the minimal shape we need from `docker inspect`.
+type dockerInspectEntry struct {
+	Name  string `json:"Name"`
+	State struct {
+		Status    string `json:"Status"`
+		Running   bool   `json:"Running"`
+		StartedAt string `json:"StartedAt"`
+		Health    *struct {
+			Status string `json:"Status"`
+		} `json:"Health"`
+	} `json:"State"`
+	Config struct {
+		Labels map[string]string `json:"Labels"`
+	} `json:"Config"`
+}
+
+// CollectAll lists all containers labeled agentmon.monitor=true and collects
+// a ServiceSnapshot for each.
+func CollectAll(ctx context.Context, cfg Config) ([]ServiceSnapshot, error) {
+	// List labeled containers (running + stopped).
+	out, err := exec.CommandContext(ctx, "docker", "ps", "-a",
+		"--filter", "label=agentmon.monitor=true",
+		"--format", "{{json .}}",
+	).Output()
+	if err != nil {
+		return nil, fmt.Errorf("docker ps failed: %w", err)
+	}
+
+	var entries []dockerPsEntry
+	for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
+		if line == "" {
+			continue
+		}
+		var e dockerPsEntry
+		if err := json.Unmarshal([]byte(line), &e); err != nil {
+			continue
+		}
+		entries = append(entries, e)
+	}
+
+	client := &http.Client{Timeout: cfg.HTTPTimeout}
+	var snapshots []ServiceSnapshot
+	for _, e := range entries {
+		snap := collectOne(ctx, e.Names, client, cfg)
+		snapshots = append(snapshots, snap)
+	}
+
+	return snapshots, nil
+}
+
+func collectOne(ctx context.Context, name string, client *http.Client, cfg Config) ServiceSnapshot {
+	snap := ServiceSnapshot{
+		Name:           name,
+		ContainerState: "missing",
+		HealthState:    "none",
+		Status:         "down",
+	}
+
+	// Inspect for detailed state.
+	out, err := exec.CommandContext(ctx, "docker", "inspect", "--format", "{{json .}}", name).Output()
+	if err != nil {
+		return snap
+	}
+
+	var detail dockerInspectEntry
+	if err := json.Unmarshal(out, &detail); err != nil {
+		return snap
+	}
+
+	snap.Role = detail.Config.Labels["agentmon.role"]
+	snap.ContainerState = detail.State.Status
+
+	if detail.State.Health != nil {
+		snap.HealthState = detail.State.Health.Status
+	}
+
+	// Calculate uptime if running.
+	if detail.State.Running && detail.State.StartedAt != "" {
+		if t, err := time.Parse(time.RFC3339Nano, detail.State.StartedAt); err == nil {
+			snap.UptimeSec = int64(time.Since(t).Seconds())
+		}
+	}
+
+	// Role-specific probes.
+	switch snap.Role {
+	case "llm-proxy":
+		collectLLMProxy(ctx, &snap, client, cfg)
+	case "search":
+		collectHTTPProbe(ctx, &snap, client, "http://localhost:"+detail.Config.Labels["agentmon.port"]+"/")
+	case "mcp":
+		collectPortProbe(ctx, &snap, detail.Config.Labels["agentmon.port"])
+	case "db", "voice", "automation":
+		// Docker healthcheck state is sufficient; no HTTP probe.
+	}
+
+	snap.Status = deriveStatus(snap)
+	return snap
+}
+
+func collectLLMProxy(ctx context.Context, snap *ServiceSnapshot, client *http.Client, cfg Config) {
+	if snap.Extra == nil {
+		snap.Extra = make(map[string]any)
+	}
+
+	// Health probe.
+	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, cfg.LiteLLMBaseURL+"/health/liveliness", nil)
+	resp, err := client.Do(req)
+	if err == nil {
+		code := resp.StatusCode
+		snap.HTTPStatus = &code
+		resp.Body.Close()
+	}
+
+	// Model count.
+	if cfg.LiteLLMAPIKey != "" {
+		req, _ := http.NewRequestWithContext(ctx, http.MethodGet, cfg.LiteLLMBaseURL+"/v2/model/info", nil)
+		req.Header.Set("Authorization", "Bearer "+cfg.LiteLLMAPIKey)
+		resp, err := client.Do(req)
+		if err == nil {
+			defer resp.Body.Close()
+			var result struct {
+				Data []struct {
+					ModelName string `json:"model_name"`
+				} `json:"data"`
+			}
+			if json.NewDecoder(resp.Body).Decode(&result) == nil {
+				snap.Extra["model_count"] = len(result.Data)
+			}
+		}
+	}
+}
+
+func collectHTTPProbe(ctx context.Context, snap *ServiceSnapshot, client *http.Client, url string) {
+	start := time.Now()
+	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	resp, err := client.Do(req)
+	if err == nil {
+		code := resp.StatusCode
+		snap.HTTPStatus = &code
+		resp.Body.Close()
+		ms := time.Since(start).Milliseconds()
+		if snap.Extra == nil {
+			snap.Extra = make(map[string]any)
+		}
+		snap.Extra["response_ms"] = ms
+	}
+}
+
+func collectPortProbe(ctx context.Context, snap *ServiceSnapshot, port string) {
+	if port == "" {
+		return
+	}
+	// Use nc to check TCP reachability.
+	err := exec.CommandContext(ctx, "nc", "-z", "-w1", "localhost", port).Run()
+	reachable := err == nil
+	if snap.Extra == nil {
+		snap.Extra = make(map[string]any)
+	}
+	snap.Extra["port_reachable"] = reachable
+}
+
+// deriveStatus computes the overall status from container state + health + probes.
+func deriveStatus(snap ServiceSnapshot) string {
+	if snap.ContainerState != "running" {
+		return "down"
+	}
+	if snap.HealthState == "unhealthy" {
+		return "degraded"
+	}
+	if snap.HTTPStatus != nil && (*snap.HTTPStatus < 200 || *snap.HTTPStatus >= 400) {
+		return "degraded"
+	}
+	if reachable, ok := snap.Extra["port_reachable"].(bool); ok && !reachable {
+		return "degraded"
+	}
+	return "healthy"
+}
+
+// DetectIssues scans a set of snapshots for notable problems.
+func DetectIssues(services []ServiceSnapshot) Issues {
+	issues := Issues{}
+	for _, s := range services {
+		switch s.Status {
+		case "down":
+			issues.ServiceDown = append(issues.ServiceDown, s.Name)
+		case "degraded":
+			issues.ServiceDegraded = append(issues.ServiceDegraded, s.Name)
+		}
+		if s.Role == "llm-proxy" {
+			if extra := s.Extra; extra != nil {
+				if count, ok := extra["cooldown_count"].(int); ok && count > 0 {
+					issues.LLMCooldowns = true
+				}
+			}
+		}
+	}
+	return issues
+}
+
+func intPtr(v int) *int { return &v }
+func _ = intPtr         // suppress unused warning
+func _ = strconv.Itoa   // imported for potential future use
+```
+
+**Step 2: Verify it compiles**
+
+Run: `cd /home/will/lab/agentmon && go build ./internal/monitor/swarm/`
+Expected: no errors
+
+**Step 3: Commit**
+
+```bash
+git add internal/monitor/swarm/collector.go
+git commit -m "feat: add swarm collector with docker inspect + HTTP probes"
+```
+
+---
+
+### Task 4: Create swarm-monitor binary
+
+**Files:**
+- Create: `cmd/swarm-monitor/main.go`
+
+**Step 1: Create the binary**
+
+```go
+package main
+
+import (
+	"context"
+	"encoding/json"
+	"log"
+	"os"
+	"time"
+
+	"agentmon/internal/monitor/swarm"
+	qnats "agentmon/internal/queue/nats"
+)
+
+func main() {
+	natsURL := envDefault("NATS_URL", "nats://nats:4222")
+	natsTopic := envDefault("NATS_TOPIC", "agentmon.events.v1")
+	interval := envDefault("POLL_INTERVAL", "30s")
+	litellmBase := envDefault("LITELLM_BASE_URL", "http://localhost:18804")
+	litellmKey := os.Getenv("LITELLM_MASTER_KEY")
+
+	pub, err := qnats.NewPublisher(natsURL, natsTopic)
+	if err != nil {
+		log.Fatalf("failed to connect to NATS: %v", err)
+	}
+	defer pub.Close()
+
+	pollDuration, err := time.ParseDuration(interval)
+	if err != nil {
+		log.Fatalf("invalid poll interval: %v", err)
+	}
+
+	cfg := swarm.Config{
+		LiteLLMBaseURL: litellmBase,
+		LiteLLMAPIKey:  litellmKey,
+		HTTPTimeout:    5 * time.Second,
+	}
+
+	ticker := time.NewTicker(pollDuration)
+	defer ticker.Stop()
+
+	ctx := context.Background()
+	log.Printf("swarm-monitor started, polling every %s", pollDuration)
+
+	// Poll immediately on start.
+	if err := poll(ctx, pub, cfg); err != nil {
+		log.Printf("initial poll error: %v", err)
+	}
+
+	for range ticker.C {
+		if err := poll(ctx, pub, cfg); err != nil {
+			log.Printf("poll error: %v", err)
+		}
+	}
+}
+
+func poll(ctx context.Context, pub *qnats.Publisher, cfg swarm.Config) error {
+	services, err := swarm.CollectAll(ctx, cfg)
+	if err != nil {
+		return err
+	}
+
+	issues := swarm.DetectIssues(services)
+	now := time.Now().UTC()
+
+	// Emit rolled-up swarm.snapshot.
+	if err := emit(ctx, pub, "swarm.snapshot", "agentmon.swarm", map[string]any{
+		"services": services,
+		"issues":   issues,
+	}, now); err != nil {
+		log.Printf("failed to emit swarm.snapshot: %v", err)
+	}
+
+	// Emit one swarm.service.snapshot per service.
+	for _, svc := range services {
+		if err := emit(ctx, pub, "swarm.service.snapshot", "agentmon.swarm.service", map[string]any{
+			"service": svc,
+		}, now); err != nil {
+			log.Printf("failed to emit swarm.service.snapshot for %s: %v", svc.Name, err)
+		}
+	}
+
+	return nil
+}
+
+func emit(ctx context.Context, pub *qnats.Publisher, eventType, schemaName string, payload map[string]any, ts time.Time) error {
+	event := map[string]any{
+		"schema": map[string]any{
+			"name":    schemaName,
+			"version": 1,
+		},
+		"event": map[string]any{
+			"id":   generateID(),
+			"type": eventType,
+			"ts":   ts.Format(time.RFC3339Nano),
+		},
+		"payload": payload,
+	}
+
+	data, err := json.Marshal(event)
+	if err != nil {
+		return err
+	}
+
+	return pub.Publish(ctx, data)
+}
+
+func generateID() string {
+	return time.Now().Format("20060102150405") + "-" + randomString(8)
+}
+
+func randomString(n int) string {
+	const chars = "abcdefghijklmnopqrstuvwxyz0123456789"
+	b := make([]byte, n)
+	for i := range b {
+		b[i] = chars[time.Now().Nanosecond()%len(chars)]
+		time.Sleep(time.Nanosecond)
+	}
+	return string(b)
+}
+
+func envDefault(key, def string) string {
+	if v := os.Getenv(key); v != "" {
+		return v
+	}
+	return def
+}
+```
+
+**Step 2: Verify it compiles**
+
+Run: `cd /home/will/lab/agentmon && go build ./cmd/swarm-monitor/`
+Expected: no errors
+
+**Step 3: Verify all binaries still build**
+
+Run: `cd /home/will/lab/agentmon && go build ./...`
+Expected: no errors
+
+**Step 4: Commit**
+
+```bash
+git add cmd/swarm-monitor/main.go
+git commit -m "feat: add swarm-monitor binary"
+```
+
+---
+
+### Task 5: Dashboard swarm strip
+
+**Files:**
+- Modify: `cmd/web-ui/static/app.js`
+- Modify: `cmd/web-ui/static/style.css`
+
+**Step 1: Add swarmState and merge function to app.js**
+
+Near the top of the IIFE, alongside the existing `let openclawState = ...` declaration (line ~49), add:
+
+```js
+let swarmState = { services: {} }; // keyed by service name
+```
+
+After the existing `mergeOpenClawEvents` function (~line 716), add:
+
+```js
+function mergeSwarmSnapshot(evt) {
+  const payload = getEnvelopePayload(evt);
+  const services = payload.services || [];
+  for (const svc of services) {
+    if (svc.name) swarmState.services[svc.name] = svc;
+  }
+}
+
+function mergeSwarmServiceSnapshot(evt) {
+  const payload = getEnvelopePayload(evt);
+  const svc = payload.service;
+  if (svc && svc.name) swarmState.services[svc.name] = svc;
+}
+```
+
+**Step 2: Add swarm strip to renderDashboard**
+
+In `renderDashboard()`, the HTML template already has:
+```html
+<div class="vm-strip" id="dash-vm-strip" style="margin-bottom:1.5rem"></div>
+```
+
+Right after that line, add a swarm strip div:
+```html
+<div class="swarm-strip" id="dash-swarm-strip"></div>
+```
+
+**Step 3: Add renderSwarmStrip function**
+
+After the `renderAgentVMStrip_dash` function (~line 1351), add:
+
+```js
+function renderSwarmStrip_dash() {
+  const strip = document.getElementById('dash-swarm-strip');
+  if (!strip) return;
+  const services = Object.values(swarmState.services);
+  if (services.length === 0) return;
+  strip.innerHTML = services.map(svc => {
+    const statusClass = svc.status === 'healthy' ? 'active'
+      : svc.status === 'degraded' ? 'degraded' : 'inactive';
+    const label = svc.status || 'unknown';
+    return `
+      <div class="vm-pill ${statusClass}">
+        <span class="vm-pill-dot"></span>
+        <span class="vm-pill-name">${escapeHTML(svc.name)}</span>
+        <span class="vm-pill-label">${escapeHTML(label)}</span>
+      </div>
+    `;
+  }).join('');
+}
+```
+
+**Step 4: Wire swarm strip into dashboard data load**
+
+In `renderDashboard()`, the `Promise.all` block loads initial data. After `mergeOpenClawEvents(snapshots.events || [])` and `renderAgentVMStrip_dash()`, add:
+
+```js
+const swarmSnaps = await api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] }));
+for (const evt of swarmSnaps.events || []) mergeSwarmSnapshot(evt);
+renderSwarmStrip_dash();
+```
+
+Note: this needs to be inside the try block, before the `if (!isCurrentPath('/')) return;` guard. The simplest placement is to add it to the `Promise.all` array:
+
+Replace the `Promise.all` call in `renderDashboard` to add swarm snapshots:
+```js
+const [summaryData, tsData, recentData, snapshots, swarmSnaps] = await Promise.all([
+  api('/v1/stats/summary'),
+  api('/v1/stats/timeseries?window=1h'),
+  api('/v1/events?limit=20'),
+  api('/v1/events?event_type=openclaw.snapshot&limit=100').catch(() => ({ events: [] })),
+  api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] })),
+]);
+```
+
+Then after `renderAgentVMStrip_dash()`:
+```js
+for (const evt of swarmSnaps.events || []) mergeSwarmSnapshot(evt);
+renderSwarmStrip_dash();
+```
+
+**Step 5: Handle swarm events in handleDashboardWS**
+
+In `handleDashboardWS`, after the `openclaw.snapshot` handler block, add:
+
+```js
+if (eventType === 'swarm.snapshot') {
+  mergeSwarmSnapshot(msg.data);
+  renderSwarmStrip_dash();
+  return;
+}
+if (eventType === 'swarm.service.snapshot') {
+  mergeSwarmServiceSnapshot(msg.data);
+  renderSwarmStrip_dash();
+  return;
+}
+```
+
+**Step 6: Add swarm strip CSS**
+
+In `style.css`, after the `.vm-pill-label` block (~line 750), add:
+
+```css
+/* ── Swarm strip ──────────────────────────────────────────── */
+.swarm-strip {
+  display: flex;
+  flex-wrap: wrap;
+  gap: 0.75rem;
+  margin-bottom: 1.5rem;
+}
+
+.vm-pill.degraded {
+  border-color: rgba(251, 191, 36, 0.3);
+}
+
+.vm-pill.degraded .vm-pill-dot {
+  background: var(--warning);
+}
+```
+
+**Step 7: Verify no JS errors**
+
+Build check: `cd /home/will/lab/agentmon && go build ./...`
+Expected: no errors
+
+**Step 8: Commit**
+
+```bash
+git add cmd/web-ui/static/app.js cmd/web-ui/static/style.css
+git commit -m "feat: add swarm strip to dashboard"
+```
+
+---
+
+### Task 6: Infrastructure page CSS
+
+**Files:**
+- Modify: `cmd/web-ui/static/style.css`
+
+**Step 1: Add infrastructure page styles**
+
+Append to the end of `style.css`:
+
+```css
+/* ── Infrastructure page ──────────────────────────────────── */
+.infra-section-title {
+  font-family: var(--font-display);
+  font-size: 0.75rem;
+  font-weight: 700;
+  color: var(--text-dim);
+  text-transform: uppercase;
+  letter-spacing: 0.12em;
+  margin: 0 0 1rem 0;
+}
+
+.infra-section {
+  margin-bottom: 2rem;
+}
+
+/* Service card grid */
+.service-grid {
+  display: grid;
+  grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
+  gap: 1.25rem;
+}
+
+.service-card {
+  background: var(--surface);
+  border: 1px solid var(--border);
+  border-radius: var(--radius-lg);
+  padding: 1.125rem 1.25rem;
+  display: flex;
+  flex-direction: column;
+  gap: 0.75rem;
+  transition: border-color 0.2s;
+}
+
+.service-card:hover {
+  border-color: rgba(34, 211, 238, 0.15);
+}
+
+.service-card-header {
+  display: flex;
+  align-items: center;
+  justify-content: space-between;
+}
+
+.service-card-name {
+  font-family: var(--font-mono);
+  font-size: 0.88rem;
+  font-weight: 600;
+  color: var(--text-bright);
+}
+
+.service-badge {
+  font-size: 0.65rem;
+  font-weight: 700;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  padding: 0.2rem 0.55rem;
+  border-radius: 999px;
+}
+
+.service-badge.healthy {
+  background: rgba(52, 211, 153, 0.12);
+  color: var(--success);
+  border: 1px solid rgba(52, 211, 153, 0.2);
+}
+
+.service-badge.degraded {
+  background: rgba(251, 191, 36, 0.12);
+  color: var(--warning);
+  border: 1px solid rgba(251, 191, 36, 0.2);
+}
+
+.service-badge.down {
+  background: rgba(248, 113, 113, 0.12);
+  color: var(--error);
+  border: 1px solid rgba(248, 113, 113, 0.2);
+}
+
+.service-role-tag {
+  font-size: 0.65rem;
+  font-family: var(--font-mono);
+  color: var(--text-dim);
+  margin-top: -0.25rem;
+}
+
+.service-stats {
+  display: flex;
+  flex-direction: column;
+  gap: 0.3rem;
+  font-size: 0.78rem;
+}
+
+.service-stat-row {
+  display: flex;
+  justify-content: space-between;
+  align-items: center;
+}
+
+.service-stat-label {
+  color: var(--text-dim);
+  font-family: var(--font-mono);
+  font-size: 0.72rem;
+}
+
+.service-stat-value {
+  color: var(--text);
+  font-family: var(--font-mono);
+  font-size: 0.75rem;
+}
+
+.service-stat-value.ok   { color: var(--success); }
+.service-stat-value.warn { color: var(--warning); }
+.service-stat-value.bad  { color: var(--error); }
+
+/* LiteLLM cooldown warning */
+.llm-cooldown-banner {
+  background: rgba(251, 191, 36, 0.08);
+  border: 1px solid rgba(251, 191, 36, 0.2);
+  border-radius: var(--radius);
+  padding: 0.4rem 0.625rem;
+  font-size: 0.72rem;
+  color: var(--warning);
+  font-family: var(--font-mono);
+}
+
+/* LiteLLM model count highlight */
+.llm-model-count {
+  font-family: var(--font-display);
+  font-size: 1.5rem;
+  font-weight: 800;
+  color: var(--text-bright);
+  letter-spacing: -0.02em;
+  line-height: 1;
+}
+
+.llm-model-label {
+  font-size: 0.68rem;
+  color: var(--text-dim);
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+}
+```
+
+**Step 2: Commit**
+
+```bash
+git add cmd/web-ui/static/style.css
+git commit -m "feat: add infrastructure page CSS"
+```
+
+---
+
+### Task 7: Infrastructure page JS + nav rename
+
+**Files:**
+- Modify: `cmd/web-ui/static/app.js`
+- Modify: `cmd/web-ui/static/index.html`
+
+**Step 1: Update nav in index.html**
+
+Change the nav link from `OpenClaw` to `Infra` and update the href:
+
+Old:
+```html
+<nav><a href="/">Dashboard</a><a href="/sessions">Sessions</a><a href="/agents">Agents</a><a href="/openclaw">OpenClaw</a></nav>
+```
+
+New:
+```html
+<nav><a href="/">Dashboard</a><a href="/sessions">Sessions</a><a href="/agents">Agents</a><a href="/infrastructure">Infra</a></nav>
+```
+
+**Step 2: Update the router in app.js**
+
+Change line ~153:
+```js
+} else if (path.startsWith('/openclaw')) {
+  renderOpenClaw();
+```
+to:
+```js
+} else if (path.startsWith('/infrastructure')) {
+  renderInfrastructure();
+```
+
+**Step 3: Add infraUnsubscribe state variable**
+
+Near the existing `let openclawUnsubscribe = null;` declaration (~line 50), add:
+```js
+let infraUnsubscribe = null;
+```
+
+**Step 4: Update cleanupLiveViews to clean up infra subscription**
+
+Find the `cleanupLiveViews` function (~line 107). Replace:
+```js
+if (openclawUnsubscribe) {
+  openclawUnsubscribe();
+  openclawUnsubscribe = null;
+}
+```
+with:
+```js
+if (openclawUnsubscribe) {
+  openclawUnsubscribe();
+  openclawUnsubscribe = null;
+}
+if (infraUnsubscribe) {
+  infraUnsubscribe();
+  infraUnsubscribe = null;
+}
+```
+
+**Step 5: Replace renderOpenClaw with renderInfrastructure**
+
+Replace the existing `renderOpenClaw` function (lines ~664-680) entirely with:
+
+```js
+async function renderInfrastructure() {
+  app.innerHTML = '<div class="page-header"><h2>Infrastructure</h2></div><p class="empty-state">Loading...</p>';
+
+  infraUnsubscribe = subscribeWS(handleInfraWS);
+
+  try {
+    const [ocData, swarmData] = await Promise.all([
+      api('/v1/events?event_type=openclaw.snapshot&limit=100'),
+      api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] })),
+    ]);
+
+    mergeOpenClawEvents(ocData.events || []);
+    for (const evt of swarmData.events || []) mergeSwarmSnapshot(evt);
+
+    if (isCurrentPath('/infrastructure')) {
+      renderInfraGrid();
+    }
+  } catch (e) {
+    if (isCurrentPath('/infrastructure')) {
+      app.innerHTML = `<div class="page-header"><h2>Infrastructure</h2></div><p class="empty-state">Error: ${escapeHTML(e.message)}</p>`;
+    }
+  }
+}
+```
+
+**Step 6: Replace handleOpenClawWS with handleInfraWS**
+
+Replace the existing `handleOpenClawWS` function (lines ~682-699) with:
+
+```js
+function handleInfraWS(msg) {
+  if (msg.type !== 'message') return;
+
+  const eventType = getEnvelopeType(msg.data);
+
+  if (eventType === 'openclaw.snapshot') {
+    mergeOpenClawEvents([msg.data]);
+    if (isCurrentPath('/infrastructure')) renderInfraGrid();
+    if (isCurrentPath('/agents')) renderAgentVMStrip();
+    return;
+  }
+
+  if (eventType === 'swarm.snapshot') {
+    mergeSwarmSnapshot(msg.data);
+    if (isCurrentPath('/infrastructure')) renderInfraGrid();
+    renderSwarmStrip_dash();
+    return;
+  }
+
+  if (eventType === 'swarm.service.snapshot') {
+    mergeSwarmServiceSnapshot(msg.data);
+    if (isCurrentPath('/infrastructure')) renderInfraGrid();
+    renderSwarmStrip_dash();
+    return;
+  }
+}
+```
+
+**Step 7: Add renderInfraGrid function**
+
+Replace the existing `renderOpenClawGrid` function (lines ~718-785) with a new `renderInfraGrid` that shows both VMs and service cards. Add it right after the new `handleInfraWS` function:
+
+```js
+function renderInfraGrid() {
+  const vmNames = Object.keys(openclawState.instances).sort();
+  const services = Object.values(swarmState.services);
+
+  app.innerHTML = `
+    <div class="page-header">
+      <h2>Infrastructure <span class="live-indicator"><span class="live-dot"></span>Live</span></h2>
+    </div>
+
+    <div class="infra-section">
+      <p class="infra-section-title">VMs</p>
+      ${vmNames.length === 0
+        ? '<p class="empty-state">No VM data</p>'
+        : `<div class="vm-grid">${vmNames.map(name => renderVMCard(name)).join('')}</div>`
+      }
+    </div>
+
+    <div class="infra-section">
+      <p class="infra-section-title">Services</p>
+      ${services.length === 0
+        ? '<p class="empty-state">No swarm service data</p>'
+        : `<div class="service-grid">${services.map(svc => renderServiceCard(svc)).join('')}</div>`
+      }
+    </div>
+  `;
+}
+
+function renderVMCard(name) {
+  const evt = openclawState.instances[name];
+  const payload = getEnvelopePayload(evt);
+  const inst = payload.instance || {};
+  const host = payload.host || {};
+  const guest = payload.guest;
+  const issues = payload.issues;
+
+  return `
+    <div class="vm-card">
+      <div class="vm-card-header">
+        <h3>${escapeHTML(inst.name || name)}</h3>
+        <div class="vm-status ${host.state === 'running' ? 'running' : 'stopped'}">
+          ${host.state === 'running' ? 'Running' : 'Stopped'}
+        </div>
+      </div>
+      <div class="vm-updated">Updated ${escapeHTML(relativeTime(getEnvelopeTS(evt)))}</div>
+      <table class="vm-stats">
+        <tr><td>Host</td><td>${escapeHTML(inst.host || '-')}</td></tr>
+        <tr><td>Domain</td><td>${escapeHTML(inst.domain || '-')}</td></tr>
+        <tr><td>vCPUs</td><td>${host.vcpus || '-'}</td></tr>
+        <tr><td>Memory</td><td>${escapeHTML(formatBytes(host.memory_kib ? host.memory_kib * 1024 : 0) || '-')}</td></tr>
+        <tr><td>Disk</td><td>${escapeHTML(formatBytes(host.disk_actual_bytes) || '-')}</td></tr>
+        <tr><td>Autostart</td><td>${host.autostart ? 'Yes' : 'No'}</td></tr>
+      </table>
+      ${guest ? `
+        <div class="vm-card-divider"></div>
+        <table class="vm-stats">
+          <tr><td>Gateway</td><td style="${guest.service_active ? 'color:var(--success)' : 'color:var(--error)'}">${guest.service_active ? 'Active' : 'Inactive'}</td></tr>
+          <tr><td>HTTP</td><td style="${guest.http_status === 200 ? 'color:var(--success)' : 'color:var(--error)'}">${guest.http_status || 'N/A'}</td></tr>
+          <tr><td>Version</td><td>${escapeHTML(guest.version || '-')}</td></tr>
+          <tr><td>Guest Mem</td><td>${guest.memory_percent !== undefined ? guest.memory_percent.toFixed(1) : '-'}%</td></tr>
+          <tr><td>Guest Disk</td><td>${guest.disk_percent !== undefined ? guest.disk_percent.toFixed(1) : '-'}%</td></tr>
+          <tr><td>Load</td><td>${guest.load_average !== undefined ? guest.load_average.toFixed(2) : '-'}</td></tr>
+          <tr><td>Uptime</td><td>${escapeHTML(guest.service_uptime || '-')}</td></tr>
+        </table>
+      ` : ''}
+      ${issues && Object.values(issues).some(Boolean) ? `
+        <div class="vm-card-divider"></div>
+        <div class="vm-issues-label">Issues</div>
+        <div class="vm-issues">
+          ${Object.entries(issues).filter(([, value]) => value).map(([key]) => `
+            <span class="issue ${escapeHTML(key)}">${escapeHTML(key.replace(/_/g, ' '))}</span>
+          `).join('')}
+        </div>
+      ` : ''}
+    </div>
+  `;
+}
+
+function renderServiceCard(svc) {
+  const role = svc.role || 'unknown';
+  switch (role) {
+    case 'llm-proxy': return renderLLMProxyCard(svc);
+    case 'db':        return renderDBCard(svc);
+    case 'search':    return renderSearchCard(svc);
+    case 'mcp':       return renderMCPCard(svc);
+    case 'voice':     return renderVoiceCard(svc);
+    case 'automation':return renderAutomationCard(svc);
+    default:          return renderGenericServiceCard(svc);
+  }
+}
+
+function serviceCardHeader(svc) {
+  return `
+    <div class="service-card-header">
+      <div>
+        <div class="service-card-name">${escapeHTML(svc.name)}</div>
+        <div class="service-role-tag">${escapeHTML(svc.role || '')}</div>
+      </div>
+      <span class="service-badge ${escapeHTML(svc.status || 'down')}">${escapeHTML(svc.status || 'down')}</span>
+    </div>
+  `;
+}
+
+function serviceStatRow(label, value, valueClass) {
+  return `
+    <div class="service-stat-row">
+      <span class="service-stat-label">${escapeHTML(label)}</span>
+      <span class="service-stat-value${valueClass ? ' ' + valueClass : ''}">${value}</span>
+    </div>
+  `;
+}
+
+function formatUptime(sec) {
+  if (!sec) return '-';
+  if (sec < 60) return sec + 's';
+  if (sec < 3600) return Math.floor(sec / 60) + 'm';
+  if (sec < 86400) return Math.floor(sec / 3600) + 'h ' + Math.floor((sec % 3600) / 60) + 'm';
+  return Math.floor(sec / 86400) + 'd ' + Math.floor((sec % 86400) / 3600) + 'h';
+}
+
+function renderLLMProxyCard(svc) {
+  const extra = svc.extra || {};
+  const modelCount = extra.model_count;
+  const cooldowns = extra.cooldown_count || 0;
+  const httpStatus = svc.http_status;
+  const httpClass = httpStatus === 200 ? 'ok' : httpStatus ? 'bad' : '';
+
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div style="display:flex;align-items:baseline;gap:0.5rem">
+        <span class="llm-model-count">${modelCount !== undefined ? modelCount : '-'}</span>
+        <span class="llm-model-label">models</span>
+      </div>
+      ${cooldowns > 0 ? `<div class="llm-cooldown-banner">⚠ ${cooldowns} model${cooldowns > 1 ? 's' : ''} in cooldown</div>` : ''}
+      <div class="service-stats">
+        ${serviceStatRow('HTTP', httpStatus ? String(httpStatus) : '-', httpClass)}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+      </div>
+    </div>
+  `;
+}
+
+function renderDBCard(svc) {
+  const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+      </div>
+    </div>
+  `;
+}
+
+function renderSearchCard(svc) {
+  const extra = svc.extra || {};
+  const ms = extra.response_ms;
+  const httpStatus = svc.http_status;
+  const httpClass = httpStatus === 200 ? 'ok' : httpStatus ? 'bad' : '';
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${serviceStatRow('HTTP', httpStatus ? String(httpStatus) : '-', httpClass)}
+        ${ms !== undefined ? serviceStatRow('Response', ms + 'ms', ms < 500 ? 'ok' : 'warn') : ''}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+      </div>
+    </div>
+  `;
+}
+
+function renderMCPCard(svc) {
+  const extra = svc.extra || {};
+  const reachable = extra.port_reachable;
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${reachable !== undefined ? serviceStatRow('Port', reachable ? 'reachable' : 'unreachable', reachable ? 'ok' : 'bad') : ''}
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+      </div>
+    </div>
+  `;
+}
+
+function renderVoiceCard(svc) {
+  const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+      </div>
+    </div>
+  `;
+}
+
+function renderAutomationCard(svc) {
+  const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+      </div>
+    </div>
+  `;
+}
+
+function renderGenericServiceCard(svc) {
+  return `
+    <div class="service-card">
+      ${serviceCardHeader(svc)}
+      <div class="service-stats">
+        ${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
+        ${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
+      </div>
+    </div>
+  `;
+}
+```
+
+**Step 8: Verify build**
+
+Run: `cd /home/will/lab/agentmon && go build ./...`
+Expected: no errors
+
+**Step 9: Commit**
+
+```bash
+git add cmd/web-ui/static/app.js cmd/web-ui/static/index.html
+git commit -m "feat: rename OpenClaw to Infrastructure page, add service cards"
+```
+
+---
+
+### Task 8: End-to-end verification
+
+**Step 1: Build all binaries**
+
+Run: `cd /home/will/lab/agentmon && go build ./...`
+Expected: no errors
+
+**Step 2: Test docker label filtering manually**
+
+Run: `docker ps -a --filter label=agentmon.monitor=true --format "table {{.Names}}\t{{.Labels}}\t{{.Status}}"`
+Expected: lists swarm containers that are currently running with their labels
+
+**Step 3: Test swarm-monitor dry run**
+
+Run:
+```bash
+cd /home/will/lab/agentmon
+NATS_URL=nats://localhost:4222 LITELLM_MASTER_KEY=$(source /home/will/lab/swarm/.env && echo $LITELLM_MASTER_KEY) \
+  go run ./cmd/swarm-monitor/ 2>&1 | head -20
+```
+Expected: logs "swarm-monitor started", then either publishes events or logs connection errors (NATS may not be running locally — that's fine, look for the collection phase to succeed before the publish fails)
+
+**Step 4: Navigate to /infrastructure in browser**
+
+Open the web UI and navigate to `/infrastructure`.
+Verify:
+- Nav shows "Infra" link, active when on `/infrastructure`
+- VMs section shows existing openclaw cards
+- Services section shows either cards (if swarm events exist in DB) or "No swarm service data"
+
+**Step 5: Verify swarm strip on dashboard**
+
+Navigate to `/`.
+Verify:
+- VM strip still shows (zap/orb/sun)
+- Swarm strip renders below it (may be empty if no `swarm.snapshot` events in DB yet)
+
+**Step 6: Final commit if any fixes needed**
+
+```bash
+git add -A
+git commit -m "fix: infrastructure page and swarm strip polish"
+```