1283 lines
35 KiB
Markdown
1283 lines
35 KiB
Markdown
# Swarm Monitor Implementation Plan
|
|
|
|
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
|
|
|
|
**Goal:** Add a `swarm-monitor` binary that polls docker-compose services in `~/lab/swarm`, emits `swarm.snapshot` and `swarm.service.snapshot` events to NATS, and surfaces service status on the dashboard strip and a new unified `/infrastructure` page (replacing `/openclaw`).
|
|
|
|
**Architecture:** New `cmd/swarm-monitor/main.go` polls via `docker inspect` exec commands and HTTP probes, emitting two event types per poll. The existing NATS → event-processor → postgres → query-api pipeline requires zero changes. Frontend adds a swarm strip to the dashboard and merges VM cards + service cards on a renamed `/infrastructure` page.
|
|
|
|
**Tech Stack:** Go (exec/docker CLI, net/http), vanilla JS, existing NATS publisher pattern
|
|
|
|
---
|
|
|
|
### Task 1: Add agentmon labels to docker-compose.yaml
|
|
|
|
**Files:**
|
|
- Modify: `/home/will/lab/swarm/docker-compose.yaml`
|
|
|
|
**Step 1: Add labels to each service**
|
|
|
|
Add a `labels:` block to each monitored service. `litellm-init` is a one-shot container — do NOT label it.
|
|
|
|
For `whisper-server` (after its `healthcheck:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "voice"
|
|
agentmon.port: "18801"
|
|
```
|
|
|
|
For `kokoro-tts` (after `restart: unless-stopped`):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "voice"
|
|
agentmon.port: "18805"
|
|
```
|
|
|
|
For `brave-search` (after its `environment:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "mcp"
|
|
agentmon.port: "18802"
|
|
```
|
|
|
|
For `searxng` (after its `volumes:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "search"
|
|
agentmon.port: "18803"
|
|
```
|
|
|
|
For `litellm` (after its `healthcheck:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "llm-proxy"
|
|
agentmon.port: "18804"
|
|
```
|
|
|
|
For `litellm-db` (after its `healthcheck:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "db"
|
|
```
|
|
|
|
For `n8n-agent` (after its `healthcheck:` block):
|
|
```yaml
|
|
labels:
|
|
agentmon.monitor: "true"
|
|
agentmon.role: "automation"
|
|
agentmon.port: "18808"
|
|
```
|
|
|
|
**Step 2: Verify labels appear in running containers**
|
|
|
|
Run: `docker ps --filter label=agentmon.monitor=true --format "table {{.Names}}\t{{.Status}}"`
|
|
|
|
Expected: lists currently-running swarm containers (whichever profiles are active).
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
cd /home/will/lab/swarm
|
|
git add docker-compose.yaml
|
|
git commit -m "feat: add agentmon monitor labels to swarm services"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 2: Create swarm types
|
|
|
|
**Files:**
|
|
- Create: `internal/monitor/swarm/types.go`
|
|
|
|
**Step 1: Create the types file**
|
|
|
|
```go
|
|
package swarm
|
|
|
|
import "time"
|
|
|
|
// ServiceSnapshot holds the collected state for one docker-compose service.
|
|
type ServiceSnapshot struct {
|
|
Name string `json:"name"`
|
|
Role string `json:"role"`
|
|
ContainerState string `json:"container_state"` // running/stopped/exited/missing
|
|
HealthState string `json:"health_state"` // healthy/unhealthy/starting/none
|
|
Status string `json:"status"` // healthy/degraded/down
|
|
UptimeSec int64 `json:"uptime_sec,omitempty"`
|
|
HTTPStatus *int `json:"http_status,omitempty"`
|
|
Extra map[string]any `json:"extra,omitempty"`
|
|
}
|
|
|
|
// SwarmSnapshot holds a rolled-up snapshot of all labeled services.
|
|
type SwarmSnapshot struct {
|
|
Services []ServiceSnapshot `json:"services"`
|
|
Issues Issues `json:"issues"`
|
|
Timestamp time.Time `json:"timestamp"`
|
|
}
|
|
|
|
// Issues flags notable problems detected during a poll.
|
|
type Issues struct {
|
|
ServiceDown []string `json:"service_down,omitempty"`
|
|
ServiceDegraded []string `json:"service_degraded,omitempty"`
|
|
LLMCooldowns bool `json:"llm_cooldowns,omitempty"`
|
|
}
|
|
```
|
|
|
|
**Step 2: Verify it compiles**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./internal/monitor/swarm/`
|
|
Expected: no errors
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add internal/monitor/swarm/types.go
|
|
git commit -m "feat: add swarm monitor types"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 3: Create swarm collector
|
|
|
|
**Files:**
|
|
- Create: `internal/monitor/swarm/collector.go`
|
|
|
|
**Step 1: Create the collector**
|
|
|
|
```go
|
|
package swarm
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"fmt"
|
|
"net/http"
|
|
"os/exec"
|
|
"strconv"
|
|
"strings"
|
|
"time"
|
|
)
|
|
|
|
// Config holds collector configuration.
|
|
type Config struct {
|
|
LiteLLMBaseURL string
|
|
LiteLLMAPIKey string
|
|
HTTPTimeout time.Duration
|
|
}
|
|
|
|
// dockerPsEntry is the JSON shape from `docker ps --format '{{json .}}'`.
|
|
type dockerPsEntry struct {
|
|
ID string `json:"ID"`
|
|
Names string `json:"Names"`
|
|
Status string `json:"Status"`
|
|
State string `json:"State"`
|
|
}
|
|
|
|
// dockerInspectEntry is the minimal shape we need from `docker inspect`.
|
|
type dockerInspectEntry struct {
|
|
Name string `json:"Name"`
|
|
State struct {
|
|
Status string `json:"Status"`
|
|
Running bool `json:"Running"`
|
|
StartedAt string `json:"StartedAt"`
|
|
Health *struct {
|
|
Status string `json:"Status"`
|
|
} `json:"Health"`
|
|
} `json:"State"`
|
|
Config struct {
|
|
Labels map[string]string `json:"Labels"`
|
|
} `json:"Config"`
|
|
}
|
|
|
|
// CollectAll lists all containers labeled agentmon.monitor=true and collects
|
|
// a ServiceSnapshot for each.
|
|
func CollectAll(ctx context.Context, cfg Config) ([]ServiceSnapshot, error) {
|
|
// List labeled containers (running + stopped).
|
|
out, err := exec.CommandContext(ctx, "docker", "ps", "-a",
|
|
"--filter", "label=agentmon.monitor=true",
|
|
"--format", "{{json .}}",
|
|
).Output()
|
|
if err != nil {
|
|
return nil, fmt.Errorf("docker ps failed: %w", err)
|
|
}
|
|
|
|
var entries []dockerPsEntry
|
|
for _, line := range strings.Split(strings.TrimSpace(string(out)), "\n") {
|
|
if line == "" {
|
|
continue
|
|
}
|
|
var e dockerPsEntry
|
|
if err := json.Unmarshal([]byte(line), &e); err != nil {
|
|
continue
|
|
}
|
|
entries = append(entries, e)
|
|
}
|
|
|
|
client := &http.Client{Timeout: cfg.HTTPTimeout}
|
|
var snapshots []ServiceSnapshot
|
|
for _, e := range entries {
|
|
snap := collectOne(ctx, e.Names, client, cfg)
|
|
snapshots = append(snapshots, snap)
|
|
}
|
|
|
|
return snapshots, nil
|
|
}
|
|
|
|
func collectOne(ctx context.Context, name string, client *http.Client, cfg Config) ServiceSnapshot {
|
|
snap := ServiceSnapshot{
|
|
Name: name,
|
|
ContainerState: "missing",
|
|
HealthState: "none",
|
|
Status: "down",
|
|
}
|
|
|
|
// Inspect for detailed state.
|
|
out, err := exec.CommandContext(ctx, "docker", "inspect", "--format", "{{json .}}", name).Output()
|
|
if err != nil {
|
|
return snap
|
|
}
|
|
|
|
var detail dockerInspectEntry
|
|
if err := json.Unmarshal(out, &detail); err != nil {
|
|
return snap
|
|
}
|
|
|
|
snap.Role = detail.Config.Labels["agentmon.role"]
|
|
snap.ContainerState = detail.State.Status
|
|
|
|
if detail.State.Health != nil {
|
|
snap.HealthState = detail.State.Health.Status
|
|
}
|
|
|
|
// Calculate uptime if running.
|
|
if detail.State.Running && detail.State.StartedAt != "" {
|
|
if t, err := time.Parse(time.RFC3339Nano, detail.State.StartedAt); err == nil {
|
|
snap.UptimeSec = int64(time.Since(t).Seconds())
|
|
}
|
|
}
|
|
|
|
// Role-specific probes.
|
|
switch snap.Role {
|
|
case "llm-proxy":
|
|
collectLLMProxy(ctx, &snap, client, cfg)
|
|
case "search":
|
|
collectHTTPProbe(ctx, &snap, client, "http://localhost:"+detail.Config.Labels["agentmon.port"]+"/")
|
|
case "mcp":
|
|
collectPortProbe(ctx, &snap, detail.Config.Labels["agentmon.port"])
|
|
case "db", "voice", "automation":
|
|
// Docker healthcheck state is sufficient; no HTTP probe.
|
|
}
|
|
|
|
snap.Status = deriveStatus(snap)
|
|
return snap
|
|
}
|
|
|
|
func collectLLMProxy(ctx context.Context, snap *ServiceSnapshot, client *http.Client, cfg Config) {
|
|
if snap.Extra == nil {
|
|
snap.Extra = make(map[string]any)
|
|
}
|
|
|
|
// Health probe.
|
|
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, cfg.LiteLLMBaseURL+"/health/liveliness", nil)
|
|
resp, err := client.Do(req)
|
|
if err == nil {
|
|
code := resp.StatusCode
|
|
snap.HTTPStatus = &code
|
|
resp.Body.Close()
|
|
}
|
|
|
|
// Model count.
|
|
if cfg.LiteLLMAPIKey != "" {
|
|
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, cfg.LiteLLMBaseURL+"/v2/model/info", nil)
|
|
req.Header.Set("Authorization", "Bearer "+cfg.LiteLLMAPIKey)
|
|
resp, err := client.Do(req)
|
|
if err == nil {
|
|
defer resp.Body.Close()
|
|
var result struct {
|
|
Data []struct {
|
|
ModelName string `json:"model_name"`
|
|
} `json:"data"`
|
|
}
|
|
if json.NewDecoder(resp.Body).Decode(&result) == nil {
|
|
snap.Extra["model_count"] = len(result.Data)
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
func collectHTTPProbe(ctx context.Context, snap *ServiceSnapshot, client *http.Client, url string) {
|
|
start := time.Now()
|
|
req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
|
|
resp, err := client.Do(req)
|
|
if err == nil {
|
|
code := resp.StatusCode
|
|
snap.HTTPStatus = &code
|
|
resp.Body.Close()
|
|
ms := time.Since(start).Milliseconds()
|
|
if snap.Extra == nil {
|
|
snap.Extra = make(map[string]any)
|
|
}
|
|
snap.Extra["response_ms"] = ms
|
|
}
|
|
}
|
|
|
|
func collectPortProbe(ctx context.Context, snap *ServiceSnapshot, port string) {
|
|
if port == "" {
|
|
return
|
|
}
|
|
// Use nc to check TCP reachability.
|
|
err := exec.CommandContext(ctx, "nc", "-z", "-w1", "localhost", port).Run()
|
|
reachable := err == nil
|
|
if snap.Extra == nil {
|
|
snap.Extra = make(map[string]any)
|
|
}
|
|
snap.Extra["port_reachable"] = reachable
|
|
}
|
|
|
|
// deriveStatus computes the overall status from container state + health + probes.
|
|
func deriveStatus(snap ServiceSnapshot) string {
|
|
if snap.ContainerState != "running" {
|
|
return "down"
|
|
}
|
|
if snap.HealthState == "unhealthy" {
|
|
return "degraded"
|
|
}
|
|
if snap.HTTPStatus != nil && (*snap.HTTPStatus < 200 || *snap.HTTPStatus >= 400) {
|
|
return "degraded"
|
|
}
|
|
if reachable, ok := snap.Extra["port_reachable"].(bool); ok && !reachable {
|
|
return "degraded"
|
|
}
|
|
return "healthy"
|
|
}
|
|
|
|
// DetectIssues scans a set of snapshots for notable problems.
|
|
func DetectIssues(services []ServiceSnapshot) Issues {
|
|
issues := Issues{}
|
|
for _, s := range services {
|
|
switch s.Status {
|
|
case "down":
|
|
issues.ServiceDown = append(issues.ServiceDown, s.Name)
|
|
case "degraded":
|
|
issues.ServiceDegraded = append(issues.ServiceDegraded, s.Name)
|
|
}
|
|
if s.Role == "llm-proxy" {
|
|
if extra := s.Extra; extra != nil {
|
|
if count, ok := extra["cooldown_count"].(int); ok && count > 0 {
|
|
issues.LLMCooldowns = true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
return issues
|
|
}
|
|
|
|
func intPtr(v int) *int { return &v }
|
|
func _ = intPtr // suppress unused warning
|
|
func _ = strconv.Itoa // imported for potential future use
|
|
```
|
|
|
|
**Step 2: Verify it compiles**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./internal/monitor/swarm/`
|
|
Expected: no errors
|
|
|
|
**Step 3: Commit**
|
|
|
|
```bash
|
|
git add internal/monitor/swarm/collector.go
|
|
git commit -m "feat: add swarm collector with docker inspect + HTTP probes"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 4: Create swarm-monitor binary
|
|
|
|
**Files:**
|
|
- Create: `cmd/swarm-monitor/main.go`
|
|
|
|
**Step 1: Create the binary**
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"context"
|
|
"encoding/json"
|
|
"log"
|
|
"os"
|
|
"time"
|
|
|
|
"agentmon/internal/monitor/swarm"
|
|
qnats "agentmon/internal/queue/nats"
|
|
)
|
|
|
|
func main() {
|
|
natsURL := envDefault("NATS_URL", "nats://nats:4222")
|
|
natsTopic := envDefault("NATS_TOPIC", "agentmon.events.v1")
|
|
interval := envDefault("POLL_INTERVAL", "30s")
|
|
litellmBase := envDefault("LITELLM_BASE_URL", "http://localhost:18804")
|
|
litellmKey := os.Getenv("LITELLM_MASTER_KEY")
|
|
|
|
pub, err := qnats.NewPublisher(natsURL, natsTopic)
|
|
if err != nil {
|
|
log.Fatalf("failed to connect to NATS: %v", err)
|
|
}
|
|
defer pub.Close()
|
|
|
|
pollDuration, err := time.ParseDuration(interval)
|
|
if err != nil {
|
|
log.Fatalf("invalid poll interval: %v", err)
|
|
}
|
|
|
|
cfg := swarm.Config{
|
|
LiteLLMBaseURL: litellmBase,
|
|
LiteLLMAPIKey: litellmKey,
|
|
HTTPTimeout: 5 * time.Second,
|
|
}
|
|
|
|
ticker := time.NewTicker(pollDuration)
|
|
defer ticker.Stop()
|
|
|
|
ctx := context.Background()
|
|
log.Printf("swarm-monitor started, polling every %s", pollDuration)
|
|
|
|
// Poll immediately on start.
|
|
if err := poll(ctx, pub, cfg); err != nil {
|
|
log.Printf("initial poll error: %v", err)
|
|
}
|
|
|
|
for range ticker.C {
|
|
if err := poll(ctx, pub, cfg); err != nil {
|
|
log.Printf("poll error: %v", err)
|
|
}
|
|
}
|
|
}
|
|
|
|
func poll(ctx context.Context, pub *qnats.Publisher, cfg swarm.Config) error {
|
|
services, err := swarm.CollectAll(ctx, cfg)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
issues := swarm.DetectIssues(services)
|
|
now := time.Now().UTC()
|
|
|
|
// Emit rolled-up swarm.snapshot.
|
|
if err := emit(ctx, pub, "swarm.snapshot", "agentmon.swarm", map[string]any{
|
|
"services": services,
|
|
"issues": issues,
|
|
}, now); err != nil {
|
|
log.Printf("failed to emit swarm.snapshot: %v", err)
|
|
}
|
|
|
|
// Emit one swarm.service.snapshot per service.
|
|
for _, svc := range services {
|
|
if err := emit(ctx, pub, "swarm.service.snapshot", "agentmon.swarm.service", map[string]any{
|
|
"service": svc,
|
|
}, now); err != nil {
|
|
log.Printf("failed to emit swarm.service.snapshot for %s: %v", svc.Name, err)
|
|
}
|
|
}
|
|
|
|
return nil
|
|
}
|
|
|
|
func emit(ctx context.Context, pub *qnats.Publisher, eventType, schemaName string, payload map[string]any, ts time.Time) error {
|
|
event := map[string]any{
|
|
"schema": map[string]any{
|
|
"name": schemaName,
|
|
"version": 1,
|
|
},
|
|
"event": map[string]any{
|
|
"id": generateID(),
|
|
"type": eventType,
|
|
"ts": ts.Format(time.RFC3339Nano),
|
|
},
|
|
"payload": payload,
|
|
}
|
|
|
|
data, err := json.Marshal(event)
|
|
if err != nil {
|
|
return err
|
|
}
|
|
|
|
return pub.Publish(ctx, data)
|
|
}
|
|
|
|
func generateID() string {
|
|
return time.Now().Format("20060102150405") + "-" + randomString(8)
|
|
}
|
|
|
|
func randomString(n int) string {
|
|
const chars = "abcdefghijklmnopqrstuvwxyz0123456789"
|
|
b := make([]byte, n)
|
|
for i := range b {
|
|
b[i] = chars[time.Now().Nanosecond()%len(chars)]
|
|
time.Sleep(time.Nanosecond)
|
|
}
|
|
return string(b)
|
|
}
|
|
|
|
func envDefault(key, def string) string {
|
|
if v := os.Getenv(key); v != "" {
|
|
return v
|
|
}
|
|
return def
|
|
}
|
|
```
|
|
|
|
**Step 2: Verify it compiles**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./cmd/swarm-monitor/`
|
|
Expected: no errors
|
|
|
|
**Step 3: Verify all binaries still build**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./...`
|
|
Expected: no errors
|
|
|
|
**Step 4: Commit**
|
|
|
|
```bash
|
|
git add cmd/swarm-monitor/main.go
|
|
git commit -m "feat: add swarm-monitor binary"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 5: Dashboard swarm strip
|
|
|
|
**Files:**
|
|
- Modify: `cmd/web-ui/static/app.js`
|
|
- Modify: `cmd/web-ui/static/style.css`
|
|
|
|
**Step 1: Add swarmState and merge function to app.js**
|
|
|
|
Near the top of the IIFE, alongside the existing `let openclawState = ...` declaration (line ~49), add:
|
|
|
|
```js
|
|
let swarmState = { services: {} }; // keyed by service name
|
|
```
|
|
|
|
After the existing `mergeOpenClawEvents` function (~line 716), add:
|
|
|
|
```js
|
|
function mergeSwarmSnapshot(evt) {
|
|
const payload = getEnvelopePayload(evt);
|
|
const services = payload.services || [];
|
|
for (const svc of services) {
|
|
if (svc.name) swarmState.services[svc.name] = svc;
|
|
}
|
|
}
|
|
|
|
function mergeSwarmServiceSnapshot(evt) {
|
|
const payload = getEnvelopePayload(evt);
|
|
const svc = payload.service;
|
|
if (svc && svc.name) swarmState.services[svc.name] = svc;
|
|
}
|
|
```
|
|
|
|
**Step 2: Add swarm strip to renderDashboard**
|
|
|
|
In `renderDashboard()`, the HTML template already has:
|
|
```html
|
|
<div class="vm-strip" id="dash-vm-strip" style="margin-bottom:1.5rem"></div>
|
|
```
|
|
|
|
Right after that line, add a swarm strip div:
|
|
```html
|
|
<div class="swarm-strip" id="dash-swarm-strip"></div>
|
|
```
|
|
|
|
**Step 3: Add renderSwarmStrip function**
|
|
|
|
After the `renderAgentVMStrip_dash` function (~line 1351), add:
|
|
|
|
```js
|
|
function renderSwarmStrip_dash() {
|
|
const strip = document.getElementById('dash-swarm-strip');
|
|
if (!strip) return;
|
|
const services = Object.values(swarmState.services);
|
|
if (services.length === 0) return;
|
|
strip.innerHTML = services.map(svc => {
|
|
const statusClass = svc.status === 'healthy' ? 'active'
|
|
: svc.status === 'degraded' ? 'degraded' : 'inactive';
|
|
const label = svc.status || 'unknown';
|
|
return `
|
|
<div class="vm-pill ${statusClass}">
|
|
<span class="vm-pill-dot"></span>
|
|
<span class="vm-pill-name">${escapeHTML(svc.name)}</span>
|
|
<span class="vm-pill-label">${escapeHTML(label)}</span>
|
|
</div>
|
|
`;
|
|
}).join('');
|
|
}
|
|
```
|
|
|
|
**Step 4: Wire swarm strip into dashboard data load**
|
|
|
|
In `renderDashboard()`, the `Promise.all` block loads initial data. After `mergeOpenClawEvents(snapshots.events || [])` and `renderAgentVMStrip_dash()`, add:
|
|
|
|
```js
|
|
const swarmSnaps = await api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] }));
|
|
for (const evt of swarmSnaps.events || []) mergeSwarmSnapshot(evt);
|
|
renderSwarmStrip_dash();
|
|
```
|
|
|
|
Note: this needs to be inside the try block, before the `if (!isCurrentPath('/')) return;` guard. The simplest placement is to add it to the `Promise.all` array:
|
|
|
|
Replace the `Promise.all` call in `renderDashboard` to add swarm snapshots:
|
|
```js
|
|
const [summaryData, tsData, recentData, snapshots, swarmSnaps] = await Promise.all([
|
|
api('/v1/stats/summary'),
|
|
api('/v1/stats/timeseries?window=1h'),
|
|
api('/v1/events?limit=20'),
|
|
api('/v1/events?event_type=openclaw.snapshot&limit=100').catch(() => ({ events: [] })),
|
|
api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] })),
|
|
]);
|
|
```
|
|
|
|
Then after `renderAgentVMStrip_dash()`:
|
|
```js
|
|
for (const evt of swarmSnaps.events || []) mergeSwarmSnapshot(evt);
|
|
renderSwarmStrip_dash();
|
|
```
|
|
|
|
**Step 5: Handle swarm events in handleDashboardWS**
|
|
|
|
In `handleDashboardWS`, after the `openclaw.snapshot` handler block, add:
|
|
|
|
```js
|
|
if (eventType === 'swarm.snapshot') {
|
|
mergeSwarmSnapshot(msg.data);
|
|
renderSwarmStrip_dash();
|
|
return;
|
|
}
|
|
if (eventType === 'swarm.service.snapshot') {
|
|
mergeSwarmServiceSnapshot(msg.data);
|
|
renderSwarmStrip_dash();
|
|
return;
|
|
}
|
|
```
|
|
|
|
**Step 6: Add swarm strip CSS**
|
|
|
|
In `style.css`, after the `.vm-pill-label` block (~line 750), add:
|
|
|
|
```css
|
|
/* ── Swarm strip ──────────────────────────────────────────── */
|
|
.swarm-strip {
|
|
display: flex;
|
|
flex-wrap: wrap;
|
|
gap: 0.75rem;
|
|
margin-bottom: 1.5rem;
|
|
}
|
|
|
|
.vm-pill.degraded {
|
|
border-color: rgba(251, 191, 36, 0.3);
|
|
}
|
|
|
|
.vm-pill.degraded .vm-pill-dot {
|
|
background: var(--warning);
|
|
}
|
|
```
|
|
|
|
**Step 7: Verify no JS errors**
|
|
|
|
Build check: `cd /home/will/lab/agentmon && go build ./...`
|
|
Expected: no errors
|
|
|
|
**Step 8: Commit**
|
|
|
|
```bash
|
|
git add cmd/web-ui/static/app.js cmd/web-ui/static/style.css
|
|
git commit -m "feat: add swarm strip to dashboard"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 6: Infrastructure page CSS
|
|
|
|
**Files:**
|
|
- Modify: `cmd/web-ui/static/style.css`
|
|
|
|
**Step 1: Add infrastructure page styles**
|
|
|
|
Append to the end of `style.css`:
|
|
|
|
```css
|
|
/* ── Infrastructure page ──────────────────────────────────── */
|
|
.infra-section-title {
|
|
font-family: var(--font-display);
|
|
font-size: 0.75rem;
|
|
font-weight: 700;
|
|
color: var(--text-dim);
|
|
text-transform: uppercase;
|
|
letter-spacing: 0.12em;
|
|
margin: 0 0 1rem 0;
|
|
}
|
|
|
|
.infra-section {
|
|
margin-bottom: 2rem;
|
|
}
|
|
|
|
/* Service card grid */
|
|
.service-grid {
|
|
display: grid;
|
|
grid-template-columns: repeat(auto-fill, minmax(260px, 1fr));
|
|
gap: 1.25rem;
|
|
}
|
|
|
|
.service-card {
|
|
background: var(--surface);
|
|
border: 1px solid var(--border);
|
|
border-radius: var(--radius-lg);
|
|
padding: 1.125rem 1.25rem;
|
|
display: flex;
|
|
flex-direction: column;
|
|
gap: 0.75rem;
|
|
transition: border-color 0.2s;
|
|
}
|
|
|
|
.service-card:hover {
|
|
border-color: rgba(34, 211, 238, 0.15);
|
|
}
|
|
|
|
.service-card-header {
|
|
display: flex;
|
|
align-items: center;
|
|
justify-content: space-between;
|
|
}
|
|
|
|
.service-card-name {
|
|
font-family: var(--font-mono);
|
|
font-size: 0.88rem;
|
|
font-weight: 600;
|
|
color: var(--text-bright);
|
|
}
|
|
|
|
.service-badge {
|
|
font-size: 0.65rem;
|
|
font-weight: 700;
|
|
text-transform: uppercase;
|
|
letter-spacing: 0.08em;
|
|
padding: 0.2rem 0.55rem;
|
|
border-radius: 999px;
|
|
}
|
|
|
|
.service-badge.healthy {
|
|
background: rgba(52, 211, 153, 0.12);
|
|
color: var(--success);
|
|
border: 1px solid rgba(52, 211, 153, 0.2);
|
|
}
|
|
|
|
.service-badge.degraded {
|
|
background: rgba(251, 191, 36, 0.12);
|
|
color: var(--warning);
|
|
border: 1px solid rgba(251, 191, 36, 0.2);
|
|
}
|
|
|
|
.service-badge.down {
|
|
background: rgba(248, 113, 113, 0.12);
|
|
color: var(--error);
|
|
border: 1px solid rgba(248, 113, 113, 0.2);
|
|
}
|
|
|
|
.service-role-tag {
|
|
font-size: 0.65rem;
|
|
font-family: var(--font-mono);
|
|
color: var(--text-dim);
|
|
margin-top: -0.25rem;
|
|
}
|
|
|
|
.service-stats {
|
|
display: flex;
|
|
flex-direction: column;
|
|
gap: 0.3rem;
|
|
font-size: 0.78rem;
|
|
}
|
|
|
|
.service-stat-row {
|
|
display: flex;
|
|
justify-content: space-between;
|
|
align-items: center;
|
|
}
|
|
|
|
.service-stat-label {
|
|
color: var(--text-dim);
|
|
font-family: var(--font-mono);
|
|
font-size: 0.72rem;
|
|
}
|
|
|
|
.service-stat-value {
|
|
color: var(--text);
|
|
font-family: var(--font-mono);
|
|
font-size: 0.75rem;
|
|
}
|
|
|
|
.service-stat-value.ok { color: var(--success); }
|
|
.service-stat-value.warn { color: var(--warning); }
|
|
.service-stat-value.bad { color: var(--error); }
|
|
|
|
/* LiteLLM cooldown warning */
|
|
.llm-cooldown-banner {
|
|
background: rgba(251, 191, 36, 0.08);
|
|
border: 1px solid rgba(251, 191, 36, 0.2);
|
|
border-radius: var(--radius);
|
|
padding: 0.4rem 0.625rem;
|
|
font-size: 0.72rem;
|
|
color: var(--warning);
|
|
font-family: var(--font-mono);
|
|
}
|
|
|
|
/* LiteLLM model count highlight */
|
|
.llm-model-count {
|
|
font-family: var(--font-display);
|
|
font-size: 1.5rem;
|
|
font-weight: 800;
|
|
color: var(--text-bright);
|
|
letter-spacing: -0.02em;
|
|
line-height: 1;
|
|
}
|
|
|
|
.llm-model-label {
|
|
font-size: 0.68rem;
|
|
color: var(--text-dim);
|
|
text-transform: uppercase;
|
|
letter-spacing: 0.08em;
|
|
}
|
|
```
|
|
|
|
**Step 2: Commit**
|
|
|
|
```bash
|
|
git add cmd/web-ui/static/style.css
|
|
git commit -m "feat: add infrastructure page CSS"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 7: Infrastructure page JS + nav rename
|
|
|
|
**Files:**
|
|
- Modify: `cmd/web-ui/static/app.js`
|
|
- Modify: `cmd/web-ui/static/index.html`
|
|
|
|
**Step 1: Update nav in index.html**
|
|
|
|
Change the nav link from `OpenClaw` to `Infra` and update the href:
|
|
|
|
Old:
|
|
```html
|
|
<nav><a href="/">Dashboard</a><a href="/sessions">Sessions</a><a href="/agents">Agents</a><a href="/openclaw">OpenClaw</a></nav>
|
|
```
|
|
|
|
New:
|
|
```html
|
|
<nav><a href="/">Dashboard</a><a href="/sessions">Sessions</a><a href="/agents">Agents</a><a href="/infrastructure">Infra</a></nav>
|
|
```
|
|
|
|
**Step 2: Update the router in app.js**
|
|
|
|
Change line ~153:
|
|
```js
|
|
} else if (path.startsWith('/openclaw')) {
|
|
renderOpenClaw();
|
|
```
|
|
to:
|
|
```js
|
|
} else if (path.startsWith('/infrastructure')) {
|
|
renderInfrastructure();
|
|
```
|
|
|
|
**Step 3: Add infraUnsubscribe state variable**
|
|
|
|
Near the existing `let openclawUnsubscribe = null;` declaration (~line 50), add:
|
|
```js
|
|
let infraUnsubscribe = null;
|
|
```
|
|
|
|
**Step 4: Update cleanupLiveViews to clean up infra subscription**
|
|
|
|
Find the `cleanupLiveViews` function (~line 107). Replace:
|
|
```js
|
|
if (openclawUnsubscribe) {
|
|
openclawUnsubscribe();
|
|
openclawUnsubscribe = null;
|
|
}
|
|
```
|
|
with:
|
|
```js
|
|
if (openclawUnsubscribe) {
|
|
openclawUnsubscribe();
|
|
openclawUnsubscribe = null;
|
|
}
|
|
if (infraUnsubscribe) {
|
|
infraUnsubscribe();
|
|
infraUnsubscribe = null;
|
|
}
|
|
```
|
|
|
|
**Step 5: Replace renderOpenClaw with renderInfrastructure**
|
|
|
|
Replace the existing `renderOpenClaw` function (lines ~664-680) entirely with:
|
|
|
|
```js
|
|
async function renderInfrastructure() {
|
|
app.innerHTML = '<div class="page-header"><h2>Infrastructure</h2></div><p class="empty-state">Loading...</p>';
|
|
|
|
infraUnsubscribe = subscribeWS(handleInfraWS);
|
|
|
|
try {
|
|
const [ocData, swarmData] = await Promise.all([
|
|
api('/v1/events?event_type=openclaw.snapshot&limit=100'),
|
|
api('/v1/events?event_type=swarm.snapshot&limit=10').catch(() => ({ events: [] })),
|
|
]);
|
|
|
|
mergeOpenClawEvents(ocData.events || []);
|
|
for (const evt of swarmData.events || []) mergeSwarmSnapshot(evt);
|
|
|
|
if (isCurrentPath('/infrastructure')) {
|
|
renderInfraGrid();
|
|
}
|
|
} catch (e) {
|
|
if (isCurrentPath('/infrastructure')) {
|
|
app.innerHTML = `<div class="page-header"><h2>Infrastructure</h2></div><p class="empty-state">Error: ${escapeHTML(e.message)}</p>`;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 6: Replace handleOpenClawWS with handleInfraWS**
|
|
|
|
Replace the existing `handleOpenClawWS` function (lines ~682-699) with:
|
|
|
|
```js
|
|
function handleInfraWS(msg) {
|
|
if (msg.type !== 'message') return;
|
|
|
|
const eventType = getEnvelopeType(msg.data);
|
|
|
|
if (eventType === 'openclaw.snapshot') {
|
|
mergeOpenClawEvents([msg.data]);
|
|
if (isCurrentPath('/infrastructure')) renderInfraGrid();
|
|
if (isCurrentPath('/agents')) renderAgentVMStrip();
|
|
return;
|
|
}
|
|
|
|
if (eventType === 'swarm.snapshot') {
|
|
mergeSwarmSnapshot(msg.data);
|
|
if (isCurrentPath('/infrastructure')) renderInfraGrid();
|
|
renderSwarmStrip_dash();
|
|
return;
|
|
}
|
|
|
|
if (eventType === 'swarm.service.snapshot') {
|
|
mergeSwarmServiceSnapshot(msg.data);
|
|
if (isCurrentPath('/infrastructure')) renderInfraGrid();
|
|
renderSwarmStrip_dash();
|
|
return;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Step 7: Add renderInfraGrid function**
|
|
|
|
Replace the existing `renderOpenClawGrid` function (lines ~718-785) with a new `renderInfraGrid` that shows both VMs and service cards. Add it right after the new `handleInfraWS` function:
|
|
|
|
```js
|
|
function renderInfraGrid() {
|
|
const vmNames = Object.keys(openclawState.instances).sort();
|
|
const services = Object.values(swarmState.services);
|
|
|
|
app.innerHTML = `
|
|
<div class="page-header">
|
|
<h2>Infrastructure <span class="live-indicator"><span class="live-dot"></span>Live</span></h2>
|
|
</div>
|
|
|
|
<div class="infra-section">
|
|
<p class="infra-section-title">VMs</p>
|
|
${vmNames.length === 0
|
|
? '<p class="empty-state">No VM data</p>'
|
|
: `<div class="vm-grid">${vmNames.map(name => renderVMCard(name)).join('')}</div>`
|
|
}
|
|
</div>
|
|
|
|
<div class="infra-section">
|
|
<p class="infra-section-title">Services</p>
|
|
${services.length === 0
|
|
? '<p class="empty-state">No swarm service data</p>'
|
|
: `<div class="service-grid">${services.map(svc => renderServiceCard(svc)).join('')}</div>`
|
|
}
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderVMCard(name) {
|
|
const evt = openclawState.instances[name];
|
|
const payload = getEnvelopePayload(evt);
|
|
const inst = payload.instance || {};
|
|
const host = payload.host || {};
|
|
const guest = payload.guest;
|
|
const issues = payload.issues;
|
|
|
|
return `
|
|
<div class="vm-card">
|
|
<div class="vm-card-header">
|
|
<h3>${escapeHTML(inst.name || name)}</h3>
|
|
<div class="vm-status ${host.state === 'running' ? 'running' : 'stopped'}">
|
|
${host.state === 'running' ? 'Running' : 'Stopped'}
|
|
</div>
|
|
</div>
|
|
<div class="vm-updated">Updated ${escapeHTML(relativeTime(getEnvelopeTS(evt)))}</div>
|
|
<table class="vm-stats">
|
|
<tr><td>Host</td><td>${escapeHTML(inst.host || '-')}</td></tr>
|
|
<tr><td>Domain</td><td>${escapeHTML(inst.domain || '-')}</td></tr>
|
|
<tr><td>vCPUs</td><td>${host.vcpus || '-'}</td></tr>
|
|
<tr><td>Memory</td><td>${escapeHTML(formatBytes(host.memory_kib ? host.memory_kib * 1024 : 0) || '-')}</td></tr>
|
|
<tr><td>Disk</td><td>${escapeHTML(formatBytes(host.disk_actual_bytes) || '-')}</td></tr>
|
|
<tr><td>Autostart</td><td>${host.autostart ? 'Yes' : 'No'}</td></tr>
|
|
</table>
|
|
${guest ? `
|
|
<div class="vm-card-divider"></div>
|
|
<table class="vm-stats">
|
|
<tr><td>Gateway</td><td style="${guest.service_active ? 'color:var(--success)' : 'color:var(--error)'}">${guest.service_active ? 'Active' : 'Inactive'}</td></tr>
|
|
<tr><td>HTTP</td><td style="${guest.http_status === 200 ? 'color:var(--success)' : 'color:var(--error)'}">${guest.http_status || 'N/A'}</td></tr>
|
|
<tr><td>Version</td><td>${escapeHTML(guest.version || '-')}</td></tr>
|
|
<tr><td>Guest Mem</td><td>${guest.memory_percent !== undefined ? guest.memory_percent.toFixed(1) : '-'}%</td></tr>
|
|
<tr><td>Guest Disk</td><td>${guest.disk_percent !== undefined ? guest.disk_percent.toFixed(1) : '-'}%</td></tr>
|
|
<tr><td>Load</td><td>${guest.load_average !== undefined ? guest.load_average.toFixed(2) : '-'}</td></tr>
|
|
<tr><td>Uptime</td><td>${escapeHTML(guest.service_uptime || '-')}</td></tr>
|
|
</table>
|
|
` : ''}
|
|
${issues && Object.values(issues).some(Boolean) ? `
|
|
<div class="vm-card-divider"></div>
|
|
<div class="vm-issues-label">Issues</div>
|
|
<div class="vm-issues">
|
|
${Object.entries(issues).filter(([, value]) => value).map(([key]) => `
|
|
<span class="issue ${escapeHTML(key)}">${escapeHTML(key.replace(/_/g, ' '))}</span>
|
|
`).join('')}
|
|
</div>
|
|
` : ''}
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderServiceCard(svc) {
|
|
const role = svc.role || 'unknown';
|
|
switch (role) {
|
|
case 'llm-proxy': return renderLLMProxyCard(svc);
|
|
case 'db': return renderDBCard(svc);
|
|
case 'search': return renderSearchCard(svc);
|
|
case 'mcp': return renderMCPCard(svc);
|
|
case 'voice': return renderVoiceCard(svc);
|
|
case 'automation':return renderAutomationCard(svc);
|
|
default: return renderGenericServiceCard(svc);
|
|
}
|
|
}
|
|
|
|
function serviceCardHeader(svc) {
|
|
return `
|
|
<div class="service-card-header">
|
|
<div>
|
|
<div class="service-card-name">${escapeHTML(svc.name)}</div>
|
|
<div class="service-role-tag">${escapeHTML(svc.role || '')}</div>
|
|
</div>
|
|
<span class="service-badge ${escapeHTML(svc.status || 'down')}">${escapeHTML(svc.status || 'down')}</span>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function serviceStatRow(label, value, valueClass) {
|
|
return `
|
|
<div class="service-stat-row">
|
|
<span class="service-stat-label">${escapeHTML(label)}</span>
|
|
<span class="service-stat-value${valueClass ? ' ' + valueClass : ''}">${value}</span>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function formatUptime(sec) {
|
|
if (!sec) return '-';
|
|
if (sec < 60) return sec + 's';
|
|
if (sec < 3600) return Math.floor(sec / 60) + 'm';
|
|
if (sec < 86400) return Math.floor(sec / 3600) + 'h ' + Math.floor((sec % 3600) / 60) + 'm';
|
|
return Math.floor(sec / 86400) + 'd ' + Math.floor((sec % 86400) / 3600) + 'h';
|
|
}
|
|
|
|
function renderLLMProxyCard(svc) {
|
|
const extra = svc.extra || {};
|
|
const modelCount = extra.model_count;
|
|
const cooldowns = extra.cooldown_count || 0;
|
|
const httpStatus = svc.http_status;
|
|
const httpClass = httpStatus === 200 ? 'ok' : httpStatus ? 'bad' : '';
|
|
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div style="display:flex;align-items:baseline;gap:0.5rem">
|
|
<span class="llm-model-count">${modelCount !== undefined ? modelCount : '-'}</span>
|
|
<span class="llm-model-label">models</span>
|
|
</div>
|
|
${cooldowns > 0 ? `<div class="llm-cooldown-banner">⚠ ${cooldowns} model${cooldowns > 1 ? 's' : ''} in cooldown</div>` : ''}
|
|
<div class="service-stats">
|
|
${serviceStatRow('HTTP', httpStatus ? String(httpStatus) : '-', httpClass)}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderDBCard(svc) {
|
|
const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderSearchCard(svc) {
|
|
const extra = svc.extra || {};
|
|
const ms = extra.response_ms;
|
|
const httpStatus = svc.http_status;
|
|
const httpClass = httpStatus === 200 ? 'ok' : httpStatus ? 'bad' : '';
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${serviceStatRow('HTTP', httpStatus ? String(httpStatus) : '-', httpClass)}
|
|
${ms !== undefined ? serviceStatRow('Response', ms + 'ms', ms < 500 ? 'ok' : 'warn') : ''}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderMCPCard(svc) {
|
|
const extra = svc.extra || {};
|
|
const reachable = extra.port_reachable;
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${reachable !== undefined ? serviceStatRow('Port', reachable ? 'reachable' : 'unreachable', reachable ? 'ok' : 'bad') : ''}
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderVoiceCard(svc) {
|
|
const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderAutomationCard(svc) {
|
|
const healthClass = svc.health_state === 'healthy' ? 'ok' : svc.health_state === 'unhealthy' ? 'bad' : '';
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${serviceStatRow('Health', escapeHTML(svc.health_state || 'none'), healthClass)}
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
|
|
function renderGenericServiceCard(svc) {
|
|
return `
|
|
<div class="service-card">
|
|
${serviceCardHeader(svc)}
|
|
<div class="service-stats">
|
|
${serviceStatRow('Container', escapeHTML(svc.container_state || '-'), svc.container_state === 'running' ? 'ok' : 'bad')}
|
|
${serviceStatRow('Uptime', formatUptime(svc.uptime_sec), '')}
|
|
</div>
|
|
</div>
|
|
`;
|
|
}
|
|
```
|
|
|
|
**Step 8: Verify build**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./...`
|
|
Expected: no errors
|
|
|
|
**Step 9: Commit**
|
|
|
|
```bash
|
|
git add cmd/web-ui/static/app.js cmd/web-ui/static/index.html
|
|
git commit -m "feat: rename OpenClaw to Infrastructure page, add service cards"
|
|
```
|
|
|
|
---
|
|
|
|
### Task 8: End-to-end verification
|
|
|
|
**Step 1: Build all binaries**
|
|
|
|
Run: `cd /home/will/lab/agentmon && go build ./...`
|
|
Expected: no errors
|
|
|
|
**Step 2: Test docker label filtering manually**
|
|
|
|
Run: `docker ps -a --filter label=agentmon.monitor=true --format "table {{.Names}}\t{{.Labels}}\t{{.Status}}"`
|
|
Expected: lists swarm containers that are currently running with their labels
|
|
|
|
**Step 3: Test swarm-monitor dry run**
|
|
|
|
Run:
|
|
```bash
|
|
cd /home/will/lab/agentmon
|
|
NATS_URL=nats://localhost:4222 LITELLM_MASTER_KEY=$(source /home/will/lab/swarm/.env && echo $LITELLM_MASTER_KEY) \
|
|
go run ./cmd/swarm-monitor/ 2>&1 | head -20
|
|
```
|
|
Expected: logs "swarm-monitor started", then either publishes events or logs connection errors (NATS may not be running locally — that's fine, look for the collection phase to succeed before the publish fails)
|
|
|
|
**Step 4: Navigate to /infrastructure in browser**
|
|
|
|
Open the web UI and navigate to `/infrastructure`.
|
|
Verify:
|
|
- Nav shows "Infra" link, active when on `/infrastructure`
|
|
- VMs section shows existing openclaw cards
|
|
- Services section shows either cards (if swarm events exist in DB) or "No swarm service data"
|
|
|
|
**Step 5: Verify swarm strip on dashboard**
|
|
|
|
Navigate to `/`.
|
|
Verify:
|
|
- VM strip still shows (zap/orb/sun)
|
|
- Swarm strip renders below it (may be empty if no `swarm.snapshot` events in DB yet)
|
|
|
|
**Step 6: Final commit if any fixes needed**
|
|
|
|
```bash
|
|
git add -A
|
|
git commit -m "fix: infrastructure page and swarm strip polish"
|
|
```
|