Files
agentmon/docs/plans/2026-03-13-agent-monitoring-plan.md
T
William Valentin 3434db3c59 feat: complete agent monitoring - hook, UI, and backend filter
- Add event_type and framework filters to events query endpoint
- Add /agents SPA route to web-ui server
- Add Agents nav link and route in frontend
- Add agents page CSS (timeline, VM pills, stats panel)
- Build VM status strip, activity timeline, and real-time stats
- Add agentmon hook for OpenClaw (HOOK.md + handler.ts)
- Add docker-compose, Dockerfile, and supporting infra files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 00:26:42 -07:00

33 KiB

Agent Activity Monitoring Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Monitor all OpenClaw agent activity (tool calls, messages, sessions, errors) across three VMs and display it in a real-time dashboard.

Architecture: An OpenClaw hook (TypeScript) on each VM captures agent events and POSTs them as agentmon envelopes to the ingest gateway. A new /agents page in the web UI renders a live activity timeline via WebSocket. A small backend addition filters events by framework.

Tech Stack: TypeScript (hook), Go (backend filter), Vanilla JS/CSS (UI)

Design doc: docs/plans/2026-03-13-agent-monitoring-design.md


Task 1: Add framework filter to events query

Files:

  • Modify: internal/store/postgres/query.go
  • Modify: cmd/query-api/main.go

The agents UI needs to load recent events filtered by source_framework = 'openclaw'. The existing ListRecentEvents only takes limit.

Step 1: Add EventsFilter struct and update ListRecentEvents

In internal/store/postgres/query.go, replace the current function:

type EventsFilter struct {
	Limit     int
	EventType string
	Framework string
}

func (d *DB) ListRecentEvents(ctx context.Context, f EventsFilter) ([]EventRow, error) {
	if f.Limit <= 0 {
		f.Limit = 100
	}
	if f.Limit > 1000 {
		f.Limit = 1000
	}

	query := "SELECT event_id, ts, type, payload FROM events WHERE 1=1"
	args := []any{}
	argN := 1

	if f.EventType != "" {
		query += fmt.Sprintf(" AND type = $%d", argN)
		args = append(args, f.EventType)
		argN++
	}
	if f.Framework != "" {
		query += fmt.Sprintf(" AND source_framework = $%d", argN)
		args = append(args, f.Framework)
		argN++
	}

	query += fmt.Sprintf(" ORDER BY ts DESC LIMIT $%d", argN)
	args = append(args, f.Limit)

	rows, err := d.sql.QueryContext(ctx, query, args...)
	if err != nil {
		return nil, err
	}
	defer rows.Close()

	var out []EventRow
	for rows.Next() {
		var r EventRow
		if err := rows.Scan(&r.EventID, &r.TS, &r.Type, &r.Payload); err != nil {
			return nil, err
		}
		out = append(out, r)
	}
	return out, rows.Err()
}

Add "fmt" to the import block.

Step 2: Update the query-api handler to pass filters

In cmd/query-api/main.go, update the /v1/events handler (around line 113):

r.Get("/v1/events", func(w http.ResponseWriter, r *http.Request) {
    limit, _ := strconv.Atoi(r.URL.Query().Get("limit"))
    f := postgres.EventsFilter{
        Limit:     limit,
        EventType: r.URL.Query().Get("event_type"),
        Framework: r.URL.Query().Get("framework"),
    }
    events, err := db.ListRecentEvents(r.Context(), f)
    if err != nil {
        httpx.WriteJSON(w, http.StatusInternalServerError, map[string]any{"error": "db_error"})
        return
    }
    httpx.WriteJSON(w, http.StatusOK, map[string]any{"events": events})
})

Step 3: Verify it compiles

Run: cd /home/will/lab/agentmon && go build ./... Expected: No errors

Step 4: Test the filter with curl

Run: curl -s 'http://localhost:8081/v1/events?framework=openclaw&limit=5' Expected: {"events":null} or {"events":[]} (no openclaw events yet, but no errors)

Step 5: Commit

git add internal/store/postgres/query.go cmd/query-api/main.go
git commit -m "feat: add event_type and framework filters to events endpoint"

Task 2: Add /agents SPA route to Go server

Files:

  • Modify: cmd/web-ui/main.go:51

Step 1: Add /agents to the SPA catch-all

In cmd/web-ui/main.go, update line 51 to include /agents:

Change:

if r.URL.Path == "/" || strings.HasPrefix(r.URL.Path, "/sessions") || strings.HasPrefix(r.URL.Path, "/runs") || strings.HasPrefix(r.URL.Path, "/openclaw") {

To:

if r.URL.Path == "/" || strings.HasPrefix(r.URL.Path, "/sessions") || strings.HasPrefix(r.URL.Path, "/runs") || strings.HasPrefix(r.URL.Path, "/openclaw") || strings.HasPrefix(r.URL.Path, "/agents") {

Step 2: Verify it compiles

Run: cd /home/will/lab/agentmon && go build ./cmd/web-ui/ Expected: No errors

Step 3: Commit

git add cmd/web-ui/main.go
git commit -m "feat: add /agents SPA route to web-ui server"

Files:

  • Modify: cmd/web-ui/static/index.html
  • Modify: cmd/web-ui/static/app.js

Step 1: Add "Agents" nav link in index.html

In cmd/web-ui/static/index.html, update the <nav> to add an Agents link before OpenClaw:

<nav><a href="/agents">Agents</a><a href="/openclaw">OpenClaw</a></nav>

Step 2: Add /agents route to the SPA router

In cmd/web-ui/static/app.js, update the route() function (around line 58). Add the agents route before the openclaw check:

function route() {
    const path = window.location.pathname;

    if (path === '/' || path === '/sessions') {
      renderSessions();
    } else if (path.startsWith('/agents')) {
      renderAgents();
    } else if (path.startsWith('/openclaw')) {
      renderOpenClaw();
    } else if (path.startsWith('/sessions/')) {
      const sessionID = path.split('/sessions/')[1];
      renderSession(sessionID);
    } else if (path.startsWith('/runs/')) {
      const runID = path.split('/runs/')[1];
      renderRun(runID);
    } else {
      app.innerHTML = '<p>Page not found</p>';
    }
  }

Step 3: Add a stub renderAgents function

Add this above the // Start comment at the bottom of app.js:

  // Agents page
  let agentsState = { events: [], vmStatus: {} };
  let agentsUnsubscribe = null;

  async function renderAgents() {
    app.innerHTML = `
      <div class="page-header"><h2>Agents</h2></div>
      <p class="empty-state">Loading agent activity...</p>
    `;
  }

Step 4: Verify in browser

Open http://localhost:8082/agents — should show "Loading agent activity..."

Step 5: Commit

git add cmd/web-ui/static/index.html cmd/web-ui/static/app.js
git commit -m "feat: add /agents route stub and nav link"

Task 4: Agents page CSS

Files:

  • Modify: cmd/web-ui/static/style.css

Step 1: Add all agents page styles

Append to the end of style.css:

/* ── Agents Page ───────────────────────────────────────── */
.agents-layout {
  display: grid;
  grid-template-columns: 1fr 280px;
  gap: 1.5rem;
  margin-top: 1.25rem;
}

@media (max-width: 900px) {
  .agents-layout { grid-template-columns: 1fr; }
}

/* VM Status Strip */
.vm-strip {
  display: flex;
  gap: 0.75rem;
  margin-bottom: 1.5rem;
}

.vm-pill {
  display: flex;
  align-items: center;
  gap: 0.5rem;
  padding: 0.5rem 1rem;
  background: var(--surface);
  border: 1px solid var(--border);
  border-radius: 999px;
  font-size: 0.78rem;
  font-weight: 600;
  letter-spacing: 0.04em;
  transition: border-color 0.2s;
}

.vm-pill.active {
  border-color: rgba(52, 211, 153, 0.3);
}

.vm-pill.inactive {
  border-color: rgba(248, 113, 113, 0.2);
  opacity: 0.6;
}

.vm-pill-dot {
  width: 7px;
  height: 7px;
  border-radius: 50%;
  flex-shrink: 0;
}

.vm-pill.active .vm-pill-dot {
  background: var(--success);
  box-shadow: 0 0 6px rgba(52, 211, 153, 0.5);
  animation: livePulse 2s ease-in-out infinite;
}

.vm-pill.inactive .vm-pill-dot {
  background: var(--error);
}

.vm-pill-name {
  font-family: var(--font-mono);
  color: var(--text-bright);
}

.vm-pill-label {
  color: var(--text-dim);
  font-size: 0.68rem;
  text-transform: uppercase;
  letter-spacing: 0.06em;
}

/* Activity Timeline */
.timeline {
  display: flex;
  flex-direction: column;
  gap: 0.5rem;
}

.timeline-event {
  background: var(--card);
  border: 1px solid var(--border);
  border-radius: var(--radius-lg);
  padding: 0.875rem 1.125rem;
  backdrop-filter: blur(8px);
  -webkit-backdrop-filter: blur(8px);
  animation: fadeUp 0.25s ease both;
  transition: border-color 0.15s;
}

.timeline-event:hover {
  border-color: rgba(34, 211, 238, 0.15);
}

.timeline-event-header {
  display: flex;
  align-items: center;
  gap: 0.6rem;
  margin-bottom: 0.35rem;
}

.timeline-vm-tag {
  font-family: var(--font-mono);
  font-size: 0.68rem;
  font-weight: 700;
  padding: 0.15rem 0.5rem;
  border-radius: 4px;
  letter-spacing: 0.05em;
  text-transform: uppercase;
}

.timeline-vm-tag.zap {
  background: rgba(34, 211, 238, 0.12);
  color: var(--accent);
  border: 1px solid rgba(34, 211, 238, 0.2);
}

.timeline-vm-tag.orb {
  background: rgba(167, 139, 250, 0.12);
  color: var(--purple);
  border: 1px solid rgba(167, 139, 250, 0.2);
}

.timeline-vm-tag.sun {
  background: rgba(251, 191, 36, 0.12);
  color: var(--warning);
  border: 1px solid rgba(251, 191, 36, 0.2);
}

.timeline-event-type {
  font-size: 0.75rem;
  font-weight: 600;
  color: var(--text-bright);
}

.timeline-event-time {
  font-family: var(--font-mono);
  font-size: 0.68rem;
  color: var(--text-dim);
  margin-left: auto;
}

.timeline-event-body {
  font-size: 0.82rem;
  color: var(--text);
  line-height: 1.5;
  padding-left: 0.15rem;
}

.timeline-event-body.tool-name {
  font-family: var(--font-mono);
  color: var(--accent);
  font-size: 0.78rem;
}

.timeline-event-body.message-preview {
  color: var(--text-dim);
  font-style: italic;
}

.timeline-event-body.error-message {
  color: var(--error);
}

.timeline-duration {
  font-family: var(--font-mono);
  font-size: 0.72rem;
  color: var(--text-dim);
  margin-left: 0.5rem;
}

.timeline-detail {
  margin-top: 0.5rem;
  padding: 0.75rem;
  background: #020508;
  border-radius: var(--radius);
  font-family: var(--font-mono);
  font-size: 0.75rem;
  color: #7a9ab5;
  white-space: pre-wrap;
  word-break: break-all;
  line-height: 1.65;
  display: none;
}

.timeline-event.expanded .timeline-detail {
  display: block;
}

.timeline-expand-hint {
  font-size: 0.68rem;
  color: var(--text-dim);
  cursor: pointer;
  margin-top: 0.3rem;
  letter-spacing: 0.03em;
}

.timeline-expand-hint:hover {
  color: var(--accent);
}

/* Stats Panel */
.stats-panel {
  display: flex;
  flex-direction: column;
  gap: 1rem;
}

.stat-card {
  background: var(--surface);
  border: 1px solid var(--border);
  border-radius: var(--radius-lg);
  padding: 1rem;
}

.stat-card-title {
  font-size: 0.68rem;
  font-weight: 700;
  color: var(--text-dim);
  text-transform: uppercase;
  letter-spacing: 0.1em;
  margin-bottom: 0.6rem;
}

.stat-card-value {
  font-family: var(--font-display);
  font-size: 1.6rem;
  font-weight: 800;
  color: var(--text-bright);
  letter-spacing: -0.02em;
}

.stat-card-sub {
  font-size: 0.72rem;
  color: var(--text-dim);
  margin-top: 0.1rem;
}

.stat-list {
  list-style: none;
}

.stat-list li {
  display: flex;
  justify-content: space-between;
  align-items: center;
  padding: 0.35rem 0;
  border-bottom: 1px solid var(--border-soft);
  font-size: 0.8rem;
}

.stat-list li:last-child { border-bottom: none; }

.stat-list-name {
  font-family: var(--font-mono);
  font-size: 0.75rem;
  color: var(--text);
}

.stat-list-count {
  font-family: var(--font-mono);
  font-size: 0.72rem;
  color: var(--text-dim);
  background: var(--surface-2);
  padding: 0.1rem 0.4rem;
  border-radius: 4px;
}

/* Event type icons */
.event-icon {
  width: 18px;
  height: 18px;
  border-radius: 4px;
  display: flex;
  align-items: center;
  justify-content: center;
  font-size: 0.6rem;
  flex-shrink: 0;
}

.event-icon.message-in {
  background: rgba(52, 211, 153, 0.12);
  color: var(--success);
  border: 1px solid rgba(52, 211, 153, 0.25);
}

.event-icon.message-out {
  background: rgba(34, 211, 238, 0.12);
  color: var(--accent);
  border: 1px solid rgba(34, 211, 238, 0.25);
}

.event-icon.tool {
  background: rgba(167, 139, 250, 0.12);
  color: var(--purple);
  border: 1px solid rgba(167, 139, 250, 0.25);
}

.event-icon.error {
  background: rgba(248, 113, 113, 0.12);
  color: var(--error);
  border: 1px solid rgba(248, 113, 113, 0.25);
}

.event-icon.session {
  background: rgba(251, 191, 36, 0.12);
  color: var(--warning);
  border: 1px solid rgba(251, 191, 36, 0.25);
}

.event-icon.internal {
  background: var(--surface-2);
  color: var(--text-dim);
  border: 1px solid var(--border);
}

Step 2: Commit

git add cmd/web-ui/static/style.css
git commit -m "feat: add agents page CSS with timeline, stats panel, vm pills"

Task 5: Build the VM status strip

Files:

  • Modify: cmd/web-ui/static/app.js

Step 1: Update renderAgents to show VM status strip

Replace the stub renderAgents function with:

  // Agents page
  let agentsState = { events: [], vmStatus: {}, stats: { messages: 0, tools: 0, errors: 0, toolCounts: {} } };
  let agentsUnsubscribe = null;

  function getVMStatus() {
    // Reuse openclaw instance data if available
    const vms = ['zap', 'orb', 'sun'];
    return vms.map(name => {
      const inst = openclawState.instances[name];
      if (!inst) return { name, active: false };
      const payload = inst.payload?.payload || inst.payload || {};
      const host = payload.host || {};
      return { name, active: host.state === 'running' };
    });
  }

  async function renderAgents() {
    // Load openclaw status for VM pills
    try {
      const data = await api('/v1/events?event_type=openclaw.snapshot&limit=100');
      updateOpenClawState(data.events || []);
    } catch (e) { /* ignore */ }

    const vms = getVMStatus();

    app.innerHTML = `
      <div class="page-header">
        <h2>Agents <span class="live-indicator"><span class="live-dot"></span>Live</span></h2>
      </div>
      <div class="vm-strip">
        ${vms.map(vm => `
          <div class="vm-pill ${vm.active ? 'active' : 'inactive'}">
            <span class="vm-pill-dot"></span>
            <span class="vm-pill-name">${vm.name}</span>
            <span class="vm-pill-label">${vm.active ? 'online' : 'offline'}</span>
          </div>
        `).join('')}
      </div>
      <div class="agents-layout">
        <div class="timeline" id="agents-timeline">
          <p class="empty-state">Waiting for agent activity...</p>
        </div>
        <div class="stats-panel" id="agents-stats">
          <div class="stat-card">
            <div class="stat-card-title">Messages</div>
            <div class="stat-card-value" id="stat-messages">0</div>
            <div class="stat-card-sub">received &amp; sent</div>
          </div>
          <div class="stat-card">
            <div class="stat-card-title">Tool Calls</div>
            <div class="stat-card-value" id="stat-tools">0</div>
          </div>
          <div class="stat-card">
            <div class="stat-card-title">Errors</div>
            <div class="stat-card-value" id="stat-errors">0</div>
          </div>
          <div class="stat-card">
            <div class="stat-card-title">Top Tools</div>
            <ul class="stat-list" id="stat-top-tools">
              <li style="color:var(--text-dim);font-size:0.8rem">No data yet</li>
            </ul>
          </div>
        </div>
      </div>
    `;

    // Load initial events and subscribe to WebSocket
    await loadAgentEvents();
    if (agentsUnsubscribe) agentsUnsubscribe();
    agentsUnsubscribe = subscribeWS(handleAgentsWS);
  }

Step 2: Verify in browser

Open http://localhost:8082/agents — should show the VM pills and empty timeline with stats cards.

Step 3: Commit

git add cmd/web-ui/static/app.js
git commit -m "feat: agents page layout with VM strip, timeline, and stats panel"

Task 6: Build the activity timeline and stats

Files:

  • Modify: cmd/web-ui/static/app.js

Step 1: Add loadAgentEvents, handleAgentsWS, and rendering functions

Add these functions after renderAgents:

  async function loadAgentEvents() {
    try {
      const data = await api('/v1/events?framework=openclaw&limit=200');
      const events = (data.events || []).reverse();
      agentsState.events = events;
      agentsState.stats = { messages: 0, tools: 0, errors: 0, toolCounts: {} };
      events.forEach(e => updateAgentStats(e));
      renderAgentTimeline();
      renderAgentStats();
    } catch (e) {
      console.error('Failed to load agent events:', e);
    }
  }

  function handleAgentsWS(msg) {
    if (msg.type !== 'message') return;
    try {
      const raw = typeof msg.data === 'string' ? JSON.parse(msg.data) : msg.data;
      // Check if this is an openclaw agent event (framework = openclaw, but not openclaw.snapshot)
      const eventType = raw.event?.type || raw.payload?.event?.type || raw.type;
      const framework = raw.event?.source?.framework || raw.payload?.event?.source?.framework || raw.source_framework;
      if (framework !== 'openclaw' || eventType === 'openclaw.snapshot') return;

      agentsState.events.push(raw);
      // Keep max 500 events in memory
      if (agentsState.events.length > 500) agentsState.events.shift();

      updateAgentStats(raw);
      renderAgentTimeline();
      renderAgentStats();
    } catch (e) { /* ignore parse errors */ }
  }

  function updateAgentStats(evt) {
    const payload = evt.payload || {};
    const eventPayload = payload.payload || payload;
    const eventType = evt.type || payload.event?.type || evt.event?.type;
    const attrs = evt.attributes || payload.attributes || {};

    if (eventType === 'run.start' || eventType === 'run.end') {
      agentsState.stats.messages++;
    } else if (eventType === 'span.end') {
      agentsState.stats.tools++;
      const toolName = attrs.name || 'unknown';
      agentsState.stats.toolCounts[toolName] = (agentsState.stats.toolCounts[toolName] || 0) + 1;
    } else if (eventType === 'error') {
      agentsState.stats.errors++;
    }
  }

  function getEventIcon(eventType) {
    switch (eventType) {
      case 'run.start': return '<div class="event-icon message-in">&#x2193;</div>';
      case 'run.end': return '<div class="event-icon message-out">&#x2191;</div>';
      case 'span.start':
      case 'span.end': return '<div class="event-icon tool">&#x2699;</div>';
      case 'error': return '<div class="event-icon error">!</div>';
      case 'session.start':
      case 'session.end': return '<div class="event-icon session">&#x25CB;</div>';
      default: return '<div class="event-icon internal">&#xB7;</div>';
    }
  }

  function getEventLabel(eventType) {
    const labels = {
      'session.start': 'Session Started',
      'session.end': 'Session Ended',
      'run.start': 'Message Received',
      'run.end': 'Response Sent',
      'span.start': 'Tool Started',
      'span.end': 'Tool Completed',
      'error': 'Error',
      'metric.snapshot': 'Metric',
    };
    return labels[eventType] || eventType;
  }

  function getVMName(evt) {
    const payload = evt.payload || {};
    return evt.client_id || payload.event?.source?.client_id || payload.source?.client_id || evt.event?.source?.client_id || 'unknown';
  }

  function getEventBody(evt) {
    const eventType = evt.type || evt.payload?.event?.type || evt.event?.type;
    const payload = evt.payload?.payload || evt.payload || {};
    const attrs = evt.attributes || evt.payload?.attributes || {};

    if (eventType === 'span.end' || eventType === 'span.start') {
      const name = attrs.name || 'unknown tool';
      const dur = payload.duration_ms ? ` <span class="timeline-duration">${formatDuration(payload.duration_ms)}</span>` : '';
      return `<div class="timeline-event-body tool-name">${name}${dur}</div>`;
    }
    if (eventType === 'run.start') {
      const preview = payload.message_preview || '';
      return preview ? `<div class="timeline-event-body message-preview">"${preview.substring(0, 120)}${preview.length > 120 ? '...' : ''}"</div>` : '';
    }
    if (eventType === 'run.end') {
      const status = payload.status || 'unknown';
      const error = payload.error;
      if (error) return `<div class="timeline-event-body error-message">${error}</div>`;
      return `<div class="timeline-event-body">${statusIcon(status)}</div>`;
    }
    if (eventType === 'error') {
      const errPayload = payload.error || {};
      return `<div class="timeline-event-body error-message">${errPayload.type || 'error'}: ${errPayload.message || 'unknown'}</div>`;
    }
    return '';
  }

  function renderAgentTimeline() {
    const timeline = document.getElementById('agents-timeline');
    if (!timeline) return;

    // Show most recent events first (last N)
    const recent = agentsState.events.slice(-100).reverse();

    if (recent.length === 0) {
      timeline.innerHTML = '<p class="empty-state">Waiting for agent activity...</p>';
      return;
    }

    timeline.innerHTML = recent.map((evt, i) => {
      const eventType = evt.type || evt.payload?.event?.type || evt.event?.type;
      const ts = evt.ts || evt.payload?.event?.ts || evt.event?.ts;
      const vmName = getVMName(evt);
      const timeStr = ts ? new Date(ts).toLocaleTimeString() : '';
      const hasPayload = evt.payload && Object.keys(evt.payload).length > 0;

      return `
        <div class="timeline-event" data-index="${i}">
          <div class="timeline-event-header">
            ${getEventIcon(eventType)}
            <span class="timeline-vm-tag ${vmName}">${vmName}</span>
            <span class="timeline-event-type">${getEventLabel(eventType)}</span>
            <span class="timeline-event-time">${timeStr}</span>
          </div>
          ${getEventBody(evt)}
          ${hasPayload ? '<div class="timeline-expand-hint" onclick="this.parentElement.classList.toggle(\'expanded\')">details</div>' : ''}
          ${hasPayload ? `<div class="timeline-detail">${JSON.stringify(evt.payload, null, 2)}</div>` : ''}
        </div>
      `;
    }).join('');
  }

  function renderAgentStats() {
    const s = agentsState.stats;
    const el = (id) => document.getElementById(id);
    if (el('stat-messages')) el('stat-messages').textContent = s.messages;
    if (el('stat-tools')) el('stat-tools').textContent = s.tools;
    if (el('stat-errors')) el('stat-errors').textContent = s.errors;

    const topTools = Object.entries(s.toolCounts)
      .sort((a, b) => b[1] - a[1])
      .slice(0, 8);

    const list = el('stat-top-tools');
    if (list) {
      if (topTools.length === 0) {
        list.innerHTML = '<li style="color:var(--text-dim);font-size:0.8rem">No data yet</li>';
      } else {
        list.innerHTML = topTools.map(([name, count]) => `
          <li>
            <span class="stat-list-name">${name}</span>
            <span class="stat-list-count">${count}</span>
          </li>
        `).join('');
      }
    }
  }

Step 2: Verify in browser

Open http://localhost:8082/agents — should show the full layout. No events yet (no hook deployed) but the structure should be complete with the empty timeline and zero stats.

Step 3: Commit

git add cmd/web-ui/static/app.js
git commit -m "feat: agents activity timeline and stats panel with real-time WebSocket"

Task 7: Create the OpenClaw hook — HOOK.md

Files:

  • Create: hooks/agentmon/HOOK.md

Step 1: Create the hook metadata file

---
name: agentmon
description: "Emit agent telemetry events to the agentmon monitoring system"
metadata:
  openclaw:
    emoji: "\U0001F4CA"
    events:
      - "command:new"
      - "command:stop"
      - "command:reset"
      - "message:received"
      - "message:sent"
      - "tool_result_persist"
      - "session:compact:before"
      - "session:compact:after"
    export: "default"
    requires:
      env:
        - "AGENTMON_INGEST_URL"
---

# Agentmon Telemetry Hook

Captures OpenClaw agent activity and emits it as structured events to the
agentmon ingest gateway for monitoring and visualization.

## Events Captured

| Event | Maps To | Description |
|-------|---------|-------------|
| command:new | session.start | Agent session begins |
| command:stop/reset | session.end | Session ends |
| message:received | run.start | Inbound message starts a turn |
| message:sent | run.end | Agent response completes |
| tool_result_persist | span.end | Tool call completed |
| session:compact:* | span (internal) | Context management |

## Configuration

Set `AGENTMON_INGEST_URL` to the agentmon ingest gateway URL:

export AGENTMON_INGEST_URL=http://192.168.122.1:8080


Optionally set `AGENTMON_VM_NAME` to override the VM identifier (defaults
to system hostname).

Step 2: Commit

git add hooks/agentmon/HOOK.md
git commit -m "feat: add agentmon hook metadata for OpenClaw"

Task 8: Create the OpenClaw hook — handler.ts

Files:

  • Create: hooks/agentmon/handler.ts

Step 1: Write the complete handler

import { randomUUID } from 'crypto';
import { hostname } from 'os';

// --- Configuration ---

const INGEST_URL = process.env.AGENTMON_INGEST_URL || 'http://192.168.122.1:8080';
const VM_NAME = process.env.AGENTMON_VM_NAME || hostname();
const BATCH_SIZE = 10;
const FLUSH_MS = 2000;
const FETCH_TIMEOUT_MS = 500;

// --- State ---

let buffer: any[] = [];
let flushTimer: ReturnType<typeof setTimeout> | null = null;
const activeRuns = new Map<string, string>(); // sessionKey -> runId
const activeCompactions = new Map<string, string>(); // sessionKey -> spanId

// --- Envelope builder ---

function envelope(
  type: string,
  sessionKey: string,
  opts: {
    runId?: string;
    spanId?: string;
    traceId?: string;
    parentSpanId?: string;
    attributes?: Record<string, any>;
    payload?: Record<string, any>;
  } = {}
) {
  return {
    schema: { name: 'agentmon.event', version: 1 },
    event: {
      id: randomUUID(),
      type,
      ts: new Date().toISOString(),
      source: {
        framework: 'openclaw',
        client_id: VM_NAME,
        host: VM_NAME,
      },
    },
    correlation: {
      session_id: sessionKey || undefined,
      run_id: opts.runId || undefined,
      trace_id: opts.traceId || undefined,
      span_id: opts.spanId || undefined,
      parent_span_id: opts.parentSpanId || undefined,
    },
    attributes: opts.attributes || undefined,
    payload: opts.payload || undefined,
  };
}

// --- Buffered emitter ---

function enqueue(evt: any) {
  buffer.push(evt);
  if (buffer.length >= BATCH_SIZE) {
    flush();
  } else if (!flushTimer) {
    flushTimer = setTimeout(flush, FLUSH_MS);
  }
}

async function flush() {
  if (flushTimer) {
    clearTimeout(flushTimer);
    flushTimer = null;
  }
  if (buffer.length === 0) return;
  const batch = buffer.splice(0);
  try {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
    await fetch(`${INGEST_URL}/v1/events`, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(batch),
      signal: controller.signal,
    });
    clearTimeout(timeout);
  } catch {
    // Fire-and-forget: log but never block the agent loop
    console.debug(`[agentmon] flush failed for ${batch.length} events`);
  }
}

// --- Event handler ---

const handler = async (event: any) => {
  const sk = event.sessionKey || 'unknown';

  try {
    // --- Command events ---
    if (event.type === 'command') {
      if (event.action === 'new') {
        enqueue(envelope('session.start', sk));
      } else if (event.action === 'stop') {
        enqueue(envelope('session.end', sk));
        activeRuns.delete(sk);
      } else if (event.action === 'reset') {
        enqueue(envelope('session.end', sk));
        activeRuns.delete(sk);
        enqueue(envelope('session.start', sk));
      }
      return;
    }

    // --- Message events ---
    if (event.type === 'message') {
      if (event.action === 'received') {
        const runId = randomUUID();
        activeRuns.set(sk, runId);
        enqueue(envelope('run.start', sk, {
          runId,
          attributes: {
            channel: event.context?.channelId,
            from: event.context?.from,
          },
          payload: {
            message_preview: (event.context?.content || '').substring(0, 200),
          },
        }));
      } else if (event.action === 'sent') {
        const runId = activeRuns.get(sk);
        enqueue(envelope('run.end', sk, {
          runId,
          attributes: {
            channel: event.context?.channelId,
            to: event.context?.to,
          },
          payload: {
            status: event.context?.success !== false ? 'success' : 'error',
            error: event.context?.error || undefined,
          },
        }));
        // Don't delete runId yet - tools may still fire between sent events
      }
      return;
    }

    // --- Tool result ---
    if (event.type === 'tool_result_persist') {
      const runId = activeRuns.get(sk);
      const spanId = randomUUID();
      const toolName = event.context?.toolName || event.context?.name || 'unknown_tool';
      enqueue(envelope('span.end', sk, {
        runId,
        spanId,
        attributes: {
          span_kind: 'tool',
          name: toolName,
        },
        payload: {
          status: 'success',
          result_preview: JSON.stringify(event.context?.result || '').substring(0, 500),
        },
      }));
      return;
    }

    // --- Session compaction ---
    if (event.type === 'session') {
      const runId = activeRuns.get(sk);
      if (event.action === 'compact:before') {
        const spanId = randomUUID();
        activeCompactions.set(sk, spanId);
        enqueue(envelope('span.start', sk, {
          runId,
          spanId,
          attributes: { span_kind: 'internal', name: 'context_compaction' },
        }));
      } else if (event.action === 'compact:after') {
        const spanId = activeCompactions.get(sk) || randomUUID();
        activeCompactions.delete(sk);
        enqueue(envelope('span.end', sk, {
          runId,
          spanId,
          attributes: { span_kind: 'internal', name: 'context_compaction' },
          payload: { status: 'success' },
        }));
      }
      return;
    }
  } catch {
    console.debug('[agentmon] handler error');
  }
};

export default handler;

Step 2: Commit

git add hooks/agentmon/handler.ts
git commit -m "feat: agentmon hook handler with buffered event emission"

Task 9: Rebuild and verify end-to-end

Step 1: Rebuild containers

docker compose build --no-cache
docker compose up -d

Step 2: Verify the framework filter works

curl -s 'http://localhost:8081/v1/events?framework=openclaw&limit=5'

Expected: {"events":null} or empty array (no errors)

Step 3: Send a synthetic test event to verify the pipeline

curl -s -X POST http://localhost:8080/v1/events \
  -H 'Content-Type: application/json' \
  -d '[{
    "schema": {"name": "agentmon.event", "version": 1},
    "event": {
      "id": "test-agent-001",
      "type": "run.start",
      "ts": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
      "source": {"framework": "openclaw", "client_id": "zap", "host": "zap"}
    },
    "correlation": {"session_id": "test-session-1", "run_id": "test-run-1"},
    "payload": {"message_preview": "Hello, check the logs please"}
  }]'

Expected: {"accepted":1,"rejected":0}

Step 4: Verify it appears in the agents page

curl -s 'http://localhost:8081/v1/events?framework=openclaw&limit=5'

Expected: The test event appears in the response.

Open http://localhost:8082/agents in a browser — the test event should appear in the timeline.

Step 5: Commit

git add -A
git commit -m "feat: complete agent monitoring - hook, UI, and backend filter"

Task 10: Deploy hook to VMs

Step 1: Copy the hook to each VM

for vm in zap orb sun; do
  IP=$(virsh domifaddr $vm | grep -oP '192\.168\.122\.\d+')
  scp -r hooks/agentmon/ openclaw@${IP}:~/.openclaw/hooks/agentmon/
done

Step 2: Set the environment variable on each VM

HOST_IP=$(ip addr show virbr0 | grep -oP '192\.168\.122\.\d+')
for vm in zap orb sun; do
  IP=$(virsh domifaddr $vm | grep -oP '192\.168\.122\.\d+')
  ssh openclaw@${IP} "echo 'AGENTMON_INGEST_URL=http://${HOST_IP}:8080' >> ~/.openclaw/.env"
  ssh openclaw@${IP} "echo 'AGENTMON_VM_NAME=${vm}' >> ~/.openclaw/.env"
done

Step 3: Verify hooks are discovered

for vm in zap orb sun; do
  IP=$(virsh domifaddr $vm | grep -oP '192\.168\.122\.\d+')
  ssh openclaw@${IP} "openclaw hooks list --json" | jq '.[] | select(.name == "agentmon")'
done

Expected: The agentmon hook appears in the list for each VM.

Step 4: Send a test message to one agent and confirm the event appears

Send a message to any OpenClaw instance through its configured channel. Watch the agents page at http://localhost:8082/agents for the event to appear in the timeline.


Summary

Task Component What it does
1 Backend Add framework/event_type filters to events query
2 Backend Add /agents SPA route to Go server
3 Frontend Add nav link and route stub
4 Frontend Agents page CSS (timeline, stats, VM pills)
5 Frontend VM status strip and page layout
6 Frontend Activity timeline rendering and stats panel
7 Hook HOOK.md metadata file
8 Hook handler.ts with buffered event emission
9 Infra Rebuild, test pipeline end-to-end
10 Infra Deploy hook to VMs