- Add event_type and framework filters to events query endpoint - Add /agents SPA route to web-ui server - Add Agents nav link and route in frontend - Add agents page CSS (timeline, VM pills, stats panel) - Build VM status strip, activity timeline, and real-time stats - Add agentmon hook for OpenClaw (HOOK.md + handler.ts) - Add docker-compose, Dockerfile, and supporting infra files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.8 KiB
Agent Activity Monitoring via OpenClaw Hooks
Date: 2026-03-13 Status: Approved
Goal
Monitor all OpenClaw agent and subagent activity across the three VMs (zap, orb, sun) — tool calls, conversation flow, token usage, session lifecycle, and errors — and display it in a real-time dashboard in the agentmon web UI.
Architecture
┌─────────────────────────────────────────────────────┐
│ VM (zap / orb / sun) │
│ │
│ OpenClaw Gateway │
│ ├── agent loop (messages, tools, sessions) │
│ └── agentmon-hook (TypeScript) │
│ │ listens to: message:received/sent, │
│ │ tool_result_persist, command:*, session:* │
│ │ │
│ └──── POST /v1/events ─────────────────┐ │
│ │ │
└──────────────────────────────────────────────────│───┘
│
▼
┌──────────────────────────────────────────────────────┐
│ Host │
│ agentmon ingest-gateway (:8080) │
│ → NATS → event-processor → Postgres │
│ → query-api → web-ui (new "Agents" page) │
└──────────────────────────────────────────────────────┘
One hook deployed to all three VMs captures everything and ships it to the existing agentmon pipeline. No changes needed to ingest, NATS, or storage.
Event Mapping
| OpenClaw Event | agentmon Event | What it captures |
|---|---|---|
command:new |
session.start |
Agent session begins |
command:stop / command:reset |
session.end |
Session ends |
message:received |
run.start |
Inbound message starts a turn |
message:sent |
run.end |
Agent response completes the turn |
tool_result_persist |
span.start + span.end |
Tool call with result |
session:compact:before/after |
span (kind: internal) |
Context window management |
Correlation
session_id= OpenClawsessionKeyrun_id= generated UUID per inbound message, carried through tomessage:sentframework="openclaw"client_id= VM name (zap / orb / sun)
Token usage and cost attached via WithLLMUsage attributes on run.end events if the message:sent payload includes usage metadata.
Hook Design
Directory Structure
~/.openclaw/hooks/agentmon/
├── HOOK.md # metadata: events, requirements
├── handler.ts # event capture + HTTP emit
└── package.json # minimal deps
Deployment
SCP the directory to each VM. The hook auto-discovers via OpenClaw's hook loading — no config changes needed beyond having hooks enabled.
The hook POSTs to the host machine's ingest gateway. VMs are on the libvirt bridge (192.168.122.x), so the gateway URL is configured as an env var or uses the host's bridge IP.
Resilience
- Fire-and-forget with a small in-memory buffer (batch up to 10 events or 2s, whichever comes first)
- 500ms timeout on fetch calls — if agentmon is slow, skip and move on
- Events that fail to send are logged locally but not retried
- The hook must never slow down the OpenClaw agent loop
Error Handling
In the hook
- All HTTP POSTs wrapped in try/catch — never throw, never block
- Malformed event payloads (missing sessionKey, etc.) silently dropped with debug log
In the pipeline
- Ingest gateway deduplicates by event ID — safe if a hook sends twice
- Events with
framework: "openclaw"but missing correlation IDs get stored but won't appear in the agents timeline
Edge cases
- VM reboots mid-session: no
session.endemitted — UI shows session as "ongoing" until a newcommand:newarrives - OpenClaw compacts context before hook fires:
session:compact:afterstill fires, captured as internal span - Network partition between VM and host: events silently lost, no backfill — acceptable for monitoring
UI — Agents Page
Layout
A live activity dashboard at /agents with three sections:
- Top strip: Three VM pill indicators (zap / orb / sun) showing online/offline with a subtle pulse when active
- Activity timeline: Vertical feed of events across all agents — messages, tool calls, errors — with VM name color-coded, monospace timestamps, and collapsible tool call detail rows. Real-time via existing WebSocket.
- Side stats panel: Aggregate metrics — messages/hour, tool calls today, error rate, most-used tools
Aesthetic
Matches the refined dark theme already in place:
- Timeline cards with glassmorphism
- Color-coded VM badges
- Monospace timestamps (Fira Code)
- Syne display font for headings
- Fade-in animations on new events
- Status pill badges consistent with existing design system
Implementation Plan
Phase 1: OpenClaw Hook
- Create hook directory structure (
HOOK.md,handler.ts) - Implement event-to-agentmon mapper — translate each OpenClaw event type to the agentmon envelope schema
- HTTP emitter with buffering (batch up to 10 events or 2s, whichever first) and 500ms timeout
- Unit test the mapper logic locally
Phase 2: Agentmon UI — Agents Page
- Add
/agentsroute to the SPA router inapp.js - Add "Agents" nav link in header
- Build the top strip — three VM status pills pulling from existing
openclaw.snapshotdata - Build the activity timeline — subscribe to WebSocket, filter for
framework: "openclaw"events, render as vertical feed with collapsible tool call details - Build the side stats panel — aggregate counts from the query API (messages/hour, tool calls, error rate, top tools)
- Style with the refined dark aesthetic — glassmorphism timeline cards, color-coded VM badges, monospace timestamps, fade-in animations
Phase 3: Deploy
- SCP hook to all three VMs, verify auto-discovery
- Send a test message to one agent, confirm events flow end-to-end
Not in Scope (Future)
- Token/cost dashboard (needs usage data verification in
message:sentpayloads) - Historical analytics and aggregation queries
- Hook auto-deployment via openclaw-monitor
- Alerting on error rate spikes