Only the zap VM remains in the fleet. Remove orb/sun from the README architecture/config docs, the getVMClassName allowlist, and their .timeline-vm-tag color styles. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
agentmon
Telemetry and monitoring system for AI agent activity across OpenClaw instances running on KVM virtual machines. Captures sessions, runs, tool calls, errors, and VM health metrics — viewable in a real-time web dashboard.
Architecture
┌──────────────────────────┐
│ OpenClaw VMs │
│ (zap) │
│ │
│ hooks/agentmon/ │
│ → handler.ts │
└──────────┬───────────────┘
│ HTTP POST
▼
┌─────────────┐ publish ┌──────────────┐
│ openclaw- │────────────▶│ NATS │
│ monitor │ │ :4222 │
│ (VM polls) │ └──────┬───────┘
└─────────────┘ │ subscribe
▼
┌──────────────────┐
│ event-processor │
└────────┬─────────┘
│ INSERT
▼
┌─────────────┐ query ┌──────────────┐ proxy ┌──────────────┐
│ web-ui │◀────────▶│ query-api │◀──────────│ browser │
│ :8082 │ │ :8081 │ └──────────────┘
└─────────────┘ └──────────────┘
▲
│
┌────────┴───────┐
│ PostgreSQL │
│ :5432 │
└────────────────┘
Data flow: OpenClaw hooks emit telemetry events over HTTP to the ingest gateway, which publishes them to NATS. The event processor subscribes and persists events to PostgreSQL. The query API serves aggregated data (sessions, runs, spans) to the web UI. A separate openclaw-monitor polls VM health metrics (CPU, memory, disk, service status) via libvirt and SSH.
Real-time updates flow through NATS → query-api → WebSocket → browser.
Services
| Service | Port | Description |
|---|---|---|
| ingest-gateway | 8080 | HTTP + WebSocket event ingestion, publishes to NATS |
| query-api | 8081 | REST API for sessions, runs, spans; WebSocket live feed |
| web-ui | 8082 | SPA frontend with reverse proxy to query-api |
| event-processor | — | NATS subscriber, persists events to Postgres |
| openclaw-monitor | — | Polls VM instances via libvirt/SSH, emits snapshots |
| postgres | 5432 | Event storage |
| nats | 4222 | Message queue (JetStream) |
Quick Start
cp .env.example .env
make up
This starts Postgres, NATS, and all application services via Docker Compose. Open http://localhost:8082.
For local development, start infrastructure only and run services manually:
make up # postgres + nats
make run-ingest # terminal 1
make run-query # terminal 2
make run-ui # terminal 3
make run-processor # terminal 4
make run-openclaw-monitor # terminal 5
Or use the convenience scripts:
./start-all.sh # start everything
./stop-all.sh # stop everything
Configuration
Environment variables (see .env.example):
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
— | Postgres connection string (required) |
NATS_URL |
nats://nats:4222 |
NATS server address |
NATS_TOPIC |
agentmon.events.v1 |
NATS topic for events |
AGENTMON_ADDR |
:8080 |
Ingest gateway listen address |
AGENTMON_QUERY_ADDR |
:8081 |
Query API listen address |
AGENTMON_UI_ADDR |
:8082 |
Web UI listen address |
AGENTMON_QUERY_BASE |
http://query-api |
Query API URL (for web-ui proxy) |
OPENCLAW_REGISTRY |
~/.claude/state/openclaw-instances.json |
VM instance registry |
POLL_INTERVAL |
30s |
VM polling interval |
API
Ingest Gateway (:8080)
GET /healthz Health check
POST /v1/events Batch event ingestion (JSON array)
GET /v1/ws WebSocket event stream
Query API (:8081)
GET /healthz Health check
GET /v1/events List events (?event_type=&framework=&limit=)
GET /v1/sessions List sessions (?from=&to=&framework=&host=&cursor=&limit=)
GET /v1/sessions/{id} Session detail with runs
GET /v1/runs/{id} Run detail with spans
GET /v1/stats/summary Today's aggregate stats (active sessions, runs, tools, errors by framework)
GET /v1/stats/timeseries Bucketed event counts (?window=1h|6h|24h|7d)
GET /v1/ws WebSocket live event broadcast
Event Schema
Events follow the agentmon.event envelope format:
{
"schema": { "name": "agentmon.event", "version": 1 },
"event": {
"id": "uuid",
"type": "session.start",
"ts": "2026-03-13T12:00:00Z",
"source": {
"framework": "openclaw",
"client_id": "zap",
"host": "zap"
}
},
"correlation": {
"session_id": "uuid",
"run_id": "uuid",
"span_id": "uuid"
},
"attributes": {},
"payload": {}
}
Event types: session.start, session.end, run.start, run.end, span.start, span.end, error, metric.snapshot, openclaw.snapshot
Database Schema
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
type TEXT NOT NULL,
session_id TEXT,
run_id TEXT,
trace_id TEXT,
span_id TEXT,
parent_span_id TEXT,
source_framework TEXT,
client_id TEXT,
payload JSONB NOT NULL
);
OpenClaw Hook
The hooks/agentmon/ directory contains a TypeScript hook that captures agent activity from OpenClaw instances and emits it to the ingest gateway. It maps OpenClaw events to agentmon's session/run/span model:
| OpenClaw Event | agentmon Event | Description |
|---|---|---|
command:new |
session.start |
New conversation started |
command:stop |
session.end |
Conversation ended |
command:reset |
session.end + session.start |
Conversation reset |
message:received |
run.start |
User message received |
message:sent |
run.end |
Agent response sent |
tool_result_persist |
span.end |
Tool call completed |
session:compact:before |
span.start |
Context compaction started |
session:compact:after |
span.end |
Context compaction finished |
Deploying the hook
The hook is deployed to each VM at ~/.openclaw/hooks/agentmon/. Two environment variables are required in ~/.openclaw/.env:
AGENTMON_INGEST_URL=http://192.168.122.1:8080
AGENTMON_VM_NAME=zap
Deployment is automated via Ansible — see the swarm ansible playbook playbooks/customize.yml.
Codex Hook
The hooks/codex/ directory contains a TypeScript handler for Codex CLI telemetry. Current Codex support is session/run oriented:
sessionStartandsessionEndmap tosession.start,run.start,run.end, andsession.endnotifymaps turn-complete notifications intorun.end- prompt-submit hooks map user prompts into the next
run.start - usage payloads emit both
run.end.payload.usageand ametric.snapshotevent
The Codex handler persists lightweight session state across hook subprocesses. If Codex only delivers later-stage hooks for a session, the handler can recover by emitting synthetic session.start/run.start events before the first run.end or usage snapshot. Full-fidelity lifecycle tracking still depends on configuring Codex session lifecycle hooks, not just notify.
Sample Codex hook configuration lives in hooks/codex/hooks.json. On the local Codex CLI version we checked (0.116.0), notify is confirmed. Online reports suggest prompt-submit hooks may appear as userpromptsubmit or userPromptSubmit, so the sample config includes those aliases.
The current Codex integration does not assume tool or subagent span hooks exist. If a newer Codex CLI exposes official tool/span hooks, they can be added separately without changing the run/session flow above.
Gemini Hook
The hooks/gemini/ directory contains a TypeScript handler for Gemini CLI telemetry. The current integration maps Gemini hook events into agentmon's session/run/span model:
onStartmaps tosession.startand an initialrun.startonStopmaps torun.endandsession.endonToolCallmaps tospan.startonToolResultmaps tospan.end
Sample Gemini hook configuration lives in hooks/gemini/hooks.json. Install the handler from that directory so the agentmon-gemini-handler binary is available, then point Gemini CLI at the sample hook config and set AGENTMON_INGEST_URL to your ingest gateway.
Hermes Hook
The hooks/hermes/ directory contains a TypeScript handler for Hermes Agent shell-hook telemetry. The current integration maps Hermes hook events into agentmon's session/run/span model:
on_session_startmaps tosession.startpre_llm_callmaps torun.startpost_llm_callmaps torun.endpre_tool_callmaps tospan.startpost_tool_callmaps tospan.endpost_api_requestmaps usage payloads tometric.snapshoton_session_finalizemaps tosession.end
Sample Hermes hook configuration lives in hooks/hermes/hooks.yaml. Install the handler from that directory so the agentmon-hermes-handler binary is available, then merge the sample hooks: block into ~/.hermes/config.yaml and set AGENTMON_INGEST_URL to your ingest gateway.
Go SDK
Emit events from Go applications:
emitter, err := sdk.NewEmitter(sdk.Config{
ServerURL: "http://localhost:8080",
Framework: "my-agent",
ClientID: "client-001",
Host: "localhost",
})
defer emitter.Close(ctx)
emitter.Emit(ctx, sdk.NewSessionStart(sessionID, sdk.WithSource(emitter)))
emitter.Emit(ctx, sdk.NewRunStart(sessionID, runID))
emitter.Emit(ctx, sdk.NewRunEnd(sessionID, runID, sdk.WithPayload(map[string]any{
"status": "success",
"duration_ms": 1234,
})))
Web UI
The web UI has five views:
- Dashboard (
/) — real-time overview with summary stats (active sessions, runs, tools, errors), uPlot time-series charts with selectable windows (1h/6h/24h/7d), framework breakdown bars, live activity feed, and top tools ranking. All sections update live via WebSocket. - Sessions (
/sessions) — browse all agent sessions with date range, framework, and host filters - Session Detail (
/sessions/{id}) — view runs within a session, drill into individual runs and spans - Agents (
/agents) — live timeline of OpenClaw agent events with VM status pills and statistics - OpenClaw (
/openclaw) — real-time grid of VM health cards (state, CPU, memory, disk, gateway status, issues)
Development
make test # run tests
make tidy # go mod tidy
make logs # docker compose logs
make down # stop everything
Project Structure
cmd/
├── ingest-gateway/ HTTP event ingestion service
├── query-api/ REST API for querying events
├── web-ui/ SPA frontend + static assets
│ └── static/ HTML, CSS, JS
├── event-processor/ NATS → Postgres persistence
└── openclaw-monitor/ VM health polling
internal/
├── event/ Envelope types and validation
├── httpx/ HTTP response helpers
├── queue/nats/ NATS publisher and subscriber
├── store/postgres/ Database queries (sessions, runs, spans, stats)
├── sdk/ Go client library for emitting events
└── monitor/openclaw/ VM metrics collection (libvirt, SSH)
hooks/
└── agentmon/ OpenClaw hook (TypeScript)
deploy/
└── k8s/ Database schema (postgres.sql)