11 KiB
agentmon
Telemetry and monitoring system for AI agent activity across OpenClaw instances running on KVM virtual machines. Captures sessions, runs, tool calls, errors, and VM health metrics — viewable in a real-time web dashboard.
Architecture
┌──────────────────────────┐
│ OpenClaw VMs │
│ (zap, orb, sun) │
│ │
│ hooks/agentmon/ │
│ → handler.ts │
└──────────┬───────────────┘
│ HTTP POST
▼
┌─────────────┐ publish ┌──────────────┐
│ openclaw- │────────────▶│ NATS │
│ monitor │ │ :4222 │
│ (VM polls) │ └──────┬───────┘
└─────────────┘ │ subscribe
▼
┌──────────────────┐
│ event-processor │
└────────┬─────────┘
│ INSERT
▼
┌─────────────┐ query ┌──────────────┐ proxy ┌──────────────┐
│ web-ui │◀────────▶│ query-api │◀──────────│ browser │
│ :8082 │ │ :8081 │ └──────────────┘
└─────────────┘ └──────────────┘
▲
│
┌────────┴───────┐
│ PostgreSQL │
│ :5432 │
└────────────────┘
Data flow: OpenClaw hooks emit telemetry events over HTTP to the ingest gateway, which publishes them to NATS. The event processor subscribes and persists events to PostgreSQL. The query API serves aggregated data (sessions, runs, spans) to the web UI. A separate openclaw-monitor polls VM health metrics (CPU, memory, disk, service status) via libvirt and SSH.
Real-time updates flow through NATS → query-api → WebSocket → browser.
Services
| Service | Port | Description |
|---|---|---|
| ingest-gateway | 8080 | HTTP + WebSocket event ingestion, publishes to NATS |
| query-api | 8081 | REST API for sessions, runs, spans; WebSocket live feed |
| web-ui | 8082 | SPA frontend with reverse proxy to query-api |
| event-processor | — | NATS subscriber, persists events to Postgres |
| openclaw-monitor | — | Polls VM instances via libvirt/SSH, emits snapshots |
| postgres | 5432 | Event storage |
| nats | 4222 | Message queue (JetStream) |
Quick Start
cp .env.example .env
make up
This starts Postgres, NATS, and all application services via Docker Compose. Open http://localhost:8082.
For local development, start infrastructure only and run services manually:
make up # postgres + nats
make run-ingest # terminal 1
make run-query # terminal 2
make run-ui # terminal 3
make run-processor # terminal 4
make run-openclaw-monitor # terminal 5
Or use the convenience scripts:
./start-all.sh # start everything
./stop-all.sh # stop everything
Configuration
Environment variables (see .env.example):
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
— | Postgres connection string (required) |
NATS_URL |
nats://nats:4222 |
NATS server address |
NATS_TOPIC |
agentmon.events.v1 |
NATS topic for events |
AGENTMON_ADDR |
:8080 |
Ingest gateway listen address |
AGENTMON_QUERY_ADDR |
:8081 |
Query API listen address |
AGENTMON_UI_ADDR |
:8082 |
Web UI listen address |
AGENTMON_QUERY_BASE |
http://query-api |
Query API URL (for web-ui proxy) |
OPENCLAW_REGISTRY |
~/.claude/state/openclaw-instances.json |
VM instance registry |
POLL_INTERVAL |
30s |
VM polling interval |
API
Ingest Gateway (:8080)
GET /healthz Health check
POST /v1/events Batch event ingestion (JSON array)
GET /v1/ws WebSocket event stream
Query API (:8081)
GET /healthz Health check
GET /v1/events List events (?event_type=&framework=&limit=)
GET /v1/sessions List sessions (?from=&to=&framework=&host=&cursor=&limit=)
GET /v1/sessions/{id} Session detail with runs
GET /v1/runs/{id} Run detail with spans
GET /v1/stats/summary Today's aggregate stats (active sessions, runs, tools, errors by framework)
GET /v1/stats/timeseries Bucketed event counts (?window=1h|6h|24h|7d)
GET /v1/ws WebSocket live event broadcast
Event Schema
Events follow the agentmon.event envelope format:
{
"schema": { "name": "agentmon.event", "version": 1 },
"event": {
"id": "uuid",
"type": "session.start",
"ts": "2026-03-13T12:00:00Z",
"source": {
"framework": "openclaw",
"client_id": "zap",
"host": "zap"
}
},
"correlation": {
"session_id": "uuid",
"run_id": "uuid",
"span_id": "uuid"
},
"attributes": {},
"payload": {}
}
Event types: session.start, session.end, run.start, run.end, span.start, span.end, error, metric.snapshot, openclaw.snapshot
Database Schema
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
ts TIMESTAMPTZ NOT NULL,
type TEXT NOT NULL,
session_id TEXT,
run_id TEXT,
trace_id TEXT,
span_id TEXT,
parent_span_id TEXT,
source_framework TEXT,
client_id TEXT,
payload JSONB NOT NULL
);
OpenClaw Hook
The hooks/agentmon/ directory contains a TypeScript hook that captures agent activity from OpenClaw instances and emits it to the ingest gateway. It maps OpenClaw events to agentmon's session/run/span model:
| OpenClaw Event | agentmon Event | Description |
|---|---|---|
command:new |
session.start |
New conversation started |
command:stop |
session.end |
Conversation ended |
command:reset |
session.end + session.start |
Conversation reset |
message:received |
run.start |
User message received |
message:sent |
run.end |
Agent response sent |
tool_result_persist |
span.end |
Tool call completed |
session:compact:before |
span.start |
Context compaction started |
session:compact:after |
span.end |
Context compaction finished |
Deploying the hook
The hook is deployed to each VM at ~/.openclaw/hooks/agentmon/. Two environment variables are required in ~/.openclaw/.env:
AGENTMON_INGEST_URL=http://192.168.122.1:8080
AGENTMON_VM_NAME=zap # or orb, sun
Deployment is automated via Ansible — see the swarm ansible playbook playbooks/customize.yml.
Codex Hook
The hooks/codex/ directory contains a TypeScript handler for Codex CLI telemetry. Current Codex support is session/run oriented:
sessionStartandsessionEndmap tosession.start,run.start,run.end, andsession.endnotifymaps turn-complete notifications intorun.end- prompt-submit hooks map user prompts into the next
run.start - usage payloads emit both
run.end.payload.usageand ametric.snapshotevent
Sample Codex hook configuration lives in hooks/codex/hooks.json. On the local Codex CLI version we checked (0.116.0), notify is confirmed. Online reports suggest prompt-submit hooks may appear as userpromptsubmit or userPromptSubmit, so the sample config includes those aliases.
The current Codex integration does not assume tool or subagent span hooks exist. If a newer Codex CLI exposes official tool/span hooks, they can be added separately without changing the run/session flow above.
Go SDK
Emit events from Go applications:
emitter, err := sdk.NewEmitter(sdk.Config{
ServerURL: "http://localhost:8080",
Framework: "my-agent",
ClientID: "client-001",
Host: "localhost",
})
defer emitter.Close(ctx)
emitter.Emit(ctx, sdk.NewSessionStart(sessionID, sdk.WithSource(emitter)))
emitter.Emit(ctx, sdk.NewRunStart(sessionID, runID))
emitter.Emit(ctx, sdk.NewRunEnd(sessionID, runID, sdk.WithPayload(map[string]any{
"status": "success",
"duration_ms": 1234,
})))
Web UI
The web UI has five views:
- Dashboard (
/) — real-time overview with summary stats (active sessions, runs, tools, errors), uPlot time-series charts with selectable windows (1h/6h/24h/7d), framework breakdown bars, live activity feed, and top tools ranking. All sections update live via WebSocket. - Sessions (
/sessions) — browse all agent sessions with date range, framework, and host filters - Session Detail (
/sessions/{id}) — view runs within a session, drill into individual runs and spans - Agents (
/agents) — live timeline of OpenClaw agent events with VM status pills and statistics - OpenClaw (
/openclaw) — real-time grid of VM health cards (state, CPU, memory, disk, gateway status, issues)
Development
make test # run tests
make tidy # go mod tidy
make logs # docker compose logs
make down # stop everything
Project Structure
cmd/
├── ingest-gateway/ HTTP event ingestion service
├── query-api/ REST API for querying events
├── web-ui/ SPA frontend + static assets
│ └── static/ HTML, CSS, JS
├── event-processor/ NATS → Postgres persistence
└── openclaw-monitor/ VM health polling
internal/
├── event/ Envelope types and validation
├── httpx/ HTTP response helpers
├── queue/nats/ NATS publisher and subscriber
├── store/postgres/ Database queries (sessions, runs, spans, stats)
├── sdk/ Go client library for emitting events
└── monitor/openclaw/ VM metrics collection (libvirt, SSH)
hooks/
└── agentmon/ OpenClaw hook (TypeScript)
deploy/
└── k8s/ Database schema (postgres.sql)