# agentmon Telemetry and monitoring system for AI agent activity across [OpenClaw](https://openclaw.ai/) instances running on KVM virtual machines. Captures sessions, runs, tool calls, errors, and VM health metrics — viewable in a real-time web dashboard. ## Architecture ``` ┌──────────────────────────┐ │ OpenClaw VMs │ │ (zap) │ │ │ │ hooks/agentmon/ │ │ → handler.ts │ └──────────┬───────────────┘ │ HTTP POST ▼ ┌─────────────┐ publish ┌──────────────┐ │ openclaw- │────────────▶│ NATS │ │ monitor │ │ :4222 │ │ (VM polls) │ └──────┬───────┘ └─────────────┘ │ subscribe ▼ ┌──────────────────┐ │ event-processor │ └────────┬─────────┘ │ INSERT ▼ ┌─────────────┐ query ┌──────────────┐ proxy ┌──────────────┐ │ web-ui │◀────────▶│ query-api │◀──────────│ browser │ │ :8082 │ │ :8081 │ └──────────────┘ └─────────────┘ └──────────────┘ ▲ │ ┌────────┴───────┐ │ PostgreSQL │ │ :5432 │ └────────────────┘ ``` **Data flow:** OpenClaw hooks emit telemetry events over HTTP to the **ingest gateway**, which publishes them to **NATS**. The **event processor** subscribes and persists events to **PostgreSQL**. The **query API** serves aggregated data (sessions, runs, spans) to the **web UI**. A separate **openclaw-monitor** polls VM health metrics (CPU, memory, disk, service status) via libvirt and SSH. Real-time updates flow through NATS → query-api → WebSocket → browser. ## Services | Service | Port | Description | |---------|------|-------------| | **ingest-gateway** | 8080 | HTTP + WebSocket event ingestion, publishes to NATS | | **query-api** | 8081 | REST API for sessions, runs, spans; WebSocket live feed | | **web-ui** | 8082 | SPA frontend with reverse proxy to query-api | | **event-processor** | — | NATS subscriber, persists events to Postgres | | **openclaw-monitor** | — | Polls VM instances via libvirt/SSH, emits snapshots | | **postgres** | 5432 | Event storage | | **nats** | 4222 | Message queue (JetStream) | ## Quick Start ```bash cp .env.example .env make up ``` This starts Postgres, NATS, and all application services via Docker Compose. Open http://localhost:8082. For local development, start infrastructure only and run services manually: ```bash make up # postgres + nats make run-ingest # terminal 1 make run-query # terminal 2 make run-ui # terminal 3 make run-processor # terminal 4 make run-openclaw-monitor # terminal 5 ``` Or use the convenience scripts: ```bash ./start-all.sh # start everything ./stop-all.sh # stop everything ``` ## Configuration Environment variables (see `.env.example`): | Variable | Default | Description | |----------|---------|-------------| | `DATABASE_URL` | — | Postgres connection string (required) | | `NATS_URL` | `nats://nats:4222` | NATS server address | | `NATS_TOPIC` | `agentmon.events.v1` | NATS topic for events | | `AGENTMON_ADDR` | `:8080` | Ingest gateway listen address | | `AGENTMON_QUERY_ADDR` | `:8081` | Query API listen address | | `AGENTMON_UI_ADDR` | `:8082` | Web UI listen address | | `AGENTMON_QUERY_BASE` | `http://query-api` | Query API URL (for web-ui proxy) | | `OPENCLAW_REGISTRY` | `~/.claude/state/openclaw-instances.json` | VM instance registry | | `POLL_INTERVAL` | `30s` | VM polling interval | ## API ### Ingest Gateway (`:8080`) ``` GET /healthz Health check POST /v1/events Batch event ingestion (JSON array) GET /v1/ws WebSocket event stream ``` ### Query API (`:8081`) ``` GET /healthz Health check GET /v1/events List events (?event_type=&framework=&limit=) GET /v1/sessions List sessions (?from=&to=&framework=&host=&cursor=&limit=) GET /v1/sessions/{id} Session detail with runs GET /v1/runs/{id} Run detail with spans GET /v1/stats/summary Today's aggregate stats (active sessions, runs, tools, errors by framework) GET /v1/stats/timeseries Bucketed event counts (?window=1h|6h|24h|7d) GET /v1/ws WebSocket live event broadcast ``` ## Event Schema Events follow the `agentmon.event` envelope format: ```json { "schema": { "name": "agentmon.event", "version": 1 }, "event": { "id": "uuid", "type": "session.start", "ts": "2026-03-13T12:00:00Z", "source": { "framework": "openclaw", "client_id": "zap", "host": "zap" } }, "correlation": { "session_id": "uuid", "run_id": "uuid", "span_id": "uuid" }, "attributes": {}, "payload": {} } ``` **Event types:** `session.start`, `session.end`, `run.start`, `run.end`, `span.start`, `span.end`, `error`, `metric.snapshot`, `openclaw.snapshot` ## Database Schema ```sql CREATE TABLE events ( event_id TEXT PRIMARY KEY, ts TIMESTAMPTZ NOT NULL, type TEXT NOT NULL, session_id TEXT, run_id TEXT, trace_id TEXT, span_id TEXT, parent_span_id TEXT, source_framework TEXT, client_id TEXT, payload JSONB NOT NULL ); ``` ## OpenClaw Hook The `hooks/agentmon/` directory contains a TypeScript hook that captures agent activity from OpenClaw instances and emits it to the ingest gateway. It maps OpenClaw events to agentmon's session/run/span model: | OpenClaw Event | agentmon Event | Description | |----------------|----------------|-------------| | `command:new` | `session.start` | New conversation started | | `command:stop` | `session.end` | Conversation ended | | `command:reset` | `session.end` + `session.start` | Conversation reset | | `message:received` | `run.start` | User message received | | `message:sent` | `run.end` | Agent response sent | | `tool_result_persist` | `span.end` | Tool call completed | | `session:compact:before` | `span.start` | Context compaction started | | `session:compact:after` | `span.end` | Context compaction finished | ### Deploying the hook The hook is deployed to each VM at `~/.openclaw/hooks/agentmon/`. Two environment variables are required in `~/.openclaw/.env`: ```bash AGENTMON_INGEST_URL=http://192.168.122.1:8080 AGENTMON_VM_NAME=zap ``` Deployment is automated via Ansible — see the [swarm ansible playbook](https://gitea-http.taildb3494.ts.net/will/swarm) `playbooks/customize.yml`. ## Codex Hook The `hooks/codex/` directory contains a TypeScript handler for Codex CLI telemetry. Current Codex support is session/run oriented: - `sessionStart` and `sessionEnd` map to `session.start`, `run.start`, `run.end`, and `session.end` - `notify` maps turn-complete notifications into `run.end` - prompt-submit hooks map user prompts into the next `run.start` - usage payloads emit both `run.end.payload.usage` and a `metric.snapshot` event The Codex handler persists lightweight session state across hook subprocesses. If Codex only delivers later-stage hooks for a session, the handler can recover by emitting synthetic `session.start`/`run.start` events before the first `run.end` or usage snapshot. Full-fidelity lifecycle tracking still depends on configuring Codex session lifecycle hooks, not just `notify`. Sample Codex hook configuration lives in [hooks/codex/hooks.json](/home/will/lab/agentmon/hooks/codex/hooks.json). On the local Codex CLI version we checked (`0.116.0`), `notify` is confirmed. Online reports suggest prompt-submit hooks may appear as `userpromptsubmit` or `userPromptSubmit`, so the sample config includes those aliases. The current Codex integration does not assume tool or subagent span hooks exist. If a newer Codex CLI exposes official tool/span hooks, they can be added separately without changing the run/session flow above. ## Gemini Hook The `hooks/gemini/` directory contains a TypeScript handler for Gemini CLI telemetry. The current integration maps Gemini hook events into agentmon's session/run/span model: - `onStart` maps to `session.start` and an initial `run.start` - `onStop` maps to `run.end` and `session.end` - `onToolCall` maps to `span.start` - `onToolResult` maps to `span.end` Sample Gemini hook configuration lives in [hooks/gemini/hooks.json](/home/will/lab/agentmon/hooks/gemini/hooks.json). Install the handler from that directory so the `agentmon-gemini-handler` binary is available, then point Gemini CLI at the sample hook config and set `AGENTMON_INGEST_URL` to your ingest gateway. ## Hermes Hook The `hooks/hermes/` directory contains a TypeScript handler for Hermes Agent shell-hook telemetry. The current integration maps Hermes hook events into agentmon's session/run/span model: - `on_session_start` maps to `session.start` - `pre_llm_call` maps to `run.start` - `post_llm_call` maps to `run.end` - `pre_tool_call` maps to `span.start` - `post_tool_call` maps to `span.end` - `post_api_request` maps usage payloads to `metric.snapshot` - `on_session_finalize` maps to `session.end` Sample Hermes hook configuration lives in [hooks/hermes/hooks.yaml](/home/will/lab/agentmon/hooks/hermes/hooks.yaml). Install the handler from that directory so the `agentmon-hermes-handler` binary is available, then merge the sample `hooks:` block into `~/.hermes/config.yaml` and set `AGENTMON_INGEST_URL` to your ingest gateway. ## Go SDK Emit events from Go applications: ```go emitter, err := sdk.NewEmitter(sdk.Config{ ServerURL: "http://localhost:8080", Framework: "my-agent", ClientID: "client-001", Host: "localhost", }) defer emitter.Close(ctx) emitter.Emit(ctx, sdk.NewSessionStart(sessionID, sdk.WithSource(emitter))) emitter.Emit(ctx, sdk.NewRunStart(sessionID, runID)) emitter.Emit(ctx, sdk.NewRunEnd(sessionID, runID, sdk.WithPayload(map[string]any{ "status": "success", "duration_ms": 1234, }))) ``` ## Web UI The web UI has five views: - **Dashboard** (`/`) — real-time overview with summary stats (active sessions, runs, tools, errors), uPlot time-series charts with selectable windows (1h/6h/24h/7d), framework breakdown bars, live activity feed, and top tools ranking. All sections update live via WebSocket. - **Sessions** (`/sessions`) — browse all agent sessions with date range, framework, and host filters - **Session Detail** (`/sessions/{id}`) — view runs within a session, drill into individual runs and spans - **Agents** (`/agents`) — live timeline of OpenClaw agent events with VM status pills and statistics - **OpenClaw** (`/openclaw`) — real-time grid of VM health cards (state, CPU, memory, disk, gateway status, issues) ## Development ```bash make test # run tests make tidy # go mod tidy make logs # docker compose logs make down # stop everything ``` ## Project Structure ``` cmd/ ├── ingest-gateway/ HTTP event ingestion service ├── query-api/ REST API for querying events ├── web-ui/ SPA frontend + static assets │ └── static/ HTML, CSS, JS ├── event-processor/ NATS → Postgres persistence └── openclaw-monitor/ VM health polling internal/ ├── event/ Envelope types and validation ├── httpx/ HTTP response helpers ├── queue/nats/ NATS publisher and subscriber ├── store/postgres/ Database queries (sessions, runs, spans, stats) ├── sdk/ Go client library for emitting events └── monitor/openclaw/ VM metrics collection (libvirt, SSH) hooks/ └── agentmon/ OpenClaw hook (TypeScript) deploy/ └── k8s/ Database schema (postgres.sql) ```