feat: complete agent monitoring - hook, UI, and backend filter
- Add event_type and framework filters to events query endpoint - Add /agents SPA route to web-ui server - Add Agents nav link and route in frontend - Add agents page CSS (timeline, VM pills, stats panel) - Build VM status strip, activity timeline, and real-time stats - Add agentmon hook for OpenClaw (HOOK.md + handler.ts) - Add docker-compose, Dockerfile, and supporting infra files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,253 @@
|
||||
# agentmon
|
||||
|
||||
Telemetry and monitoring system for AI agent activity across [OpenClaw](https://openclaw.ai/) instances running on KVM virtual machines. Captures sessions, runs, tool calls, errors, and VM health metrics — viewable in a real-time web dashboard.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────────────┐
|
||||
│ OpenClaw VMs │
|
||||
│ (zap, orb, sun) │
|
||||
│ │
|
||||
│ hooks/agentmon/ │
|
||||
│ → handler.ts │
|
||||
└──────────┬───────────────┘
|
||||
│ HTTP POST
|
||||
▼
|
||||
┌─────────────┐ publish ┌──────────────┐
|
||||
│ openclaw- │────────────▶│ NATS │
|
||||
│ monitor │ │ :4222 │
|
||||
│ (VM polls) │ └──────┬───────┘
|
||||
└─────────────┘ │ subscribe
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ event-processor │
|
||||
└────────┬─────────┘
|
||||
│ INSERT
|
||||
▼
|
||||
┌─────────────┐ query ┌──────────────┐ proxy ┌──────────────┐
|
||||
│ web-ui │◀────────▶│ query-api │◀──────────│ browser │
|
||||
│ :8082 │ │ :8081 │ └──────────────┘
|
||||
└─────────────┘ └──────────────┘
|
||||
▲
|
||||
│
|
||||
┌────────┴───────┐
|
||||
│ PostgreSQL │
|
||||
│ :5432 │
|
||||
└────────────────┘
|
||||
```
|
||||
|
||||
**Data flow:** OpenClaw hooks emit telemetry events over HTTP to the **ingest gateway**, which publishes them to **NATS**. The **event processor** subscribes and persists events to **PostgreSQL**. The **query API** serves aggregated data (sessions, runs, spans) to the **web UI**. A separate **openclaw-monitor** polls VM health metrics (CPU, memory, disk, service status) via libvirt and SSH.
|
||||
|
||||
Real-time updates flow through NATS → query-api → WebSocket → browser.
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Port | Description |
|
||||
|---------|------|-------------|
|
||||
| **ingest-gateway** | 8080 | HTTP + WebSocket event ingestion, publishes to NATS |
|
||||
| **query-api** | 8081 | REST API for sessions, runs, spans; WebSocket live feed |
|
||||
| **web-ui** | 8082 | SPA frontend with reverse proxy to query-api |
|
||||
| **event-processor** | — | NATS subscriber, persists events to Postgres |
|
||||
| **openclaw-monitor** | — | Polls VM instances via libvirt/SSH, emits snapshots |
|
||||
| **postgres** | 5432 | Event storage |
|
||||
| **nats** | 4222 | Message queue (JetStream) |
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
make up
|
||||
```
|
||||
|
||||
This starts Postgres, NATS, and all application services via Docker Compose. Open http://localhost:8082.
|
||||
|
||||
For local development, start infrastructure only and run services manually:
|
||||
|
||||
```bash
|
||||
make up # postgres + nats
|
||||
make run-ingest # terminal 1
|
||||
make run-query # terminal 2
|
||||
make run-ui # terminal 3
|
||||
make run-processor # terminal 4
|
||||
make run-openclaw-monitor # terminal 5
|
||||
```
|
||||
|
||||
Or use the convenience scripts:
|
||||
|
||||
```bash
|
||||
./start-all.sh # start everything
|
||||
./stop-all.sh # stop everything
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables (see `.env.example`):
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `DATABASE_URL` | — | Postgres connection string (required) |
|
||||
| `NATS_URL` | `nats://nats:4222` | NATS server address |
|
||||
| `NATS_TOPIC` | `agentmon.events.v1` | NATS topic for events |
|
||||
| `AGENTMON_ADDR` | `:8080` | Ingest gateway listen address |
|
||||
| `AGENTMON_QUERY_ADDR` | `:8081` | Query API listen address |
|
||||
| `AGENTMON_UI_ADDR` | `:8082` | Web UI listen address |
|
||||
| `AGENTMON_QUERY_BASE` | `http://query-api` | Query API URL (for web-ui proxy) |
|
||||
| `OPENCLAW_REGISTRY` | `~/.claude/state/openclaw-instances.json` | VM instance registry |
|
||||
| `POLL_INTERVAL` | `30s` | VM polling interval |
|
||||
|
||||
## API
|
||||
|
||||
### Ingest Gateway (`:8080`)
|
||||
|
||||
```
|
||||
GET /healthz Health check
|
||||
POST /v1/events Batch event ingestion (JSON array)
|
||||
GET /v1/ws WebSocket event stream
|
||||
```
|
||||
|
||||
### Query API (`:8081`)
|
||||
|
||||
```
|
||||
GET /healthz Health check
|
||||
GET /v1/events List events (?event_type=&framework=&limit=)
|
||||
GET /v1/sessions List sessions (?from=&to=&framework=&host=&cursor=&limit=)
|
||||
GET /v1/sessions/{id} Session detail with runs
|
||||
GET /v1/runs/{id} Run detail with spans
|
||||
GET /v1/ws WebSocket live event broadcast
|
||||
```
|
||||
|
||||
## Event Schema
|
||||
|
||||
Events follow the `agentmon.event` envelope format:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema": { "name": "agentmon.event", "version": 1 },
|
||||
"event": {
|
||||
"id": "uuid",
|
||||
"type": "session.start",
|
||||
"ts": "2026-03-13T12:00:00Z",
|
||||
"source": {
|
||||
"framework": "openclaw",
|
||||
"client_id": "zap",
|
||||
"host": "zap"
|
||||
}
|
||||
},
|
||||
"correlation": {
|
||||
"session_id": "uuid",
|
||||
"run_id": "uuid",
|
||||
"span_id": "uuid"
|
||||
},
|
||||
"attributes": {},
|
||||
"payload": {}
|
||||
}
|
||||
```
|
||||
|
||||
**Event types:** `session.start`, `session.end`, `run.start`, `run.end`, `span.start`, `span.end`, `error`, `metric.snapshot`, `openclaw.snapshot`
|
||||
|
||||
## Database Schema
|
||||
|
||||
```sql
|
||||
CREATE TABLE events (
|
||||
event_id TEXT PRIMARY KEY,
|
||||
ts TIMESTAMPTZ NOT NULL,
|
||||
type TEXT NOT NULL,
|
||||
session_id TEXT,
|
||||
run_id TEXT,
|
||||
trace_id TEXT,
|
||||
span_id TEXT,
|
||||
parent_span_id TEXT,
|
||||
source_framework TEXT,
|
||||
client_id TEXT,
|
||||
payload JSONB NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
## OpenClaw Hook
|
||||
|
||||
The `hooks/agentmon/` directory contains a TypeScript hook that captures agent activity from OpenClaw instances and emits it to the ingest gateway. It maps OpenClaw events to agentmon's session/run/span model:
|
||||
|
||||
| OpenClaw Event | agentmon Event | Description |
|
||||
|----------------|----------------|-------------|
|
||||
| `command:new` | `session.start` | New conversation started |
|
||||
| `command:stop` | `session.end` | Conversation ended |
|
||||
| `command:reset` | `session.end` + `session.start` | Conversation reset |
|
||||
| `message:received` | `run.start` | User message received |
|
||||
| `message:sent` | `run.end` | Agent response sent |
|
||||
| `tool_result_persist` | `span.end` | Tool call completed |
|
||||
| `session:compact:before` | `span.start` | Context compaction started |
|
||||
| `session:compact:after` | `span.end` | Context compaction finished |
|
||||
|
||||
### Deploying the hook
|
||||
|
||||
The hook is deployed to each VM at `~/.openclaw/hooks/agentmon/`. Two environment variables are required in `~/.openclaw/.env`:
|
||||
|
||||
```bash
|
||||
AGENTMON_INGEST_URL=http://192.168.122.1:8080
|
||||
AGENTMON_VM_NAME=zap # or orb, sun
|
||||
```
|
||||
|
||||
Deployment is automated via Ansible — see the [swarm ansible playbook](https://gitea-http.taildb3494.ts.net/will/swarm) `playbooks/customize.yml`.
|
||||
|
||||
## Go SDK
|
||||
|
||||
Emit events from Go applications:
|
||||
|
||||
```go
|
||||
emitter, err := sdk.NewEmitter(sdk.Config{
|
||||
ServerURL: "http://localhost:8080",
|
||||
Framework: "my-agent",
|
||||
ClientID: "client-001",
|
||||
Host: "localhost",
|
||||
})
|
||||
defer emitter.Close(ctx)
|
||||
|
||||
emitter.Emit(ctx, sdk.NewSessionStart(sessionID, sdk.WithSource(emitter)))
|
||||
emitter.Emit(ctx, sdk.NewRunStart(sessionID, runID))
|
||||
emitter.Emit(ctx, sdk.NewRunEnd(sessionID, runID, sdk.WithPayload(map[string]any{
|
||||
"status": "success",
|
||||
"duration_ms": 1234,
|
||||
})))
|
||||
```
|
||||
|
||||
## Web UI
|
||||
|
||||
The dashboard has four views:
|
||||
|
||||
- **Sessions** — browse all agent sessions with date range and framework filters
|
||||
- **Session Detail** — view runs within a session, drill into individual runs
|
||||
- **OpenClaw** — real-time grid of VM health cards (state, CPU, memory, disk, issues)
|
||||
- **Agents** — live timeline of agent events with statistics (message counts, tool usage, errors)
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
make test # run tests
|
||||
make tidy # go mod tidy
|
||||
make logs # docker compose logs
|
||||
make down # stop everything
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
cmd/
|
||||
├── ingest-gateway/ HTTP event ingestion service
|
||||
├── query-api/ REST API for querying events
|
||||
├── web-ui/ SPA frontend + static assets
|
||||
│ └── static/ HTML, CSS, JS
|
||||
├── event-processor/ NATS → Postgres persistence
|
||||
└── openclaw-monitor/ VM health polling
|
||||
internal/
|
||||
├── event/ Envelope types and validation
|
||||
├── httpx/ HTTP response helpers
|
||||
├── queue/nats/ NATS publisher and subscriber
|
||||
├── store/postgres/ Database queries (sessions, runs, spans)
|
||||
├── sdk/ Go client library for emitting events
|
||||
└── monitor/openclaw/ VM metrics collection (libvirt, SSH)
|
||||
hooks/
|
||||
└── agentmon/ OpenClaw hook (TypeScript)
|
||||
deploy/
|
||||
└── k8s/ Database schema (postgres.sql)
|
||||
```
|
||||
Reference in New Issue
Block a user