docs(plans): add dashboard and realtime agent plans

This commit is contained in:
William Valentin
2026-03-20 11:17:17 -07:00
parent 2e277fb138
commit c88746693a
4 changed files with 1871 additions and 0 deletions
+111
View File
@@ -0,0 +1,111 @@
# Dashboard with Real-time Stats and Graphs
**Date:** 2026-03-14
**Status:** Approved
## Overview
Add a comprehensive dashboard at `/` combining server-side aggregation endpoints, the existing WebSocket stream, and uPlot charts for real-time agent monitoring with stats and graphs.
## Architecture
Three data sources feed the dashboard:
1. **Server-side aggregation endpoints** (new) - historical stats on page load
2. **WebSocket stream** (existing) - live events update charts in real-time
3. **Existing REST endpoints** - sessions/runs for linking out
### New Backend Endpoints
#### `GET /v1/stats/summary`
Current-day aggregates:
```json
{
"active_sessions": 3,
"runs_today": 47,
"tool_calls_today": 312,
"errors_today": 2,
"by_framework": {
"openclaw": { "runs": 30, "tools": 210, "errors": 1 },
"claude-code": { "runs": 17, "tools": 102, "errors": 1 }
}
}
```
Simple `COUNT` + `GROUP BY` over the `events` table using `type` and `source_framework` columns, filtered to `ts >= today midnight`.
#### `GET /v1/stats/timeseries?window=1h&bucket=1m`
Bucketed event counts:
```json
{
"window": "1h",
"bucket": "1m",
"series": [
{ "ts": "2026-03-14T10:00:00Z", "runs": 2, "tools": 14, "errors": 0 },
{ "ts": "2026-03-14T10:01:00Z", "runs": 1, "tools": 8, "errors": 0 }
]
}
```
Bucket sizes auto-calculated if not provided: 1h→1m, 6h→5m, 24h→15m, 7d→1h. Uses Postgres `date_bin` for bucketing.
## Dashboard Layout
Single-scroll page with four sections:
### 1. Summary Strip
Four stat cards in a horizontal row:
- **Active Sessions** - sessions with no `session.end` event, with framework breakdown
- **Runs Today** - total runs since midnight
- **Tool Calls** - total tool spans today
- **Errors** - error events today, red-highlighted if > 0
### 2. OpenClaw VM Strip
Reuse existing VM pill component (zap/orb/sun online/offline status).
### 3. Activity Charts
Two charts side by side:
- **Left: Event rate** - uPlot stacked area time-series. One series per category (runs, tools, errors). Time window selector (1h/6h/24h/7d) in top-right.
- **Right: Framework breakdown** - horizontal bar chart showing events by framework. Rendered as styled divs (categorical, no uPlot needed).
### 4. Bottom Panels
Two columns:
- **Left: Recent activity feed** - last 20 events as compact timeline (reuse existing timeline helpers)
- **Right: Top tools** - ranked list with counts and bar visualization
## Frontend
- **uPlot** loaded from CDN for time-series charts
- Thin wrapper `createChart(el, series, opts)` applies dark theme from existing CSS vars
- Real-time update flow:
1. Page load → fetch summary + timeseries (default 1h window)
2. Connect WebSocket → append live events to charts
3. On each WS event: increment counters, update chart bucket, update top tools, prepend to feed
4. Time window change → re-fetch from server, rebuild chart
- Framework bars: styled `<div>` elements with proportional widths
- Activity feed: reuse `getEventIcon`, `getEventLabel`, `getEventBody` from existing code (extracted as shared helpers)
- Time window selector: segmented control with 1h / 6h / 24h / 7d buttons
## File Changes
### New files
- `internal/store/postgres/stats.go` - `Summary()` and `Timeseries()` query functions
### Modified files
- `cmd/query-api/main.go` - add `/v1/stats/summary` and `/v1/stats/timeseries` handlers
- `cmd/web-ui/static/index.html` - add uPlot CDN script/link tags
- `cmd/web-ui/static/app.js` - dashboard route at `/`, shared helpers, chart rendering, real-time updates
- `cmd/web-ui/static/style.css` - dashboard layout, summary cards, chart containers, time window selector, framework bars
### No changes to
- Database schema (queries use existing `events` table)
- Event format or ingestion pipeline
- Existing pages (sessions, agents, openclaw)
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,59 @@
# Realtime Agents Activity View
**Date:** 2026-03-14
**Status:** Approved
## Overview
Replace the single-timeline Agents page with a per-agent lane layout showing live presence, in-progress operations, and recent events grouped by VM (client_id).
## Data Model: Live State Tracker
Track open events (start without matching end) to derive live status:
- **Active sessions**: `session.start` events keyed by `session_id`, removed on `session.end`. Each knows its VM, start time, framework.
- **In-progress operations**: `span.start`/`run.start` keyed by `span_id`/`run_id`. Marked complete on matching end event. Show elapsed timer while open.
- **Per-agent grouping**: All state grouped by `client_id` (zap, orb, sun).
Initial REST load seeds state, WebSocket updates keep it current.
## Layout: Per-Agent Lanes
Three columns (one per VM), replacing the single timeline + stats sidebar:
```
┌─────────────────────────────────────────────────┐
│ Agents ● Live │
├───────────────┬───────────────┬─────────────────┤
│ ZAP │ ORB │ SUN │
│ ● 2 sessions │ ● 1 session │ ○ idle │
│ ───────────── │ ───────────── │ ─────────────── │
│ ▶ Bash (3.2s) │ ▶ Read (1.1s) │ │
│ ▶ Grep (...) │ │ (recent events) │
│ ───────────── │ ───────────── │ │
│ recent events │ recent events │ │
└───────────────┴───────────────┴─────────────────┘
```
Each lane has:
- **Header**: VM name, online/offline dot, active session count
- **Active operations**: In-progress spans/runs with pulsing dot + live elapsed timer
- **Recent events**: Completed events, same card style, scoped to this agent
Summary stats (messages/tools/errors) move to a compact row above the lanes.
Responsive: lanes stack vertically below 900px.
## Active Operations Display
- `span.start` (tool): pulsing green dot + tool name + elapsed counter (1s setInterval)
- `run.start`: shows as "Thinking..." until `run.end`
- On matching end event: pill fades out, completed event appears in recent list
- If session has both active run + active tool spans, show tool spans (more specific)
- Stale guard: after 5 minutes with no update, dim and show "(stale?)"
## Implementation Notes
- All changes in `style.css` and `app.js` (no backend changes needed)
- Reuse existing WebSocket subscription and REST API calls
- Existing event envelope fields (`correlation.session_id`, `correlation.span_id`, `correlation.run_id`, `source.client_id`) provide all grouping keys
@@ -0,0 +1,619 @@
# Realtime Agents Activity View — Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Replace the single-timeline Agents page with per-agent lanes showing live presence, in-progress operations, and scoped event feeds.
**Architecture:** Pure frontend change. Replace `createAgentsState()`, `renderAgents()`, `handleAgentsWS()`, and related functions in `app.js`. Add new CSS classes in `style.css`. Reuse existing envelope helpers, WS subscription, and REST API.
**Tech Stack:** Vanilla JS, CSS custom properties, existing WebSocket + REST API
---
### Task 1: CSS — Lane layout and active operation styles
**Files:**
- Modify: `cmd/web-ui/static/style.css` (append after existing agents styles, ~line 1430)
**Step 1: Add lane layout CSS**
```css
/* ── Agent lanes ──────────────────────────────────────────── */
.agents-summary-row {
display: flex;
gap: 1.25rem;
margin-bottom: 1.25rem;
}
.agents-summary-stat {
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius);
padding: 0.6rem 1rem;
font-family: var(--font-mono);
font-size: 0.78rem;
color: var(--text-dim);
display: flex;
align-items: center;
gap: 0.5rem;
}
.agents-summary-stat .value {
color: var(--text-bright);
font-weight: 600;
}
.agent-lanes {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 1.25rem;
}
@media (max-width: 900px) {
.agent-lanes {
grid-template-columns: 1fr;
}
}
.agent-lane {
background: var(--surface);
border: 1px solid var(--border);
border-radius: var(--radius-lg);
display: flex;
flex-direction: column;
overflow: hidden;
}
.agent-lane-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: 0.875rem 1.125rem;
border-bottom: 1px solid var(--border-soft);
}
.agent-lane-name {
display: flex;
align-items: center;
gap: 0.5rem;
font-family: var(--font-display);
font-size: 0.9rem;
font-weight: 700;
color: var(--text-bright);
text-transform: uppercase;
letter-spacing: 0.06em;
}
.agent-lane-dot {
width: 7px;
height: 7px;
border-radius: 50%;
flex-shrink: 0;
}
.agent-lane-dot.online {
background: var(--success);
box-shadow: 0 0 6px rgba(52, 211, 153, 0.5);
animation: livePulse 2s ease-in-out infinite;
}
.agent-lane-dot.offline {
background: var(--text-dim);
opacity: 0.5;
}
.agent-lane-status {
font-size: 0.7rem;
font-weight: 600;
color: var(--text-dim);
text-transform: uppercase;
letter-spacing: 0.06em;
}
.agent-lane-status.has-sessions {
color: var(--success);
}
/* ── Active operations ────────────────────────────────────── */
.active-ops {
padding: 0.625rem 1.125rem;
display: flex;
flex-direction: column;
gap: 0.375rem;
border-bottom: 1px solid var(--border-soft);
}
.active-op {
display: flex;
align-items: center;
gap: 0.5rem;
padding: 0.35rem 0.625rem;
background: var(--accent-dim);
border: 1px solid rgba(34, 211, 238, 0.15);
border-radius: var(--radius);
font-size: 0.75rem;
animation: fadeUp 0.2s ease both;
}
[data-theme="light"] .active-op {
border-color: rgba(8, 145, 178, 0.2);
}
.active-op.stale {
opacity: 0.5;
}
.active-op-dot {
width: 6px;
height: 6px;
border-radius: 50%;
background: var(--success);
box-shadow: 0 0 6px rgba(52, 211, 153, 0.5);
animation: livePulse 1.5s ease-in-out infinite;
flex-shrink: 0;
}
.active-op.stale .active-op-dot {
background: var(--warning);
box-shadow: none;
animation: none;
}
.active-op-name {
font-family: var(--font-mono);
color: var(--accent);
font-weight: 500;
}
.active-op-time {
font-family: var(--font-mono);
color: var(--text-dim);
font-size: 0.7rem;
margin-left: auto;
}
.active-op-stale {
color: var(--warning);
font-size: 0.65rem;
font-weight: 600;
}
/* ── Lane event feed ──────────────────────────────────────── */
.agent-lane-events {
flex: 1;
overflow-y: auto;
max-height: 520px;
padding: 0.625rem;
position: relative;
}
.agent-lane-events::after {
content: '';
position: sticky;
bottom: 0;
left: 0;
right: 0;
height: 32px;
background: linear-gradient(to bottom, transparent, var(--surface));
pointer-events: none;
display: block;
}
.agent-lane-events .timeline-event {
padding: 0.5rem 0.625rem;
border-radius: var(--radius);
margin-bottom: 0.25rem;
border: 1px solid var(--border-soft);
background: transparent;
font-size: 0.82rem;
}
.agent-lane-events .timeline-event-header {
margin-bottom: 0.2rem;
}
.agent-lane-events .timeline-event-time {
font-size: 0.6rem;
}
.agent-lane-events .empty-state {
padding: 2rem 1rem;
font-size: 0.78rem;
}
```
**Step 2: Commit**
```bash
git add cmd/web-ui/static/style.css
git commit -m "feat(agents): add CSS for per-agent lane layout and active operation pills"
```
---
### Task 2: JS — Live state tracker data model
**Files:**
- Modify: `cmd/web-ui/static/app.js` — replace `createAgentsState()` (~line 665) and add `processAgentEvent()` and `getAgentDisplayOps()`
**Step 1: Replace createAgentsState**
Replace the existing `createAgentsState()` function (lines ~626-643) with:
```js
function createAgentsState() {
function agentBucket() {
return { sessions: {}, operations: {}, events: [], eventIDs: new Set() };
}
return {
agents: { zap: agentBucket(), orb: agentBucket(), sun: agentBucket() },
stats: { messages: 0, tools: 0, errors: 0, toolCounts: {} },
timerInterval: null,
};
}
```
**Step 2: Add processAgentEvent function**
Add after `createAgentsState`:
```js
function getAgentBucket(evt) {
const name = getVMName(evt).toLowerCase();
return agentsState.agents[name] || null;
}
function processAgentEvent(evt) {
const agent = getAgentBucket(evt);
if (!agent) return;
const eventType = getEnvelopeType(evt);
const correlation = getEnvelopeCorrelation(evt);
const attrs = getEnvelopeAttributes(evt);
// Track active sessions
if (eventType === 'session.start' && correlation.session_id) {
agent.sessions[correlation.session_id] = { ts: getEnvelopeTS(evt) };
}
if (eventType === 'session.end' && correlation.session_id) {
delete agent.sessions[correlation.session_id];
}
// Track active operations (span)
if (eventType === 'span.start' && correlation.span_id) {
agent.operations['s:' + correlation.span_id] = {
type: 'span',
name: attrs.name || attrs.span_kind || 'unknown',
kind: attrs.span_kind || '',
startedAt: Date.now(),
};
}
if (eventType === 'span.end' && correlation.span_id) {
delete agent.operations['s:' + correlation.span_id];
}
// Track active operations (run)
if (eventType === 'run.start' && correlation.run_id) {
agent.operations['r:' + correlation.run_id] = {
type: 'run',
name: 'Thinking\u2026',
kind: 'run',
startedAt: Date.now(),
};
}
if (eventType === 'run.end' && correlation.run_id) {
delete agent.operations['r:' + correlation.run_id];
}
// Add to recent events list
const id = getRecordID(evt);
if (id && !agent.eventIDs.has(id)) {
agent.eventIDs.add(id);
agent.events.push(evt);
while (agent.events.length > 100) {
const removed = agent.events.shift();
agent.eventIDs.delete(getRecordID(removed));
}
}
}
function getAgentDisplayOps(agent) {
const ops = Object.values(agent.operations);
const hasTools = ops.some(op => op.kind === 'tool');
return hasTools ? ops.filter(op => op.kind === 'tool') : ops;
}
```
**Step 3: Update recomputeAgentStats**
Replace existing `recomputeAgentStats()` to iterate over all agent buckets:
```js
function recomputeAgentStats() {
const stats = { messages: 0, tools: 0, errors: 0, toolCounts: {} };
for (const agent of Object.values(agentsState.agents)) {
for (const evt of agent.events) {
const eventType = getEnvelopeType(evt);
const attrs = getEnvelopeAttributes(evt);
if (eventType === 'run.start' || eventType === 'run.end') stats.messages++;
if (eventType === 'span.end' && attrs.span_kind === 'tool') {
stats.tools++;
const toolName = attrs.name || 'unknown';
stats.toolCounts[toolName] = (stats.toolCounts[toolName] || 0) + 1;
}
if (eventType === 'error') stats.errors++;
}
}
agentsState.stats = stats;
}
```
**Step 4: Commit**
```bash
git add cmd/web-ui/static/app.js
git commit -m "feat(agents): add live state tracker with session/operation pairing"
```
---
### Task 3: JS — Rewrite renderAgents and lane rendering
**Files:**
- Modify: `cmd/web-ui/static/app.js` — replace `renderAgents()`, `renderAgentTimeline()`, `renderAgentStats()`, `addAgentEvents()`
**Step 1: Replace addAgentEvents**
Replace existing `addAgentEvents()` with a version that routes to processAgentEvent:
```js
function addAgentEvents(events) {
let changed = false;
for (const evt of events) {
const id = getRecordID(evt);
const agent = getAgentBucket(evt);
if (!id || !agent || agent.eventIDs.has(id)) continue;
processAgentEvent(evt);
changed = true;
}
if (changed) {
// Sort each agent's events
for (const agent of Object.values(agentsState.agents)) {
agent.events.sort((a, b) => new Date(getEnvelopeTS(a)).getTime() - new Date(getEnvelopeTS(b)).getTime());
}
recomputeAgentStats();
}
}
```
**Step 2: Replace renderAgents with lane layout**
Replace the existing `renderAgents()` function:
```js
async function renderAgents() {
agentsState = createAgentsState();
app.innerHTML = `
<div class="page-header">
<h2>Agents <span class="live-indicator"><span class="live-dot"></span>Live</span></h2>
</div>
<div class="agents-summary-row" id="agents-summary"></div>
<div class="agent-lanes" id="agents-lanes">
<div class="agent-lane"><div class="agent-lane-header"><div class="agent-lane-name">ZAP</div></div><div class="agent-lane-events"><p class="empty-state">Loading...</p></div></div>
<div class="agent-lane"><div class="agent-lane-header"><div class="agent-lane-name">ORB</div></div><div class="agent-lane-events"><p class="empty-state">Loading...</p></div></div>
<div class="agent-lane"><div class="agent-lane-header"><div class="agent-lane-name">SUN</div></div><div class="agent-lane-events"><p class="empty-state">Loading...</p></div></div>
</div>
`;
try {
const [snapshots, events] = await Promise.all([
api('/v1/events?event_type=openclaw.snapshot&limit=100').catch(() => ({ events: [] })),
api('/v1/events?framework=openclaw&limit=200'),
]);
if (!isCurrentPath('/agents')) return;
mergeOpenClawEvents(snapshots.events || []);
addAgentEvents((events.events || []).slice().reverse());
renderAgentLanes();
renderAgentSummary();
} catch (e) {
document.getElementById('agents-lanes').innerHTML =
`<p class="empty-state">Error loading agent activity: ${escapeHTML(e.message)}</p>`;
}
// Start elapsed timer
agentsState.timerInterval = setInterval(updateAgentTimers, 1000);
agentsUnsubscribe = subscribeWS(handleAgentsWS);
}
```
**Step 3: Add renderAgentLanes**
```js
function renderAgentLanes() {
const lanesEl = document.getElementById('agents-lanes');
if (!lanesEl) return;
const vmNames = ['zap', 'orb', 'sun'];
lanesEl.innerHTML = vmNames.map(name => {
const agent = agentsState.agents[name];
const vmStatus = getVMStatus().find(v => v.name === name);
const isOnline = vmStatus && vmStatus.active;
const sessionCount = Object.keys(agent.sessions).length;
const ops = getAgentDisplayOps(agent);
const statusClass = sessionCount > 0 ? ' has-sessions' : '';
const statusText = !isOnline ? 'offline'
: sessionCount > 0 ? sessionCount + ' session' + (sessionCount > 1 ? 's' : '')
: 'idle';
const opsHTML = ops.length > 0 ? `<div class="active-ops">${ops.map(op => {
const elapsed = Math.floor((Date.now() - op.startedAt) / 1000);
const stale = elapsed > 300;
return `
<div class="active-op${stale ? ' stale' : ''}">
<span class="active-op-dot"></span>
<span class="active-op-name">${escapeHTML(op.name)}</span>
<span class="active-op-time" data-start="${op.startedAt}">${formatElapsed(elapsed)}</span>
${stale ? '<span class="active-op-stale">(stale?)</span>' : ''}
</div>`;
}).join('')}</div>` : '';
const recent = agent.events.slice(-40).reverse();
const eventsHTML = recent.length > 0 ? recent.map(evt => {
const eventType = getEnvelopeType(evt);
const vmClass = getVMClassName(name);
const details = getEventDetails(evt);
const detailHTML = details ? `<div class="timeline-detail">${escapeHTML(details)}</div>` : '';
const expandHTML = details ? '<button class="timeline-expand-hint" type="button">details</button>' : '';
return `
<div class="timeline-event">
<div class="timeline-event-header">
${getEventIcon(eventType)}
<span class="timeline-event-type">${escapeHTML(getEventLabel(eventType))}</span>
<span class="timeline-event-time">${escapeHTML(new Date(getEnvelopeTS(evt)).toLocaleTimeString())}</span>
</div>
${getEventBody(evt)}
${expandHTML}
${detailHTML}
</div>`;
}).join('') : '<p class="empty-state">No recent activity</p>';
return `
<div class="agent-lane">
<div class="agent-lane-header">
<div class="agent-lane-name">
<span class="agent-lane-dot ${isOnline ? 'online' : 'offline'}"></span>
${escapeHTML(name.toUpperCase())}
</div>
<span class="agent-lane-status${statusClass}">${statusText}</span>
</div>
${opsHTML}
<div class="agent-lane-events">${eventsHTML}</div>
</div>`;
}).join('');
// Wire up detail expand buttons
lanesEl.querySelectorAll('.timeline-expand-hint').forEach(button => {
button.addEventListener('click', () => {
button.parentElement.classList.toggle('expanded');
});
});
}
```
**Step 4: Add formatElapsed helper**
Add near the other format helpers (around line 202):
```js
function formatElapsed(seconds) {
if (seconds < 60) return seconds + 's';
if (seconds < 3600) return Math.floor(seconds / 60) + 'm ' + (seconds % 60) + 's';
return Math.floor(seconds / 3600) + 'h ' + Math.floor((seconds % 3600) / 60) + 'm';
}
```
**Step 5: Add renderAgentSummary**
```js
function renderAgentSummary() {
const el = document.getElementById('agents-summary');
if (!el) return;
const s = agentsState.stats;
el.innerHTML = `
<div class="agents-summary-stat">Messages <span class="value">${s.messages}</span></div>
<div class="agents-summary-stat">Tool Calls <span class="value">${s.tools}</span></div>
<div class="agents-summary-stat">Errors <span class="value">${s.errors}</span></div>
`;
}
```
**Step 6: Commit**
```bash
git add cmd/web-ui/static/app.js
git commit -m "feat(agents): per-agent lane layout with active operations and scoped event feeds"
```
---
### Task 4: JS — WS handler and elapsed timer
**Files:**
- Modify: `cmd/web-ui/static/app.js` — replace `handleAgentsWS()`, add `updateAgentTimers()`
**Step 1: Replace handleAgentsWS**
```js
function handleAgentsWS(msg) {
if (msg.type !== 'message') return;
const eventType = getEnvelopeType(msg.data);
if (eventType === 'openclaw.snapshot') {
mergeOpenClawEvents([msg.data]);
renderAgentLanes();
return;
}
const framework = getEnvelopeSource(msg.data).framework || msg.data.source_framework;
if (framework !== 'openclaw') return;
addAgentEvents([msg.data]);
renderAgentLanes();
renderAgentSummary();
}
```
**Step 2: Add updateAgentTimers**
```js
function updateAgentTimers() {
document.querySelectorAll('.active-op-time[data-start]').forEach(el => {
const start = parseInt(el.dataset.start, 10);
if (!start) return;
const elapsed = Math.floor((Date.now() - start) / 1000);
el.textContent = formatElapsed(elapsed);
// Check stale
const op = el.closest('.active-op');
if (op && elapsed > 300 && !op.classList.contains('stale')) {
op.classList.add('stale');
if (!op.querySelector('.active-op-stale')) {
op.insertAdjacentHTML('beforeend', '<span class="active-op-stale">(stale?)</span>');
}
}
});
}
```
**Step 3: Clean up timer on navigation**
In `cleanupLiveViews()`, add timer cleanup:
```js
if (agentsState && agentsState.timerInterval) {
clearInterval(agentsState.timerInterval);
agentsState.timerInterval = null;
}
```
**Step 4: Remove old functions that are no longer needed**
Delete: `renderAgentTimeline()`, `renderAgentStats()` (replaced by `renderAgentLanes()` and `renderAgentSummary()`), `renderAgentVMStrip()` (VM status is now in lane headers).
**Step 5: Commit**
```bash
git add cmd/web-ui/static/app.js
git commit -m "feat(agents): WS handler, elapsed timer, and stale detection"
```