docs(03): create phase plan for live ops dashboard
This commit is contained in:
@@ -0,0 +1,255 @@
|
||||
---
|
||||
phase: 03-live-ops-dashboard
|
||||
plan: 01
|
||||
type: execute
|
||||
wave: 1
|
||||
depends_on: []
|
||||
files_modified:
|
||||
- src/gateway/metrics.ts
|
||||
- src/gateway/metrics.test.ts
|
||||
- src/gateway/handlers/system.ts
|
||||
- src/gateway/server.ts
|
||||
- src/daemon/services.ts
|
||||
autonomous: true
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "MetricsCollector accumulates counters (messages processed, errors) and model call metrics (latency, tokens/sec, provider)"
|
||||
- "Gateway exposes system.metrics and system.events RPC methods returning accumulated data"
|
||||
- "GET /health returns JSON with daemon status, uptime, and component readiness without WebSocket"
|
||||
- "Errors and significant events are captured in a ring buffer accessible via RPC"
|
||||
- "Active agent requests are tracked (in-flight count, tool executions, session IDs)"
|
||||
artifacts:
|
||||
- path: "src/gateway/metrics.ts"
|
||||
provides: "MetricsCollector class — single source of truth for all ops metrics"
|
||||
exports: ["MetricsCollector"]
|
||||
- path: "src/gateway/metrics.test.ts"
|
||||
provides: "Tests for MetricsCollector"
|
||||
contains: "describe.*MetricsCollector"
|
||||
- path: "src/gateway/handlers/system.ts"
|
||||
provides: "New system.metrics, system.events, system.activeRequests RPC handlers"
|
||||
contains: "system.metrics"
|
||||
- path: "src/gateway/server.ts"
|
||||
provides: "HTTP /health endpoint and MetricsCollector wiring"
|
||||
contains: "/health"
|
||||
- path: "src/daemon/services.ts"
|
||||
provides: "MetricsCollector creation and wiring into gateway"
|
||||
contains: "MetricsCollector"
|
||||
key_links:
|
||||
- from: "src/gateway/server.ts"
|
||||
to: "src/gateway/metrics.ts"
|
||||
via: "GatewayServer holds MetricsCollector ref, passes to handlers"
|
||||
pattern: "metrics.*MetricsCollector"
|
||||
- from: "src/gateway/handlers/system.ts"
|
||||
to: "src/gateway/metrics.ts"
|
||||
via: "System handlers read from MetricsCollector"
|
||||
pattern: "getMetrics|getEvents|getActiveRequests"
|
||||
- from: "src/daemon/services.ts"
|
||||
to: "src/gateway/metrics.ts"
|
||||
via: "createGateway instantiates MetricsCollector"
|
||||
pattern: "new MetricsCollector"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Create the metrics collection backend and wire it into the gateway server with new RPC handlers and an HTTP /health endpoint.
|
||||
|
||||
Purpose: Provide the data layer that the dashboard UI (Plan 02) will consume. Without collected metrics, the dashboard has nothing to show beyond what system.health already provides.
|
||||
|
||||
Output: MetricsCollector class, 3 new RPC methods (system.metrics, system.events, system.activeRequests), HTTP GET /health endpoint, tests.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/will/.config/opencode/get-shit-done/workflows/execute-plan.md
|
||||
@/home/will/.config/opencode/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@src/gateway/server.ts
|
||||
@src/gateway/handlers/system.ts
|
||||
@src/gateway/handlers/index.ts
|
||||
@src/gateway/protocol.ts
|
||||
@src/gateway/router.ts
|
||||
@src/gateway/handlers/agent.ts
|
||||
@src/gateway/session-bridge.ts
|
||||
@src/daemon/services.ts
|
||||
@src/daemon/index.ts
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Create MetricsCollector and wire into gateway</name>
|
||||
<files>
|
||||
src/gateway/metrics.ts
|
||||
src/gateway/metrics.test.ts
|
||||
src/gateway/server.ts
|
||||
src/gateway/handlers/system.ts
|
||||
src/gateway/handlers/index.ts
|
||||
src/daemon/services.ts
|
||||
</files>
|
||||
<action>
|
||||
Create `src/gateway/metrics.ts` with a `MetricsCollector` class that tracks:
|
||||
|
||||
**Counters (simple incrementing numbers):**
|
||||
- `messagesProcessed` — incremented each time an agent.send completes (success or error)
|
||||
- `errors` — incremented on agent.send errors and any other recorded errors
|
||||
- `activeRequests` — gauge (increment on start, decrement on end)
|
||||
|
||||
**Model call metrics (ring buffer of recent calls, max 200 entries):**
|
||||
Each entry: `{ timestamp: number, provider: string, latency: number, inputTokens: number, outputTokens: number, tokensPerSec: number, error?: string }`
|
||||
- `recordModelCall(entry)` — push to ring buffer
|
||||
- `getModelMetrics()` — return the array
|
||||
|
||||
**Event stream (ring buffer of recent events, max 500 entries):**
|
||||
Each entry: `{ timestamp: number, level: 'info' | 'warn' | 'error', source: string, message: string, context?: Record<string, unknown> }`
|
||||
- `recordEvent(event)` — push to ring buffer
|
||||
- `getEvents(opts?: { level?: string, limit?: number })` — return filtered/limited array (newest first)
|
||||
|
||||
**Active request tracking:**
|
||||
- `startRequest(id: string, info: { sessionId: string, channel: string })` — records start time + info
|
||||
- `endRequest(id: string)` — removes from active map
|
||||
- `getActiveRequests()` — returns array of `{ id, sessionId, channel, startedAt, durationMs }`
|
||||
|
||||
**Snapshot method:**
|
||||
- `getSnapshot()` — returns `{ messagesProcessed, errors, activeRequests: number, uptime: number, modelCalls: { total, avgLatency, errorRate, recentCalls }, queueDepth: number }`
|
||||
- Accept a `getQueueDepth` callback in constructor for LaneQueue integration
|
||||
|
||||
The class should be simple, synchronous (no async), and have NO external dependencies beyond Node.js builtins. Export the class and all relevant types.
|
||||
|
||||
**Wire MetricsCollector into the gateway:**
|
||||
|
||||
1. In `src/gateway/server.ts`:
|
||||
- Add `metrics?: MetricsCollector` to `GatewayServerConfig` interface
|
||||
- Store the metrics instance on the GatewayServer class
|
||||
- In `handleHttpRequest`, add a handler for `GET /health` BEFORE the auth check (health endpoint should be unauthenticated for Docker HEALTHCHECK). Return JSON: `{ status: 'ok', uptime: <seconds>, version: <string>, sessions: <count>, connections: <count>, tools: <count>, channels: <channelList> }`. Use the same data sources as `system.health` RPC handler. Set `Content-Type: application/json`.
|
||||
- In the agent.send flow: the GatewayServer doesn't handle agent.send directly (it's in the handler), so instead expose `getMetrics()` accessor on GatewayServer so handlers can access it.
|
||||
|
||||
2. In `src/gateway/handlers/system.ts`:
|
||||
- Add `getMetrics?: () => { messagesProcessed: number, errors: number, activeRequests: number, uptime: number, modelCalls: { total: number, avgLatency: number, errorRate: number, recentCalls: unknown[] }, queueDepth: number }` to `SystemHandlerDeps`
|
||||
- Add `getEvents?: () => unknown[]` and `getActiveRequests?: () => unknown[]` to `SystemHandlerDeps`
|
||||
- Add `system.metrics` handler: returns `getMetrics()` snapshot
|
||||
- Add `system.events` handler: returns `getEvents()` with optional `level` and `limit` params
|
||||
- Add `system.activeRequests` handler: returns `getActiveRequests()` array
|
||||
- Update the re-exports in `src/gateway/handlers/index.ts` if any new types need exporting
|
||||
|
||||
3. In `src/gateway/handlers/agent.ts` (or via a wrapper in server.ts):
|
||||
- The metrics recording for agent.send happens naturally. In `src/gateway/server.ts`, when registering handlers, wrap the system handlers construction to pass the metrics callbacks. The MetricsCollector is NOT directly imported by agent handler; instead, the GatewayServer passes metrics callbacks via SystemHandlerDeps. For request tracking in agent.send, add a `onRequestStart` and `onRequestEnd` callback to `AgentHandlerDeps` so the server can hook MetricsCollector in.
|
||||
|
||||
4. In `src/daemon/services.ts`:
|
||||
- In `createGateway()`, instantiate `new MetricsCollector({ getQueueDepth: () => 0 })` (queue depth from LaneQueue is internal to GatewayServer; we'll wire it there).
|
||||
- Pass it to the GatewayServer config as `metrics`.
|
||||
- Actually, better approach: let GatewayServer create the MetricsCollector itself in its constructor using its own LaneQueue. This keeps it self-contained. In `GatewayServerConfig`, just add `metricsEnabled?: boolean` (default true). The GatewayServer constructor creates `this.metrics = new MetricsCollector({ getQueueDepth: () => this.laneQueue.totalPending() })`.
|
||||
- Add a `totalPending()` method to LaneQueue that sums all queue lengths across lanes.
|
||||
|
||||
Wait — simpler approach that avoids changing too many files:
|
||||
- Create MetricsCollector in GatewayServer constructor (it already has LaneQueue). No config change needed in services.ts.
|
||||
- GatewayServer passes metrics callbacks to system handler deps and agent handler deps.
|
||||
- This keeps the metrics concern entirely within the gateway module.
|
||||
|
||||
Use this simpler approach. Changes to `src/daemon/services.ts` are minimal or unnecessary — just ensure the GatewayServer starts collecting metrics automatically.
|
||||
|
||||
**Update LaneQueue** (`src/gateway/lane-queue.ts`):
|
||||
- Add a `totalPending(): number` method that returns the sum of all lane queue lengths (iterate over lanes, sum queue.length).
|
||||
|
||||
**Tests in `src/gateway/metrics.test.ts`:**
|
||||
- Test counter increment/decrement
|
||||
- Test model call ring buffer (max 200, FIFO eviction)
|
||||
- Test event ring buffer (max 500, FIFO eviction, filtering by level)
|
||||
- Test active request tracking (start, end, duration calculation)
|
||||
- Test getSnapshot returns correct shape
|
||||
- Test getEvents with level filter and limit
|
||||
|
||||
Run `pnpm test:run` to verify zero regressions plus new tests pass.
|
||||
Run `pnpm typecheck` to verify no type errors.
|
||||
</action>
|
||||
<verify>
|
||||
`pnpm test:run` — all existing 1077 tests pass plus new metrics tests pass.
|
||||
`pnpm typecheck` — no type errors.
|
||||
`grep -r "system.metrics\|system.events\|system.activeRequests" src/gateway/handlers/system.ts` — confirms new RPC methods exist.
|
||||
`grep -r "GET.*health\|/health" src/gateway/server.ts` — confirms HTTP health endpoint exists.
|
||||
</verify>
|
||||
<done>
|
||||
MetricsCollector created with counters, model call ring buffer, event ring buffer, and active request tracking.
|
||||
Three new RPC handlers registered (system.metrics, system.events, system.activeRequests).
|
||||
GET /health returns unauthenticated JSON health status.
|
||||
LaneQueue has totalPending() method.
|
||||
All tests pass with zero regressions.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 2: Hook metrics recording into agent request flow</name>
|
||||
<files>
|
||||
src/gateway/server.ts
|
||||
src/gateway/handlers/agent.ts
|
||||
src/gateway/lane-queue.ts
|
||||
</files>
|
||||
<action>
|
||||
Wire the MetricsCollector into the actual agent request flow so metrics are populated with real data as messages flow through the system.
|
||||
|
||||
1. **In `src/gateway/server.ts` registerHandlers():**
|
||||
- When creating agent handlers, pass `onRequestStart` and `onRequestEnd` callbacks that call `this.metrics.startRequest()` and `this.metrics.endRequest()` respectively.
|
||||
- When creating agent handlers, pass `onRequestComplete` callback that calls `this.metrics.incrementMessages()` and optionally `this.metrics.incrementErrors()` on error.
|
||||
- Pass `onModelCall` callback that the agent handler can call with latency/token data.
|
||||
- Actually, the simpler pattern: pass the MetricsCollector instance directly to agent handler deps: `metrics?: MetricsCollector`. The agent handler can then call the methods directly. This is cleaner than a bag of callbacks.
|
||||
|
||||
2. **In `src/gateway/handlers/agent.ts`:**
|
||||
- Add `metrics?: MetricsCollector` to `AgentHandlerDeps` (import type from `../metrics.js`)
|
||||
- In `agent.send` handler:
|
||||
- At start: `const requestId = request.id.toString(); deps.metrics?.startRequest(requestId, { sessionId: laneId, channel: 'ws' });`
|
||||
- In the `try` block after `agent.process()` resolves: `deps.metrics?.incrementMessages();`
|
||||
- In the `catch` block: `deps.metrics?.incrementErrors(); deps.metrics?.recordEvent({ timestamp: Date.now(), level: 'error', source: 'agent.send', message: err.message || 'Unknown error', context: { sessionId: laneId } });`
|
||||
- In `finally`: `deps.metrics?.endRequest(requestId);`
|
||||
- For tool use events, record them to metrics: when `event.type === 'end'` and `event.result` and `!event.result.success`, increment error counter and record error event.
|
||||
|
||||
3. **In `src/gateway/server.ts` registerHandlers():**
|
||||
- Pass `metrics: this.metrics` when constructing agent handler deps.
|
||||
- Update the system handlers construction to pass the metrics accessors:
|
||||
```
|
||||
getMetrics: () => this.metrics.getSnapshot(),
|
||||
getEvents: (opts) => this.metrics.getEvents(opts),
|
||||
getActiveRequests: () => this.metrics.getActiveRequests(),
|
||||
```
|
||||
|
||||
4. **Test the wiring:**
|
||||
- In the existing `src/gateway/server.test.ts` or `src/gateway/handlers/handlers.test.ts`, verify that sending a message through agent.send increments the metrics counters. If the existing test infrastructure doesn't easily support this, at minimum verify through the type system that the wiring is correct.
|
||||
|
||||
Run `pnpm test:run` and `pnpm typecheck`.
|
||||
</action>
|
||||
<verify>
|
||||
`pnpm test:run` — all tests pass (1077 existing + new metrics tests).
|
||||
`pnpm typecheck` — no type errors.
|
||||
`grep -r "metrics\." src/gateway/handlers/agent.ts` — confirms metrics calls in agent handler.
|
||||
</verify>
|
||||
<done>
|
||||
Agent request flow records: messagesProcessed counter, error counter, active request tracking, and error events.
|
||||
Tool failures are recorded as error events.
|
||||
System handlers return live metrics data from MetricsCollector.
|
||||
All tests pass, no type errors.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. `pnpm test:run` — all 1077+ tests pass
|
||||
2. `pnpm typecheck` — zero type errors
|
||||
3. New system.metrics, system.events, system.activeRequests RPC methods registered (check via getMethods())
|
||||
4. GET /health returns valid JSON with status, uptime, version fields
|
||||
5. MetricsCollector ring buffers enforce size limits
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- MetricsCollector exists with counters, model call buffer, event buffer, active request tracking
|
||||
- Three new RPC handlers return metrics data
|
||||
- GET /health endpoint returns unauthenticated JSON health status
|
||||
- Agent request flow records messagesProcessed, errors, active requests, and error events
|
||||
- Zero test regressions, all new tests pass
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/03-live-ops-dashboard/03-01-SUMMARY.md`
|
||||
</output>
|
||||
@@ -0,0 +1,260 @@
|
||||
---
|
||||
phase: 03-live-ops-dashboard
|
||||
plan: 02
|
||||
type: execute
|
||||
wave: 2
|
||||
depends_on: ["03-01"]
|
||||
files_modified:
|
||||
- src/gateway/ui/pages/dashboard.js
|
||||
- src/gateway/ui/style.css
|
||||
- src/gateway/ui/index.html
|
||||
- src/gateway/ui/lib/ws-client.js
|
||||
autonomous: false
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "Dashboard shows live-updating counters: messages processed, active sessions, queue depth, daemon uptime — values change in real time"
|
||||
- "Dashboard shows model call metrics: per-call latency, tokens/sec throughput, error rates by provider"
|
||||
- "Dashboard shows live event stream: scrollable log of errors and events with timestamps, auto-scrolls on new entries"
|
||||
- "Dashboard shows active request tracking: in-flight requests with duration and session info"
|
||||
- "Dashboard auto-refreshes every 3 seconds for counters and events, maintaining live feel"
|
||||
artifacts:
|
||||
- path: "src/gateway/ui/pages/dashboard.js"
|
||||
provides: "Enhanced dashboard page with metrics, events, and active request sections"
|
||||
min_lines: 200
|
||||
- path: "src/gateway/ui/style.css"
|
||||
provides: "New CSS classes for event stream, metrics cards, active requests table"
|
||||
contains: "event-stream"
|
||||
- path: "src/gateway/ui/index.html"
|
||||
provides: "Unchanged structure (dashboard page already registered)"
|
||||
- path: "src/gateway/ui/lib/ws-client.js"
|
||||
provides: "No changes needed (call() method already supports the new RPC methods)"
|
||||
key_links:
|
||||
- from: "src/gateway/ui/pages/dashboard.js"
|
||||
to: "system.metrics"
|
||||
via: "client.call('system.metrics')"
|
||||
pattern: "client\\.call.*system\\.metrics"
|
||||
- from: "src/gateway/ui/pages/dashboard.js"
|
||||
to: "system.events"
|
||||
via: "client.call('system.events')"
|
||||
pattern: "client\\.call.*system\\.events"
|
||||
- from: "src/gateway/ui/pages/dashboard.js"
|
||||
to: "system.activeRequests"
|
||||
via: "client.call('system.activeRequests')"
|
||||
pattern: "client\\.call.*system\\.activeRequests"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Extend the existing vanilla JS dashboard with live ops sections: core counters, model call metrics, event stream, and active request tracking.
|
||||
|
||||
Purpose: This is the user-facing deliverable — the operator opens the dashboard and sees real-time system health without tailing logs. All data comes from the RPC handlers created in Plan 01.
|
||||
|
||||
Output: Enhanced dashboard.js with four new sections, supporting CSS, human-verified live dashboard.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/home/will/.config/opencode/get-shit-done/workflows/execute-plan.md
|
||||
@/home/will/.config/opencode/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/STATE.md
|
||||
@.planning/phases/03-live-ops-dashboard/03-01-SUMMARY.md
|
||||
@src/gateway/ui/pages/dashboard.js
|
||||
@src/gateway/ui/style.css
|
||||
@src/gateway/ui/index.html
|
||||
@src/gateway/ui/app.js
|
||||
@src/gateway/ui/lib/ws-client.js
|
||||
</context>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto">
|
||||
<name>Task 1: Extend dashboard page with live ops sections</name>
|
||||
<files>
|
||||
src/gateway/ui/pages/dashboard.js
|
||||
src/gateway/ui/style.css
|
||||
</files>
|
||||
<action>
|
||||
**IMPORTANT: Extend the existing vanilla JS dashboard — do NOT replace with React or any framework. This is a locked user decision.**
|
||||
|
||||
Rewrite `src/gateway/ui/pages/dashboard.js` to show four sections (replacing the current simple health/channels/usage layout):
|
||||
|
||||
**Section 1: Core Counters (top row of stat cards)**
|
||||
- Messages Processed (from `system.metrics` → messagesProcessed)
|
||||
- Active Sessions (from `system.health` → sessions)
|
||||
- Queue Depth (from `system.metrics` → queueDepth)
|
||||
- Daemon Uptime (from `system.metrics` → uptime, formatted as "Xd Xh Xm Xs")
|
||||
- Active Requests (from `system.metrics` → activeRequests)
|
||||
- Errors (from `system.metrics` → errors, colored red if > 0)
|
||||
|
||||
Use the existing `.stats-grid` and `.stat-card` CSS classes.
|
||||
|
||||
**Section 2: Model Performance (table of recent model calls)**
|
||||
- Show the most recent 20 model calls from `system.metrics` → modelCalls.recentCalls
|
||||
- Table columns: Time (relative, e.g. "3s ago"), Provider, Latency (ms), Tokens/sec, In/Out tokens, Status (✓ or ✗)
|
||||
- Summary row above the table: Total calls, Avg latency, Error rate %
|
||||
- Use existing table CSS classes
|
||||
|
||||
**Section 3: Event Stream (scrollable log)**
|
||||
- Fetch from `system.events` with `{ limit: 50 }`
|
||||
- Each event rendered as a row: `[HH:MM:SS] [LEVEL] source: message`
|
||||
- Color-code: error=red, warn=yellow, info=default
|
||||
- Container has max-height with overflow-y: auto and auto-scrolls to bottom on new entries
|
||||
- New class `.event-stream` for the container, `.event-row` for each entry, `.event-level-error`, `.event-level-warn`, `.event-level-info` for coloring
|
||||
|
||||
**Section 4: Active Requests (table, only shown when requests in flight)**
|
||||
- Fetch from `system.activeRequests`
|
||||
- Table columns: Session, Channel, Duration (live-updating), Started
|
||||
- If no active requests, show "No active requests" muted text
|
||||
- Use existing table CSS
|
||||
|
||||
**Section 5: Channels (keep existing)**
|
||||
- Keep the existing channels grid showing connected/disconnected channel adapters
|
||||
|
||||
**Refresh strategy:**
|
||||
- Replace the current 10-second interval with a 3-second interval for the core data (system.metrics, system.events, system.activeRequests)
|
||||
- Fetch system.health and system.channels every 10 seconds (less dynamic data)
|
||||
- Use `Promise.all` to batch the frequent calls together
|
||||
- Keep the existing `teardown()` pattern with `clearInterval`
|
||||
|
||||
**Implementation approach:**
|
||||
- Keep the same module pattern: `loadDashboard(el, client)` function + `DashboardPage` export with `render`/`teardown`
|
||||
- Use two timers: `_fastTimer` (3s) for metrics/events/requests, `_slowTimer` (10s) for health/channels
|
||||
- On first render, fetch everything with `Promise.all`
|
||||
- On subsequent fast ticks, only update the dynamic sections (don't re-render the whole page — use targeted DOM updates via `getElementById` for each section)
|
||||
- Generate unique section IDs: `#ops-counters`, `#ops-model-table`, `#ops-events`, `#ops-requests`, `#ops-channels`
|
||||
|
||||
**CSS additions in `src/gateway/ui/style.css`:**
|
||||
Add at the end of the file (before the responsive section):
|
||||
|
||||
```css
|
||||
/* ── Event Stream ──────────────────────────────────────── */
|
||||
.event-stream {
|
||||
max-height: 300px;
|
||||
overflow-y: auto;
|
||||
background-color: var(--bg-secondary);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: var(--radius);
|
||||
padding: 8px;
|
||||
font-size: var(--font-size-sm);
|
||||
font-family: var(--font-mono);
|
||||
}
|
||||
|
||||
.event-row {
|
||||
padding: 4px 8px;
|
||||
border-bottom: 1px solid var(--border-light);
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.event-row:last-child {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
.event-level-error { color: var(--error); }
|
||||
.event-level-warn { color: var(--warning); }
|
||||
.event-level-info { color: var(--text-secondary); }
|
||||
|
||||
/* ── Model Metrics Summary ─────────────────────────────── */
|
||||
.metrics-summary {
|
||||
display: flex;
|
||||
gap: 24px;
|
||||
margin-bottom: 12px;
|
||||
font-size: var(--font-size-sm);
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.metrics-summary .metric {
|
||||
display: flex;
|
||||
gap: 6px;
|
||||
}
|
||||
|
||||
.metrics-summary .metric-value {
|
||||
font-weight: 600;
|
||||
color: var(--text-primary);
|
||||
}
|
||||
```
|
||||
|
||||
**Keep the formatUptime helper** — it already exists and works perfectly.
|
||||
|
||||
**Avoid:** Do NOT add animations or transitions. Do NOT import external libraries. Do NOT use template literals with innerHTML for the fast-update path — use targeted textContent/innerHTML updates on specific elements to avoid flicker.
|
||||
</action>
|
||||
<verify>
|
||||
`pnpm typecheck` — no type errors (vanilla JS won't affect this, but ensures no TS regressions).
|
||||
`pnpm build` — builds successfully (UI files are served as static assets, not compiled).
|
||||
Manual check: Open `src/gateway/ui/pages/dashboard.js` and verify it:
|
||||
- Calls `client.call('system.metrics')`
|
||||
- Calls `client.call('system.events')`
|
||||
- Calls `client.call('system.activeRequests')`
|
||||
- Has 3-second and 10-second refresh timers
|
||||
- Has `teardown()` that cleans up both timers
|
||||
</verify>
|
||||
<done>
|
||||
Dashboard page shows five sections: core counters, model performance table, event stream, active requests, and channels.
|
||||
Counters and events refresh every 3 seconds.
|
||||
Health and channels refresh every 10 seconds.
|
||||
Event stream auto-scrolls and is color-coded by level.
|
||||
Active requests section shows in-flight requests or "no active requests" message.
|
||||
All existing stat-card and table CSS reused; new event-stream CSS added.
|
||||
</done>
|
||||
</task>
|
||||
|
||||
<task type="checkpoint:human-verify" gate="blocking">
|
||||
<name>Task 2: Verify live dashboard in browser</name>
|
||||
<files>src/gateway/ui/pages/dashboard.js</files>
|
||||
<action>
|
||||
Human verification of the live dashboard. What was built:
|
||||
- Live ops dashboard with real-time metrics, event stream, model performance table, active request tracking, and HTTP /health endpoint
|
||||
- Extended the existing vanilla JS dashboard (no framework replacement)
|
||||
|
||||
Steps to verify:
|
||||
1. Start Flynn: `pnpm dev`
|
||||
2. Open the dashboard in a browser (default: http://localhost:3100 or configured port)
|
||||
3. Verify the dashboard shows:
|
||||
- Core counters row: Messages Processed, Active Sessions, Queue Depth, Uptime, Active Requests, Errors
|
||||
- Model Performance section: table of recent model calls (may be empty if no messages sent yet)
|
||||
- Event Stream section: scrollable log (may show startup events)
|
||||
- Active Requests section: "No active requests" or table
|
||||
- Channels section: connected channel adapters
|
||||
4. Send a message through the chat page (or via a connected channel) and verify:
|
||||
- Messages Processed counter increments within 3 seconds
|
||||
- Model Performance table shows the new call with latency and tokens/sec
|
||||
- Event stream shows relevant entries
|
||||
5. Trigger an error (e.g., send a message that causes a tool error) and verify it appears in the event stream in red
|
||||
6. Test HTTP /health: `curl http://localhost:3100/health` — should return JSON with status, uptime, version
|
||||
7. Run `pnpm test:run` — all tests pass
|
||||
|
||||
Resume signal: Type "approved" or describe issues.
|
||||
</action>
|
||||
<verify>Human confirms dashboard displays correctly and updates in real-time.</verify>
|
||||
<done>Dashboard visually confirmed working with live-updating metrics, event stream, and model performance data.</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<verification>
|
||||
1. Dashboard loads without errors in browser console
|
||||
2. All five sections render with real data
|
||||
3. Counters update within 3 seconds of events occurring
|
||||
4. Event stream is scrollable and color-coded
|
||||
5. `curl /health` returns valid JSON
|
||||
6. `pnpm test:run` — all tests pass
|
||||
7. `pnpm typecheck` — zero type errors
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- Dashboard shows live-updating counters that change as messages flow (DASH-01)
|
||||
- Model call metrics visible with latency and tokens/sec (DASH-02)
|
||||
- Event stream shows errors with timestamps and context (DASH-03)
|
||||
- Active requests tracked and displayed (DASH-04)
|
||||
- GET /health returns JSON status (DASH-05)
|
||||
- Existing dashboard pages (chat, sessions, usage, settings) unaffected
|
||||
- Zero test regressions
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
After completion, create `.planning/phases/03-live-ops-dashboard/03-02-SUMMARY.md`
|
||||
</output>
|
||||
Reference in New Issue
Block a user