Files
flynn/.planning/phases/03-live-ops-dashboard/03-02-PLAN.md
T
2026-02-09 21:10:03 -08:00

11 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, must_haves
phase plan type wave depends_on files_modified autonomous must_haves
03-live-ops-dashboard 02 execute 2
03-01
src/gateway/ui/pages/dashboard.js
src/gateway/ui/style.css
src/gateway/ui/index.html
src/gateway/ui/lib/ws-client.js
false
truths artifacts key_links
Dashboard shows live-updating counters: messages processed, active sessions, queue depth, daemon uptime — values change in real time
Dashboard shows model call metrics: per-call latency, tokens/sec throughput, error rates by provider
Dashboard shows live event stream: scrollable log of errors and events with timestamps, auto-scrolls on new entries
Dashboard shows active request tracking: in-flight requests with duration and session info
Dashboard auto-refreshes every 3 seconds for counters and events, maintaining live feel
path provides min_lines
src/gateway/ui/pages/dashboard.js Enhanced dashboard page with metrics, events, and active request sections 200
path provides contains
src/gateway/ui/style.css New CSS classes for event stream, metrics cards, active requests table event-stream
path provides
src/gateway/ui/index.html Unchanged structure (dashboard page already registered)
path provides
src/gateway/ui/lib/ws-client.js No changes needed (call() method already supports the new RPC methods)
from to via pattern
src/gateway/ui/pages/dashboard.js system.metrics client.call('system.metrics') client.call.*system.metrics
from to via pattern
src/gateway/ui/pages/dashboard.js system.events client.call('system.events') client.call.*system.events
from to via pattern
src/gateway/ui/pages/dashboard.js system.activeRequests client.call('system.activeRequests') client.call.*system.activeRequests
Extend the existing vanilla JS dashboard with live ops sections: core counters, model call metrics, event stream, and active request tracking.

Purpose: This is the user-facing deliverable — the operator opens the dashboard and sees real-time system health without tailing logs. All data comes from the RPC handlers created in Plan 01.

Output: Enhanced dashboard.js with four new sections, supporting CSS, human-verified live dashboard.

<execution_context> @/home/will/.config/opencode/get-shit-done/workflows/execute-plan.md @/home/will/.config/opencode/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/03-live-ops-dashboard/03-01-SUMMARY.md @src/gateway/ui/pages/dashboard.js @src/gateway/ui/style.css @src/gateway/ui/index.html @src/gateway/ui/app.js @src/gateway/ui/lib/ws-client.js Task 1: Extend dashboard page with live ops sections src/gateway/ui/pages/dashboard.js src/gateway/ui/style.css **IMPORTANT: Extend the existing vanilla JS dashboard — do NOT replace with React or any framework. This is a locked user decision.**

Rewrite src/gateway/ui/pages/dashboard.js to show four sections (replacing the current simple health/channels/usage layout):

Section 1: Core Counters (top row of stat cards)

  • Messages Processed (from system.metrics → messagesProcessed)
  • Active Sessions (from system.health → sessions)
  • Queue Depth (from system.metrics → queueDepth)
  • Daemon Uptime (from system.metrics → uptime, formatted as "Xd Xh Xm Xs")
  • Active Requests (from system.metrics → activeRequests)
  • Errors (from system.metrics → errors, colored red if > 0)

Use the existing .stats-grid and .stat-card CSS classes.

Section 2: Model Performance (table of recent model calls)

  • Show the most recent 20 model calls from system.metrics → modelCalls.recentCalls
  • Table columns: Time (relative, e.g. "3s ago"), Provider, Latency (ms), Tokens/sec, In/Out tokens, Status (✓ or ✗)
  • Summary row above the table: Total calls, Avg latency, Error rate %
  • Use existing table CSS classes

Section 3: Event Stream (scrollable log)

  • Fetch from system.events with { limit: 50 }
  • Each event rendered as a row: [HH:MM:SS] [LEVEL] source: message
  • Color-code: error=red, warn=yellow, info=default
  • Container has max-height with overflow-y: auto and auto-scrolls to bottom on new entries
  • New class .event-stream for the container, .event-row for each entry, .event-level-error, .event-level-warn, .event-level-info for coloring

Section 4: Active Requests (table, only shown when requests in flight)

  • Fetch from system.activeRequests
  • Table columns: Session, Channel, Duration (live-updating), Started
  • If no active requests, show "No active requests" muted text
  • Use existing table CSS

Section 5: Channels (keep existing)

  • Keep the existing channels grid showing connected/disconnected channel adapters

Refresh strategy:

  • Replace the current 10-second interval with a 3-second interval for the core data (system.metrics, system.events, system.activeRequests)
  • Fetch system.health and system.channels every 10 seconds (less dynamic data)
  • Use Promise.all to batch the frequent calls together
  • Keep the existing teardown() pattern with clearInterval

Implementation approach:

  • Keep the same module pattern: loadDashboard(el, client) function + DashboardPage export with render/teardown
  • Use two timers: _fastTimer (3s) for metrics/events/requests, _slowTimer (10s) for health/channels
  • On first render, fetch everything with Promise.all
  • On subsequent fast ticks, only update the dynamic sections (don't re-render the whole page — use targeted DOM updates via getElementById for each section)
  • Generate unique section IDs: #ops-counters, #ops-model-table, #ops-events, #ops-requests, #ops-channels

CSS additions in src/gateway/ui/style.css: Add at the end of the file (before the responsive section):

/* ── Event Stream ──────────────────────────────────────── */
.event-stream {
  max-height: 300px;
  overflow-y: auto;
  background-color: var(--bg-secondary);
  border: 1px solid var(--border);
  border-radius: var(--radius);
  padding: 8px;
  font-size: var(--font-size-sm);
  font-family: var(--font-mono);
}

.event-row {
  padding: 4px 8px;
  border-bottom: 1px solid var(--border-light);
  white-space: pre-wrap;
  word-break: break-word;
}

.event-row:last-child {
  border-bottom: none;
}

.event-level-error { color: var(--error); }
.event-level-warn { color: var(--warning); }
.event-level-info { color: var(--text-secondary); }

/* ── Model Metrics Summary ─────────────────────────────── */
.metrics-summary {
  display: flex;
  gap: 24px;
  margin-bottom: 12px;
  font-size: var(--font-size-sm);
  color: var(--text-secondary);
}

.metrics-summary .metric {
  display: flex;
  gap: 6px;
}

.metrics-summary .metric-value {
  font-weight: 600;
  color: var(--text-primary);
}

Keep the formatUptime helper — it already exists and works perfectly.

Avoid: Do NOT add animations or transitions. Do NOT import external libraries. Do NOT use template literals with innerHTML for the fast-update path — use targeted textContent/innerHTML updates on specific elements to avoid flicker. pnpm typecheck — no type errors (vanilla JS won't affect this, but ensures no TS regressions). pnpm build — builds successfully (UI files are served as static assets, not compiled). Manual check: Open src/gateway/ui/pages/dashboard.js and verify it:

  • Calls client.call('system.metrics')
  • Calls client.call('system.events')
  • Calls client.call('system.activeRequests')
  • Has 3-second and 10-second refresh timers
  • Has teardown() that cleans up both timers Dashboard page shows five sections: core counters, model performance table, event stream, active requests, and channels. Counters and events refresh every 3 seconds. Health and channels refresh every 10 seconds. Event stream auto-scrolls and is color-coded by level. Active requests section shows in-flight requests or "no active requests" message. All existing stat-card and table CSS reused; new event-stream CSS added.
Task 2: Verify live dashboard in browser src/gateway/ui/pages/dashboard.js Human verification of the live dashboard. What was built: - Live ops dashboard with real-time metrics, event stream, model performance table, active request tracking, and HTTP /health endpoint - Extended the existing vanilla JS dashboard (no framework replacement)

Steps to verify:

  1. Start Flynn: pnpm dev
  2. Open the dashboard in a browser (default: http://localhost:3100 or configured port)
  3. Verify the dashboard shows:
    • Core counters row: Messages Processed, Active Sessions, Queue Depth, Uptime, Active Requests, Errors
    • Model Performance section: table of recent model calls (may be empty if no messages sent yet)
    • Event Stream section: scrollable log (may show startup events)
    • Active Requests section: "No active requests" or table
    • Channels section: connected channel adapters
  4. Send a message through the chat page (or via a connected channel) and verify:
    • Messages Processed counter increments within 3 seconds
    • Model Performance table shows the new call with latency and tokens/sec
    • Event stream shows relevant entries
  5. Trigger an error (e.g., send a message that causes a tool error) and verify it appears in the event stream in red
  6. Test HTTP /health: curl http://localhost:3100/health — should return JSON with status, uptime, version
  7. Run pnpm test:run — all tests pass

Resume signal: Type "approved" or describe issues. Human confirms dashboard displays correctly and updates in real-time. Dashboard visually confirmed working with live-updating metrics, event stream, and model performance data.

1. Dashboard loads without errors in browser console 2. All five sections render with real data 3. Counters update within 3 seconds of events occurring 4. Event stream is scrollable and color-coded 5. `curl /health` returns valid JSON 6. `pnpm test:run` — all tests pass 7. `pnpm typecheck` — zero type errors

<success_criteria>

  • Dashboard shows live-updating counters that change as messages flow (DASH-01)
  • Model call metrics visible with latency and tokens/sec (DASH-02)
  • Event stream shows errors with timestamps and context (DASH-03)
  • Active requests tracked and displayed (DASH-04)
  • GET /health returns JSON status (DASH-05)
  • Existing dashboard pages (chat, sessions, usage, settings) unaffected
  • Zero test regressions </success_criteria>
After completion, create `.planning/phases/03-live-ops-dashboard/03-02-SUMMARY.md`