--- phase: 03-live-ops-dashboard plan: 02 type: execute wave: 2 depends_on: ["03-01"] files_modified: - src/gateway/ui/pages/dashboard.js - src/gateway/ui/style.css - src/gateway/ui/index.html - src/gateway/ui/lib/ws-client.js autonomous: false must_haves: truths: - "Dashboard shows live-updating counters: messages processed, active sessions, queue depth, daemon uptime — values change in real time" - "Dashboard shows model call metrics: per-call latency, tokens/sec throughput, error rates by provider" - "Dashboard shows live event stream: scrollable log of errors and events with timestamps, auto-scrolls on new entries" - "Dashboard shows active request tracking: in-flight requests with duration and session info" - "Dashboard auto-refreshes every 3 seconds for counters and events, maintaining live feel" artifacts: - path: "src/gateway/ui/pages/dashboard.js" provides: "Enhanced dashboard page with metrics, events, and active request sections" min_lines: 200 - path: "src/gateway/ui/style.css" provides: "New CSS classes for event stream, metrics cards, active requests table" contains: "event-stream" - path: "src/gateway/ui/index.html" provides: "Unchanged structure (dashboard page already registered)" - path: "src/gateway/ui/lib/ws-client.js" provides: "No changes needed (call() method already supports the new RPC methods)" key_links: - from: "src/gateway/ui/pages/dashboard.js" to: "system.metrics" via: "client.call('system.metrics')" pattern: "client\\.call.*system\\.metrics" - from: "src/gateway/ui/pages/dashboard.js" to: "system.events" via: "client.call('system.events')" pattern: "client\\.call.*system\\.events" - from: "src/gateway/ui/pages/dashboard.js" to: "system.activeRequests" via: "client.call('system.activeRequests')" pattern: "client\\.call.*system\\.activeRequests" --- Extend the existing vanilla JS dashboard with live ops sections: core counters, model call metrics, event stream, and active request tracking. Purpose: This is the user-facing deliverable — the operator opens the dashboard and sees real-time system health without tailing logs. All data comes from the RPC handlers created in Plan 01. Output: Enhanced dashboard.js with four new sections, supporting CSS, human-verified live dashboard. @/home/will/.config/opencode/get-shit-done/workflows/execute-plan.md @/home/will/.config/opencode/get-shit-done/templates/summary.md @.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/03-live-ops-dashboard/03-01-SUMMARY.md @src/gateway/ui/pages/dashboard.js @src/gateway/ui/style.css @src/gateway/ui/index.html @src/gateway/ui/app.js @src/gateway/ui/lib/ws-client.js Task 1: Extend dashboard page with live ops sections src/gateway/ui/pages/dashboard.js src/gateway/ui/style.css **IMPORTANT: Extend the existing vanilla JS dashboard — do NOT replace with React or any framework. This is a locked user decision.** Rewrite `src/gateway/ui/pages/dashboard.js` to show four sections (replacing the current simple health/channels/usage layout): **Section 1: Core Counters (top row of stat cards)** - Messages Processed (from `system.metrics` → messagesProcessed) - Active Sessions (from `system.health` → sessions) - Queue Depth (from `system.metrics` → queueDepth) - Daemon Uptime (from `system.metrics` → uptime, formatted as "Xd Xh Xm Xs") - Active Requests (from `system.metrics` → activeRequests) - Errors (from `system.metrics` → errors, colored red if > 0) Use the existing `.stats-grid` and `.stat-card` CSS classes. **Section 2: Model Performance (table of recent model calls)** - Show the most recent 20 model calls from `system.metrics` → modelCalls.recentCalls - Table columns: Time (relative, e.g. "3s ago"), Provider, Latency (ms), Tokens/sec, In/Out tokens, Status (✓ or ✗) - Summary row above the table: Total calls, Avg latency, Error rate % - Use existing table CSS classes **Section 3: Event Stream (scrollable log)** - Fetch from `system.events` with `{ limit: 50 }` - Each event rendered as a row: `[HH:MM:SS] [LEVEL] source: message` - Color-code: error=red, warn=yellow, info=default - Container has max-height with overflow-y: auto and auto-scrolls to bottom on new entries - New class `.event-stream` for the container, `.event-row` for each entry, `.event-level-error`, `.event-level-warn`, `.event-level-info` for coloring **Section 4: Active Requests (table, only shown when requests in flight)** - Fetch from `system.activeRequests` - Table columns: Session, Channel, Duration (live-updating), Started - If no active requests, show "No active requests" muted text - Use existing table CSS **Section 5: Channels (keep existing)** - Keep the existing channels grid showing connected/disconnected channel adapters **Refresh strategy:** - Replace the current 10-second interval with a 3-second interval for the core data (system.metrics, system.events, system.activeRequests) - Fetch system.health and system.channels every 10 seconds (less dynamic data) - Use `Promise.all` to batch the frequent calls together - Keep the existing `teardown()` pattern with `clearInterval` **Implementation approach:** - Keep the same module pattern: `loadDashboard(el, client)` function + `DashboardPage` export with `render`/`teardown` - Use two timers: `_fastTimer` (3s) for metrics/events/requests, `_slowTimer` (10s) for health/channels - On first render, fetch everything with `Promise.all` - On subsequent fast ticks, only update the dynamic sections (don't re-render the whole page — use targeted DOM updates via `getElementById` for each section) - Generate unique section IDs: `#ops-counters`, `#ops-model-table`, `#ops-events`, `#ops-requests`, `#ops-channels` **CSS additions in `src/gateway/ui/style.css`:** Add at the end of the file (before the responsive section): ```css /* ── Event Stream ──────────────────────────────────────── */ .event-stream { max-height: 300px; overflow-y: auto; background-color: var(--bg-secondary); border: 1px solid var(--border); border-radius: var(--radius); padding: 8px; font-size: var(--font-size-sm); font-family: var(--font-mono); } .event-row { padding: 4px 8px; border-bottom: 1px solid var(--border-light); white-space: pre-wrap; word-break: break-word; } .event-row:last-child { border-bottom: none; } .event-level-error { color: var(--error); } .event-level-warn { color: var(--warning); } .event-level-info { color: var(--text-secondary); } /* ── Model Metrics Summary ─────────────────────────────── */ .metrics-summary { display: flex; gap: 24px; margin-bottom: 12px; font-size: var(--font-size-sm); color: var(--text-secondary); } .metrics-summary .metric { display: flex; gap: 6px; } .metrics-summary .metric-value { font-weight: 600; color: var(--text-primary); } ``` **Keep the formatUptime helper** — it already exists and works perfectly. **Avoid:** Do NOT add animations or transitions. Do NOT import external libraries. Do NOT use template literals with innerHTML for the fast-update path — use targeted textContent/innerHTML updates on specific elements to avoid flicker. `pnpm typecheck` — no type errors (vanilla JS won't affect this, but ensures no TS regressions). `pnpm build` — builds successfully (UI files are served as static assets, not compiled). Manual check: Open `src/gateway/ui/pages/dashboard.js` and verify it: - Calls `client.call('system.metrics')` - Calls `client.call('system.events')` - Calls `client.call('system.activeRequests')` - Has 3-second and 10-second refresh timers - Has `teardown()` that cleans up both timers Dashboard page shows five sections: core counters, model performance table, event stream, active requests, and channels. Counters and events refresh every 3 seconds. Health and channels refresh every 10 seconds. Event stream auto-scrolls and is color-coded by level. Active requests section shows in-flight requests or "no active requests" message. All existing stat-card and table CSS reused; new event-stream CSS added. Task 2: Verify live dashboard in browser src/gateway/ui/pages/dashboard.js Human verification of the live dashboard. What was built: - Live ops dashboard with real-time metrics, event stream, model performance table, active request tracking, and HTTP /health endpoint - Extended the existing vanilla JS dashboard (no framework replacement) Steps to verify: 1. Start Flynn: `pnpm dev` 2. Open the dashboard in a browser (default: http://localhost:3100 or configured port) 3. Verify the dashboard shows: - Core counters row: Messages Processed, Active Sessions, Queue Depth, Uptime, Active Requests, Errors - Model Performance section: table of recent model calls (may be empty if no messages sent yet) - Event Stream section: scrollable log (may show startup events) - Active Requests section: "No active requests" or table - Channels section: connected channel adapters 4. Send a message through the chat page (or via a connected channel) and verify: - Messages Processed counter increments within 3 seconds - Model Performance table shows the new call with latency and tokens/sec - Event stream shows relevant entries 5. Trigger an error (e.g., send a message that causes a tool error) and verify it appears in the event stream in red 6. Test HTTP /health: `curl http://localhost:3100/health` — should return JSON with status, uptime, version 7. Run `pnpm test:run` — all tests pass Resume signal: Type "approved" or describe issues. Human confirms dashboard displays correctly and updates in real-time. Dashboard visually confirmed working with live-updating metrics, event stream, and model performance data. 1. Dashboard loads without errors in browser console 2. All five sections render with real data 3. Counters update within 3 seconds of events occurring 4. Event stream is scrollable and color-coded 5. `curl /health` returns valid JSON 6. `pnpm test:run` — all tests pass 7. `pnpm typecheck` — zero type errors - Dashboard shows live-updating counters that change as messages flow (DASH-01) - Model call metrics visible with latency and tokens/sec (DASH-02) - Event stream shows errors with timestamps and context (DASH-03) - Active requests tracked and displayed (DASH-04) - GET /health returns JSON status (DASH-05) - Existing dashboard pages (chat, sessions, usage, settings) unaffected - Zero test regressions After completion, create `.planning/phases/03-live-ops-dashboard/03-02-SUMMARY.md`