261 lines
11 KiB
Markdown
261 lines
11 KiB
Markdown
---
|
|
phase: 03-live-ops-dashboard
|
|
plan: 02
|
|
type: execute
|
|
wave: 2
|
|
depends_on: ["03-01"]
|
|
files_modified:
|
|
- src/gateway/ui/pages/dashboard.js
|
|
- src/gateway/ui/style.css
|
|
- src/gateway/ui/index.html
|
|
- src/gateway/ui/lib/ws-client.js
|
|
autonomous: false
|
|
|
|
must_haves:
|
|
truths:
|
|
- "Dashboard shows live-updating counters: messages processed, active sessions, queue depth, daemon uptime — values change in real time"
|
|
- "Dashboard shows model call metrics: per-call latency, tokens/sec throughput, error rates by provider"
|
|
- "Dashboard shows live event stream: scrollable log of errors and events with timestamps, auto-scrolls on new entries"
|
|
- "Dashboard shows active request tracking: in-flight requests with duration and session info"
|
|
- "Dashboard auto-refreshes every 3 seconds for counters and events, maintaining live feel"
|
|
artifacts:
|
|
- path: "src/gateway/ui/pages/dashboard.js"
|
|
provides: "Enhanced dashboard page with metrics, events, and active request sections"
|
|
min_lines: 200
|
|
- path: "src/gateway/ui/style.css"
|
|
provides: "New CSS classes for event stream, metrics cards, active requests table"
|
|
contains: "event-stream"
|
|
- path: "src/gateway/ui/index.html"
|
|
provides: "Unchanged structure (dashboard page already registered)"
|
|
- path: "src/gateway/ui/lib/ws-client.js"
|
|
provides: "No changes needed (call() method already supports the new RPC methods)"
|
|
key_links:
|
|
- from: "src/gateway/ui/pages/dashboard.js"
|
|
to: "system.metrics"
|
|
via: "client.call('system.metrics')"
|
|
pattern: "client\\.call.*system\\.metrics"
|
|
- from: "src/gateway/ui/pages/dashboard.js"
|
|
to: "system.events"
|
|
via: "client.call('system.events')"
|
|
pattern: "client\\.call.*system\\.events"
|
|
- from: "src/gateway/ui/pages/dashboard.js"
|
|
to: "system.activeRequests"
|
|
via: "client.call('system.activeRequests')"
|
|
pattern: "client\\.call.*system\\.activeRequests"
|
|
---
|
|
|
|
<objective>
|
|
Extend the existing vanilla JS dashboard with live ops sections: core counters, model call metrics, event stream, and active request tracking.
|
|
|
|
Purpose: This is the user-facing deliverable — the operator opens the dashboard and sees real-time system health without tailing logs. All data comes from the RPC handlers created in Plan 01.
|
|
|
|
Output: Enhanced dashboard.js with four new sections, supporting CSS, human-verified live dashboard.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/home/will/.config/opencode/get-shit-done/workflows/execute-plan.md
|
|
@/home/will/.config/opencode/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/03-live-ops-dashboard/03-01-SUMMARY.md
|
|
@src/gateway/ui/pages/dashboard.js
|
|
@src/gateway/ui/style.css
|
|
@src/gateway/ui/index.html
|
|
@src/gateway/ui/app.js
|
|
@src/gateway/ui/lib/ws-client.js
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: Extend dashboard page with live ops sections</name>
|
|
<files>
|
|
src/gateway/ui/pages/dashboard.js
|
|
src/gateway/ui/style.css
|
|
</files>
|
|
<action>
|
|
**IMPORTANT: Extend the existing vanilla JS dashboard — do NOT replace with React or any framework. This is a locked user decision.**
|
|
|
|
Rewrite `src/gateway/ui/pages/dashboard.js` to show four sections (replacing the current simple health/channels/usage layout):
|
|
|
|
**Section 1: Core Counters (top row of stat cards)**
|
|
- Messages Processed (from `system.metrics` → messagesProcessed)
|
|
- Active Sessions (from `system.health` → sessions)
|
|
- Queue Depth (from `system.metrics` → queueDepth)
|
|
- Daemon Uptime (from `system.metrics` → uptime, formatted as "Xd Xh Xm Xs")
|
|
- Active Requests (from `system.metrics` → activeRequests)
|
|
- Errors (from `system.metrics` → errors, colored red if > 0)
|
|
|
|
Use the existing `.stats-grid` and `.stat-card` CSS classes.
|
|
|
|
**Section 2: Model Performance (table of recent model calls)**
|
|
- Show the most recent 20 model calls from `system.metrics` → modelCalls.recentCalls
|
|
- Table columns: Time (relative, e.g. "3s ago"), Provider, Latency (ms), Tokens/sec, In/Out tokens, Status (✓ or ✗)
|
|
- Summary row above the table: Total calls, Avg latency, Error rate %
|
|
- Use existing table CSS classes
|
|
|
|
**Section 3: Event Stream (scrollable log)**
|
|
- Fetch from `system.events` with `{ limit: 50 }`
|
|
- Each event rendered as a row: `[HH:MM:SS] [LEVEL] source: message`
|
|
- Color-code: error=red, warn=yellow, info=default
|
|
- Container has max-height with overflow-y: auto and auto-scrolls to bottom on new entries
|
|
- New class `.event-stream` for the container, `.event-row` for each entry, `.event-level-error`, `.event-level-warn`, `.event-level-info` for coloring
|
|
|
|
**Section 4: Active Requests (table, only shown when requests in flight)**
|
|
- Fetch from `system.activeRequests`
|
|
- Table columns: Session, Channel, Duration (live-updating), Started
|
|
- If no active requests, show "No active requests" muted text
|
|
- Use existing table CSS
|
|
|
|
**Section 5: Channels (keep existing)**
|
|
- Keep the existing channels grid showing connected/disconnected channel adapters
|
|
|
|
**Refresh strategy:**
|
|
- Replace the current 10-second interval with a 3-second interval for the core data (system.metrics, system.events, system.activeRequests)
|
|
- Fetch system.health and system.channels every 10 seconds (less dynamic data)
|
|
- Use `Promise.all` to batch the frequent calls together
|
|
- Keep the existing `teardown()` pattern with `clearInterval`
|
|
|
|
**Implementation approach:**
|
|
- Keep the same module pattern: `loadDashboard(el, client)` function + `DashboardPage` export with `render`/`teardown`
|
|
- Use two timers: `_fastTimer` (3s) for metrics/events/requests, `_slowTimer` (10s) for health/channels
|
|
- On first render, fetch everything with `Promise.all`
|
|
- On subsequent fast ticks, only update the dynamic sections (don't re-render the whole page — use targeted DOM updates via `getElementById` for each section)
|
|
- Generate unique section IDs: `#ops-counters`, `#ops-model-table`, `#ops-events`, `#ops-requests`, `#ops-channels`
|
|
|
|
**CSS additions in `src/gateway/ui/style.css`:**
|
|
Add at the end of the file (before the responsive section):
|
|
|
|
```css
|
|
/* ── Event Stream ──────────────────────────────────────── */
|
|
.event-stream {
|
|
max-height: 300px;
|
|
overflow-y: auto;
|
|
background-color: var(--bg-secondary);
|
|
border: 1px solid var(--border);
|
|
border-radius: var(--radius);
|
|
padding: 8px;
|
|
font-size: var(--font-size-sm);
|
|
font-family: var(--font-mono);
|
|
}
|
|
|
|
.event-row {
|
|
padding: 4px 8px;
|
|
border-bottom: 1px solid var(--border-light);
|
|
white-space: pre-wrap;
|
|
word-break: break-word;
|
|
}
|
|
|
|
.event-row:last-child {
|
|
border-bottom: none;
|
|
}
|
|
|
|
.event-level-error { color: var(--error); }
|
|
.event-level-warn { color: var(--warning); }
|
|
.event-level-info { color: var(--text-secondary); }
|
|
|
|
/* ── Model Metrics Summary ─────────────────────────────── */
|
|
.metrics-summary {
|
|
display: flex;
|
|
gap: 24px;
|
|
margin-bottom: 12px;
|
|
font-size: var(--font-size-sm);
|
|
color: var(--text-secondary);
|
|
}
|
|
|
|
.metrics-summary .metric {
|
|
display: flex;
|
|
gap: 6px;
|
|
}
|
|
|
|
.metrics-summary .metric-value {
|
|
font-weight: 600;
|
|
color: var(--text-primary);
|
|
}
|
|
```
|
|
|
|
**Keep the formatUptime helper** — it already exists and works perfectly.
|
|
|
|
**Avoid:** Do NOT add animations or transitions. Do NOT import external libraries. Do NOT use template literals with innerHTML for the fast-update path — use targeted textContent/innerHTML updates on specific elements to avoid flicker.
|
|
</action>
|
|
<verify>
|
|
`pnpm typecheck` — no type errors (vanilla JS won't affect this, but ensures no TS regressions).
|
|
`pnpm build` — builds successfully (UI files are served as static assets, not compiled).
|
|
Manual check: Open `src/gateway/ui/pages/dashboard.js` and verify it:
|
|
- Calls `client.call('system.metrics')`
|
|
- Calls `client.call('system.events')`
|
|
- Calls `client.call('system.activeRequests')`
|
|
- Has 3-second and 10-second refresh timers
|
|
- Has `teardown()` that cleans up both timers
|
|
</verify>
|
|
<done>
|
|
Dashboard page shows five sections: core counters, model performance table, event stream, active requests, and channels.
|
|
Counters and events refresh every 3 seconds.
|
|
Health and channels refresh every 10 seconds.
|
|
Event stream auto-scrolls and is color-coded by level.
|
|
Active requests section shows in-flight requests or "no active requests" message.
|
|
All existing stat-card and table CSS reused; new event-stream CSS added.
|
|
</done>
|
|
</task>
|
|
|
|
<task type="checkpoint:human-verify" gate="blocking">
|
|
<name>Task 2: Verify live dashboard in browser</name>
|
|
<files>src/gateway/ui/pages/dashboard.js</files>
|
|
<action>
|
|
Human verification of the live dashboard. What was built:
|
|
- Live ops dashboard with real-time metrics, event stream, model performance table, active request tracking, and HTTP /health endpoint
|
|
- Extended the existing vanilla JS dashboard (no framework replacement)
|
|
|
|
Steps to verify:
|
|
1. Start Flynn: `pnpm dev`
|
|
2. Open the dashboard in a browser (default: http://localhost:3100 or configured port)
|
|
3. Verify the dashboard shows:
|
|
- Core counters row: Messages Processed, Active Sessions, Queue Depth, Uptime, Active Requests, Errors
|
|
- Model Performance section: table of recent model calls (may be empty if no messages sent yet)
|
|
- Event Stream section: scrollable log (may show startup events)
|
|
- Active Requests section: "No active requests" or table
|
|
- Channels section: connected channel adapters
|
|
4. Send a message through the chat page (or via a connected channel) and verify:
|
|
- Messages Processed counter increments within 3 seconds
|
|
- Model Performance table shows the new call with latency and tokens/sec
|
|
- Event stream shows relevant entries
|
|
5. Trigger an error (e.g., send a message that causes a tool error) and verify it appears in the event stream in red
|
|
6. Test HTTP /health: `curl http://localhost:3100/health` — should return JSON with status, uptime, version
|
|
7. Run `pnpm test:run` — all tests pass
|
|
|
|
Resume signal: Type "approved" or describe issues.
|
|
</action>
|
|
<verify>Human confirms dashboard displays correctly and updates in real-time.</verify>
|
|
<done>Dashboard visually confirmed working with live-updating metrics, event stream, and model performance data.</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
1. Dashboard loads without errors in browser console
|
|
2. All five sections render with real data
|
|
3. Counters update within 3 seconds of events occurring
|
|
4. Event stream is scrollable and color-coded
|
|
5. `curl /health` returns valid JSON
|
|
6. `pnpm test:run` — all tests pass
|
|
7. `pnpm typecheck` — zero type errors
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- Dashboard shows live-updating counters that change as messages flow (DASH-01)
|
|
- Model call metrics visible with latency and tokens/sec (DASH-02)
|
|
- Event stream shows errors with timestamps and context (DASH-03)
|
|
- Active requests tracked and displayed (DASH-04)
|
|
- GET /health returns JSON status (DASH-05)
|
|
- Existing dashboard pages (chat, sessions, usage, settings) unaffected
|
|
- Zero test regressions
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/03-live-ops-dashboard/03-02-SUMMARY.md`
|
|
</output>
|