75 Commits

Author SHA1 Message Date
William Valentin ebc944702f chore: drop retired orb and sun VMs
Only the zap VM remains in the fleet. Remove orb/sun from the README
architecture/config docs, the getVMClassName allowlist, and their
.timeline-vm-tag color styles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:38:04 -07:00
William Valentin 69eb87ebc9 feat(web-ui): improve Agents page legibility and scannability
Targeted UI/UX polish on the Agents page, keeping the existing dark
aesthetic and both Overview/Live view modes:

- Add a readable --text-mute token (dark + light) and apply it to the
  summary chips, lane meta, and idle/offline status, which previously
  used the near-invisible --text-dim.
- Event feed: demote the generic "Span Started/Completed" label to a
  quiet mono category tag and promote the tool name, with a left-edge
  accent by event kind (run/span/error/session). Scoped to
  #agents-content so other pages' feeds are unaffected.
- Active-op pills: add a per-kind left accent bar (tool/subagent/run).
- Lane sparkline: raise opacity and add a gradient so it actually reads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 10:35:33 -07:00
William Valentin 478c7529a7 feat(hooks): emit per-run token usage and duration on run.end
The stats layer reads usage/duration only from run.end, but neither
framework populated them, so tokens/cost/avg-duration were always 0.

- hermes: accumulate token usage across each run's api-result calls in
  session state and attach the summed usage plus a computed duration_ms
  (from a stored runStartedAt) onto run.end. metric.snapshot emission is
  unchanged, so there is no double counting.
- claude-code: store runStartedAt and use it as a duration_ms fallback at
  all run.end sites. Usage is unavailable from CC hook inputs.

Live verification: a real hermes run now reports duration_ms and
total_tokens on run.end; dashboard tokens_today/avg_duration_ms, both
previously 0, now populate. cost_today stays 0 (no provider emits cost
through the hooks).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:16:23 -07:00
William Valentin 5014d89258 feat(metrics): surface tool-span latency in stats and dashboard
Tool spans already carry duration_ms and status, but the metrics layer
only counted them. Expose that data:

- GetTopTools now returns avg/p95 duration and error count per tool.
- Timeseries buckets gain tool_avg_ms / tool_p95_ms (filtered
  percentile_cont over tool spans).
- Dashboard Top Tools shows avg latency per tool; the Latency panel,
  previously always empty (it read run-level duration that is never
  emitted), now plots real tool-span latency (min/avg/p95).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 11:16:23 -07:00
William Valentin c44e7fe72e refactor(web-ui): extract shared component primitives
Introduce components.js with barTrack, barRow, barRankList, metricPill,
metricStrip, and chartHeader helpers. Migrate dashboard.js and usage.js
to use these primitives, replacing 13 families of duplicated CSS
(stat-list, fw-bar, token-bar, metric-pill, chart-insight, chart-header,
usage-chart-total, etc.) with a unified .am-* namespace. Net: -256 lines.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 12:21:48 -07:00
William Valentin 8753c0c9d5 feat(web-ui): better stats and ergonomics
Usage page: add 7-day trend chart (activity/tokens/cost tabs),
framework breakdown panel with per-framework run/tool/error counts
and proportional bars, and 7d aggregate pills above the chart.

Dashboard: add avg cost/run metric pill to the metrics strip.

Run detail: extract and display prompt preview from the first agent
span's payload above the spans table.

Bug fixes: stat-list bars now render correctly (flex-direction:column),
right-panel-tab active background uses correct accent color, missing
framework colors added for hermes/codex/gemini/copilot. Dead code
renderSessionRow removed from sessions.js. Hardcoded font-family
replaced with CSS variable in metric-pill-value and token-stat-value.
Usage page cleanup() wired into router teardown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 16:49:05 -07:00
William Valentin 1b01f0b0cd chore(compose): pin postgres patch image 2026-05-20 22:19:58 -07:00
William Valentin 27d40ce28f feat(hooks): add Hermes telemetry handler 2026-05-20 17:35:56 -07:00
William Valentin 78376bdd83 feat(query): include session totals and stable framework names 2026-05-20 17:35:56 -07:00
William Valentin db73eca6fd chore(infra): pin nats image digest 2026-05-20 17:35:56 -07:00
William Valentin f8bec2d6d5 fix: ignore non-persistent claude startups 2026-04-30 17:07:19 -07:00
William Valentin 476c0e347f fix: count only live dashboard sessions 2026-04-30 17:07:17 -07:00
William Valentin fd17628e94 fix: ignore invalid claude hook starts 2026-04-29 09:41:07 -07:00
William Valentin 6799cc3681 docs: add run detail improvements design
Covers three improvements to /runs/🆔 prompt/error header callouts,
client-side span filter/search, and an interactive waterfall with hover
tooltips and inline detail drawers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 13:35:20 -07:00
William Valentin 184aa5e6cb fix(web-ui): security hardening, SPA nav, and modularization
Ship the in-progress ES-module refactor of the web-ui (new static/modules/
layout, Usage/Settings pages, uplot-based dashboard) alongside a round of
security and UX fixes:

- main.go: add CSP + X-Frame-Options: DENY + X-Content-Type-Options:
  nosniff + Referrer-Policy middleware on every response; WS CheckOrigin
  now requires Origin host to match Host (blocks cross-site WebSocket
  hijacking); upgrade client before dialing upstream so origin check
  runs first; fatal on unparseable AGENTMON_QUERY_BASE.
- app.js: delegated click handler intercepts same-origin <a> clicks for
  SPA navigation (prev. every nav link caused a full page reload,
  dropping WS + in-memory state); delegated .copy-btn[data-copy]
  handler replaces inline onclick=; removed window.navigate /
  window.copyToClipboard globals and the duplicated handleGlobalSearch.
- modules/nav-signal.js: per-route AbortController so in-flight fetches
  are cancelled when the user navigates away, preventing stale toasts
  and wasted renders.
- modules/api.js: honours the nav signal by default; AbortError is
  silent.
- modules/router.js: resets the nav controller on every route; dropped
  the fixed 80ms transition delay; breadcrumbs no longer emit inline
  onclick= (delegated handler picks them up).
- modules/utils.js: renderCopyButton emits data-copy=\"...\" instead of
  nesting a JS string inside an HTML attribute — fixes an XSS where
  values containing ' broke out via &#39; decoding.

Verified: go build clean; `node --check` clean on all modified modules;
manual curl probes confirm security headers present on every response
and WS upgrade returns 403 for cross-origin/missing Origin while 101
for same-origin.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 15:36:12 -07:00
William Valentin 41b7165800 fix(store): backfill spans in run detail 2026-04-21 13:07:09 -07:00
William Valentin 43113f6241 feat(web-ui): improve navigation and session UX 2026-04-21 13:07:05 -07:00
William Valentin 8f766b4019 chore(gitignore): ignore build directory 2026-04-21 13:07:01 -07:00
William Valentin d5154b8eec fix(codex): recover session lifecycle from hooks 2026-04-21 13:02:58 -07:00
William Valentin 8b6ce8e628 Add restart policy to docker-compose services 2026-03-27 20:47:24 -07:00
William Valentin c53283ac07 feat: improve web UI UX with global search, breadcrumbs, and better feedback 2026-03-26 14:24:52 -07:00
William Valentin 8bca99573b feat(web-ui): redesign dashboard and live sessions 2026-03-26 11:22:49 -07:00
William Valentin 5ff4794d98 feat(openclaw-monitor): add MinIO telemetry 2026-03-26 11:22:45 -07:00
William Valentin 6605780b58 feat(ingest): batch event writes and harden transport 2026-03-26 11:22:42 -07:00
William Valentin 43877a5448 feat(query-api): add richer stats and retention 2026-03-26 11:22:34 -07:00
William Valentin fdfcb50e80 feat(hooks): consolidate shared transport helpers 2026-03-26 11:22:27 -07:00
William Valentin d49785cb25 fix: filter dashboard activity feed events 2026-03-20 14:05:59 -07:00
William Valentin 687a7aa79d Add live agent views and improve Codex monitoring 2026-03-20 13:59:51 -07:00
William Valentin a87bbc6983 fix(claude-hook): derive span durations from start timestamps 2026-03-20 11:17:40 -07:00
William Valentin d235e3c873 feat(hooks): add telemetry handlers for codex/copilot/gemini 2026-03-20 11:17:26 -07:00
William Valentin c88746693a docs(plans): add dashboard and realtime agent plans 2026-03-20 11:17:17 -07:00
William Valentin 2e277fb138 fix: preserve session state across turns in claude-code hook handler
handleNotification("Done") was incorrectly emitting session.end and
calling clearState at the end of each Claude turn. Since "Done" means
a turn finished (not the session), clearing state caused subsequent
tool calls to find no runId, storing spans without run_id and making
them invisible in run-level queries.

- handleNotification: remove session.end emission and clearState call;
  only emit run.end for the completed turn
- handleSessionEnd: load state file to get runId (in-memory activeRuns
  is always empty in a subprocess)
- handlePromptSubmit: load state file to get runId for ending previous
  run before starting a new one

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 23:42:22 -07:00
William Valentin f8ddea3698 feat: add agentmon services section to infrastructure page
Label all agentmon docker-compose services with agentmon.monitor=true
and agentmon.group=agentmon so the swarm-monitor picks them up.
Adds Group field to ServiceSnapshot, probes /healthz for api/web roles,
and renders a separate "Agentmon" section below Swarm Services on the
Infrastructure page with new api and worker card renderers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 13:41:26 -07:00
William Valentin d2d044a3d8 fix: use Docker socket HTTP API in swarm collector, no CLI dependency
Replace exec.CommandContext calls (docker ps, docker inspect, nc -z) with
direct HTTP calls over the Unix socket using Go's net/http + custom transport.
Also removes netcat-openbsd from Dockerfile since nc is no longer used.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 10:36:32 -07:00
William Valentin f48953781b fix: add swarm-monitor binary and netcat to Dockerfile 2026-03-18 10:31:28 -07:00
William Valentin edaa7bac45 feat: add swarm-monitor service to docker-compose 2026-03-18 10:29:40 -07:00
William Valentin 1b3c74b441 fix: add /infrastructure to SPA catch-all routes 2026-03-18 10:27:06 -07:00
William Valentin cd2f345454 feat: rename OpenClaw to Infrastructure page, add service cards
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 10:20:28 -07:00
William Valentin 93edd39a2b feat: add infrastructure page CSS 2026-03-18 10:16:50 -07:00
William Valentin 07c16653cd feat: add swarm strip to dashboard
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 10:14:48 -07:00
William Valentin 7c043b78a4 feat: add swarm-monitor binary 2026-03-18 10:12:18 -07:00
William Valentin 9c2f048b92 feat: add swarm collector with docker inspect + HTTP probes 2026-03-18 10:10:34 -07:00
William Valentin 083e522bb7 feat: add swarm monitor types 2026-03-18 10:08:54 -07:00
William Valentin 22bc16bf51 docs: swarm monitor implementation plan 2026-03-18 09:57:51 -07:00
William Valentin ecabc7fd19 docs: swarm monitor design — infra page, docker labels, role-driven cards 2026-03-18 09:53:39 -07:00
William Valentin e7be607db4 feat: extend agentmon hook with agent:bootstrap for embedded/cron runs
- Add agent:bootstrap handler to capture run.start events for cron and
  automation runs that bypass the message:received path
- Remove dead event subscriptions (tool_result_persist, session:compact:*)
  which are plugin hook events and never fire through triggerInternalHook
- Remove AGENTMON_INGEST_URL from requires.env since handler has a
  hardcoded fallback URL
- Drop activeCompactions map (no longer needed after removing compaction handlers)

Deployed to zap VM with hooks.internal.enabled=true in openclaw.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:32:32 -07:00
William Valentin 13356adfbd feat: openclaw card dividers, running pulse, issue label section
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 12:16:30 -07:00
William Valentin acd89e95a9 feat: stat card top accents, timeline time hierarchy 2026-03-14 12:14:15 -07:00
William Valentin 5dbfd68fb5 feat: meta tiles, back link button, css chevron, span-details bg fix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 12:11:55 -07:00
William Valentin eb12319f19 feat: framework color dots in sessions table, filter toolbar panel
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 12:05:40 -07:00