Commit Graph

48 Commits

Author SHA1 Message Date
William Valentin 3aee387a2b fix: repair swarm watchdog n8n HTTP checks 2026-05-14 14:42:02 -07:00
William Valentin 13ab43de6c feat(n8n): add obsidian automation workflows 2026-05-14 14:39:10 -07:00
William Valentin c774030341 feat(n8n): migrate rag health watchdog 2026-05-14 11:50:59 -07:00
William Valentin 13087de8c4 fix(openclaw): define user vars in customize playbook 2026-05-14 10:51:50 -07:00
William Valentin 867f879d4a fix(n8n): harden swarm automation workflows 2026-05-13 17:33:17 -07:00
William Valentin ff28a7c1ad feat(swarm): add URL content extractor + voice memo processor + webhook catalog
- url-content-extractor.py on :18812: YouTube/PDF/web content extraction
- voice-memo-processor.py on :18813: Telegram/Discord/URL voice ingress + Kokoro TTS
- Webhook Action Bus catalog in Obsidian vault
- Updated n8n Implementation Handoff: items #8-10 done
2026-05-13 16:13:00 -07:00
William Valentin 6c13a60f57 feat(swarm): add Obsidian vault reindex endpoint + update handoff
- obsidian-reindex-server.py: HTTP endpoint on port 18810 for
  triggering incremental Obsidian vault reindex from n8n
- Updated n8n Implementation Handoff: Obsidian Semantic Index
  section, new reindex workflow, updated verification commands
2026-05-13 15:18:50 -07:00
William Valentin aa77e11b3a docs: finalize n8n handoff - morning brief, evening digest, api key pitfall, verification section 2026-05-13 14:48:04 -07:00
William Valentin 8544267842 docs: update n8n handoff - all 6 implementation items completed 2026-05-13 14:45:01 -07:00
William Valentin 62a1f57c1f feat(n8n): add Morning Brief scheduled workflow (g3IdGZCK1EtTsv9T)
Daily 06:30 PT scheduled workflow that:
- Collects weather (wttr.in/Seattle), swarm health, n8n/litellm health
- Fetches email highlights from IMAP Triage executions
- Attempts Google Calendar (graceful skip if OAuth expired)
- Synthesizes via local LLM (gemma-4-26B on llama.cpp)
- Delivers to Telegram + Obsidian

All data collection nodes have continueOnFail for resilience.
Workflow ID: g3IdGZCK1EtTsv9T, active: true
2026-05-13 14:42:19 -07:00
William Valentin 9b11016340 feat: add Evening Digest n8n workflow (replaces Nightly Obsidian Vault Sync)
- New workflow PlZywwqL8MRNEAN6: Evening Digest
  - Daily 9PM America/Los_Angeles schedule trigger
  - Collects: n8n success/error executions, swarm health, new Obsidian notes
  - LLM synthesis via local gemma-4-26B for digest generation
  - Delivers to Telegram, Discord, and Obsidian vault
  - All collection nodes have continueOnFail for resilience

- Deactivated old Nightly Obsidian Vault Sync (75JCevkdgkyCr2qH)
- Exported both workflows + updated Failure Digest export
2026-05-13 14:41:46 -07:00
William Valentin b3eefc4d14 feat: add systemd service file and updated n8n watchdog workflow
- docker-health-endpoint.service (systemd user unit)
- swarm-health-watchdog.json with Docker health enrichment
  - Calls http://172.19.0.1:18809/health for container state
  - Includes docker status/health/restarts in alert messages
  - Adds docker field to service check results

Task: t_461f71fe
2026-05-13 14:33:48 -07:00
William Valentin 2dc3c66bb4 Add n8n Failure Digest workflow with Discord delivery
Workflow G9ylNbHbnJ6fWX2C now sends failure digests to both:
- Telegram (existing)
- Discord #ops-alerts channel (new)

Discord delivery uses Bot Auth (UgPqYcoCNNIgr55m) via HTTP Request
node POSTing to Discord API v10 channel messages endpoint.

Task: t_627466f8
2026-05-13 14:31:41 -07:00
William Valentin 733b32b6cd fix(n8n): update IMAP Inbox Triage workflow container URLs from stale 192.168.153.130 to Docker bridge gateway 172.19.0.1
- Judge with Local LLM: http://172.19.0.1:18806/v1/chat/completions
- Write Email to Vault: http://172.19.0.1:27123/vault/...
- Workflow 9sFwRyUDz51csAp7 deactivated, updated, and reactivated
2026-05-13 14:30:28 -07:00
William Valentin 9fdd29f7b7 feat: add Docker health-state HTTP endpoint for Swarm Health Watchdog
- Python HTTP server on 0.0.0.0:18809
- GET /health -> all monitored containers (JSON)
- GET /health/<name> -> single container
- Monitors: brave-search, kokoro-tts, litellm, litellm-db, n8n-agent, searxng, whisper-server
- Returns status, health, restart count via docker inspect
- systemd user service for auto-start

Task: t_461f71fe
2026-05-13 14:29:25 -07:00
William Valentin aea9042cce docs: update n8n gmail imap handoff 2026-05-13 14:16:53 -07:00
William Valentin f4a61acf7f docs: add n8n implementation handoff 2026-05-13 08:58:25 -07:00
William Valentin 84043aebb8 docs: document n8n recovery and watchdog workflows 2026-05-12 20:29:07 -07:00
William Valentin f2c71cb2dd chore(swarm): wire local AI health checks 2026-05-11 09:52:14 -07:00
William Valentin 50f2640846 feat(whisper): add CUDA Blackwell server, promote to primary on :18801
Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120
so it actually initializes on the RTX 5070 Ti — the upstream
ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90.

Compose changes:
- New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin
  into the shared volume on first run, fixing the original crash where
  whisper-server tried to load a model that was never fetched.
- New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built
  locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the
  benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with
  identical WER on JFK + 4 TTS samples).
- Existing whisper-server (CPU/medium) moves to port 18811 as the fallback
  for when GPU is unavailable. Container names unchanged so monitoring and
  volume bindings keep working.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 01:12:58 -07:00
William Valentin f0c84a8f05 docs(obsidian): sync vault notes 2026-03-30 17:08:55 -07:00
William Valentin 1606283197 chore(openclaw): sync runtime state 2026-03-30 17:08:49 -07:00
William Valentin f9ef8b55ac docs(zap): update routing policy and OpenClaw notes 2026-03-26 11:02:06 -07:00
William Valentin 88fafab27e feat(k8s): add cluster read-only access resources 2026-03-26 11:02:00 -07:00
William Valentin 7ed5383d10 chore(openclaw): refresh runtime models and credentials 2026-03-26 11:01:47 -07:00
William Valentin 8cb4c7c019 chore: add local env and credential artifacts 2026-03-20 10:33:36 -07:00
William Valentin 1082f8bad7 chore: refresh tokens and sync runtime state snapshots 2026-03-20 10:33:18 -07:00
William Valentin 8c6b54b827 feat: add anthropic model profiles for main and automation 2026-03-20 10:33:14 -07:00
William Valentin 4b1afb1073 feat: add swarm-common obsidian vault
Add Obsidian vault to the swarm-common virtiofs share for access
from zap VM and other VMs. Contains agent memory, notes, and
infrastructure documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:36:02 -07:00
William Valentin d96efca2c4 chore: sync OpenClaw runtime state
Sync latest runtime state from zap VM: credential rotations,
device registrations, completion scripts, cron jobs, and
telemetry offsets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:58 -07:00
William Valentin 905d2eb58c feat: add council agent configurations
Add four specialized council agent personas for structured
multi-perspective deliberation:

- council-pragmatist: practical, implementation-focused perspective
- council-referee: neutral arbiter for resolving disagreements
- council-skeptic: critical analysis and risk identification
- council-visionary: long-term strategic and creative thinking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:52 -07:00
William Valentin 0215c037da feat: add agentmon monitoring hook for OpenClaw telemetry
Add hook handler that forwards OpenClaw agent events to the agentmon
ingest endpoint for monitoring and observability.

- ansible/playbooks/files/agentmon-hook/: Ansible-deployable hook
- openclaw/hooks/agentmon/: Hook installed in OpenClaw instance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:47 -07:00
William Valentin c235b04fc3 feat: update ansible playbooks for openclaw VM configuration
- Add agentmon_ingest_url var to openclaw_servers inventory
- Reduce vm.swappiness from 10 to 5 for better memory management
- Refactor virtiofs mounts: remove bindfs layer, mount swarm-common
  directly at /mnt/swarm-common (simpler, no FUSE overhead)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:43 -07:00
William Valentin 227bff9e43 feat: add gpt-5.3-codex-spark and qwen2.5-14b-local to LiteLLM init
- Add gpt-5.3-codex-spark OpenAI Codex model
- Add qwen2.5-14b-local: Qwen2.5-14B-Instruct running locally via
  llama.cpp at 192.168.153.113:18806, with model_info (chat mode,
  8192 max tokens, 32768 input, supports function calling)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:37 -07:00
William Valentin ed3273d1ed feat: configure OpenClaw main agent for native Anthropic API access
Route Claude models directly through the Anthropic API using a
setup-token (Pro subscription) instead of the LiteLLM proxy.

- Add anthropic:manual profile (setup-token auth) to auth-profiles.json
- Remove Claude models from litellm provider in models.json (they now
  use the built-in anthropic catalog instead)
- Set default model to anthropic/claude-sonnet-4-6 in openclaw.json
- Add anthropic/* fallback chain: opus-4-6, sonnet-4-6, opus-4-5,
  sonnet-4-5, haiku-4-5
- Remove litellm/claude-* entries from fallback list
- Update openai-codex and github-copilot credentials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 15:35:31 -07:00
William Valentin da64b55caf fix: pin container image versions to avoid unexpected upgrades
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 22:50:18 -07:00
William Valentin bd8a039c82 feat: add agentmon monitor labels to swarm services 2026-03-18 10:07:35 -07:00
William Valentin ea5e2c2ef3 Add orb and sun VMs with virtiofs swarm-common share
- Add orb (192.168.122.183) and sun (192.168.122.184) to inventory
- Create host_vars for orb and sun (fresh install, brew_packages: [])
- Add brew_packages to zap host_vars (gogcli, himalaya, kubernetes-cli, opencode)
- customize.yml: parameterize brew_packages via host_vars, add /mnt/swarm-common
  virtiofs+bindfs mount for all VMs, install bindfs, fix Homebrew install
- provision-vm.yml: remove become requirement; use virsh vol commands for all
  disk/image operations (no sudo needed)
- roles/vm/tasks/main.yml: rewrite disk provisioning to use virsh vol-create-as
  and vol-upload; fix vol name quoting for names with spaces; use qcow2 backing
- domain.xml.j2: always include swarm-common virtiofs share; make main share
  conditional on vm_virtiofs_source/tag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-13 11:06:08 -07:00
William Valentin c8aaa40cd8 Add Homebrew installation and packages to customize playbook
Installs Homebrew as the openclaw user (idempotent via creates guard),
adds it to PATH in .bashrc, then installs the four leaf packages present
on zap: gogcli, himalaya, kubernetes-cli, opencode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:50:48 -07:00
William Valentin ebce788702 Add missing apt packages to system-tools role
Adds packages installed on zap that were absent from the playbook:
btop, byobu, fd-find, ffmpeg, gh, mtr-tiny, screen, whois, yt-dlp

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:50:42 -07:00
William Valentin b58a6ae06d Track litellm credentials; ignore runtime log files
Add *.log to .gitignore so litellm-maintenance.log and any future log files
are not tracked. Stage litellm-copilot-tokens/api-key.json — repo is local
access only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:36:30 -07:00
William Valentin 727069e16d Document LiteLLM setup, model registration, and maintenance
Add LiteLLM section to README covering: service startup, credential and
model registration (including FORCE=1 for re-runs), adding new models via
API, maintenance scripts, systemd timer, and a troubleshooting guide for
the 429/cooldown and duplicate-entry failure modes encountered in practice.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:33:22 -07:00
William Valentin c94bbe5de8 Add LiteLLM maintenance scripts and systemd health-check timer
litellm-dedup.sh: removes duplicate model DB entries (idempotent, supports
--dry-run). Root cause of duplicates was litellm-init running multiple times
before the DB was populated, causing all entries to be inserted concurrently.

litellm-health-check.sh: runs every 6 hours via systemd user timer; checks
liveness (auto-restarts container if unresponsive) and duplicate entries
(auto-dedups when DEDUP=1). Logs to litellm-maintenance.log.

Systemd units: litellm-health-check.{service,timer} installed under
~/.config/systemd/user/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:33:16 -07:00
William Valentin ef344579fa Add gpt-5.4, glm-4.7-flash, and glm-5 models; fix init idempotency
- Register gpt-5.4 (OpenAI codex auth), glm-4.7-flash, and glm-5 (ZAI)
- Add early-exit guard to litellm-init-models.sh: skips registration if
  gpt-4o already exists in the DB, preventing duplicate entries on re-runs;
  set FORCE=1 to bypass and add any missing models

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 13:33:06 -07:00
William Valentin 07927f6101 Move auto-reboot to 04:00 to avoid e2scrub conflict
e2scrub_all.timer runs Sundays at 03:10. The previous 03:30 reboot
window gave only 20 minutes. 04:00 gives a safe 50-minute buffer
after both the 03:00 config backup and the filesystem scrub.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:51:47 -07:00
William Valentin d0514fa345 Add automatic security updates to customize playbook
Enable unattended-upgrades with auto-reboot at 03:30 (after the
03:00 config backup). Includes kernel package cleanup and removal
of unused dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:40:04 -07:00
William Valentin 5900a51f3d Include all credentials and runtime config
Remove secret exclusions from .gitignore (local-only repo).
Add openclaw runtime state: credentials, identity, devices,
hooks, telegram, secrets, agent configs.
Exclude noisy/binary data: sessions, sqlite, media, temp files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:20:33 -07:00
William Valentin aceeb7b542 Initial commit — OpenClaw VM infrastructure
- ansible/: VM provisioning playbooks and roles
  - provision-vm.yml: create KVM VM from Ubuntu cloud image
  - install.yml: install OpenClaw on guest (upstream)
  - customize.yml: swappiness, virtiofs fstab, linger
  - roles/vm/: libvirt domain XML, cloud-init templates
  - inventory.yml + host_vars/zap.yml: zap instance config
- backup-openclaw-vm.sh: daily rsync + MinIO upload
- restore-openclaw-vm.sh: full redeploy from scratch
- README.md: full operational documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:18:31 -07:00