swarm-master

Author	SHA1	Message	Date
William Valentin	905e675d77	docs: add diagram maintenance conventions	2026-05-16 12:48:52 -07:00
William Valentin	6a79e0e336	docs: document swarm infrastructure topology	2026-05-16 12:45:02 -07:00
William Valentin	0ddd09f007	feat: add agentmon n8n health watchdog workflow	2026-05-16 12:38:20 -07:00
William Valentin	9954cd60bf	docs: add Obsidian diary system	2026-05-16 12:34:54 -07:00
William Valentin	ca3f932df5	fix(litellm): disable stale copilot registrations	2026-05-15 10:04:48 -07:00
William Valentin	3aee387a2b	fix: repair swarm watchdog n8n HTTP checks	2026-05-14 14:42:02 -07:00
William Valentin	13ab43de6c	feat(n8n): add obsidian automation workflows	2026-05-14 14:39:10 -07:00
William Valentin	c774030341	feat(n8n): migrate rag health watchdog	2026-05-14 11:50:59 -07:00
William Valentin	13087de8c4	fix(openclaw): define user vars in customize playbook	2026-05-14 10:51:50 -07:00
William Valentin	867f879d4a	fix(n8n): harden swarm automation workflows	2026-05-13 17:33:17 -07:00
William Valentin	ff28a7c1ad	feat(swarm): add URL content extractor + voice memo processor + webhook catalog - url-content-extractor.py on :18812: YouTube/PDF/web content extraction - voice-memo-processor.py on :18813: Telegram/Discord/URL voice ingress + Kokoro TTS - Webhook Action Bus catalog in Obsidian vault - Updated n8n Implementation Handoff: items #8-10 done	2026-05-13 16:13:00 -07:00
William Valentin	6c13a60f57	feat(swarm): add Obsidian vault reindex endpoint + update handoff - obsidian-reindex-server.py: HTTP endpoint on port 18810 for triggering incremental Obsidian vault reindex from n8n - Updated n8n Implementation Handoff: Obsidian Semantic Index section, new reindex workflow, updated verification commands	2026-05-13 15:18:50 -07:00
William Valentin	aa77e11b3a	docs: finalize n8n handoff - morning brief, evening digest, api key pitfall, verification section	2026-05-13 14:48:04 -07:00
William Valentin	8544267842	docs: update n8n handoff - all 6 implementation items completed	2026-05-13 14:45:01 -07:00
William Valentin	62a1f57c1f	feat(n8n): add Morning Brief scheduled workflow (g3IdGZCK1EtTsv9T) Daily 06:30 PT scheduled workflow that: - Collects weather (wttr.in/Seattle), swarm health, n8n/litellm health - Fetches email highlights from IMAP Triage executions - Attempts Google Calendar (graceful skip if OAuth expired) - Synthesizes via local LLM (gemma-4-26B on llama.cpp) - Delivers to Telegram + Obsidian All data collection nodes have continueOnFail for resilience. Workflow ID: g3IdGZCK1EtTsv9T, active: true	2026-05-13 14:42:19 -07:00
William Valentin	9b11016340	feat: add Evening Digest n8n workflow (replaces Nightly Obsidian Vault Sync) - New workflow PlZywwqL8MRNEAN6: Evening Digest - Daily 9PM America/Los_Angeles schedule trigger - Collects: n8n success/error executions, swarm health, new Obsidian notes - LLM synthesis via local gemma-4-26B for digest generation - Delivers to Telegram, Discord, and Obsidian vault - All collection nodes have continueOnFail for resilience - Deactivated old Nightly Obsidian Vault Sync (75JCevkdgkyCr2qH) - Exported both workflows + updated Failure Digest export	2026-05-13 14:41:46 -07:00
William Valentin	b3eefc4d14	feat: add systemd service file and updated n8n watchdog workflow - docker-health-endpoint.service (systemd user unit) - swarm-health-watchdog.json with Docker health enrichment - Calls http://172.19.0.1:18809/health for container state - Includes docker status/health/restarts in alert messages - Adds docker field to service check results Task: t_461f71fe	2026-05-13 14:33:48 -07:00
William Valentin	2dc3c66bb4	Add n8n Failure Digest workflow with Discord delivery Workflow G9ylNbHbnJ6fWX2C now sends failure digests to both: - Telegram (existing) - Discord #ops-alerts channel (new) Discord delivery uses Bot Auth (UgPqYcoCNNIgr55m) via HTTP Request node POSTing to Discord API v10 channel messages endpoint. Task: t_627466f8	2026-05-13 14:31:41 -07:00
William Valentin	733b32b6cd	fix(n8n): update IMAP Inbox Triage workflow container URLs from stale 192.168.153.130 to Docker bridge gateway 172.19.0.1 - Judge with Local LLM: http://172.19.0.1:18806/v1/chat/completions - Write Email to Vault: http://172.19.0.1:27123/vault/... - Workflow 9sFwRyUDz51csAp7 deactivated, updated, and reactivated	2026-05-13 14:30:28 -07:00
William Valentin	9fdd29f7b7	feat: add Docker health-state HTTP endpoint for Swarm Health Watchdog - Python HTTP server on 0.0.0.0:18809 - GET /health -> all monitored containers (JSON) - GET /health/<name> -> single container - Monitors: brave-search, kokoro-tts, litellm, litellm-db, n8n-agent, searxng, whisper-server - Returns status, health, restart count via docker inspect - systemd user service for auto-start Task: t_461f71fe	2026-05-13 14:29:25 -07:00
William Valentin	aea9042cce	docs: update n8n gmail imap handoff	2026-05-13 14:16:53 -07:00
William Valentin	f4a61acf7f	docs: add n8n implementation handoff	2026-05-13 08:58:25 -07:00
William Valentin	84043aebb8	docs: document n8n recovery and watchdog workflows	2026-05-12 20:29:07 -07:00
William Valentin	f2c71cb2dd	chore(swarm): wire local AI health checks	2026-05-11 09:52:14 -07:00
William Valentin	50f2640846	feat(whisper): add CUDA Blackwell server, promote to primary on :18801 Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120 so it actually initializes on the RTX 5070 Ti — the upstream ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90. Compose changes: - New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin into the shared volume on first run, fixing the original crash where whisper-server tried to load a model that was never fetched. - New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with identical WER on JFK + 4 TTS samples). - Existing whisper-server (CPU/medium) moves to port 18811 as the fallback for when GPU is unavailable. Container names unchanged so monitoring and volume bindings keep working. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 01:12:58 -07:00
William Valentin	f0c84a8f05	docs(obsidian): sync vault notes	2026-03-30 17:08:55 -07:00
William Valentin	1606283197	chore(openclaw): sync runtime state	2026-03-30 17:08:49 -07:00
William Valentin	f9ef8b55ac	docs(zap): update routing policy and OpenClaw notes	2026-03-26 11:02:06 -07:00
William Valentin	88fafab27e	feat(k8s): add cluster read-only access resources	2026-03-26 11:02:00 -07:00
William Valentin	7ed5383d10	chore(openclaw): refresh runtime models and credentials	2026-03-26 11:01:47 -07:00
William Valentin	8cb4c7c019	chore: add local env and credential artifacts	2026-03-20 10:33:36 -07:00
William Valentin	1082f8bad7	chore: refresh tokens and sync runtime state snapshots	2026-03-20 10:33:18 -07:00
William Valentin	8c6b54b827	feat: add anthropic model profiles for main and automation	2026-03-20 10:33:14 -07:00
William Valentin	4b1afb1073	feat: add swarm-common obsidian vault Add Obsidian vault to the swarm-common virtiofs share for access from zap VM and other VMs. Contains agent memory, notes, and infrastructure documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:36:02 -07:00
William Valentin	d96efca2c4	chore: sync OpenClaw runtime state Sync latest runtime state from zap VM: credential rotations, device registrations, completion scripts, cron jobs, and telemetry offsets. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:58 -07:00
William Valentin	905d2eb58c	feat: add council agent configurations Add four specialized council agent personas for structured multi-perspective deliberation: - council-pragmatist: practical, implementation-focused perspective - council-referee: neutral arbiter for resolving disagreements - council-skeptic: critical analysis and risk identification - council-visionary: long-term strategic and creative thinking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:52 -07:00
William Valentin	0215c037da	feat: add agentmon monitoring hook for OpenClaw telemetry Add hook handler that forwards OpenClaw agent events to the agentmon ingest endpoint for monitoring and observability. - ansible/playbooks/files/agentmon-hook/: Ansible-deployable hook - openclaw/hooks/agentmon/: Hook installed in OpenClaw instance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:47 -07:00
William Valentin	c235b04fc3	feat: update ansible playbooks for openclaw VM configuration - Add agentmon_ingest_url var to openclaw_servers inventory - Reduce vm.swappiness from 10 to 5 for better memory management - Refactor virtiofs mounts: remove bindfs layer, mount swarm-common directly at /mnt/swarm-common (simpler, no FUSE overhead) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:43 -07:00
William Valentin	227bff9e43	feat: add gpt-5.3-codex-spark and qwen2.5-14b-local to LiteLLM init - Add gpt-5.3-codex-spark OpenAI Codex model - Add qwen2.5-14b-local: Qwen2.5-14B-Instruct running locally via llama.cpp at 192.168.153.113:18806, with model_info (chat mode, 8192 max tokens, 32768 input, supports function calling) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:37 -07:00
William Valentin	ed3273d1ed	feat: configure OpenClaw main agent for native Anthropic API access Route Claude models directly through the Anthropic API using a setup-token (Pro subscription) instead of the LiteLLM proxy. - Add anthropic:manual profile (setup-token auth) to auth-profiles.json - Remove Claude models from litellm provider in models.json (they now use the built-in anthropic catalog instead) - Set default model to anthropic/claude-sonnet-4-6 in openclaw.json - Add anthropic/* fallback chain: opus-4-6, sonnet-4-6, opus-4-5, sonnet-4-5, haiku-4-5 - Remove litellm/claude-* entries from fallback list - Update openai-codex and github-copilot credentials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 15:35:31 -07:00
William Valentin	da64b55caf	fix: pin container image versions to avoid unexpected upgrades Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 22:50:18 -07:00
William Valentin	bd8a039c82	feat: add agentmon monitor labels to swarm services	2026-03-18 10:07:35 -07:00
William Valentin	ea5e2c2ef3	Add orb and sun VMs with virtiofs swarm-common share - Add orb (192.168.122.183) and sun (192.168.122.184) to inventory - Create host_vars for orb and sun (fresh install, brew_packages: []) - Add brew_packages to zap host_vars (gogcli, himalaya, kubernetes-cli, opencode) - customize.yml: parameterize brew_packages via host_vars, add /mnt/swarm-common virtiofs+bindfs mount for all VMs, install bindfs, fix Homebrew install - provision-vm.yml: remove become requirement; use virsh vol commands for all disk/image operations (no sudo needed) - roles/vm/tasks/main.yml: rewrite disk provisioning to use virsh vol-create-as and vol-upload; fix vol name quoting for names with spaces; use qcow2 backing - domain.xml.j2: always include swarm-common virtiofs share; make main share conditional on vm_virtiofs_source/tag Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-13 11:06:08 -07:00
William Valentin	c8aaa40cd8	Add Homebrew installation and packages to customize playbook Installs Homebrew as the openclaw user (idempotent via creates guard), adds it to PATH in .bashrc, then installs the four leaf packages present on zap: gogcli, himalaya, kubernetes-cli, opencode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:50:48 -07:00
William Valentin	ebce788702	Add missing apt packages to system-tools role Adds packages installed on zap that were absent from the playbook: btop, byobu, fd-find, ffmpeg, gh, mtr-tiny, screen, whois, yt-dlp Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:50:42 -07:00
William Valentin	b58a6ae06d	Track litellm credentials; ignore runtime log files Add *.log to .gitignore so litellm-maintenance.log and any future log files are not tracked. Stage litellm-copilot-tokens/api-key.json — repo is local access only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:36:30 -07:00
William Valentin	727069e16d	Document LiteLLM setup, model registration, and maintenance Add LiteLLM section to README covering: service startup, credential and model registration (including FORCE=1 for re-runs), adding new models via API, maintenance scripts, systemd timer, and a troubleshooting guide for the 429/cooldown and duplicate-entry failure modes encountered in practice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:33:22 -07:00
William Valentin	c94bbe5de8	Add LiteLLM maintenance scripts and systemd health-check timer litellm-dedup.sh: removes duplicate model DB entries (idempotent, supports --dry-run). Root cause of duplicates was litellm-init running multiple times before the DB was populated, causing all entries to be inserted concurrently. litellm-health-check.sh: runs every 6 hours via systemd user timer; checks liveness (auto-restarts container if unresponsive) and duplicate entries (auto-dedups when DEDUP=1). Logs to litellm-maintenance.log. Systemd units: litellm-health-check.{service,timer} installed under ~/.config/systemd/user/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:33:16 -07:00
William Valentin	ef344579fa	Add gpt-5.4, glm-4.7-flash, and glm-5 models; fix init idempotency - Register gpt-5.4 (OpenAI codex auth), glm-4.7-flash, and glm-5 (ZAI) - Add early-exit guard to litellm-init-models.sh: skips registration if gpt-4o already exists in the DB, preventing duplicate entries on re-runs; set FORCE=1 to bypass and add any missing models Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 13:33:06 -07:00
William Valentin	07927f6101	Move auto-reboot to 04:00 to avoid e2scrub conflict e2scrub_all.timer runs Sundays at 03:10. The previous 03:30 reboot window gave only 20 minutes. 04:00 gives a safe 50-minute buffer after both the 03:00 config backup and the filesystem scrub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 12:51:47 -07:00

1 2

53 Commits