- obsidian-reindex-server.py: HTTP endpoint on port 18810 for
triggering incremental Obsidian vault reindex from n8n
- Updated n8n Implementation Handoff: Obsidian Semantic Index
section, new reindex workflow, updated verification commands
Daily 06:30 PT scheduled workflow that:
- Collects weather (wttr.in/Seattle), swarm health, n8n/litellm health
- Fetches email highlights from IMAP Triage executions
- Attempts Google Calendar (graceful skip if OAuth expired)
- Synthesizes via local LLM (gemma-4-26B on llama.cpp)
- Delivers to Telegram + Obsidian
All data collection nodes have continueOnFail for resilience.
Workflow ID: g3IdGZCK1EtTsv9T, active: true
- docker-health-endpoint.service (systemd user unit)
- swarm-health-watchdog.json with Docker health enrichment
- Calls http://172.19.0.1:18809/health for container state
- Includes docker status/health/restarts in alert messages
- Adds docker field to service check results
Task: t_461f71fe
- Python HTTP server on 0.0.0.0:18809
- GET /health -> all monitored containers (JSON)
- GET /health/<name> -> single container
- Monitors: brave-search, kokoro-tts, litellm, litellm-db, n8n-agent, searxng, whisper-server
- Returns status, health, restart count via docker inspect
- systemd user service for auto-start
Task: t_461f71fe
Adds a custom whisper.cpp Docker image built with CMAKE_CUDA_ARCHITECTURES=120
so it actually initializes on the RTX 5070 Ti — the upstream
ghcr.io/ggml-org/whisper.cpp:main-cuda only ships kernels for sm_75/80/86/90.
Compose changes:
- New whisper-init one-shot service downloads ggml-medium.bin and ggml-small.bin
into the shared volume on first run, fixing the original crash where
whisper-server tried to load a model that was never fetched.
- New whisper-server-gpu service (image whisper.cpp:cuda-blackwell, built
locally from ./whisper-cuda-blackwell/Dockerfile) on port 18801 — the
benchmarked path (~150 ms per short clip, ~93x faster than CPU/medium with
identical WER on JFK + 4 TTS samples).
- Existing whisper-server (CPU/medium) moves to port 18811 as the fallback
for when GPU is unavailable. Container names unchanged so monitoring and
volume bindings keep working.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add Obsidian vault to the swarm-common virtiofs share for access
from zap VM and other VMs. Contains agent memory, notes, and
infrastructure documentation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sync latest runtime state from zap VM: credential rotations,
device registrations, completion scripts, cron jobs, and
telemetry offsets.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add four specialized council agent personas for structured
multi-perspective deliberation:
- council-pragmatist: practical, implementation-focused perspective
- council-referee: neutral arbiter for resolving disagreements
- council-skeptic: critical analysis and risk identification
- council-visionary: long-term strategic and creative thinking
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add hook handler that forwards OpenClaw agent events to the agentmon
ingest endpoint for monitoring and observability.
- ansible/playbooks/files/agentmon-hook/: Ansible-deployable hook
- openclaw/hooks/agentmon/: Hook installed in OpenClaw instance
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add agentmon_ingest_url var to openclaw_servers inventory
- Reduce vm.swappiness from 10 to 5 for better memory management
- Refactor virtiofs mounts: remove bindfs layer, mount swarm-common
directly at /mnt/swarm-common (simpler, no FUSE overhead)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gpt-5.3-codex-spark OpenAI Codex model
- Add qwen2.5-14b-local: Qwen2.5-14B-Instruct running locally via
llama.cpp at 192.168.153.113:18806, with model_info (chat mode,
8192 max tokens, 32768 input, supports function calling)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Route Claude models directly through the Anthropic API using a
setup-token (Pro subscription) instead of the LiteLLM proxy.
- Add anthropic:manual profile (setup-token auth) to auth-profiles.json
- Remove Claude models from litellm provider in models.json (they now
use the built-in anthropic catalog instead)
- Set default model to anthropic/claude-sonnet-4-6 in openclaw.json
- Add anthropic/* fallback chain: opus-4-6, sonnet-4-6, opus-4-5,
sonnet-4-5, haiku-4-5
- Remove litellm/claude-* entries from fallback list
- Update openai-codex and github-copilot credentials
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add orb (192.168.122.183) and sun (192.168.122.184) to inventory
- Create host_vars for orb and sun (fresh install, brew_packages: [])
- Add brew_packages to zap host_vars (gogcli, himalaya, kubernetes-cli, opencode)
- customize.yml: parameterize brew_packages via host_vars, add /mnt/swarm-common
virtiofs+bindfs mount for all VMs, install bindfs, fix Homebrew install
- provision-vm.yml: remove become requirement; use virsh vol commands for all
disk/image operations (no sudo needed)
- roles/vm/tasks/main.yml: rewrite disk provisioning to use virsh vol-create-as
and vol-upload; fix vol name quoting for names with spaces; use qcow2 backing
- domain.xml.j2: always include swarm-common virtiofs share; make main share
conditional on vm_virtiofs_source/tag
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Installs Homebrew as the openclaw user (idempotent via creates guard),
adds it to PATH in .bashrc, then installs the four leaf packages present
on zap: gogcli, himalaya, kubernetes-cli, opencode.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds packages installed on zap that were absent from the playbook:
btop, byobu, fd-find, ffmpeg, gh, mtr-tiny, screen, whois, yt-dlp
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add *.log to .gitignore so litellm-maintenance.log and any future log files
are not tracked. Stage litellm-copilot-tokens/api-key.json — repo is local
access only.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add LiteLLM section to README covering: service startup, credential and
model registration (including FORCE=1 for re-runs), adding new models via
API, maintenance scripts, systemd timer, and a troubleshooting guide for
the 429/cooldown and duplicate-entry failure modes encountered in practice.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
litellm-dedup.sh: removes duplicate model DB entries (idempotent, supports
--dry-run). Root cause of duplicates was litellm-init running multiple times
before the DB was populated, causing all entries to be inserted concurrently.
litellm-health-check.sh: runs every 6 hours via systemd user timer; checks
liveness (auto-restarts container if unresponsive) and duplicate entries
(auto-dedups when DEDUP=1). Logs to litellm-maintenance.log.
Systemd units: litellm-health-check.{service,timer} installed under
~/.config/systemd/user/.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Register gpt-5.4 (OpenAI codex auth), glm-4.7-flash, and glm-5 (ZAI)
- Add early-exit guard to litellm-init-models.sh: skips registration if
gpt-4o already exists in the DB, preventing duplicate entries on re-runs;
set FORCE=1 to bypass and add any missing models
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
e2scrub_all.timer runs Sundays at 03:10. The previous 03:30 reboot
window gave only 20 minutes. 04:00 gives a safe 50-minute buffer
after both the 03:00 config backup and the filesystem scrub.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>