T

William Valentin 0215c037da feat: add agentmon monitoring hook for OpenClaw telemetry

Add hook handler that forwards OpenClaw agent events to the agentmon
ingest endpoint for monitoring and observability.

- ansible/playbooks/files/agentmon-hook/: Ansible-deployable hook
- openclaw/hooks/agentmon/: Hook installed in OpenClaw instance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-19 15:35:47 -07:00

ansible

feat: add agentmon monitoring hook for OpenClaw telemetry

2026-03-19 15:35:47 -07:00

litellm-copilot-tokens

Track litellm credentials; ignore runtime log files

2026-03-12 13:36:30 -07:00

openclaw

feat: add agentmon monitoring hook for OpenClaw telemetry

2026-03-19 15:35:47 -07:00

searxng

Initial commit — OpenClaw VM infrastructure

2026-03-12 12:18:31 -07:00

.env

Include all credentials and runtime config

2026-03-12 12:20:33 -07:00

.gitignore

Track litellm credentials; ignore runtime log files

2026-03-12 13:36:30 -07:00

backup-openclaw-vm.sh

Initial commit — OpenClaw VM infrastructure

2026-03-12 12:18:31 -07:00

docker-compose.yaml

fix: pin container image versions to avoid unexpected upgrades

2026-03-18 22:50:18 -07:00

litellm-config.yaml

Initial commit — OpenClaw VM infrastructure

2026-03-12 12:18:31 -07:00

litellm-dedup.sh

Add LiteLLM maintenance scripts and systemd health-check timer

2026-03-12 13:33:16 -07:00

litellm-health-check.sh

Add LiteLLM maintenance scripts and systemd health-check timer

2026-03-12 13:33:16 -07:00

litellm-init-credentials.sh

Initial commit — OpenClaw VM infrastructure

2026-03-12 12:18:31 -07:00

litellm-init-models.sh

feat: add gpt-5.3-codex-spark and qwen2.5-14b-local to LiteLLM init

2026-03-19 15:35:37 -07:00

README.md

Document LiteLLM setup, model registration, and maintenance

2026-03-12 13:33:22 -07:00

restore-openclaw-vm.sh

Initial commit — OpenClaw VM infrastructure

2026-03-12 12:18:31 -07:00

swarm-kubeconfig.yaml

Include all credentials and runtime config

2026-03-12 12:20:33 -07:00

README.md

swarm

This directory is the source of truth for the OpenClaw VM infrastructure. It is shared into the zap VM via virtiofs (mounted at /mnt/swarm inside the guest, active after reboot).

Directory Structure

swarm/
├── ansible/                    # VM provisioning and configuration
│   ├── inventory.yml           # Host definitions
│   ├── host_vars/
│   │   └── zap.yml             # All zap-specific variables
│   ├── playbooks/
│   │   ├── provision-vm.yml    # Create the VM on the hypervisor
│   │   ├── install.yml         # Install OpenClaw on the guest
│   │   └── customize.yml       # Post-provision tweaks
│   └── roles/
│       ├── openclaw/           # Upstream role (from openclaw-ansible)
│       └── vm/                 # VM provisioning role (local)
├── openclaw/                   # Live mirror of guest ~/.openclaw/
├── docker-compose.yaml         # LiteLLM + supporting services
├── litellm-config.yaml         # LiteLLM static config
├── litellm-init-credentials.sh # Register API keys into LiteLLM DB
├── litellm-init-models.sh      # Register models into LiteLLM DB (idempotent)
├── litellm-dedup.sh            # Remove duplicate model DB entries
├── litellm-health-check.sh     # Liveness check + auto-dedup (run by systemd timer)
├── backup-openclaw-vm.sh       # Sync openclaw/ + upload to MinIO
├── restore-openclaw-vm.sh      # Full VM redeploy from scratch
└── README.md                   # This file

VM: zap

Property	Value
Libvirt domain	`zap [claw]`
Guest hostname	`zap`
IP	`192.168.122.182` (static DHCP)
MAC	`52:54:00:01:00:71`
RAM	3 GiB
vCPUs	2
Disk	`/var/lib/libvirt/images/claw.qcow2` (60 GiB qcow2)
OS	Ubuntu 24.04
Firmware	EFI + Secure Boot + TPM 2.0
Autostart	enabled
virtiofs	`~/lab/swarm` → `/mnt/swarm` (active after reboot)
Swappiness	10

SSH access:

ssh root@192.168.122.182      # privileged operations
ssh openclaw@192.168.122.182  # application-level access

Provisioning a New VM

Use this when deploying zap from scratch on a fresh hypervisor, or creating a new instance.

Step 1 — Create the VM

cd ~/lab/swarm/ansible
ansible-playbook -i inventory.yml playbooks/provision-vm.yml --limit zap

This will:

Download the Ubuntu 24.04 cloud image (cached at /var/lib/libvirt/images/)
Create the disk image via copy-on-write (claw.qcow2, 60 GiB)
Build a cloud-init seed ISO with your SSH key and hostname
Define the VM XML (EFI, memfd shared memory, virtiofs, TPM, watchdog)
Add a static DHCP reservation for the MAC/IP pair
Enable autostart and start the VM
Wait for SSH to become available

Step 2 — Install OpenClaw

ansible-playbook -i inventory.yml playbooks/install.yml --limit zap

Installs Node.js, pnpm, Docker, UFW, fail2ban, Tailscale, and OpenClaw via the upstream openclaw-ansible role.

Step 3 — Apply customizations

ansible-playbook -i inventory.yml playbooks/customize.yml --limit zap

Applies settings not covered by the upstream role:

vm.swappiness=10 (live + persisted)
virtiofs fstab entry (swarm → /mnt/swarm)
loginctl enable-linger openclaw (for user systemd services)

Step 4 — Restore config

~/lab/swarm/restore-openclaw-vm.sh zap

Rsyncs openclaw/ back to ~/.openclaw/ on the guest and restarts the gateway service.

All-in-one redeploy

# Existing VM (just re-provision guest)
~/lab/swarm/restore-openclaw-vm.sh zap

# Fresh VM at a new IP
~/lab/swarm/restore-openclaw-vm.sh zap <new-ip>

When a target IP is passed, restore-openclaw-vm.sh runs all four steps above in sequence.

Backup

The openclaw/ directory is a live rsync mirror of the guest's ~/.openclaw/, automatically updated daily at 03:00 by a systemd user timer.

# Run manually
~/lab/swarm/backup-openclaw-vm.sh zap

# Check timer status
systemctl --user status openclaw-backup.timer
systemctl --user list-timers openclaw-backup.timer

What is backed up

Included	Excluded
`openclaw.json` (main config)	`workspace/` (2.6 GiB conversation history)
`secrets.json` (API keys)	`logs/`
`credentials/`, `identity/`	`extensions-quarantine/`
`memory/`, `agents/`	`.bak`, `.backup-`, `.pre-`, `*.failed`
`hooks/`, `cron/`, `telegram/`
`workspace-*/` (provider workspaces)

MinIO

Timestamped archives are uploaded to MinIO on every backup run:

Property	Value
Endpoint	`http://192.168.153.253:9000`
Bucket	`s3://zap/backups/`
Retention	7 most recent archives
Credentials	`~/.aws/credentials` (default profile)

To list available archives:

aws s3 ls s3://zap/backups/

LiteLLM

LiteLLM runs as a Docker service (litellm, port 18804) backed by a Postgres database (litellm-db). It acts as a unified OpenAI-compatible proxy over Anthropic, OpenAI, Gemini, ZAI/GLM, and GitHub Copilot.

Starting

cd ~/lab/swarm
docker compose --profile api up -d

Credentials and model registration

On first start, litellm-init registers API credentials and all models into the DB. It is idempotent — re-running it when models already exist is a no-op (guarded by a gpt-4o sentinel check). To force a re-run (e.g. to add newly-added models to the script):

docker compose --profile api run --rm \
  -e FORCE=1 litellm-init

Adding a new model

Add an add_model (or add_copilot_model) call to litellm-init-models.sh
Register it live via the API (no restart needed):

source .env
curl -X POST http://localhost:18804/model/new \
  -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model_name":"<name>","litellm_params":{"model":"<provider>/<model>","api_key":"os.environ/<KEY_VAR>"}}'

Maintenance scripts

Script	Purpose
`litellm-dedup.sh`	Remove duplicate model DB entries (run `--dry-run` to preview)
`litellm-health-check.sh`	Liveness check + auto-dedup; run by systemd timer

# Manual dedup
./litellm-dedup.sh

# Manual health check
./litellm-health-check.sh

# Check maintenance log
tail -f litellm-maintenance.log

Systemd timer

litellm-health-check.timer runs every 6 hours (user session, enabled at install). It checks liveness (restarting the container if unresponsive) and removes any duplicate model entries.

systemctl --user status litellm-health-check.timer
systemctl --user list-timers litellm-health-check.timer
journalctl --user -u litellm-health-check.service -n 20

Troubleshooting

Model returns 429 "No deployments available" All deployments for that model group are in cooldown (usually from a transient upstream error). Restart litellm to clear:

docker restart litellm

Model returns upstream subscription error The API key in use does not have access to that model. Check the provider's plan. The model will stay in cooldown until restarted; consider removing it from the DB if access is not expected.

Duplicate model entries Caused by running litellm-init multiple times. Run ./litellm-dedup.sh to clean up. The health-check timer also auto-deduplicates when DEDUP=1 (default).

Adding a New Instance

Add an entry to ansible/inventory.yml
Create ansible/host_vars/<name>.yml with VM and OpenClaw variables (copy host_vars/zap.yml as a template)
Run the four provisioning steps above
Add the instance to ~/.claude/state/openclaw-instances.json
Add a backup timer: copy ~/.config/systemd/user/openclaw-backup.{service,timer}, update the instance name, reload

Ansible Role Reference

`vm` role (`roles/vm/`)

Provisions the KVM/libvirt VM on the hypervisor host. Variables (set in host_vars):

Variable	Description	Example
`vm_domain`	Libvirt domain name	`"zap [claw]"`
`vm_hostname`	Guest hostname	`zap`
`vm_memory_mib`	RAM in MiB	`3072`
`vm_vcpus`	vCPU count	`2`
`vm_disk_path`	qcow2 path on host	`/var/lib/libvirt/images/claw.qcow2`
`vm_disk_size`	Disk size	`60G`
`vm_mac`	Network MAC address	`52:54:00:01:00:71`
`vm_ip`	Static DHCP IP	`192.168.122.182`
`vm_virtiofs_source`	Host path to share	`/home/will/lab/swarm`
`vm_virtiofs_tag`	Mount tag in guest	`swarm`

`openclaw` role (`roles/openclaw/`)

Upstream role from openclaw-ansible. Installs and configures OpenClaw on the guest. Key variables:

Variable	Value
`openclaw_install_mode`	`release`
`openclaw_ssh_keys`	will's public key

`customize.yml` playbook

Post-provision tweaks applied after the upstream role:

vm.swappiness = 10
/etc/fstab entry for virtiofs swarm share
loginctl enable-linger openclaw

Languages

Shell 67.2%

PowerShell 21%

JavaScript 5.2%

TypeScript 3.4%

Jinja 2.5%

Other 0.7%

README.md

swarm

Directory Structure

VM: zap

Provisioning a New VM

Step 1 — Create the VM

Step 2 — Install OpenClaw

Step 3 — Apply customizations

Step 4 — Restore config

All-in-one redeploy

Backup

What is backed up

MinIO

LiteLLM

Starting

Credentials and model registration

Adding a new model

Maintenance scripts

Systemd timer

Troubleshooting

Adding a New Instance

Ansible Role Reference

vm role (roles/vm/)

openclaw role (roles/openclaw/)

customize.yml playbook

`vm` role (`roles/vm/`)

`openclaw` role (`roles/openclaw/`)

`customize.yml` playbook