From 727069e16d1626eac42e5cc70209678586c84351 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Thu, 12 Mar 2026 13:33:22 -0700 Subject: [PATCH] Document LiteLLM setup, model registration, and maintenance Add LiteLLM section to README covering: service startup, credential and model registration (including FORCE=1 for re-runs), adding new models via API, maintenance scripts, systemd timer, and a troubleshooting guide for the 429/cooldown and duplicate-entry failure modes encountered in practice. Co-Authored-By: Claude Sonnet 4.6 --- README.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/README.md b/README.md index ab2b8a5..48b9fa6 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,12 @@ swarm/ │ ├── openclaw/ # Upstream role (from openclaw-ansible) │ └── vm/ # VM provisioning role (local) ├── openclaw/ # Live mirror of guest ~/.openclaw/ +├── docker-compose.yaml # LiteLLM + supporting services +├── litellm-config.yaml # LiteLLM static config +├── litellm-init-credentials.sh # Register API keys into LiteLLM DB +├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent) +├── litellm-dedup.sh # Remove duplicate model DB entries +├── litellm-health-check.sh # Liveness check + auto-dedup (run by systemd timer) ├── backup-openclaw-vm.sh # Sync openclaw/ + upload to MinIO ├── restore-openclaw-vm.sh # Full VM redeploy from scratch └── README.md # This file @@ -147,6 +153,81 @@ To list available archives: aws s3 ls s3://zap/backups/ ``` +## LiteLLM + +LiteLLM runs as a Docker service (`litellm`, port 18804) backed by a Postgres database (`litellm-db`). It acts as a unified OpenAI-compatible proxy over Anthropic, OpenAI, Gemini, ZAI/GLM, and GitHub Copilot. + +### Starting + +```bash +cd ~/lab/swarm +docker compose --profile api up -d +``` + +### Credentials and model registration + +On first start, `litellm-init` registers API credentials and all models into the DB. It is idempotent — re-running it when models already exist is a no-op (guarded by a `gpt-4o` sentinel check). To force a re-run (e.g. to add newly-added models to the script): + +```bash +docker compose --profile api run --rm \ + -e FORCE=1 litellm-init +``` + +### Adding a new model + +1. Add an `add_model` (or `add_copilot_model`) call to `litellm-init-models.sh` +2. Register it live via the API (no restart needed): + +```bash +source .env +curl -X POST http://localhost:18804/model/new \ + -H "Authorization: Bearer $LITELLM_MASTER_KEY" \ + -H "Content-Type: application/json" \ + -d '{"model_name":"","litellm_params":{"model":"/","api_key":"os.environ/"}}' +``` + +### Maintenance scripts + +| Script | Purpose | +|--------|---------| +| `litellm-dedup.sh` | Remove duplicate model DB entries (run `--dry-run` to preview) | +| `litellm-health-check.sh` | Liveness check + auto-dedup; run by systemd timer | + +```bash +# Manual dedup +./litellm-dedup.sh + +# Manual health check +./litellm-health-check.sh + +# Check maintenance log +tail -f litellm-maintenance.log +``` + +### Systemd timer + +`litellm-health-check.timer` runs every 6 hours (user session, enabled at install). It checks liveness (restarting the container if unresponsive) and removes any duplicate model entries. + +```bash +systemctl --user status litellm-health-check.timer +systemctl --user list-timers litellm-health-check.timer +journalctl --user -u litellm-health-check.service -n 20 +``` + +### Troubleshooting + +**Model returns 429 "No deployments available"** +All deployments for that model group are in cooldown (usually from a transient upstream error). Restart litellm to clear: +```bash +docker restart litellm +``` + +**Model returns upstream subscription error** +The API key in use does not have access to that model. Check the provider's plan. The model will stay in cooldown until restarted; consider removing it from the DB if access is not expected. + +**Duplicate model entries** +Caused by running `litellm-init` multiple times. Run `./litellm-dedup.sh` to clean up. The health-check timer also auto-deduplicates when `DEDUP=1` (default). + ## Adding a New Instance 1. Add an entry to `ansible/inventory.yml`