Document LiteLLM setup, model registration, and maintenance
Add LiteLLM section to README covering: service startup, credential and model registration (including FORCE=1 for re-runs), adding new models via API, maintenance scripts, systemd timer, and a troubleshooting guide for the 429/cooldown and duplicate-entry failure modes encountered in practice. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,12 @@ swarm/
|
|||||||
│ ├── openclaw/ # Upstream role (from openclaw-ansible)
|
│ ├── openclaw/ # Upstream role (from openclaw-ansible)
|
||||||
│ └── vm/ # VM provisioning role (local)
|
│ └── vm/ # VM provisioning role (local)
|
||||||
├── openclaw/ # Live mirror of guest ~/.openclaw/
|
├── openclaw/ # Live mirror of guest ~/.openclaw/
|
||||||
|
├── docker-compose.yaml # LiteLLM + supporting services
|
||||||
|
├── litellm-config.yaml # LiteLLM static config
|
||||||
|
├── litellm-init-credentials.sh # Register API keys into LiteLLM DB
|
||||||
|
├── litellm-init-models.sh # Register models into LiteLLM DB (idempotent)
|
||||||
|
├── litellm-dedup.sh # Remove duplicate model DB entries
|
||||||
|
├── litellm-health-check.sh # Liveness check + auto-dedup (run by systemd timer)
|
||||||
├── backup-openclaw-vm.sh # Sync openclaw/ + upload to MinIO
|
├── backup-openclaw-vm.sh # Sync openclaw/ + upload to MinIO
|
||||||
├── restore-openclaw-vm.sh # Full VM redeploy from scratch
|
├── restore-openclaw-vm.sh # Full VM redeploy from scratch
|
||||||
└── README.md # This file
|
└── README.md # This file
|
||||||
@@ -147,6 +153,81 @@ To list available archives:
|
|||||||
aws s3 ls s3://zap/backups/
|
aws s3 ls s3://zap/backups/
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## LiteLLM
|
||||||
|
|
||||||
|
LiteLLM runs as a Docker service (`litellm`, port 18804) backed by a Postgres database (`litellm-db`). It acts as a unified OpenAI-compatible proxy over Anthropic, OpenAI, Gemini, ZAI/GLM, and GitHub Copilot.
|
||||||
|
|
||||||
|
### Starting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/lab/swarm
|
||||||
|
docker compose --profile api up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
### Credentials and model registration
|
||||||
|
|
||||||
|
On first start, `litellm-init` registers API credentials and all models into the DB. It is idempotent — re-running it when models already exist is a no-op (guarded by a `gpt-4o` sentinel check). To force a re-run (e.g. to add newly-added models to the script):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose --profile api run --rm \
|
||||||
|
-e FORCE=1 litellm-init
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding a new model
|
||||||
|
|
||||||
|
1. Add an `add_model` (or `add_copilot_model`) call to `litellm-init-models.sh`
|
||||||
|
2. Register it live via the API (no restart needed):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source .env
|
||||||
|
curl -X POST http://localhost:18804/model/new \
|
||||||
|
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"model_name":"<name>","litellm_params":{"model":"<provider>/<model>","api_key":"os.environ/<KEY_VAR>"}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Maintenance scripts
|
||||||
|
|
||||||
|
| Script | Purpose |
|
||||||
|
|--------|---------|
|
||||||
|
| `litellm-dedup.sh` | Remove duplicate model DB entries (run `--dry-run` to preview) |
|
||||||
|
| `litellm-health-check.sh` | Liveness check + auto-dedup; run by systemd timer |
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Manual dedup
|
||||||
|
./litellm-dedup.sh
|
||||||
|
|
||||||
|
# Manual health check
|
||||||
|
./litellm-health-check.sh
|
||||||
|
|
||||||
|
# Check maintenance log
|
||||||
|
tail -f litellm-maintenance.log
|
||||||
|
```
|
||||||
|
|
||||||
|
### Systemd timer
|
||||||
|
|
||||||
|
`litellm-health-check.timer` runs every 6 hours (user session, enabled at install). It checks liveness (restarting the container if unresponsive) and removes any duplicate model entries.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl --user status litellm-health-check.timer
|
||||||
|
systemctl --user list-timers litellm-health-check.timer
|
||||||
|
journalctl --user -u litellm-health-check.service -n 20
|
||||||
|
```
|
||||||
|
|
||||||
|
### Troubleshooting
|
||||||
|
|
||||||
|
**Model returns 429 "No deployments available"**
|
||||||
|
All deployments for that model group are in cooldown (usually from a transient upstream error). Restart litellm to clear:
|
||||||
|
```bash
|
||||||
|
docker restart litellm
|
||||||
|
```
|
||||||
|
|
||||||
|
**Model returns upstream subscription error**
|
||||||
|
The API key in use does not have access to that model. Check the provider's plan. The model will stay in cooldown until restarted; consider removing it from the DB if access is not expected.
|
||||||
|
|
||||||
|
**Duplicate model entries**
|
||||||
|
Caused by running `litellm-init` multiple times. Run `./litellm-dedup.sh` to clean up. The health-check timer also auto-deduplicates when `DEDUP=1` (default).
|
||||||
|
|
||||||
## Adding a New Instance
|
## Adding a New Instance
|
||||||
|
|
||||||
1. Add an entry to `ansible/inventory.yml`
|
1. Add an entry to `ansible/inventory.yml`
|
||||||
|
|||||||
Reference in New Issue
Block a user