Files
swarm-zap/memory/2026-03-03-litellm-test.md
zap b0b4060bf9 chore(workspace): install all 4 bundled hooks + model default update
- Installed hooks: boot-md, session-memory, command-logger, bootstrap-extra-files
- Updated default model to litellm/copilot-claude-sonnet-4.6
- Added memory/startup-health.json + boot-last-run.json from boot check
- Added memory/startup-health.md warning log
- Captured new session memory files (2026-03-04/05)
- Minor MEMORY.md + USER.md: added git preference line
2026-03-05 04:40:46 +00:00

111 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Session: 2026-03-03 18:00:45 UTC
- **Session Key**: agent:main:main
- **Session ID**: fe1b7054-41fb-462f-8177-f4e8998f988b
- **Source**: gateway:sessions.reset
## Conversation Summary
assistant: [[reply_to_current]] Done — updated.
### Whats now in `litellm-config.yaml`
- **Gemini** was already present:
- `gemini-1.5-pro`
- `gemini-1.5-flash`
- **z.ai** added:
- `zai-glm-4.7`
- `zai-glm-4.7-flash`
- both using:
- `api_base: https://api.z.ai/api/coding/paas/v4`
- `api_key: os.environ/ZAI_API_KEY`
If you want, I can also add newer Gemini aliases (e.g. 2.x) and/or switch Gemini key var from `GOOGLE_API_KEY` to `GEMINI_API_KEY` if thats what your env uses.
user: [Tue 2026-03-03 09:37 UTC] Update the file accordingly based on the currently configured providers
assistant: [[reply_to_current]] Done — I updated `litellm-config.yaml` to match the providers currently configured on this host.
### Kept (configured)
- OpenAI-compatible local endpoint (`OPENAI_BASE_URL` + `OPENAI_API_KEY`)
- z.ai (`ZAI_API_KEY`, base `https://api.z.ai/api/coding/paas/v4`)
- `glm-5`
- `glm-4.7`
- `glm-4.7-flash`
- `glm-4.7-flashx`
### Removed (not currently configured)
- Anthropic
- OpenRouter
- Opencode
- Gemini
If you want Gemini included too, I can add it back with `GEMINI_API_KEY` entries as soon as you confirm that key/env is set.
user: [Tue 2026-03-03 09:42 UTC] We have litellm fixed. Can you test it?
assistant: [[reply_to_current]] I tested from here — current status:
- `http://192.168.153.113:18805/v1/models` responds, but returns TTS models (`kokoro`, `tts-1`, etc.), not your LiteLLM model list.
- `http://192.168.153.113:18805/v1/chat/completions` returns `404 Not Found`.
- No local LiteLLM listener appears to be running on this host (`127.0.0.1:4000/8000` not reachable).
So LiteLLM itself is likely on a different port/host (or not started yet).
If you share the exact LiteLLM base URL (e.g. `http://<host>:<port>/v1`), Ill run a real end-to-end test immediately.
user: [Tue 2026-03-03 09:43 UTC] NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS brave-search mcp/brave-search:latest "node dist/index.js" brave-search 2 hours ago Up 2 hours 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp kokoro-tts ghcr.io/remsky/kokoro-fastapi-cpu:latest "./entrypoint.sh" kokoro-tts 2 hours ago Up 2 hours 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp litellm litellm/litellm:latest "docker/prod_entrypo…" litellm 2 hours ago Up 3 minutes (unhealthy) 0.0.0.0:18804->4000/tcp, [::]:18804->4000/tcp searxng searxng/searxng:latest "/usr/local/searxng/…" searxng 2 hours ago Up 2 hours 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp whisper-server ghcr.io/ggml-org/whisper.cpp:main "whisper-server --mo…" whisper-server 2 hours ago Up 2 hours (healthy) 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp
assistant: [[reply_to_current]] Yep — tested. Good news/bad news:
**LiteLLM is reachable** at `:18804` and serving model aliases:
- `gpt-4o`
- `gpt-4-turbo`
- `zai-glm-5`
- `zai-glm-4.7`
- `zai-glm-4.7-flash`
- `zai-glm-4.7-flashx`
**Actual completions fail** (auth upstream):
- `zai-glm-4.7-flash` returns `Authentication parameter not received in Header`
- Health endpoint shows all configured backends unhealthy for missing upstream API keys.
So your LiteLLM config is loaded, but container env is missing provider keys (`ZAI_API_KEY`, and for OpenAI route, `OPENAI_API_KEY` if used upstream directly).
If you want, I can give you a minimal `docker-compose` env block to make it green in one restart.
user: [Tue 2026-03-03 10:09 UTC] Can you try to hit the litellm API? Here is a virtual key: sk-IlDTV8FaW0vTUL_a4VDhMw
assistant: [[reply_to_current]] Tested with your virtual key.
- `GET /v1/models` ✅ works (returns your configured aliases)
- `POST /v1/chat/completions` ❌ still fails:
- `zai-glm-4.7-flash`: upstream auth missing
- `gpt-4o`: upstream OpenAI API key missing
So your **virtual key is valid for LiteLLM**, but LiteLLM itself still lacks provider creds in its container env.
You need to set (at least):
- `ZAI_API_KEY` (for z.ai models)
- `OPENAI_API_KEY` (for gpt-4o / gpt-4-turbo)
Then restart litellm container and re-test.
user: [Tue 2026-03-03 10:41 UTC] ok test now
assistant: [[reply_to_current]] Yep — much better now. I re-tested and its working.
### Results
- `/v1/models` ✅ returns models (now includes Claude/Gemini/z.ai aliases too).
- `gpt-4o` chat completion ✅ returned `OK`.
- `zai-glm-4.7-flash` chat completion ✅ works too (returned `OK` when given enough tokens).
- `/health` now shows partial recovery: **7 healthy / 4 unhealthy**.
So LiteLLM is up and functional with your virtual key.
If you want, I can run one quick probe to list exactly which 4 models are still unhealthy.
user: [Tue 2026-03-03 11:07 UTC] Test litellm with gpt-5-mini
assistant: [[reply_to_current]] Tested.
`gpt-5-mini` currently fails on your LiteLLM with:
> no healthy deployments for this model (`model=gpt-5-mini`)
So it isnt mapped/configured (or healthy) in your current LiteLLM setup yet.
user: [Tue 2026-03-03 11:08 UTC] Test litellm with openai/gpt-5-mini
assistant: [[reply_to_current]] Tested `openai/gpt-5-mini`.
Result: ❌ fails with:
`OpenAIException - Unknown parameter: 'credential_name'`
So this route is being attempted, but theres a provider-side/request-shape mismatch in the current LiteLLM/OpenAI config for that model.