Files
swarm-zap/memory/incidents/2026-03-03-litellm-test.md

5.3 KiB
Raw Blame History

Session: 2026-03-03 18:00:45 UTC

  • Session Key: agent:main:main
  • Session ID: fe1b7054-41fb-462f-8177-f4e8998f988b
  • Source: gateway:sessions.reset

Conversation Summary

assistant: reply_to_current Done — updated.

Whats now in litellm-config.yaml

  • Gemini was already present:
    • gemini-1.5-pro
    • gemini-1.5-flash
  • z.ai added:
    • zai-glm-4.7
    • zai-glm-4.7-flash
    • both using:
      • api_base: https://api.z.ai/api/coding/paas/v4
      • api_key: os.environ/ZAI_API_KEY

If you want, I can also add newer Gemini aliases (e.g. 2.x) and/or switch Gemini key var from GOOGLE_API_KEY to GEMINI_API_KEY if thats what your env uses. user: [Tue 2026-03-03 09:37 UTC] Update the file accordingly based on the currently configured providers assistant: reply_to_current Done — I updated litellm-config.yaml to match the providers currently configured on this host.

Kept (configured)

  • OpenAI-compatible local endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
  • z.ai (ZAI_API_KEY, base https://api.z.ai/api/coding/paas/v4)
    • glm-5
    • glm-4.7
    • glm-4.7-flash
    • glm-4.7-flashx

Removed (not currently configured)

  • Anthropic
  • OpenRouter
  • Opencode
  • Gemini

If you want Gemini included too, I can add it back with GEMINI_API_KEY entries as soon as you confirm that key/env is set. user: [Tue 2026-03-03 09:42 UTC] We have litellm fixed. Can you test it? assistant: reply_to_current I tested from here — current status:

  • http://192.168.153.113:18805/v1/models responds, but returns TTS models (kokoro, tts-1, etc.), not your LiteLLM model list.
  • http://192.168.153.113:18805/v1/chat/completions returns 404 Not Found.
  • No local LiteLLM listener appears to be running on this host (127.0.0.1:4000/8000 not reachable).

So LiteLLM itself is likely on a different port/host (or not started yet).

If you share the exact LiteLLM base URL (e.g. http://<host>:<port>/v1), Ill run a real end-to-end test immediately. user: [Tue 2026-03-03 09:43 UTC] NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS brave-search mcp/brave-search:latest "node dist/index.js" brave-search 2 hours ago Up 2 hours 0.0.0.0:18802->8000/tcp, [::]:18802->8000/tcp kokoro-tts ghcr.io/remsky/kokoro-fastapi-cpu:latest "./entrypoint.sh" kokoro-tts 2 hours ago Up 2 hours 0.0.0.0:18805->8880/tcp, [::]:18805->8880/tcp litellm litellm/litellm:latest "docker/prod_entrypo…" litellm 2 hours ago Up 3 minutes (unhealthy) 0.0.0.0:18804->4000/tcp, [::]:18804->4000/tcp searxng searxng/searxng:latest "/usr/local/searxng/…" searxng 2 hours ago Up 2 hours 0.0.0.0:18803->8080/tcp, [::]:18803->8080/tcp whisper-server ghcr.io/ggml-org/whisper.cpp:main "whisper-server --mo…" whisper-server 2 hours ago Up 2 hours (healthy) 0.0.0.0:18801->8080/tcp, [::]:18801->8080/tcp assistant: reply_to_current Yep — tested. Good news/bad news:

LiteLLM is reachable at :18804 and serving model aliases:

  • gpt-4o
  • gpt-4-turbo
  • zai-glm-5
  • zai-glm-4.7
  • zai-glm-4.7-flash
  • zai-glm-4.7-flashx

Actual completions fail (auth upstream):

  • zai-glm-4.7-flash returns Authentication parameter not received in Header
  • Health endpoint shows all configured backends unhealthy for missing upstream API keys.

So your LiteLLM config is loaded, but container env is missing provider keys (ZAI_API_KEY, and for OpenAI route, OPENAI_API_KEY if used upstream directly).

If you want, I can give you a minimal docker-compose env block to make it green in one restart. user: [Tue 2026-03-03 10:09 UTC] Can you try to hit the litellm API? Here is a virtual key: sk-IlDTV8FaW0vTUL_a4VDhMw assistant: reply_to_current Tested with your virtual key.

  • GET /v1/models works (returns your configured aliases)
  • POST /v1/chat/completions still fails:
    • zai-glm-4.7-flash: upstream auth missing
    • gpt-4o: upstream OpenAI API key missing

So your virtual key is valid for LiteLLM, but LiteLLM itself still lacks provider creds in its container env.

You need to set (at least):

  • ZAI_API_KEY (for z.ai models)
  • OPENAI_API_KEY (for gpt-4o / gpt-4-turbo)

Then restart litellm container and re-test. user: [Tue 2026-03-03 10:41 UTC] ok test now assistant: reply_to_current Yep — much better now. I re-tested and its working.

Results

  • /v1/models returns models (now includes Claude/Gemini/z.ai aliases too).
  • gpt-4o chat completion returned OK.
  • zai-glm-4.7-flash chat completion works too (returned OK when given enough tokens).
  • /health now shows partial recovery: 7 healthy / 4 unhealthy.

So LiteLLM is up and functional with your virtual key.
If you want, I can run one quick probe to list exactly which 4 models are still unhealthy. user: [Tue 2026-03-03 11:07 UTC] Test litellm with gpt-5-mini assistant: reply_to_current Tested.

gpt-5-mini currently fails on your LiteLLM with:

no healthy deployments for this model (model=gpt-5-mini)

So it isnt mapped/configured (or healthy) in your current LiteLLM setup yet. user: [Tue 2026-03-03 11:08 UTC] Test litellm with openai/gpt-5-mini assistant: reply_to_current Tested openai/gpt-5-mini.

Result: fails with:

OpenAIException - Unknown parameter: 'credential_name'

So this route is being attempted, but theres a provider-side/request-shape mismatch in the current LiteLLM/OpenAI config for that model.