Add llama-swap local LLM setup

- Config at ~/.config/llama-swap/config.yaml
- Systemd user service (auto-starts)
- 6 models: qwen3, coder, glm, gemma, reasoning, gpt-oss
- Endpoint: http://127.0.0.1:8080
This commit is contained in:
William Valentin
2026-01-26 22:19:41 -08:00
parent f9111eea11
commit fece6b59c5

View File

@@ -59,10 +59,30 @@ Skills define *how* tools work. This file is for *your* specifics — the stuff
- **K8s Tools:** k9s, kubectl, argocd CLI, krew, kubecolor
- **Containers:** Docker, Podman, Distrobox
### Local AI
- **Ollama:** ✅ running
- **llama-swap:** ✅
- **Models:** Qwen3-4b, Gemma3-4b
### Local AI (llama-swap)
- **Endpoint:** `http://127.0.0.1:8080`
- **Service:** `systemctl --user status llama-swap`
- **Config:** `~/.config/llama-swap/config.yaml`
- **GPU:** RTX 5070 Ti (12GB VRAM)
**Available Models:**
| Alias | Model | Notes |
|-------|-------|-------|
| `qwen3` | Qwen3-30B-A3B | General purpose MoE, 8k ctx |
| `coder` | Qwen3-Coder-30B-A3B | Code specialist MoE |
| `glm` | GLM-4.7-Flash | Fast reasoning |
| `gemma` | Gemma-3-12B | Balanced, fits fully |
| `reasoning` | Ministral-3-14B-Reasoning | Reasoning specialist |
| `gpt-oss` | GPT-OSS-20B | Experimental |
**Usage:**
```bash
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gemma", "messages": [{"role": "user", "content": "Hello"}]}'
```
**Web UI:** http://127.0.0.1:8080/ui
---