Add llama-swap local LLM setup

- Config at ~/.config/llama-swap/config.yaml - Systemd user service (auto-starts) - 6 models: qwen3, coder, glm, gemma, reasoning, gpt-oss - Endpoint: http://127.0.0.1:8080
2026-01-26 22:19:41 -08:00
parent f9111eea11
commit fece6b59c5
1 changed files with 24 additions and 4 deletions
@@ -59,10 +59,30 @@ Skills define *how* tools work. This file is for *your* specifics — the stuff
 - **K8s Tools:** k9s, kubectl, argocd CLI, krew, kubecolor
 - **Containers:** Docker, Podman, Distrobox

-### Local AI
- **Ollama:** ✅ running
- **llama-swap:** ✅
- **Models:** Qwen3-4b, Gemma3-4b
+### Local AI (llama-swap)
+- **Endpoint:** `http://127.0.0.1:8080`
+- **Service:** `systemctl --user status llama-swap`
+- **Config:** `~/.config/llama-swap/config.yaml`
+- **GPU:** RTX 5070 Ti (12GB VRAM)
+
+**Available Models:**
+| Alias | Model | Notes |
+|-------|-------|-------|
+| `qwen3` | Qwen3-30B-A3B | General purpose MoE, 8k ctx |
+| `coder` | Qwen3-Coder-30B-A3B | Code specialist MoE |
+| `glm` | GLM-4.7-Flash | Fast reasoning |
+| `gemma` | Gemma-3-12B | Balanced, fits fully |
+| `reasoning` | Ministral-3-14B-Reasoning | Reasoning specialist |
+| `gpt-oss` | GPT-OSS-20B | Experimental |
+
+**Usage:**
+```bash
+curl http://127.0.0.1:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "gemma", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+
+**Web UI:** http://127.0.0.1:8080/ui

 ---