From fece6b59c5000176a5c1abfcb876e9caab00b879 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Mon, 26 Jan 2026 22:19:41 -0800 Subject: [PATCH] Add llama-swap local LLM setup - Config at ~/.config/llama-swap/config.yaml - Systemd user service (auto-starts) - 6 models: qwen3, coder, glm, gemma, reasoning, gpt-oss - Endpoint: http://127.0.0.1:8080 --- TOOLS.md | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/TOOLS.md b/TOOLS.md index 7a87087..75839c6 100644 --- a/TOOLS.md +++ b/TOOLS.md @@ -59,10 +59,30 @@ Skills define *how* tools work. This file is for *your* specifics — the stuff - **K8s Tools:** k9s, kubectl, argocd CLI, krew, kubecolor - **Containers:** Docker, Podman, Distrobox -### Local AI -- **Ollama:** ✅ running -- **llama-swap:** ✅ -- **Models:** Qwen3-4b, Gemma3-4b +### Local AI (llama-swap) +- **Endpoint:** `http://127.0.0.1:8080` +- **Service:** `systemctl --user status llama-swap` +- **Config:** `~/.config/llama-swap/config.yaml` +- **GPU:** RTX 5070 Ti (12GB VRAM) + +**Available Models:** +| Alias | Model | Notes | +|-------|-------|-------| +| `qwen3` | Qwen3-30B-A3B | General purpose MoE, 8k ctx | +| `coder` | Qwen3-Coder-30B-A3B | Code specialist MoE | +| `glm` | GLM-4.7-Flash | Fast reasoning | +| `gemma` | Gemma-3-12B | Balanced, fits fully | +| `reasoning` | Ministral-3-14B-Reasoning | Reasoning specialist | +| `gpt-oss` | GPT-OSS-20B | Experimental | + +**Usage:** +```bash +curl http://127.0.0.1:8080/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"model": "gemma", "messages": [{"role": "user", "content": "Hello"}]}' +``` + +**Web UI:** http://127.0.0.1:8080/ui ---