From fece6b59c5000176a5c1abfcb876e9caab00b879 Mon Sep 17 00:00:00 2001
From: William Valentin <william.valentin.info@gmail.com>
Date: Mon, 26 Jan 2026 22:19:41 -0800
Subject: [PATCH] Add llama-swap local LLM setup

- Config at ~/.config/llama-swap/config.yaml
- Systemd user service (auto-starts)
- 6 models: qwen3, coder, glm, gemma, reasoning, gpt-oss
- Endpoint: http://127.0.0.1:8080
---
 TOOLS.md | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/TOOLS.md b/TOOLS.md
index 7a87087..75839c6 100644
--- a/TOOLS.md
+++ b/TOOLS.md
@@ -59,10 +59,30 @@ Skills define *how* tools work. This file is for *your* specifics — the stuff
 - **K8s Tools:** k9s, kubectl, argocd CLI, krew, kubecolor
 - **Containers:** Docker, Podman, Distrobox
 
-### Local AI
-- **Ollama:** ✅ running
-- **llama-swap:** ✅
-- **Models:** Qwen3-4b, Gemma3-4b
+### Local AI (llama-swap)
+- **Endpoint:** `http://127.0.0.1:8080`
+- **Service:** `systemctl --user status llama-swap`
+- **Config:** `~/.config/llama-swap/config.yaml`
+- **GPU:** RTX 5070 Ti (12GB VRAM)
+
+**Available Models:**
+| Alias | Model | Notes |
+|-------|-------|-------|
+| `qwen3` | Qwen3-30B-A3B | General purpose MoE, 8k ctx |
+| `coder` | Qwen3-Coder-30B-A3B | Code specialist MoE |
+| `glm` | GLM-4.7-Flash | Fast reasoning |
+| `gemma` | Gemma-3-12B | Balanced, fits fully |
+| `reasoning` | Ministral-3-14B-Reasoning | Reasoning specialist |
+| `gpt-oss` | GPT-OSS-20B | Experimental |
+
+**Usage:**
+```bash
+curl http://127.0.0.1:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "gemma", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+
+**Web UI:** http://127.0.0.1:8080/ui
 
 ---