Add llama-swap local LLM setup
- Config at ~/.config/llama-swap/config.yaml - Systemd user service (auto-starts) - 6 models: qwen3, coder, glm, gemma, reasoning, gpt-oss - Endpoint: http://127.0.0.1:8080
This commit is contained in:
28
TOOLS.md
28
TOOLS.md
@@ -59,10 +59,30 @@ Skills define *how* tools work. This file is for *your* specifics — the stuff
|
||||
- **K8s Tools:** k9s, kubectl, argocd CLI, krew, kubecolor
|
||||
- **Containers:** Docker, Podman, Distrobox
|
||||
|
||||
### Local AI
|
||||
- **Ollama:** ✅ running
|
||||
- **llama-swap:** ✅
|
||||
- **Models:** Qwen3-4b, Gemma3-4b
|
||||
### Local AI (llama-swap)
|
||||
- **Endpoint:** `http://127.0.0.1:8080`
|
||||
- **Service:** `systemctl --user status llama-swap`
|
||||
- **Config:** `~/.config/llama-swap/config.yaml`
|
||||
- **GPU:** RTX 5070 Ti (12GB VRAM)
|
||||
|
||||
**Available Models:**
|
||||
| Alias | Model | Notes |
|
||||
|-------|-------|-------|
|
||||
| `qwen3` | Qwen3-30B-A3B | General purpose MoE, 8k ctx |
|
||||
| `coder` | Qwen3-Coder-30B-A3B | Code specialist MoE |
|
||||
| `glm` | GLM-4.7-Flash | Fast reasoning |
|
||||
| `gemma` | Gemma-3-12B | Balanced, fits fully |
|
||||
| `reasoning` | Ministral-3-14B-Reasoning | Reasoning specialist |
|
||||
| `gpt-oss` | GPT-OSS-20B | Experimental |
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
curl http://127.0.0.1:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "gemma", "messages": [{"role": "user", "content": "Hello"}]}'
|
||||
```
|
||||
|
||||
**Web UI:** http://127.0.0.1:8080/ui
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user