clawdbot/LLM-ROUTING.md

# LLM Routing Guide

Use the right model for the job. Cost and speed matter.

## Available CLIs

| CLI | Auth | Best For |
|-----|------|----------|
| `claude` | Pro subscription | Complex reasoning, this workspace |
| `opencode` | GitHub Copilot subscription | Code, free Copilot models |
| `gemini` | Google account (free tier available) | Long context, multimodal |

## Model Tiers

### ⚡ Fast & Cheap (Simple Tasks)
```bash
# Quick parsing, extraction, formatting, simple questions
opencode run -m github-copilot/claude-haiku-4.5 "parse this JSON and extract emails"
opencode run -m zai-coding-plan/glm-4.5-flash "summarize in 2 sentences"
gemini -m gemini-2.0-flash "quick question here"
```

**Use for:** Log parsing, data extraction, simple formatting, yes/no questions, summarization

### 🔧 Balanced (Standard Work)
```bash
# Code review, analysis, standard coding tasks
opencode run -m github-copilot/claude-sonnet-4.5 "review this code"
opencode run -m github-copilot/gpt-5-mini "explain this error"
gemini -m gemini-2.5-pro "analyze this architecture"
```

**Use for:** Code generation, debugging, analysis, documentation

### 🧠 Powerful (Complex Reasoning)
```bash
# Complex reasoning, multi-step planning, difficult problems
claude -p --model opus "design a system for X"
opencode run -m github-copilot/gpt-5.2 "complex reasoning task"
opencode run -m github-copilot/gemini-3-pro-preview "architectural decision"
```

**Use for:** Architecture decisions, complex debugging, multi-step planning

### 📚 Long Context
```bash
# Large codebases, long documents, big context windows
gemini -m gemini-2.5-pro "analyze this entire codebase" < large_file.txt
opencode run -m github-copilot/gemini-3-pro-preview "summarize all these files"
```

**Use for:** Analyzing large files, long documents, full codebase understanding

## Quick Reference

| Task | Model | CLI Command |
|------|-------|-------------|
| Parse JSON/logs | haiku | `opencode run -m github-copilot/claude-haiku-4.5 "..."` |
| Simple summary | flash | `gemini -m gemini-2.0-flash "..."` |
| Code review | sonnet | `opencode run -m github-copilot/claude-sonnet-4.5 "..."` |
| Write code | codex | `opencode run -m github-copilot/gpt-5.1-codex "..."` |
| Debug complex issue | sonnet/opus | `claude -p --model sonnet "..."` |
| Architecture design | opus | `claude -p --model opus "..."` |
| Analyze large file | gemini-pro | `gemini -m gemini-2.5-pro "..." < file` |
| Quick kubectl help | flash | `opencode run -m zai-coding-plan/glm-4.5-flash "..."` |

## Cost Optimization Rules

1. **Start small** — Try haiku/flash first, escalate only if needed
2. **Batch similar tasks** — One opus call > five haiku calls for complex work
3. **Use subscriptions** — GitHub Copilot models are "free" with subscription
4. **Cache results** — Don't re-ask the same question
5. **Context matters** — Smaller context = faster + cheaper

## Example Workflows

### Triage emails (cheap)
```bash
opencode run -m github-copilot/claude-haiku-4.5 "categorize these emails as urgent/normal/spam"
```

### Code review (balanced)
```bash
opencode run -m github-copilot/claude-sonnet-4.5 "review this PR for issues"
```

### Architectural decision (powerful)
```bash
claude -p --model opus "given these constraints, design the best approach for..."
```

### Summarize long doc (long context)
```bash
cat huge_document.md | gemini -m gemini-2.5-pro "summarize key points"
```

## For Flynn (Clawdbot)

When spawning sub-agents or doing background work:
- Use `sessions_spawn` with appropriate model hints
- For simple extraction: spawn with default (cheaper model)
- For complex analysis: explicitly request opus

When using exec to call CLIs:
- Prefer `opencode run` for one-shot tasks (GitHub Copilot = included)
- Use `claude -p` when you need Claude-specific capabilities
- Use `gemini` for very long context or multimodal

---

*Principle: Don't use a sledgehammer to hang a picture.*