- AGENTS.md: workspace conventions and guidelines - SOUL.md: personality and principles - USER.md: about William - IDENTITY.md: who I am - TOOLS.md: local notes and infrastructure details - MEMORY.md: long-term memory - HEARTBEAT.md: periodic task config - LLM-ROUTING.md: model selection guide - memory/2026-01-26.md: daily log - .gitignore: exclude runtime state and secrets
3.9 KiB
3.9 KiB
LLM Routing Guide
Use the right model for the job. Cost and speed matter.
Available CLIs
| CLI | Auth | Best For |
|---|---|---|
claude |
Pro subscription | Complex reasoning, this workspace |
opencode |
GitHub Copilot subscription | Code, free Copilot models |
gemini |
Google account (free tier available) | Long context, multimodal |
Model Tiers
⚡ Fast & Cheap (Simple Tasks)
# Quick parsing, extraction, formatting, simple questions
opencode run -m github-copilot/claude-haiku-4.5 "parse this JSON and extract emails"
opencode run -m zai-coding-plan/glm-4.5-flash "summarize in 2 sentences"
gemini -m gemini-2.0-flash "quick question here"
Use for: Log parsing, data extraction, simple formatting, yes/no questions, summarization
🔧 Balanced (Standard Work)
# Code review, analysis, standard coding tasks
opencode run -m github-copilot/claude-sonnet-4.5 "review this code"
opencode run -m github-copilot/gpt-5-mini "explain this error"
gemini -m gemini-2.5-pro "analyze this architecture"
Use for: Code generation, debugging, analysis, documentation
🧠 Powerful (Complex Reasoning)
# Complex reasoning, multi-step planning, difficult problems
claude -p --model opus "design a system for X"
opencode run -m github-copilot/gpt-5.2 "complex reasoning task"
opencode run -m github-copilot/gemini-3-pro-preview "architectural decision"
Use for: Architecture decisions, complex debugging, multi-step planning
📚 Long Context
# Large codebases, long documents, big context windows
gemini -m gemini-2.5-pro "analyze this entire codebase" < large_file.txt
opencode run -m github-copilot/gemini-3-pro-preview "summarize all these files"
Use for: Analyzing large files, long documents, full codebase understanding
Quick Reference
| Task | Model | CLI Command |
|---|---|---|
| Parse JSON/logs | haiku | opencode run -m github-copilot/claude-haiku-4.5 "..." |
| Simple summary | flash | gemini -m gemini-2.0-flash "..." |
| Code review | sonnet | opencode run -m github-copilot/claude-sonnet-4.5 "..." |
| Write code | codex | opencode run -m github-copilot/gpt-5.1-codex "..." |
| Debug complex issue | sonnet/opus | claude -p --model sonnet "..." |
| Architecture design | opus | claude -p --model opus "..." |
| Analyze large file | gemini-pro | gemini -m gemini-2.5-pro "..." < file |
| Quick kubectl help | flash | opencode run -m zai-coding-plan/glm-4.5-flash "..." |
Cost Optimization Rules
- Start small — Try haiku/flash first, escalate only if needed
- Batch similar tasks — One opus call > five haiku calls for complex work
- Use subscriptions — GitHub Copilot models are "free" with subscription
- Cache results — Don't re-ask the same question
- Context matters — Smaller context = faster + cheaper
Example Workflows
Triage emails (cheap)
opencode run -m github-copilot/claude-haiku-4.5 "categorize these emails as urgent/normal/spam"
Code review (balanced)
opencode run -m github-copilot/claude-sonnet-4.5 "review this PR for issues"
Architectural decision (powerful)
claude -p --model opus "given these constraints, design the best approach for..."
Summarize long doc (long context)
cat huge_document.md | gemini -m gemini-2.5-pro "summarize key points"
For Flynn (Clawdbot)
When spawning sub-agents or doing background work:
- Use
sessions_spawnwith appropriate model hints - For simple extraction: spawn with default (cheaper model)
- For complex analysis: explicitly request opus
When using exec to call CLIs:
- Prefer
opencode runfor one-shot tasks (GitHub Copilot = included) - Use
claude -pwhen you need Claude-specific capabilities - Use
geminifor very long context or multimodal
Principle: Don't use a sledgehammer to hang a picture.