Files
clawdbot/LLM-ROUTING.md
William Valentin f9111eea11 Initial commit: Flynn's workspace
- AGENTS.md: workspace conventions and guidelines
- SOUL.md: personality and principles
- USER.md: about William
- IDENTITY.md: who I am
- TOOLS.md: local notes and infrastructure details
- MEMORY.md: long-term memory
- HEARTBEAT.md: periodic task config
- LLM-ROUTING.md: model selection guide
- memory/2026-01-26.md: daily log
- .gitignore: exclude runtime state and secrets
2026-01-26 21:56:59 -08:00

3.9 KiB

LLM Routing Guide

Use the right model for the job. Cost and speed matter.

Available CLIs

CLI Auth Best For
claude Pro subscription Complex reasoning, this workspace
opencode GitHub Copilot subscription Code, free Copilot models
gemini Google account (free tier available) Long context, multimodal

Model Tiers

Fast & Cheap (Simple Tasks)

# Quick parsing, extraction, formatting, simple questions
opencode run -m github-copilot/claude-haiku-4.5 "parse this JSON and extract emails"
opencode run -m zai-coding-plan/glm-4.5-flash "summarize in 2 sentences"
gemini -m gemini-2.0-flash "quick question here"

Use for: Log parsing, data extraction, simple formatting, yes/no questions, summarization

🔧 Balanced (Standard Work)

# Code review, analysis, standard coding tasks
opencode run -m github-copilot/claude-sonnet-4.5 "review this code"
opencode run -m github-copilot/gpt-5-mini "explain this error"
gemini -m gemini-2.5-pro "analyze this architecture"

Use for: Code generation, debugging, analysis, documentation

🧠 Powerful (Complex Reasoning)

# Complex reasoning, multi-step planning, difficult problems
claude -p --model opus "design a system for X"
opencode run -m github-copilot/gpt-5.2 "complex reasoning task"
opencode run -m github-copilot/gemini-3-pro-preview "architectural decision"

Use for: Architecture decisions, complex debugging, multi-step planning

📚 Long Context

# Large codebases, long documents, big context windows
gemini -m gemini-2.5-pro "analyze this entire codebase" < large_file.txt
opencode run -m github-copilot/gemini-3-pro-preview "summarize all these files"

Use for: Analyzing large files, long documents, full codebase understanding

Quick Reference

Task Model CLI Command
Parse JSON/logs haiku opencode run -m github-copilot/claude-haiku-4.5 "..."
Simple summary flash gemini -m gemini-2.0-flash "..."
Code review sonnet opencode run -m github-copilot/claude-sonnet-4.5 "..."
Write code codex opencode run -m github-copilot/gpt-5.1-codex "..."
Debug complex issue sonnet/opus claude -p --model sonnet "..."
Architecture design opus claude -p --model opus "..."
Analyze large file gemini-pro gemini -m gemini-2.5-pro "..." < file
Quick kubectl help flash opencode run -m zai-coding-plan/glm-4.5-flash "..."

Cost Optimization Rules

  1. Start small — Try haiku/flash first, escalate only if needed
  2. Batch similar tasks — One opus call > five haiku calls for complex work
  3. Use subscriptions — GitHub Copilot models are "free" with subscription
  4. Cache results — Don't re-ask the same question
  5. Context matters — Smaller context = faster + cheaper

Example Workflows

Triage emails (cheap)

opencode run -m github-copilot/claude-haiku-4.5 "categorize these emails as urgent/normal/spam"

Code review (balanced)

opencode run -m github-copilot/claude-sonnet-4.5 "review this PR for issues"

Architectural decision (powerful)

claude -p --model opus "given these constraints, design the best approach for..."

Summarize long doc (long context)

cat huge_document.md | gemini -m gemini-2.5-pro "summarize key points"

For Flynn (Clawdbot)

When spawning sub-agents or doing background work:

  • Use sessions_spawn with appropriate model hints
  • For simple extraction: spawn with default (cheaper model)
  • For complex analysis: explicitly request opus

When using exec to call CLIs:

  • Prefer opencode run for one-shot tasks (GitHub Copilot = included)
  • Use claude -p when you need Claude-specific capabilities
  • Use gemini for very long context or multimodal

Principle: Don't use a sledgehammer to hang a picture.