feat(external-llm): add external LLM integration (fc-004)
Implements external LLM routing via opencode CLI for: - GitHub Copilot (gpt-5.2, claude-sonnet-4.5, claude-haiku-4.5, o3, gemini-3-pro) - Z.AI (glm-4.7 for code generation) - OpenCode native (big-pickle) Components: - mcp/llm-router/invoke.py: Main router with task-based model selection - mcp/llm-router/delegate.py: Agent delegation helper (respects external mode) - mcp/llm-router/toggle.py: Enable/disable external-only mode - mcp/llm-router/providers/: CLI wrappers for opencode and gemini Features: - Persistent toggle via state/external-mode.json - Task routing: reasoning -> gpt-5.2, code-gen -> glm-4.7, long-context -> gemini - Claude tier mapping: opus -> gpt-5.2, sonnet -> claude-sonnet-4.5, haiku -> claude-haiku-4.5 - Session-start hook announces when external mode is active - Natural language toggle support via component registry Plan: gleaming-routing-mercury Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
375
plans/gleaming-routing-mercury.md
Normal file
375
plans/gleaming-routing-mercury.md
Normal file
@@ -0,0 +1,375 @@
|
||||
# Plan: External LLM Integration
|
||||
|
||||
## Summary
|
||||
|
||||
Integrate external LLMs (via subscription-based access) into the agent system for cost optimization, specialized capabilities, and redundancy. Uses `opencode` CLI for Copilot/Z.AI models and `gemini` CLI for Google models.
|
||||
|
||||
## Motivation
|
||||
|
||||
- **Cost optimization** — Use cheaper models for simple tasks
|
||||
- **Specialized capabilities** — Access models with unique strengths (GPT-5.2 for reasoning, GLM 4.7 for code)
|
||||
- **Redundancy** — Fallback when Claude is unavailable
|
||||
|
||||
## Design Decisions
|
||||
|
||||
| Decision | Choice |
|
||||
|----------|--------|
|
||||
| Provider type | Cloud APIs via subscription (not local) |
|
||||
| Providers | GitHub Copilot, Z.AI, Google Gemini |
|
||||
| CLIs | `opencode` (Copilot, Z.AI), `gemini` (Google) |
|
||||
| Integration | Task-specific routing + agent-level assignment |
|
||||
| Toggle | State file (persists across sessions) |
|
||||
| Toggle scope | All agents switch when enabled |
|
||||
|
||||
## Task Routing
|
||||
|
||||
| Task Type | Model |
|
||||
|-----------|-------|
|
||||
| Reasoning chains | copilot/gpt-5.2 |
|
||||
| Code generation | zai/glm-4.7 |
|
||||
| Long context | gemini/gemini-3-pro |
|
||||
| General/fallback | copilot/sonnet-4.5 |
|
||||
|
||||
## Claude-to-External Mapping
|
||||
|
||||
| Claude Tier | External Equivalent |
|
||||
|-------------|---------------------|
|
||||
| opus | copilot/gpt-5.2 |
|
||||
| sonnet | copilot/sonnet-4.5 |
|
||||
| haiku | copilot/haiku-4.5 |
|
||||
|
||||
## Files to Create
|
||||
|
||||
### `~/.claude/state/external-mode.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"enabled": false,
|
||||
"activated_at": null,
|
||||
"reason": null
|
||||
}
|
||||
```
|
||||
|
||||
### `~/.claude/mcp/llm-router/invoke.py`
|
||||
|
||||
Main entry point for invoking external LLMs.
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Invoke external LLM via configured provider.
|
||||
|
||||
Usage:
|
||||
invoke.py --model copilot/gpt-5.2 -p "prompt"
|
||||
invoke.py --task reasoning -p "prompt"
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
STATE_DIR = Path.home() / ".claude/state"
|
||||
|
||||
def load_policy():
|
||||
with open(STATE_DIR / "model-policy.json") as f:
|
||||
return json.load(f)
|
||||
|
||||
def resolve_model(args, policy):
|
||||
if args.model:
|
||||
return args.model
|
||||
if args.task and args.task in policy["task_routing"]:
|
||||
return policy["task_routing"][args.task]
|
||||
return policy["task_routing"]["default"]
|
||||
|
||||
def invoke(model: str, prompt: str, policy: dict) -> str:
|
||||
model_config = policy["external_models"][model]
|
||||
cli = model_config["cli"]
|
||||
cli_args = model_config["cli_args"]
|
||||
|
||||
if cli == "opencode":
|
||||
from providers.opencode import invoke as opencode_invoke
|
||||
return opencode_invoke(cli_args, prompt)
|
||||
elif cli == "gemini":
|
||||
from providers.gemini import invoke as gemini_invoke
|
||||
return gemini_invoke(cli_args, prompt)
|
||||
else:
|
||||
raise ValueError(f"Unknown CLI: {cli}")
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("-p", "--prompt", required=True, help="Prompt text")
|
||||
parser.add_argument("--model", help="Explicit model (e.g., copilot/gpt-5.2)")
|
||||
parser.add_argument("--task", help="Task type for routing")
|
||||
parser.add_argument("--json", action="store_true", help="Output as JSON")
|
||||
args = parser.parse_args()
|
||||
|
||||
policy = load_policy()
|
||||
model = resolve_model(args, policy)
|
||||
result = invoke(args.prompt, model, policy)
|
||||
|
||||
if args.json:
|
||||
print(json.dumps({"model": model, "response": result}))
|
||||
else:
|
||||
print(result)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
### `~/.claude/mcp/llm-router/providers/opencode.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""OpenCode CLI wrapper."""
|
||||
|
||||
import subprocess
|
||||
|
||||
def invoke(cli_args: list, prompt: str) -> str:
|
||||
cmd = ["opencode", "--print"] + cli_args + ["-p", prompt]
|
||||
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(f"opencode failed: {result.stderr}")
|
||||
|
||||
return result.stdout.strip()
|
||||
```
|
||||
|
||||
### `~/.claude/mcp/llm-router/providers/gemini.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""Gemini CLI wrapper."""
|
||||
|
||||
import subprocess
|
||||
|
||||
def invoke(cli_args: list, prompt: str) -> str:
|
||||
cmd = ["gemini"] + cli_args + ["-p", prompt]
|
||||
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=300
|
||||
)
|
||||
|
||||
if result.returncode != 0:
|
||||
raise RuntimeError(f"gemini failed: {result.stderr}")
|
||||
|
||||
return result.stdout.strip()
|
||||
```
|
||||
|
||||
### `~/.claude/mcp/llm-router/delegate.py`
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Agent delegation helper. Routes to external or Claude based on mode.
|
||||
|
||||
Usage:
|
||||
delegate.py --tier sonnet -p "prompt"
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
STATE_DIR = Path.home() / ".claude/state"
|
||||
|
||||
def is_external_mode():
|
||||
mode_file = STATE_DIR / "external-mode.json"
|
||||
if mode_file.exists():
|
||||
with open(mode_file) as f:
|
||||
return json.load(f).get("enabled", False)
|
||||
return False
|
||||
|
||||
def delegate(tier: str, prompt: str) -> str:
|
||||
if is_external_mode():
|
||||
policy = json.loads((STATE_DIR / "model-policy.json").read_text())
|
||||
model = policy["claude_to_external_map"][tier]
|
||||
|
||||
result = subprocess.run(
|
||||
[str(Path.home() / ".claude/mcp/llm-router/invoke.py"),
|
||||
"--model", model, "-p", prompt],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
return result.stdout
|
||||
else:
|
||||
result = subprocess.run(
|
||||
["claude", "--print", "--model", tier, prompt],
|
||||
capture_output=True, text=True
|
||||
)
|
||||
return result.stdout
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--tier", required=True, choices=["opus", "sonnet", "haiku"])
|
||||
parser.add_argument("-p", "--prompt", required=True)
|
||||
args = parser.parse_args()
|
||||
|
||||
print(delegate(args.tier, args.prompt))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Files to Modify
|
||||
|
||||
### `~/.claude/state/model-policy.json`
|
||||
|
||||
Add sections:
|
||||
|
||||
```json
|
||||
{
|
||||
"external_models": {
|
||||
"copilot/gpt-5.2": {
|
||||
"cli": "opencode",
|
||||
"cli_args": ["--provider", "copilot", "--model", "gpt-5.2"],
|
||||
"use_cases": ["reasoning", "fallback"],
|
||||
"tier": "opus-equivalent"
|
||||
},
|
||||
"copilot/sonnet-4.5": {
|
||||
"cli": "opencode",
|
||||
"cli_args": ["--provider", "copilot", "--model", "sonnet-4.5"],
|
||||
"use_cases": ["general", "fallback"],
|
||||
"tier": "sonnet-equivalent"
|
||||
},
|
||||
"copilot/haiku-4.5": {
|
||||
"cli": "opencode",
|
||||
"cli_args": ["--provider", "copilot", "--model", "haiku-4.5"],
|
||||
"use_cases": ["simple"],
|
||||
"tier": "haiku-equivalent"
|
||||
},
|
||||
"zai/glm-4.7": {
|
||||
"cli": "opencode",
|
||||
"cli_args": ["--provider", "zai", "--model", "glm-4.7"],
|
||||
"use_cases": ["code-generation"],
|
||||
"tier": "sonnet-equivalent"
|
||||
},
|
||||
"gemini/gemini-3-pro": {
|
||||
"cli": "gemini",
|
||||
"cli_args": ["-m", "gemini-3-pro"],
|
||||
"use_cases": ["long-context"],
|
||||
"tier": "opus-equivalent"
|
||||
}
|
||||
},
|
||||
"claude_to_external_map": {
|
||||
"opus": "copilot/gpt-5.2",
|
||||
"sonnet": "copilot/sonnet-4.5",
|
||||
"haiku": "copilot/haiku-4.5"
|
||||
},
|
||||
"task_routing": {
|
||||
"reasoning": "copilot/gpt-5.2",
|
||||
"code-generation": "zai/glm-4.7",
|
||||
"long-context": "gemini/gemini-3-pro",
|
||||
"default": "copilot/sonnet-4.5"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### `~/.claude/state/component-registry.json`
|
||||
|
||||
Add trigger for external mode:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "external-mode-toggle",
|
||||
"type": "skill",
|
||||
"triggers": [
|
||||
"use external", "switch to external", "external models",
|
||||
"stop using claude", "external only", "use copilot",
|
||||
"back to claude", "use claude again", "disable external"
|
||||
],
|
||||
"action": "toggle-external-mode"
|
||||
}
|
||||
```
|
||||
|
||||
### `~/.claude/hooks/scripts/session-start.sh`
|
||||
|
||||
Add external mode announcement:
|
||||
|
||||
```bash
|
||||
# Check external mode
|
||||
if [ -f ~/.claude/state/external-mode.json ]; then
|
||||
if [ "$(jq -r '.enabled' ~/.claude/state/external-mode.json)" = "true" ]; then
|
||||
echo "external-mode:enabled"
|
||||
fi
|
||||
fi
|
||||
```
|
||||
|
||||
## Toggle Interface
|
||||
|
||||
### Command-based
|
||||
|
||||
```
|
||||
/pa --external on # Enable external-only mode
|
||||
/pa --external off # Disable, return to Claude
|
||||
/pa --external status # Show current mode
|
||||
```
|
||||
|
||||
### Natural language
|
||||
|
||||
| User says | Action |
|
||||
|-----------|--------|
|
||||
| "switch to external models" | Enable |
|
||||
| "use copilot for everything" | Enable |
|
||||
| "go back to claude" | Disable |
|
||||
| "are we using external?" | Status |
|
||||
| "use external for this" | One-shot (no persist) |
|
||||
|
||||
### Visual indicator
|
||||
|
||||
When external mode active, PA prefixes responses:
|
||||
|
||||
```
|
||||
🔌 [External: copilot/gpt-5.2]
|
||||
|
||||
<response>
|
||||
```
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. Create `external-mode.json` state file
|
||||
2. Extend `model-policy.json` with external models
|
||||
3. Create `llm-router/` directory and scripts
|
||||
4. Add provider wrappers (opencode, gemini)
|
||||
5. Create delegation helper
|
||||
6. Update PA with toggle commands
|
||||
7. Add component-registry triggers
|
||||
8. Update session-start hook
|
||||
9. Test each provider
|
||||
|
||||
## Validation
|
||||
|
||||
```bash
|
||||
# Test router directly
|
||||
~/.claude/mcp/llm-router/invoke.py --model copilot/gpt-5.2 -p "Say hello"
|
||||
|
||||
# Test delegation
|
||||
~/.claude/mcp/llm-router/delegate.py --tier sonnet -p "What is 2+2?"
|
||||
|
||||
# Test toggle
|
||||
/pa --external on
|
||||
/pa what time is it?
|
||||
/pa --external off
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
If issues arise:
|
||||
1. Set `external-mode.json` → `enabled: false`
|
||||
2. All operations revert to Claude immediately
|
||||
|
||||
## Future Considerations
|
||||
|
||||
- Auto-fallback to external when Claude rate-limited
|
||||
- Cost tracking per external model
|
||||
- Response quality comparison metrics
|
||||
- Additional providers (Mistral, local Ollama)
|
||||
Reference in New Issue
Block a user