# Plan: External LLM Integration ## Summary Integrate external LLMs (via subscription-based access) into the agent system for cost optimization, specialized capabilities, and redundancy. Uses `opencode` CLI for Copilot/Z.AI models and `gemini` CLI for Google models. ## Motivation - **Cost optimization** — Use cheaper models for simple tasks - **Specialized capabilities** — Access models with unique strengths (GPT-5.2 for reasoning, GLM 4.7 for code) - **Redundancy** — Fallback when Claude is unavailable ## Design Decisions | Decision | Choice | |----------|--------| | Provider type | Cloud APIs via subscription (not local) | | Providers | GitHub Copilot, Z.AI, Google Gemini | | CLIs | `opencode` (Copilot, Z.AI), `gemini` (Google) | | Integration | Task-specific routing + agent-level assignment | | Toggle | State file (persists across sessions) | | Toggle scope | All agents switch when enabled | ## Task Routing | Task Type | Model | |-----------|-------| | Reasoning chains | copilot/gpt-5.2 | | Code generation | zai/glm-4.7 | | Long context | gemini/gemini-3-pro | | General/fallback | copilot/sonnet-4.5 | ## Claude-to-External Mapping | Claude Tier | External Equivalent | |-------------|---------------------| | opus | copilot/gpt-5.2 | | sonnet | copilot/sonnet-4.5 | | haiku | copilot/haiku-4.5 | ## Files to Create ### `~/.claude/state/external-mode.json` ```json { "enabled": false, "activated_at": null, "reason": null } ``` ### `~/.claude/mcp/llm-router/invoke.py` Main entry point for invoking external LLMs. ```python #!/usr/bin/env python3 """ Invoke external LLM via configured provider. Usage: invoke.py --model copilot/gpt-5.2 -p "prompt" invoke.py --task reasoning -p "prompt" """ import argparse import json import subprocess from pathlib import Path STATE_DIR = Path.home() / ".claude/state" def load_policy(): with open(STATE_DIR / "model-policy.json") as f: return json.load(f) def resolve_model(args, policy): if args.model: return args.model if args.task and args.task in policy["task_routing"]: return policy["task_routing"][args.task] return policy["task_routing"]["default"] def invoke(model: str, prompt: str, policy: dict) -> str: model_config = policy["external_models"][model] cli = model_config["cli"] cli_args = model_config["cli_args"] if cli == "opencode": from providers.opencode import invoke as opencode_invoke return opencode_invoke(cli_args, prompt) elif cli == "gemini": from providers.gemini import invoke as gemini_invoke return gemini_invoke(cli_args, prompt) else: raise ValueError(f"Unknown CLI: {cli}") def main(): parser = argparse.ArgumentParser() parser.add_argument("-p", "--prompt", required=True, help="Prompt text") parser.add_argument("--model", help="Explicit model (e.g., copilot/gpt-5.2)") parser.add_argument("--task", help="Task type for routing") parser.add_argument("--json", action="store_true", help="Output as JSON") args = parser.parse_args() policy = load_policy() model = resolve_model(args, policy) result = invoke(args.prompt, model, policy) if args.json: print(json.dumps({"model": model, "response": result})) else: print(result) if __name__ == "__main__": main() ``` ### `~/.claude/mcp/llm-router/providers/opencode.py` ```python #!/usr/bin/env python3 """OpenCode CLI wrapper.""" import subprocess def invoke(cli_args: list, prompt: str) -> str: cmd = ["opencode", "--print"] + cli_args + ["-p", prompt] result = subprocess.run( cmd, capture_output=True, text=True, timeout=300 ) if result.returncode != 0: raise RuntimeError(f"opencode failed: {result.stderr}") return result.stdout.strip() ``` ### `~/.claude/mcp/llm-router/providers/gemini.py` ```python #!/usr/bin/env python3 """Gemini CLI wrapper.""" import subprocess def invoke(cli_args: list, prompt: str) -> str: cmd = ["gemini"] + cli_args + ["-p", prompt] result = subprocess.run( cmd, capture_output=True, text=True, timeout=300 ) if result.returncode != 0: raise RuntimeError(f"gemini failed: {result.stderr}") return result.stdout.strip() ``` ### `~/.claude/mcp/llm-router/delegate.py` ```python #!/usr/bin/env python3 """ Agent delegation helper. Routes to external or Claude based on mode. Usage: delegate.py --tier sonnet -p "prompt" """ import argparse import json import subprocess from pathlib import Path STATE_DIR = Path.home() / ".claude/state" def is_external_mode(): mode_file = STATE_DIR / "external-mode.json" if mode_file.exists(): with open(mode_file) as f: return json.load(f).get("enabled", False) return False def delegate(tier: str, prompt: str) -> str: if is_external_mode(): policy = json.loads((STATE_DIR / "model-policy.json").read_text()) model = policy["claude_to_external_map"][tier] result = subprocess.run( [str(Path.home() / ".claude/mcp/llm-router/invoke.py"), "--model", model, "-p", prompt], capture_output=True, text=True ) return result.stdout else: result = subprocess.run( ["claude", "--print", "--model", tier, prompt], capture_output=True, text=True ) return result.stdout def main(): parser = argparse.ArgumentParser() parser.add_argument("--tier", required=True, choices=["opus", "sonnet", "haiku"]) parser.add_argument("-p", "--prompt", required=True) args = parser.parse_args() print(delegate(args.tier, args.prompt)) if __name__ == "__main__": main() ``` ## Files to Modify ### `~/.claude/state/model-policy.json` Add sections: ```json { "external_models": { "copilot/gpt-5.2": { "cli": "opencode", "cli_args": ["--provider", "copilot", "--model", "gpt-5.2"], "use_cases": ["reasoning", "fallback"], "tier": "opus-equivalent" }, "copilot/sonnet-4.5": { "cli": "opencode", "cli_args": ["--provider", "copilot", "--model", "sonnet-4.5"], "use_cases": ["general", "fallback"], "tier": "sonnet-equivalent" }, "copilot/haiku-4.5": { "cli": "opencode", "cli_args": ["--provider", "copilot", "--model", "haiku-4.5"], "use_cases": ["simple"], "tier": "haiku-equivalent" }, "zai/glm-4.7": { "cli": "opencode", "cli_args": ["--provider", "zai", "--model", "glm-4.7"], "use_cases": ["code-generation"], "tier": "sonnet-equivalent" }, "gemini/gemini-3-pro": { "cli": "gemini", "cli_args": ["-m", "gemini-3-pro"], "use_cases": ["long-context"], "tier": "opus-equivalent" } }, "claude_to_external_map": { "opus": "copilot/gpt-5.2", "sonnet": "copilot/sonnet-4.5", "haiku": "copilot/haiku-4.5" }, "task_routing": { "reasoning": "copilot/gpt-5.2", "code-generation": "zai/glm-4.7", "long-context": "gemini/gemini-3-pro", "default": "copilot/sonnet-4.5" } } ``` ### `~/.claude/state/component-registry.json` Add trigger for external mode: ```json { "id": "external-mode-toggle", "type": "skill", "triggers": [ "use external", "switch to external", "external models", "stop using claude", "external only", "use copilot", "back to claude", "use claude again", "disable external" ], "action": "toggle-external-mode" } ``` ### `~/.claude/hooks/scripts/session-start.sh` Add external mode announcement: ```bash # Check external mode if [ -f ~/.claude/state/external-mode.json ]; then if [ "$(jq -r '.enabled' ~/.claude/state/external-mode.json)" = "true" ]; then echo "external-mode:enabled" fi fi ``` ## Toggle Interface ### Command-based ``` /pa --external on # Enable external-only mode /pa --external off # Disable, return to Claude /pa --external status # Show current mode ``` ### Natural language | User says | Action | |-----------|--------| | "switch to external models" | Enable | | "use copilot for everything" | Enable | | "go back to claude" | Disable | | "are we using external?" | Status | | "use external for this" | One-shot (no persist) | ### Visual indicator When external mode active, PA prefixes responses: ``` 🔌 [External: copilot/gpt-5.2] ``` ## Implementation Order 1. Create `external-mode.json` state file 2. Extend `model-policy.json` with external models 3. Create `llm-router/` directory and scripts 4. Add provider wrappers (opencode, gemini) 5. Create delegation helper 6. Update PA with toggle commands 7. Add component-registry triggers 8. Update session-start hook 9. Test each provider ## Validation ```bash # Test router directly ~/.claude/mcp/llm-router/invoke.py --model copilot/gpt-5.2 -p "Say hello" # Test delegation ~/.claude/mcp/llm-router/delegate.py --tier sonnet -p "What is 2+2?" # Test toggle /pa --external on /pa what time is it? /pa --external off ``` ## Rollback If issues arise: 1. Set `external-mode.json` → `enabled: false` 2. All operations revert to Claude immediately ## Future Considerations - Auto-fallback to external when Claude rate-limited - Cost tracking per external model - Response quality comparison metrics - Additional providers (Mistral, local Ollama)