Files
claude-code/plans/gleaming-routing-mercury.md
OpenCode Test df6cf94dae feat(external-llm): add external LLM integration (fc-004)
Implements external LLM routing via opencode CLI for:
- GitHub Copilot (gpt-5.2, claude-sonnet-4.5, claude-haiku-4.5, o3, gemini-3-pro)
- Z.AI (glm-4.7 for code generation)
- OpenCode native (big-pickle)

Components:
- mcp/llm-router/invoke.py: Main router with task-based model selection
- mcp/llm-router/delegate.py: Agent delegation helper (respects external mode)
- mcp/llm-router/toggle.py: Enable/disable external-only mode
- mcp/llm-router/providers/: CLI wrappers for opencode and gemini

Features:
- Persistent toggle via state/external-mode.json
- Task routing: reasoning -> gpt-5.2, code-gen -> glm-4.7, long-context -> gemini
- Claude tier mapping: opus -> gpt-5.2, sonnet -> claude-sonnet-4.5, haiku -> claude-haiku-4.5
- Session-start hook announces when external mode is active
- Natural language toggle support via component registry

Plan: gleaming-routing-mercury

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 13:34:35 -08:00

9.2 KiB

Plan: External LLM Integration

Summary

Integrate external LLMs (via subscription-based access) into the agent system for cost optimization, specialized capabilities, and redundancy. Uses opencode CLI for Copilot/Z.AI models and gemini CLI for Google models.

Motivation

  • Cost optimization — Use cheaper models for simple tasks
  • Specialized capabilities — Access models with unique strengths (GPT-5.2 for reasoning, GLM 4.7 for code)
  • Redundancy — Fallback when Claude is unavailable

Design Decisions

Decision Choice
Provider type Cloud APIs via subscription (not local)
Providers GitHub Copilot, Z.AI, Google Gemini
CLIs opencode (Copilot, Z.AI), gemini (Google)
Integration Task-specific routing + agent-level assignment
Toggle State file (persists across sessions)
Toggle scope All agents switch when enabled

Task Routing

Task Type Model
Reasoning chains copilot/gpt-5.2
Code generation zai/glm-4.7
Long context gemini/gemini-3-pro
General/fallback copilot/sonnet-4.5

Claude-to-External Mapping

Claude Tier External Equivalent
opus copilot/gpt-5.2
sonnet copilot/sonnet-4.5
haiku copilot/haiku-4.5

Files to Create

~/.claude/state/external-mode.json

{
  "enabled": false,
  "activated_at": null,
  "reason": null
}

~/.claude/mcp/llm-router/invoke.py

Main entry point for invoking external LLMs.

#!/usr/bin/env python3
"""
Invoke external LLM via configured provider.

Usage:
  invoke.py --model copilot/gpt-5.2 -p "prompt"
  invoke.py --task reasoning -p "prompt"
"""

import argparse
import json
import subprocess
from pathlib import Path

STATE_DIR = Path.home() / ".claude/state"

def load_policy():
    with open(STATE_DIR / "model-policy.json") as f:
        return json.load(f)

def resolve_model(args, policy):
    if args.model:
        return args.model
    if args.task and args.task in policy["task_routing"]:
        return policy["task_routing"][args.task]
    return policy["task_routing"]["default"]

def invoke(model: str, prompt: str, policy: dict) -> str:
    model_config = policy["external_models"][model]
    cli = model_config["cli"]
    cli_args = model_config["cli_args"]

    if cli == "opencode":
        from providers.opencode import invoke as opencode_invoke
        return opencode_invoke(cli_args, prompt)
    elif cli == "gemini":
        from providers.gemini import invoke as gemini_invoke
        return gemini_invoke(cli_args, prompt)
    else:
        raise ValueError(f"Unknown CLI: {cli}")

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-p", "--prompt", required=True, help="Prompt text")
    parser.add_argument("--model", help="Explicit model (e.g., copilot/gpt-5.2)")
    parser.add_argument("--task", help="Task type for routing")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    args = parser.parse_args()

    policy = load_policy()
    model = resolve_model(args, policy)
    result = invoke(args.prompt, model, policy)

    if args.json:
        print(json.dumps({"model": model, "response": result}))
    else:
        print(result)

if __name__ == "__main__":
    main()

~/.claude/mcp/llm-router/providers/opencode.py

#!/usr/bin/env python3
"""OpenCode CLI wrapper."""

import subprocess

def invoke(cli_args: list, prompt: str) -> str:
    cmd = ["opencode", "--print"] + cli_args + ["-p", prompt]

    result = subprocess.run(
        cmd,
        capture_output=True,
        text=True,
        timeout=300
    )

    if result.returncode != 0:
        raise RuntimeError(f"opencode failed: {result.stderr}")

    return result.stdout.strip()

~/.claude/mcp/llm-router/providers/gemini.py

#!/usr/bin/env python3
"""Gemini CLI wrapper."""

import subprocess

def invoke(cli_args: list, prompt: str) -> str:
    cmd = ["gemini"] + cli_args + ["-p", prompt]

    result = subprocess.run(
        cmd,
        capture_output=True,
        text=True,
        timeout=300
    )

    if result.returncode != 0:
        raise RuntimeError(f"gemini failed: {result.stderr}")

    return result.stdout.strip()

~/.claude/mcp/llm-router/delegate.py

#!/usr/bin/env python3
"""
Agent delegation helper. Routes to external or Claude based on mode.

Usage:
  delegate.py --tier sonnet -p "prompt"
"""

import argparse
import json
import subprocess
from pathlib import Path

STATE_DIR = Path.home() / ".claude/state"

def is_external_mode():
    mode_file = STATE_DIR / "external-mode.json"
    if mode_file.exists():
        with open(mode_file) as f:
            return json.load(f).get("enabled", False)
    return False

def delegate(tier: str, prompt: str) -> str:
    if is_external_mode():
        policy = json.loads((STATE_DIR / "model-policy.json").read_text())
        model = policy["claude_to_external_map"][tier]

        result = subprocess.run(
            [str(Path.home() / ".claude/mcp/llm-router/invoke.py"),
             "--model", model, "-p", prompt],
            capture_output=True, text=True
        )
        return result.stdout
    else:
        result = subprocess.run(
            ["claude", "--print", "--model", tier, prompt],
            capture_output=True, text=True
        )
        return result.stdout

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--tier", required=True, choices=["opus", "sonnet", "haiku"])
    parser.add_argument("-p", "--prompt", required=True)
    args = parser.parse_args()

    print(delegate(args.tier, args.prompt))

if __name__ == "__main__":
    main()

Files to Modify

~/.claude/state/model-policy.json

Add sections:

{
  "external_models": {
    "copilot/gpt-5.2": {
      "cli": "opencode",
      "cli_args": ["--provider", "copilot", "--model", "gpt-5.2"],
      "use_cases": ["reasoning", "fallback"],
      "tier": "opus-equivalent"
    },
    "copilot/sonnet-4.5": {
      "cli": "opencode",
      "cli_args": ["--provider", "copilot", "--model", "sonnet-4.5"],
      "use_cases": ["general", "fallback"],
      "tier": "sonnet-equivalent"
    },
    "copilot/haiku-4.5": {
      "cli": "opencode",
      "cli_args": ["--provider", "copilot", "--model", "haiku-4.5"],
      "use_cases": ["simple"],
      "tier": "haiku-equivalent"
    },
    "zai/glm-4.7": {
      "cli": "opencode",
      "cli_args": ["--provider", "zai", "--model", "glm-4.7"],
      "use_cases": ["code-generation"],
      "tier": "sonnet-equivalent"
    },
    "gemini/gemini-3-pro": {
      "cli": "gemini",
      "cli_args": ["-m", "gemini-3-pro"],
      "use_cases": ["long-context"],
      "tier": "opus-equivalent"
    }
  },
  "claude_to_external_map": {
    "opus": "copilot/gpt-5.2",
    "sonnet": "copilot/sonnet-4.5",
    "haiku": "copilot/haiku-4.5"
  },
  "task_routing": {
    "reasoning": "copilot/gpt-5.2",
    "code-generation": "zai/glm-4.7",
    "long-context": "gemini/gemini-3-pro",
    "default": "copilot/sonnet-4.5"
  }
}

~/.claude/state/component-registry.json

Add trigger for external mode:

{
  "id": "external-mode-toggle",
  "type": "skill",
  "triggers": [
    "use external", "switch to external", "external models",
    "stop using claude", "external only", "use copilot",
    "back to claude", "use claude again", "disable external"
  ],
  "action": "toggle-external-mode"
}

~/.claude/hooks/scripts/session-start.sh

Add external mode announcement:

# Check external mode
if [ -f ~/.claude/state/external-mode.json ]; then
  if [ "$(jq -r '.enabled' ~/.claude/state/external-mode.json)" = "true" ]; then
    echo "external-mode:enabled"
  fi
fi

Toggle Interface

Command-based

/pa --external on       # Enable external-only mode
/pa --external off      # Disable, return to Claude
/pa --external status   # Show current mode

Natural language

User says Action
"switch to external models" Enable
"use copilot for everything" Enable
"go back to claude" Disable
"are we using external?" Status
"use external for this" One-shot (no persist)

Visual indicator

When external mode active, PA prefixes responses:

🔌 [External: copilot/gpt-5.2]

<response>

Implementation Order

  1. Create external-mode.json state file
  2. Extend model-policy.json with external models
  3. Create llm-router/ directory and scripts
  4. Add provider wrappers (opencode, gemini)
  5. Create delegation helper
  6. Update PA with toggle commands
  7. Add component-registry triggers
  8. Update session-start hook
  9. Test each provider

Validation

# Test router directly
~/.claude/mcp/llm-router/invoke.py --model copilot/gpt-5.2 -p "Say hello"

# Test delegation
~/.claude/mcp/llm-router/delegate.py --tier sonnet -p "What is 2+2?"

# Test toggle
/pa --external on
/pa what time is it?
/pa --external off

Rollback

If issues arise:

  1. Set external-mode.jsonenabled: false
  2. All operations revert to Claude immediately

Future Considerations

  • Auto-fallback to external when Claude rate-limited
  • Cost tracking per external model
  • Response quality comparison metrics
  • Additional providers (Mistral, local Ollama)