Initial commit: Claude Code config and K8s agent orchestrator design
- Add .gitignore for logs, caches, credentials, and history - Add K8s agent orchestrator design document - Include existing Claude Code settings and plugin configs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
85
docs/plans/2025-12-26-agent-orchestrator-brainstorm.md
Normal file
85
docs/plans/2025-12-26-agent-orchestrator-brainstorm.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Agent Orchestrator System - Brainstorming Notes
|
||||
|
||||
## Overview
|
||||
User-level Claude Code agent system with orchestrator + specialized subagents + workflows.
|
||||
Location: `~/.claude/`
|
||||
|
||||
## Target Domains (for future expansion)
|
||||
- **A) DevOps/Infrastructure** - PRIMARY - Raspberry Pi K8s cluster management
|
||||
- B) Software development - Code generation, refactoring, testing
|
||||
- C) Research & analysis - Information gathering, summarizing
|
||||
- D) Personal productivity - Files, notes, tasks, schedules
|
||||
- E) Multi-domain - General-purpose tasks
|
||||
|
||||
## Primary Use Case
|
||||
- Raspberry Pi Kubernetes cluster management
|
||||
- App deployment to the cluster
|
||||
- K8s distribution: **k0s**
|
||||
- Deployment method: **GitOps with ArgoCD**
|
||||
|
||||
## Cluster Hardware
|
||||
| Node | Hardware | RAM | Role |
|
||||
|------|----------|-----|------|
|
||||
| Node 1 | Raspberry Pi 5 | 8GB | Control plane + Worker |
|
||||
| Node 2 | Raspberry Pi 5 | 8GB | Worker |
|
||||
| Node 3 | Raspberry Pi 3B+ | 1GB | Worker (tainted, tolerations required) |
|
||||
|
||||
**Pi 3 node**: Reserved for lightweight workloads only. Good candidate for dashboard deployment.
|
||||
**Architecture**: All nodes run arm64 (64-bit OS).
|
||||
|
||||
## Workloads
|
||||
- Self-hosted services (home automation, media, personal tools)
|
||||
- Development/testing environments
|
||||
- Infrastructure services (monitoring, logging, databases)
|
||||
|
||||
## Agent Tasks (priority order)
|
||||
1. **Cluster health monitoring** - Detect issues, diagnose, suggest/apply fixes (TOP PRIORITY)
|
||||
2. Deployment management - Create/update deployments, ArgoCD sync, rollbacks
|
||||
3. Resource management - Scaling, allocation, cleanup
|
||||
4. App lifecycle - End-to-end "I want to run X" to deployed
|
||||
5. Incident response - Alerting, investigation, remediation
|
||||
|
||||
## Autonomy Model
|
||||
- **Tiered autonomy**: Safe actions auto-apply, risky actions require confirmation
|
||||
- Safe: restart pod, scale replicas, clear completed jobs
|
||||
- Risky: delete PVC, modify configs, node operations
|
||||
|
||||
## Interaction Methods
|
||||
- **Terminal/CLI** - Primary interaction via Claude Code (also fallback when cluster is down)
|
||||
- **Dashboard/UI** - Web interface deployed on cluster via ArgoCD
|
||||
- **Push notifications** - Future consideration (Discord/Slack/Telegram)
|
||||
|
||||
## Infrastructure Stack
|
||||
- Monitoring: **Prometheus + Alertmanager + Grafana**
|
||||
- GitOps repo: **Self-hosted Gitea/Forgejo**
|
||||
- Workflow triggers: **Scheduled + Event-driven (Alertmanager webhooks)**
|
||||
|
||||
## Implementation Approach
|
||||
**Phase 1**: Claude Code skills + custom subagent types in `~/.claude/`
|
||||
**Phase 2 (later)**: Add SDK-based daemon for background automation
|
||||
|
||||
## Subagents
|
||||
1. **k8s-diagnostician** - Cluster health, pod/node status, resource utilization, log analysis
|
||||
2. **argocd-operator** - App sync, deployments, rollbacks, GitOps operations
|
||||
3. **prometheus-analyst** - Query metrics, analyze trends, interpret alerts
|
||||
4. **git-operator** - Commit manifests, create PRs in Gitea, manage GitOps repo
|
||||
|
||||
## Workflow Definitions
|
||||
- **YAML** - Complex workflows with branching, conditions, multi-step
|
||||
- **Markdown** - Simple workflows, prose-like descriptions
|
||||
|
||||
## CLI Tools Available
|
||||
- kubectl
|
||||
- argocd CLI
|
||||
- k0sctl
|
||||
|
||||
## Model Assignment
|
||||
- **Default**: Orchestrator = Opus, Subagents = Sonnet
|
||||
- **Override levels**:
|
||||
1. Per-workflow: specify model in workflow YAML
|
||||
2. Per-step: specify model for individual workflow steps
|
||||
3. Dynamic: Orchestrator can downgrade/upgrade model per-delegation based on task complexity
|
||||
- **Cost optimization**: Orchestrator evaluates task complexity and selects appropriate model
|
||||
- Simple queries (get status, list) → Haiku
|
||||
- Standard operations (analyze, diagnose) → Sonnet
|
||||
- Complex reasoning (root cause, multi-factor decisions) → Opus
|
||||
Reference in New Issue
Block a user