Initial commit: Claude Code config and K8s agent orchestrator design

- Add .gitignore for logs, caches, credentials, and history - Add K8s agent orchestrator design document - Include existing Claude Code settings and plugin configs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:16:07 -08:00
commit 216a95cec4
9 changed files with 1116 additions and 0 deletions
--- a/docs/plans/2025-12-26-agent-orchestrator-brainstorm.md
+++ b/docs/plans/2025-12-26-agent-orchestrator-brainstorm.md
@@ -0,0 +1,85 @@
+# Agent Orchestrator System - Brainstorming Notes
+
+## Overview
+User-level Claude Code agent system with orchestrator + specialized subagents + workflows.
+Location: `~/.claude/`
+
+## Target Domains (for future expansion)
+- **A) DevOps/Infrastructure** - PRIMARY - Raspberry Pi K8s cluster management
+- B) Software development - Code generation, refactoring, testing
+- C) Research & analysis - Information gathering, summarizing
+- D) Personal productivity - Files, notes, tasks, schedules
+- E) Multi-domain - General-purpose tasks
+
+## Primary Use Case
+- Raspberry Pi Kubernetes cluster management
+- App deployment to the cluster
+- K8s distribution: **k0s**
+- Deployment method: **GitOps with ArgoCD**
+
+## Cluster Hardware
+| Node | Hardware | RAM | Role |
+|------|----------|-----|------|
+| Node 1 | Raspberry Pi 5 | 8GB | Control plane + Worker |
+| Node 2 | Raspberry Pi 5 | 8GB | Worker |
+| Node 3 | Raspberry Pi 3B+ | 1GB | Worker (tainted, tolerations required) |
+
+**Pi 3 node**: Reserved for lightweight workloads only. Good candidate for dashboard deployment.
+**Architecture**: All nodes run arm64 (64-bit OS).
+
+## Workloads
+- Self-hosted services (home automation, media, personal tools)
+- Development/testing environments
+- Infrastructure services (monitoring, logging, databases)
+
+## Agent Tasks (priority order)
+1. **Cluster health monitoring** - Detect issues, diagnose, suggest/apply fixes (TOP PRIORITY)
+2. Deployment management - Create/update deployments, ArgoCD sync, rollbacks
+3. Resource management - Scaling, allocation, cleanup
+4. App lifecycle - End-to-end "I want to run X" to deployed
+5. Incident response - Alerting, investigation, remediation
+
+## Autonomy Model
+- **Tiered autonomy**: Safe actions auto-apply, risky actions require confirmation
+- Safe: restart pod, scale replicas, clear completed jobs
+- Risky: delete PVC, modify configs, node operations
+
+## Interaction Methods
+- **Terminal/CLI** - Primary interaction via Claude Code (also fallback when cluster is down)
+- **Dashboard/UI** - Web interface deployed on cluster via ArgoCD
+- **Push notifications** - Future consideration (Discord/Slack/Telegram)
+
+## Infrastructure Stack
+- Monitoring: **Prometheus + Alertmanager + Grafana**
+- GitOps repo: **Self-hosted Gitea/Forgejo**
+- Workflow triggers: **Scheduled + Event-driven (Alertmanager webhooks)**
+
+## Implementation Approach
+**Phase 1**: Claude Code skills + custom subagent types in `~/.claude/`
+**Phase 2 (later)**: Add SDK-based daemon for background automation
+
+## Subagents
+1. **k8s-diagnostician** - Cluster health, pod/node status, resource utilization, log analysis
+2. **argocd-operator** - App sync, deployments, rollbacks, GitOps operations
+3. **prometheus-analyst** - Query metrics, analyze trends, interpret alerts
+4. **git-operator** - Commit manifests, create PRs in Gitea, manage GitOps repo
+
+## Workflow Definitions
+- **YAML** - Complex workflows with branching, conditions, multi-step
+- **Markdown** - Simple workflows, prose-like descriptions
+
+## CLI Tools Available
+- kubectl
+- argocd CLI
+- k0sctl
+
+## Model Assignment
+- **Default**: Orchestrator = Opus, Subagents = Sonnet
+- **Override levels**:
+  1. Per-workflow: specify model in workflow YAML
+  2. Per-step: specify model for individual workflow steps
+  3. Dynamic: Orchestrator can downgrade/upgrade model per-delegation based on task complexity
+- **Cost optimization**: Orchestrator evaluates task complexity and selects appropriate model
+  - Simple queries (get status, list) → Haiku
+  - Standard operations (analyze, diagnose) → Sonnet
+  - Complex reasoning (root cause, multi-factor decisions) → Opus