Files

OpenCode Test 216a95cec4 Initial commit: Claude Code config and K8s agent orchestrator design

- Add .gitignore for logs, caches, credentials, and history
- Add K8s agent orchestrator design document
- Include existing Claude Code settings and plugin configs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-26 11:16:07 -08:00

3.5 KiB

Raw Blame History

Agent Orchestrator System - Brainstorming Notes

Overview

User-level Claude Code agent system with orchestrator + specialized subagents + workflows. Location: ~/.claude/

Target Domains (for future expansion)

A) DevOps/Infrastructure - PRIMARY - Raspberry Pi K8s cluster management
B) Software development - Code generation, refactoring, testing
C) Research & analysis - Information gathering, summarizing
D) Personal productivity - Files, notes, tasks, schedules
E) Multi-domain - General-purpose tasks

Primary Use Case

Raspberry Pi Kubernetes cluster management
App deployment to the cluster
K8s distribution: k0s
Deployment method: GitOps with ArgoCD

Cluster Hardware

Node	Hardware	RAM	Role
Node 1	Raspberry Pi 5	8GB	Control plane + Worker
Node 2	Raspberry Pi 5	8GB	Worker
Node 3	Raspberry Pi 3B+	1GB	Worker (tainted, tolerations required)

Pi 3 node: Reserved for lightweight workloads only. Good candidate for dashboard deployment. Architecture: All nodes run arm64 (64-bit OS).

Workloads

Self-hosted services (home automation, media, personal tools)
Development/testing environments
Infrastructure services (monitoring, logging, databases)

Agent Tasks (priority order)

Cluster health monitoring - Detect issues, diagnose, suggest/apply fixes (TOP PRIORITY)
Deployment management - Create/update deployments, ArgoCD sync, rollbacks
Resource management - Scaling, allocation, cleanup
App lifecycle - End-to-end "I want to run X" to deployed
Incident response - Alerting, investigation, remediation

Autonomy Model

Tiered autonomy: Safe actions auto-apply, risky actions require confirmation
Safe: restart pod, scale replicas, clear completed jobs
Risky: delete PVC, modify configs, node operations

Interaction Methods

Terminal/CLI - Primary interaction via Claude Code (also fallback when cluster is down)
Dashboard/UI - Web interface deployed on cluster via ArgoCD
Push notifications - Future consideration (Discord/Slack/Telegram)

Infrastructure Stack

Monitoring: Prometheus + Alertmanager + Grafana
GitOps repo: Self-hosted Gitea/Forgejo
Workflow triggers: Scheduled + Event-driven (Alertmanager webhooks)

Implementation Approach

Phase 1: Claude Code skills + custom subagent types in ~/.claude/ Phase 2 (later): Add SDK-based daemon for background automation

Subagents

k8s-diagnostician - Cluster health, pod/node status, resource utilization, log analysis
argocd-operator - App sync, deployments, rollbacks, GitOps operations
prometheus-analyst - Query metrics, analyze trends, interpret alerts
git-operator - Commit manifests, create PRs in Gitea, manage GitOps repo

Workflow Definitions

YAML - Complex workflows with branching, conditions, multi-step
Markdown - Simple workflows, prose-like descriptions

CLI Tools Available

kubectl
argocd CLI
k0sctl

Model Assignment

Default: Orchestrator = Opus, Subagents = Sonnet
Override levels:
1. Per-workflow: specify model in workflow YAML
2. Per-step: specify model for individual workflow steps
3. Dynamic: Orchestrator can downgrade/upgrade model per-delegation based on task complexity
Cost optimization: Orchestrator evaluates task complexity and selects appropriate model
- Simple queries (get status, list) → Haiku
- Standard operations (analyze, diagnose) → Sonnet
- Complex reasoning (root cause, multi-factor decisions) → Opus

3.5 KiB Raw Blame History