Files
claude-code/skills/diagnose.md
OpenCode Test a80f714fc2 feat: Implement Phase 1 K8s agent orchestrator system
Core agent system for Raspberry Pi k0s cluster management:

Agents:
- k8s-orchestrator: Central task delegation and decision making
- k8s-diagnostician: Cluster health, logs, troubleshooting
- argocd-operator: GitOps deployments and rollbacks
- prometheus-analyst: Metrics queries and alert analysis
- git-operator: Manifest management and PR workflows

Workflows:
- cluster-health-check.yaml: Scheduled health assessment
- deploy-app.md: Application deployment guide
- pod-crashloop.yaml: Automated incident response

Skills:
- /cluster-status: Quick health overview
- /deploy: Deploy or update applications
- /diagnose: Investigate cluster issues

Configuration:
- Agent definitions with model assignments (Opus/Sonnet)
- Autonomy rules (safe/confirm/forbidden actions)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 11:25:11 -08:00

3.0 KiB

Diagnose Issue

Investigate and diagnose problems in the Raspberry Pi Kubernetes cluster.

Usage

/diagnose <issue-description>
/diagnose pod <pod-name> -n <namespace>
/diagnose app <argocd-app-name>
/diagnose node <node-name>

What it does

Invokes the k8s-orchestrator to investigate issues by coordinating multiple specialist agents.

Diagnosis Types

General Issue

/diagnose "my app is returning 503 errors"

The orchestrator will:

  1. Identify relevant resources
  2. Check pod status and logs
  3. Query relevant metrics
  4. Analyze ArgoCD sync state
  5. Provide diagnosis and recommendations

Pod Diagnosis

/diagnose pod myapp-7d9f8b6c5-x2k4m -n production

Focuses on:

  • Pod status and events
  • Container logs (current and previous)
  • Resource usage vs limits
  • Restart history
  • Related alerts

ArgoCD App Diagnosis

/diagnose app homepage

Focuses on:

  • Sync status and history
  • Health status of resources
  • Diff between desired and live state
  • Recent sync errors

Node Diagnosis

/diagnose node pi5-1

Focuses on:

  • Node conditions
  • Resource pressure
  • Running pods count
  • System events
  • Disk and network status

Investigation Flow

User describes issue
        │
        ▼
┌─────────────────┐
│ k8s-orchestrator│ ─── Analyze issue, plan investigation
└────────┬────────┘
         │
    ┌────┼────┬────────┐
    ▼    ▼    ▼        ▼
┌──────┐┌──────┐┌──────┐┌──────┐
│diag- ││argo- ││prom- ││git-  │
│nosti-││cd-   ││etheus││opera-│
│cian  ││oper- ││analy-││tor   │
│      ││ator  ││st    ││      │
└──┬───┘└──┬───┘└──┬───┘└──┬───┘
   │       │       │       │
   └───────┴───────┴───────┘
                │
                ▼
        ┌─────────────────┐
        │ k8s-orchestrator│ ─── Synthesize findings
        └────────┬────────┘
                 │
                 ▼
        Diagnosis + Recommendations

Output Format

Diagnosis for: [issue description]

Status: [Investigating/Identified/Resolved]

Findings:
1. [Finding with evidence]
2. [Finding with evidence]

Root Cause:
[Explanation of what's causing the issue]

Evidence:
- [Relevant log lines or metrics]
- [Command outputs]

Recommended Actions:
- [SAFE] Action that can be auto-applied
- [CONFIRM] Action requiring approval
- [INFO] Suggestion for manual follow-up

Severity: [Low/Medium/High/Critical]

Options

  • --verbose - Include full command outputs
  • --logs - Focus on log analysis
  • --metrics - Focus on metrics analysis
  • --quick - Fast surface-level check only