Core agent system for Raspberry Pi k0s cluster management: Agents: - k8s-orchestrator: Central task delegation and decision making - k8s-diagnostician: Cluster health, logs, troubleshooting - argocd-operator: GitOps deployments and rollbacks - prometheus-analyst: Metrics queries and alert analysis - git-operator: Manifest management and PR workflows Workflows: - cluster-health-check.yaml: Scheduled health assessment - deploy-app.md: Application deployment guide - pod-crashloop.yaml: Automated incident response Skills: - /cluster-status: Quick health overview - /deploy: Deploy or update applications - /diagnose: Investigate cluster issues Configuration: - Agent definitions with model assignments (Opus/Sonnet) - Autonomy rules (safe/confirm/forbidden actions) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
125 lines
3.0 KiB
Markdown
125 lines
3.0 KiB
Markdown
# Diagnose Issue
|
|
|
|
Investigate and diagnose problems in the Raspberry Pi Kubernetes cluster.
|
|
|
|
## Usage
|
|
|
|
```
|
|
/diagnose <issue-description>
|
|
/diagnose pod <pod-name> -n <namespace>
|
|
/diagnose app <argocd-app-name>
|
|
/diagnose node <node-name>
|
|
```
|
|
|
|
## What it does
|
|
|
|
Invokes the k8s-orchestrator to investigate issues by coordinating multiple specialist agents.
|
|
|
|
## Diagnosis Types
|
|
|
|
### General Issue
|
|
```
|
|
/diagnose "my app is returning 503 errors"
|
|
```
|
|
The orchestrator will:
|
|
1. Identify relevant resources
|
|
2. Check pod status and logs
|
|
3. Query relevant metrics
|
|
4. Analyze ArgoCD sync state
|
|
5. Provide diagnosis and recommendations
|
|
|
|
### Pod Diagnosis
|
|
```
|
|
/diagnose pod myapp-7d9f8b6c5-x2k4m -n production
|
|
```
|
|
Focuses on:
|
|
- Pod status and events
|
|
- Container logs (current and previous)
|
|
- Resource usage vs limits
|
|
- Restart history
|
|
- Related alerts
|
|
|
|
### ArgoCD App Diagnosis
|
|
```
|
|
/diagnose app homepage
|
|
```
|
|
Focuses on:
|
|
- Sync status and history
|
|
- Health status of resources
|
|
- Diff between desired and live state
|
|
- Recent sync errors
|
|
|
|
### Node Diagnosis
|
|
```
|
|
/diagnose node pi5-1
|
|
```
|
|
Focuses on:
|
|
- Node conditions
|
|
- Resource pressure
|
|
- Running pods count
|
|
- System events
|
|
- Disk and network status
|
|
|
|
## Investigation Flow
|
|
|
|
```
|
|
User describes issue
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ k8s-orchestrator│ ─── Analyze issue, plan investigation
|
|
└────────┬────────┘
|
|
│
|
|
┌────┼────┬────────┐
|
|
▼ ▼ ▼ ▼
|
|
┌──────┐┌──────┐┌──────┐┌──────┐
|
|
│diag- ││argo- ││prom- ││git- │
|
|
│nosti-││cd- ││etheus││opera-│
|
|
│cian ││oper- ││analy-││tor │
|
|
│ ││ator ││st ││ │
|
|
└──┬───┘└──┬───┘└──┬───┘└──┬───┘
|
|
│ │ │ │
|
|
└───────┴───────┴───────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ k8s-orchestrator│ ─── Synthesize findings
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
Diagnosis + Recommendations
|
|
```
|
|
|
|
## Output Format
|
|
|
|
```
|
|
Diagnosis for: [issue description]
|
|
|
|
Status: [Investigating/Identified/Resolved]
|
|
|
|
Findings:
|
|
1. [Finding with evidence]
|
|
2. [Finding with evidence]
|
|
|
|
Root Cause:
|
|
[Explanation of what's causing the issue]
|
|
|
|
Evidence:
|
|
- [Relevant log lines or metrics]
|
|
- [Command outputs]
|
|
|
|
Recommended Actions:
|
|
- [SAFE] Action that can be auto-applied
|
|
- [CONFIRM] Action requiring approval
|
|
- [INFO] Suggestion for manual follow-up
|
|
|
|
Severity: [Low/Medium/High/Critical]
|
|
```
|
|
|
|
## Options
|
|
|
|
- `--verbose` - Include full command outputs
|
|
- `--logs` - Focus on log analysis
|
|
- `--metrics` - Focus on metrics analysis
|
|
- `--quick` - Fast surface-level check only
|