claude-code/skills/diagnose.md

# Diagnose Issue

Investigate and diagnose problems in the Raspberry Pi Kubernetes cluster.

## Usage

```
/diagnose <issue-description>
/diagnose pod <pod-name> -n <namespace>
/diagnose app <argocd-app-name>
/diagnose node <node-name>
```

## What it does

Invokes the k8s-orchestrator to investigate issues by coordinating multiple specialist agents.

## Diagnosis Types

### General Issue
```
/diagnose "my app is returning 503 errors"
```
The orchestrator will:
1. Identify relevant resources
2. Check pod status and logs
3. Query relevant metrics
4. Analyze ArgoCD sync state
5. Provide diagnosis and recommendations

### Pod Diagnosis
```
/diagnose pod myapp-7d9f8b6c5-x2k4m -n production
```
Focuses on:
- Pod status and events
- Container logs (current and previous)
- Resource usage vs limits
- Restart history
- Related alerts

### ArgoCD App Diagnosis
```
/diagnose app homepage
```
Focuses on:
- Sync status and history
- Health status of resources
- Diff between desired and live state
- Recent sync errors

### Node Diagnosis
```
/diagnose node pi5-1
```
Focuses on:
- Node conditions
- Resource pressure
- Running pods count
- System events
- Disk and network status

## Investigation Flow

```
User describes issue
        │
        ▼
┌─────────────────┐
│ k8s-orchestrator│ ─── Analyze issue, plan investigation
└────────┬────────┘
         │
    ┌────┼────┬────────┐
    ▼    ▼    ▼        ▼
┌──────┐┌──────┐┌──────┐┌──────┐
│diag- ││argo- ││prom- ││git-  │
│nosti-││cd-   ││etheus││opera-│
│cian  ││oper- ││analy-││tor   │
│      ││ator  ││st    ││      │
└──┬───┘└──┬───┘└──┬───┘└──┬───┘
   │       │       │       │
   └───────┴───────┴───────┘
                │
                ▼
        ┌─────────────────┐
        │ k8s-orchestrator│ ─── Synthesize findings
        └────────┬────────┘
                 │
                 ▼
        Diagnosis + Recommendations
```

## Output Format

```
Diagnosis for: [issue description]

Status: [Investigating/Identified/Resolved]

Findings:
1. [Finding with evidence]
2. [Finding with evidence]

Root Cause:
[Explanation of what's causing the issue]

Evidence:
- [Relevant log lines or metrics]
- [Command outputs]

Recommended Actions:
- [SAFE] Action that can be auto-applied
- [CONFIRM] Action requiring approval
- [INFO] Suggestion for manual follow-up

Severity: [Low/Medium/High/Critical]
```

## Options

- `--verbose` - Include full command outputs
- `--logs` - Focus on log analysis
- `--metrics` - Focus on metrics analysis
- `--quick` - Fast surface-level check only