claude-code/plans/pure-wishing-metcalfe.md

# Cluster Issue Diagnosis Plan

## Issues to Investigate

1. **Critical Alerts** - KubeSchedulerDown, KubeControllerManagerDown
   - Likely false positives (k0s bundles these in k0s-controller)
   - Check if cluster is actually functional

2. **CrashLooping Pod** - Find and diagnose
   - Get pod status across all namespaces
   - Check logs and events

3. **Stuck Deployment** - Find and diagnose
   - List deployments not at desired replica count
   - Check events

4. **Degraded kube-prometheus-stack**
   - Check prometheus/alertmanager pods

## Commands to Run

```bash
# Find crash looping pods
kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull'

# Find stuck deployments
kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4'

# Check prometheus stack
kubectl get pods -n monitoring

# Check scheduler/controller (k0s specific)
kubectl get pods -n kube-system
```