# Cluster Issue Diagnosis Plan ## Issues to Investigate 1. **Critical Alerts** - KubeSchedulerDown, KubeControllerManagerDown - Likely false positives (k0s bundles these in k0s-controller) - Check if cluster is actually functional 2. **CrashLooping Pod** - Find and diagnose - Get pod status across all namespaces - Check logs and events 3. **Stuck Deployment** - Find and diagnose - List deployments not at desired replica count - Check events 4. **Degraded kube-prometheus-stack** - Check prometheus/alertmanager pods ## Commands to Run ```bash # Find crash looping pods kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull' # Find stuck deployments kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4' # Check prometheus stack kubectl get pods -n monitoring # Check scheduler/controller (k0s specific) kubectl get pods -n kube-system ```