Update dashboard manifests and add automation

- Updated deployment with correct Pi 3 tolerations - Updated ingress for cloudflare-tunnel - Added crontab example for systemd alternative - Updated go.sum 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 11:39:40 -08:00
parent c14bae9a12
commit 73512e92a6
7 changed files with 64 additions and 21 deletions
@@ -0,0 +1,34 @@
+# Cluster Issue Diagnosis Plan
+
+## Issues to Investigate
+
+1. **Critical Alerts** - KubeSchedulerDown, KubeControllerManagerDown
+   - Likely false positives (k0s bundles these in k0s-controller)
+   - Check if cluster is actually functional
+
+2. **CrashLooping Pod** - Find and diagnose
+   - Get pod status across all namespaces
+   - Check logs and events
+
+3. **Stuck Deployment** - Find and diagnose
+   - List deployments not at desired replica count
+   - Check events
+
+4. **Degraded kube-prometheus-stack**
+   - Check prometheus/alertmanager pods
+
+## Commands to Run
+
+```bash
+# Find crash looping pods
+kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull'
+
+# Find stuck deployments
+kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4'
+
+# Check prometheus stack
+kubectl get pods -n monitoring
+
+# Check scheduler/controller (k0s specific)
+kubectl get pods -n kube-system
+```