Update dashboard manifests and add automation

- Updated deployment with correct Pi 3 tolerations
- Updated ingress for cloudflare-tunnel
- Added crontab example for systemd alternative
- Updated go.sum

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
OpenCode Test
2025-12-27 11:39:40 -08:00
parent c14bae9a12
commit 73512e92a6
7 changed files with 64 additions and 21 deletions

View File

@@ -0,0 +1,34 @@
# Cluster Issue Diagnosis Plan
## Issues to Investigate
1. **Critical Alerts** - KubeSchedulerDown, KubeControllerManagerDown
- Likely false positives (k0s bundles these in k0s-controller)
- Check if cluster is actually functional
2. **CrashLooping Pod** - Find and diagnose
- Get pod status across all namespaces
- Check logs and events
3. **Stuck Deployment** - Find and diagnose
- List deployments not at desired replica count
- Check events
4. **Degraded kube-prometheus-stack**
- Check prometheus/alertmanager pods
## Commands to Run
```bash
# Find crash looping pods
kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull'
# Find stuck deployments
kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4'
# Check prometheus stack
kubectl get pods -n monitoring
# Check scheduler/controller (k0s specific)
kubectl get pods -n kube-system
```