Update dashboard manifests and add automation
- Updated deployment with correct Pi 3 tolerations - Updated ingress for cloudflare-tunnel - Added crontab example for systemd alternative - Updated go.sum 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
34
plans/pure-wishing-metcalfe.md
Normal file
34
plans/pure-wishing-metcalfe.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# Cluster Issue Diagnosis Plan
|
||||
|
||||
## Issues to Investigate
|
||||
|
||||
1. **Critical Alerts** - KubeSchedulerDown, KubeControllerManagerDown
|
||||
- Likely false positives (k0s bundles these in k0s-controller)
|
||||
- Check if cluster is actually functional
|
||||
|
||||
2. **CrashLooping Pod** - Find and diagnose
|
||||
- Get pod status across all namespaces
|
||||
- Check logs and events
|
||||
|
||||
3. **Stuck Deployment** - Find and diagnose
|
||||
- List deployments not at desired replica count
|
||||
- Check events
|
||||
|
||||
4. **Degraded kube-prometheus-stack**
|
||||
- Check prometheus/alertmanager pods
|
||||
|
||||
## Commands to Run
|
||||
|
||||
```bash
|
||||
# Find crash looping pods
|
||||
kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull'
|
||||
|
||||
# Find stuck deployments
|
||||
kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4'
|
||||
|
||||
# Check prometheus stack
|
||||
kubectl get pods -n monitoring
|
||||
|
||||
# Check scheduler/controller (k0s specific)
|
||||
kubectl get pods -n kube-system
|
||||
```
|
||||
Reference in New Issue
Block a user