Files
claude-code/plans/pure-wishing-metcalfe.md
OpenCode Test 73512e92a6 Update dashboard manifests and add automation
- Updated deployment with correct Pi 3 tolerations
- Updated ingress for cloudflare-tunnel
- Added crontab example for systemd alternative
- Updated go.sum

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-27 11:39:40 -08:00

884 B

Cluster Issue Diagnosis Plan

Issues to Investigate

  1. Critical Alerts - KubeSchedulerDown, KubeControllerManagerDown

    • Likely false positives (k0s bundles these in k0s-controller)
    • Check if cluster is actually functional
  2. CrashLooping Pod - Find and diagnose

    • Get pod status across all namespaces
    • Check logs and events
  3. Stuck Deployment - Find and diagnose

    • List deployments not at desired replica count
    • Check events
  4. Degraded kube-prometheus-stack

    • Check prometheus/alertmanager pods

Commands to Run

# Find crash looping pods
kubectl get pods -A | grep -E 'CrashLoop|Error|ImagePull'

# Find stuck deployments
kubectl get deploy -A -o wide | grep -v '1/1\|2/2\|3/3\|4/4'

# Check prometheus stack
kubectl get pods -n monitoring

# Check scheduler/controller (k0s specific)
kubectl get pods -n kube-system