name: resource-pressure-response description: Respond to cluster resource pressure alerts version: "1.0" trigger: - alert: match: alertname: KubeMemoryOvercommit - alert: match: alertname: KubeCPUOvercommit - manual: true defaults: model: sonnet steps: - name: assess-pressure agent: prometheus-analyst model: sonnet task: | Assess current resource pressure: - Per-node CPU usage and requests vs limits - Per-node memory usage and requests vs limits - Identify nodes under most pressure - Check for OOM events in last hour Focus on Pi cluster constraints: - Pi 5 (8GB): Higher capacity - Pi 3 (1GB): Very limited, check if overloaded output: pressure_analysis - name: identify-hogs agent: k8s-diagnostician model: haiku task: | Identify resource-heavy workloads: - Top 5 pods by CPU usage - Top 5 pods by memory usage - Any pods exceeding their requests - Any pods with no limits set output: resource_hogs - name: check-scaling agent: argocd-operator model: haiku task: | Check if any deployments can be scaled: - List deployments with >1 replica - Check HPA configurations - Identify candidates for scale-down output: scaling_options - name: recommend-actions agent: k8s-orchestrator model: sonnet task: | Recommend resource optimization actions: Analysis: - Pressure: {{ steps.assess-pressure.output }} - Top consumers: {{ steps.identify-hogs.output }} - Scaling options: {{ steps.check-scaling.output }} Prioritize actions by impact and safety: [SAFE] - Can be auto-applied: - Clean up completed jobs/pods - Identify and report issues [CONFIRM] - Require approval: - Scale down non-critical deployments - Adjust resource limits - Evict low-priority pods [FORBIDDEN] - Never auto-apply: - Delete PVCs - Delete critical workloads output: recommendations - name: cleanup agent: k8s-diagnostician model: haiku task: | Perform safe cleanup actions: - Delete completed jobs older than 1 hour - Delete succeeded pods - Delete failed pods older than 24 hours Report what was cleaned up. output: cleanup_result confirm: false outputs: - pressure_analysis - recommendations - cleanup_result