feat(heartbeat): add process memory and backup health checks
This commit is contained in:
@@ -652,12 +652,14 @@ automation:
|
||||
heartbeat:
|
||||
enabled: true
|
||||
interval: "5m" # Check every 5 minutes
|
||||
checks: [gateway, model, channels, memory, disk]
|
||||
checks: [gateway, model, channels, memory, disk, process_memory, backup]
|
||||
notify:
|
||||
channel: telegram
|
||||
peer: "123456789"
|
||||
failure_threshold: 2 # Notify after 2 consecutive failures
|
||||
disk_threshold_mb: 100 # Warn when <100MB free
|
||||
process_memory_threshold_mb: 1500 # Warn when RSS memory exceeds threshold
|
||||
backup_failure_threshold: 1 # Warn when backup failures meet threshold
|
||||
```
|
||||
|
||||
### Heartbeat Checks
|
||||
@@ -669,6 +671,8 @@ automation:
|
||||
| `channels` | At least one channel adapter is connected |
|
||||
| `memory` | Memory directory is readable and writable |
|
||||
| `disk` | Free disk space exceeds threshold |
|
||||
| `process_memory` | Flynn process RSS memory usage stays under threshold |
|
||||
| `backup` | Backup scheduler consecutive failures stay under threshold |
|
||||
|
||||
The monitor sends a notification when failures reach the configured threshold and a recovery notification when all checks pass again.
|
||||
|
||||
@@ -683,6 +687,8 @@ The monitor sends a notification when failures reach the configured threshold an
|
||||
| `notify.peer` | no | Peer/chat ID for notifications |
|
||||
| `failure_threshold` | no | Consecutive failures before notifying (default: `2`) |
|
||||
| `disk_threshold_mb` | no | Disk space warning threshold in MB (default: `100`) |
|
||||
| `process_memory_threshold_mb` | no | RSS memory threshold in MB for `process_memory` check (default: `1500`) |
|
||||
| `backup_failure_threshold` | no | Consecutive backup failures threshold for `backup` check (default: `1`) |
|
||||
|
||||
## Gmail Pub/Sub Watcher
|
||||
|
||||
|
||||
Reference in New Issue
Block a user