feat(ops): add setup operator pack, heartbeat alert cooldown, and doctor strict mode

This commit is contained in:
William Valentin
2026-02-16 14:57:56 -08:00
parent 030fb13a26
commit 3210e75c94
12 changed files with 274 additions and 17 deletions
+24 -1
View File
@@ -89,6 +89,9 @@ flynn send "What's the weather in London?"
# Check system health
flynn doctor --config ~/.config/flynn/config.yaml
# Treat warnings as failures (useful in CI)
flynn doctor --strict
# Show current config (secrets masked)
flynn config
@@ -705,6 +708,7 @@ automation:
heartbeat:
enabled: true
interval: "5m" # Check every 5 minutes
notify_cooldown: "30m" # Suppress repeated alerts inside cooldown window
checks: [gateway, model, channels, memory, disk, process_memory, backup, provider_errors]
notify:
channel: telegram
@@ -731,6 +735,7 @@ automation:
| `provider_errors` | Model provider error rates stay below threshold |
The monitor sends a notification when failures reach the configured threshold and a recovery notification when all checks pass again.
Repeated failure/recovery notifications are throttled by `notify_cooldown`.
### Heartbeat Config Fields
@@ -738,7 +743,8 @@ The monitor sends a notification when failures reach the configured threshold an
|-------|----------|-------------|
| `enabled` | no | Enable the heartbeat monitor (default: `false`) |
| `interval` | no | Check interval: `60s`, `5m`, `1h` (default: `5m`) |
| `checks` | no | Which checks to run (default: all five) |
| `notify_cooldown` | no | Minimum time between repeated heartbeat notifications of the same type (default: `30m`) |
| `checks` | no | Which checks to run (default: `gateway, model, channels, memory, disk, process_memory, backup, provider_errors`) |
| `notify.channel` | no | Channel to send failure/recovery notifications |
| `notify.peer` | no | Peer/chat ID for notifications |
| `failure_threshold` | no | Consecutive failures before notifying (default: `2`) |
@@ -748,6 +754,23 @@ The monitor sends a notification when failures reach the configured threshold an
| `provider_error_rate_threshold` | no | Error-rate threshold (0..1) for `provider_errors` check (default: `0.5`) |
| `provider_error_min_calls` | no | Minimum provider calls before applying error-rate threshold (default: `5`) |
### Common Schedules and Routing
- Nightly backups to Telegram alerts:
- `backup.schedule: "0 2 * * *"`
- `backup.notify.channel: telegram`
- Weekday daily briefing to Discord:
- `automation.daily_briefing.schedule: "0 8 * * 1-5"`
- `automation.daily_briefing.output.channel: discord`
- High-frequency heartbeat to Slack:
- `automation.heartbeat.interval: "2m"`
- `automation.heartbeat.notify.channel: slack`
- MinIO sync every 6h to WebChat:
- `automation.minio_sync.interval: "6h"`
- `automation.minio_sync.notify.channel: webchat`
`flynn setup` now includes an Operator Pack option in Automation that preconfigures scheduled backups, heartbeat alerts, a daily briefing, and a default MinIO sync task.
## Gmail Pub/Sub Watcher
Monitor a Gmail inbox and forward new messages into the agent pipeline.