feat(ops): add setup operator pack, heartbeat alert cooldown, and doctor strict mode
This commit is contained in:
@@ -89,6 +89,9 @@ flynn send "What's the weather in London?"
|
||||
# Check system health
|
||||
flynn doctor --config ~/.config/flynn/config.yaml
|
||||
|
||||
# Treat warnings as failures (useful in CI)
|
||||
flynn doctor --strict
|
||||
|
||||
# Show current config (secrets masked)
|
||||
flynn config
|
||||
|
||||
@@ -705,6 +708,7 @@ automation:
|
||||
heartbeat:
|
||||
enabled: true
|
||||
interval: "5m" # Check every 5 minutes
|
||||
notify_cooldown: "30m" # Suppress repeated alerts inside cooldown window
|
||||
checks: [gateway, model, channels, memory, disk, process_memory, backup, provider_errors]
|
||||
notify:
|
||||
channel: telegram
|
||||
@@ -731,6 +735,7 @@ automation:
|
||||
| `provider_errors` | Model provider error rates stay below threshold |
|
||||
|
||||
The monitor sends a notification when failures reach the configured threshold and a recovery notification when all checks pass again.
|
||||
Repeated failure/recovery notifications are throttled by `notify_cooldown`.
|
||||
|
||||
### Heartbeat Config Fields
|
||||
|
||||
@@ -738,7 +743,8 @@ The monitor sends a notification when failures reach the configured threshold an
|
||||
|-------|----------|-------------|
|
||||
| `enabled` | no | Enable the heartbeat monitor (default: `false`) |
|
||||
| `interval` | no | Check interval: `60s`, `5m`, `1h` (default: `5m`) |
|
||||
| `checks` | no | Which checks to run (default: all five) |
|
||||
| `notify_cooldown` | no | Minimum time between repeated heartbeat notifications of the same type (default: `30m`) |
|
||||
| `checks` | no | Which checks to run (default: `gateway, model, channels, memory, disk, process_memory, backup, provider_errors`) |
|
||||
| `notify.channel` | no | Channel to send failure/recovery notifications |
|
||||
| `notify.peer` | no | Peer/chat ID for notifications |
|
||||
| `failure_threshold` | no | Consecutive failures before notifying (default: `2`) |
|
||||
@@ -748,6 +754,23 @@ The monitor sends a notification when failures reach the configured threshold an
|
||||
| `provider_error_rate_threshold` | no | Error-rate threshold (0..1) for `provider_errors` check (default: `0.5`) |
|
||||
| `provider_error_min_calls` | no | Minimum provider calls before applying error-rate threshold (default: `5`) |
|
||||
|
||||
### Common Schedules and Routing
|
||||
|
||||
- Nightly backups to Telegram alerts:
|
||||
- `backup.schedule: "0 2 * * *"`
|
||||
- `backup.notify.channel: telegram`
|
||||
- Weekday daily briefing to Discord:
|
||||
- `automation.daily_briefing.schedule: "0 8 * * 1-5"`
|
||||
- `automation.daily_briefing.output.channel: discord`
|
||||
- High-frequency heartbeat to Slack:
|
||||
- `automation.heartbeat.interval: "2m"`
|
||||
- `automation.heartbeat.notify.channel: slack`
|
||||
- MinIO sync every 6h to WebChat:
|
||||
- `automation.minio_sync.interval: "6h"`
|
||||
- `automation.minio_sync.notify.channel: webchat`
|
||||
|
||||
`flynn setup` now includes an Operator Pack option in Automation that preconfigures scheduled backups, heartbeat alerts, a daily briefing, and a default MinIO sync task.
|
||||
|
||||
## Gmail Pub/Sub Watcher
|
||||
|
||||
Monitor a Gmail inbox and forward new messages into the agent pipeline.
|
||||
|
||||
Reference in New Issue
Block a user