feat(runtime): add talk mode and capture tools

This commit is contained in:
William Valentin
2026-02-16 10:17:24 -08:00
parent a9b38150c0
commit 83b8e38b11
12 changed files with 391 additions and 4 deletions
+17
View File
@@ -16,6 +16,8 @@ Self-hosted personal AI assistant with Telegram and Terminal interfaces.
- **Docker Sandboxing**: Per-session container isolation for tool execution
- **Multi-Agent Routing**: Config-driven agent selection per sender/channel with tool profiles
- **Media Pipeline**: Image analysis, outbound attachments, audio transcription and native audio passthrough across all channels
- **Talk Mode (Wake Phrase)**: Optional wake-phrase gating (`audio.talk_mode`) with timed conversation windows
- **Capture Tools**: `screen.capture` and `camera.capture` tools for host capture workflows
- **Session Transfer**: Move conversations between frontends
- **CLI**: Full command-line interface (`flynn start`, `send`, `doctor`, `completion`, etc.)
- **Shell Completion**: Auto-generated completions for bash, zsh, and fish with `--install` flag
@@ -294,6 +296,10 @@ audio:
| `provider.endpoint` | yes | Whisper-compatible API endpoint |
| `provider.api_key` | no | Bearer token for authentication |
| `provider.model` | no | Model name sent in request (default: `whisper-1`) |
| `talk_mode.enabled` | no | Enable wake-phrase talk mode gating (default: `false`) |
| `talk_mode.wake_phrase` | no | Phrase that activates talk mode (default: `hey flynn`) |
| `talk_mode.timeout_ms` | no | Active listen window after wake (default: `120000`) |
| `talk_mode.allow_manual_toggle` | no | Enable `/talk on|off|status` controls (default: `true`) |
Without an `audio` config, voice messages from non-audio-capable models will display an error message to the user. For local transcription, you can run a whisper.cpp server:
@@ -314,6 +320,17 @@ docker run -d \
# docker compose up -d
```
### Capture Tools
Flynn includes host capture tools:
- `screen.capture` -> captures current screen and returns base64 image payload
- `camera.capture` -> captures one camera frame and returns base64 image payload
Notes:
- These are host-command wrappers and require platform binaries:
- macOS: `screencapture` (screen), `imagesnap` (camera)
- Linux: `grim` or ImageMagick `import` (screen), `ffmpeg` (camera)
## Telegram Commands
| Command | Description |