Files
flynn/TROUBLESHOOTING.md
T
William Valentin 8a6cd7f559 docs: Add comprehensive documentation for production deployment and contribution
This commit adds 6 new documentation files to fill critical gaps:

- CONTRIBUTING.md: Developer onboarding guide with setup, workflow,
  code style, testing, and adding features

- TROUBLESHOOTING.md: Common issues and solutions for errors,
  model issues, tool issues, channel issues, gateway issues,
  configuration issues, and memory/database issues

- docs/api/PROTOCOL.md: Gateway JSON-RPC protocol documentation
  with connection, authentication, message format, methods,
  events, error codes, and example client implementation

- docs/api/TOOLS.md: Tools API documentation covering tool interface,
  input schema format, result format, tool patterns,
  tool registration, tool policy, execution flow, and
  builtin tools reference

- docs/deployment/PRODUCTION.md: Production deployment guide
  covering Docker deployment, systemd service, security,
  configuration, monitoring, backup & recovery, and
  performance tuning

- docs/performance/TUNING.md: Performance optimization guide
  covering context management, model routing, tool execution,
  memory & embeddings, session management, database
  performance, gateway performance, and resource usage

These files complement the existing excellent documentation
(README.md, AGENTS.md, ARCHITECTURE.md, STRUCTURE.md,
CONVENTIONS.md) to provide complete coverage for users,
developers, and operators.
2026-02-13 16:07:29 -08:00

694 lines
13 KiB
Markdown

# Troubleshooting Flynn
This guide covers common issues, error messages, and debugging techniques for Flynn.
## Table of Contents
- [Common Errors](#common-errors)
- [Model Issues](#model-issues)
- [Tool Issues](#tool-issues)
- [Channel Issues](#channel-issues)
- [Gateway Issues](#gateway-issues)
- [Configuration Issues](#configuration-issues)
- [Memory & Database Issues](#memory--database-issues)
- [Debug Mode](#debug-mode)
- [Getting Help](#getting-help)
## Common Errors
### `Error: Environment variable FLYNN_CONFIG is not set`
Flynn can't find your configuration file.
**Solution:**
```bash
# Create config from template
cp config/default.yaml ~/.config/flynn/config.yaml
# Or specify config file explicitly
flynn start --config ~/my-config.yaml
```
### `Error: Failed to load config: ...`
Configuration validation failed. Check your YAML syntax.
**Solution:**
```bash
# Validate your config
flynn doctor --config ~/.config/flynn/config.yaml
# Check YAML syntax
cat ~/.config/flynn/config.yaml | yamllint
```
Common issues:
- Missing quotes around special characters
- Incorrect indentation (YAML is space-sensitive, 2 spaces)
- Invalid boolean values (use `true`/`false`, not `yes`/`no`)
### `Error: Tool 'xxx' is not allowed by tool policy`
Tool execution blocked by policy configuration.
**Solution:**
1. Check tool policy in config:
```yaml
tools:
policy: 'full' # or 'coding', 'messaging', 'minimal'
profiles:
full:
allow: ['*']
deny: []
```
2. Check agent-specific tool overrides:
```yaml
agents:
my-agent:
toolPolicy: 'full'
```
3. Verify tool is registered (check `src/daemon/index.ts`)
### `Error: Session not found`
Gateway tried to access a non-existent session.
**Solution:**
```bash
# List active sessions
flynn sessions
# Sessions are auto-created on first message
# Ensure you're sending to a valid channel/sender combination
```
## Model Issues
### Model refuses to answer / "I can't help with that"
The model may be rejecting requests due to content filters or safety constraints.
**Solution:**
1. Try rephrasing your request
2. Check if the model has specific content policies
3. Try a different model tier:
```bash
# In TUI, switch models
/model complex
```
### Rate Limit Errors / Too Many Requests
API rate limits exceeded.
**Solution:**
1. Reduce request frequency
2. Add retry configuration in config:
```yaml
models:
default:
anthropic:
apiKey: 'sk-...'
retry:
maxAttempts: 3
initialDelayMs: 1000
maxDelayMs: 30000
multiplier: 2
```
3. Switch to a different provider or tier
### Model Fallback Not Working
Fallback chain not triggering on errors.
**Solution:**
1. Check model router config:
```yaml
models:
router:
tiers:
default: 'anthropic:claude-sonnet-4-20250514'
fallbackChain:
- 'github:claude-sonnet-4-5'
- 'local:ollama:llama3'
```
2. Check error patterns in retry config (errors matching these patterns won't retry):
```yaml
retry:
nonRetryablePatterns:
- 'invalid_api_key'
- 'permission_denied'
```
3. Enable debug logging to see fallback decisions:
```bash
DEBUG='*' flynn start
```
### Context Window Exceeded
Conversation too long for model's context window.
**Solution:**
1. Check compaction settings:
```yaml
agents:
default:
compaction:
thresholdPct: 80
keepTurns: 4
summaryMaxTokens: 1024
```
2. Increase `keepTurns` to preserve more recent history
3. Increase `thresholdPct` to trigger compaction earlier
## Tool Issues
### Tool Execution Timeout
Tool took too long to complete.
**Solution:**
1. Check timeout config:
```yaml
tools:
executor:
defaultTimeoutMs: 30000
```
2. Increase timeout for specific tools (if supported):
```bash
shell.exec --timeout 60000 "long-running-command"
```
3. Check if tool is hanging (stuck process, network issue)
### Tool Permission Denied
Tool can't access file/execute command.
**Solution:**
1. Check file permissions:
```bash
ls -la /path/to/file
chmod +x /path/to/executable
```
2. Check Docker sandbox permissions (if enabled):
```bash
# Ensure Docker daemon is running
docker ps
# Check Flynn's Docker access
groups $USER | grep docker
```
3. Verify hook configuration (hooks may block tools):
```bash
flynn config | grep -A 20 hooks
```
### Tool Output Truncated
Output exceeds maximum size limit.
**Solution:**
Check max output config:
```yaml
tools:
executor:
maxOutputBytes: 51200 # 50KB default
```
Increase limit if needed, or use file operations to handle large outputs.
### Tool Not Found
Tool doesn't exist or isn't registered.
**Solution:**
```bash
# List available tools
flynn doctor --config ~/.config/flynn/config.yaml
# Check if tool is in builtin list
grep -r "name: 'your.tool'" src/tools/builtin/
```
If you added a custom tool, ensure it's registered in `src/daemon/index.ts`.
## Channel Issues
### Telegram Bot Not Responding
Telegram bot not receiving or processing messages.
**Solution:**
1. Check bot token in config:
```yaml
channels:
telegram:
enabled: true
token: '123456789:ABCdefGHIjklMNOpqrSTUvwxYZ'
```
2. Verify bot is running:
```bash
# Check Flynn logs for telegram startup
flynn start 2>&1 | grep telegram
```
3. Test bot via Telegram API:
```bash
curl https://api.telegram.org/bot<YOUR_TOKEN>/getMe
```
4. Check `allowed_chat_ids` whitelist:
```yaml
channels:
telegram:
allowedChatIds: ['123456789'] # Your chat ID
```
### Discord Bot Not Joining Channels
Discord bot permissions issues.
**Solution:**
1. Check bot token:
```yaml
channels:
discord:
enabled: true
token: 'MTIzNDU2Nzg5O...ABcDefGhIjKlMnOpQrStUvWxYz'
```
2. Verify guild/channel IDs in whitelist:
```yaml
channels:
discord:
allowedGuildIds: ['123456789012345678']
allowedChannelIds: ['123456789012345678']
```
3. Check bot permissions in Discord server (need `Read Messages`, `Send Messages`, `Embed Links`)
### Slack Webhooks Not Receiving
Slack event subscription issues.
**Solution:**
1. Verify signing secret in config:
```yaml
channels:
slack:
enabled: true
signingSecret: 'a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6'
```
2. Check Slack app permissions and event subscriptions
3. Verify ngrok/tunnel if testing locally
### WhatsApp Bot Not Connecting
WhatsApp Web.js connection issues.
**Solution:**
1. Check phone number whitelist:
```yaml
channels:
whatsapp:
enabled: true
allowedNumbers: ['+1234567890']
```
2. WhatsApp requires QR code scan on first run - run in foreground:
```bash
flynn start # Watch console for QR code
```
3. If running in Docker, ensure it can display QR code or use saved session
## Gateway Issues
### WebSocket Connection Refused
Can't connect to gateway WebSocket.
**Solution:**
1. Check gateway is running:
```bash
curl http://localhost:18800/health
```
2. Check port configuration:
```yaml
gateway:
enabled: true
port: 18800
```
3. Check firewall rules:
```bash
sudo ufw allow 18800
```
4. Check auth token if configured:
```yaml
gateway:
auth:
token: 'your-secret-token'
```
### Gateway Lock Active
Only one WebSocket client allowed at a time.
**Solution:**
```yaml
gateway:
lock:
enabled: false # Disable lock
```
Or disconnect existing client first.
### Tailscale Serve Not Exposing
Tailscale integration not working.
**Solution:**
1. Ensure Tailscale is installed and running:
```bash
tailscale status
```
2. Check Tailscale config:
```yaml
gateway:
tailscaleServe:
enabled: true
hostname: 'my-flynn'
port: 443
```
3. Check Flynn logs for Tailscale errors:
```bash
flynn start 2>&1 | grep -i tailscale
```
## Configuration Issues
### Invalid YAML Syntax
Configuration file has YAML syntax errors.
**Solution:**
```bash
# Validate with YAML linter
pip install yamllint
yamllint ~/.config/flynn/config.yaml
# Or use online YAML validator
# https://www.yamllint.com/
```
Common mistakes:
- Using tabs instead of spaces (YAML requires spaces)
- Incorrect indentation (must be consistent)
- Missing quotes around special characters (`:`, `{`, `}`, `[`, `]`)
- Unclosed brackets or quotes
### Environment Variable Expansion Not Working
`${ENV_VAR}` in config not expanding.
**Solution:**
1. Ensure variable is set:
```bash
echo $MY_API_KEY
```
2. Check variable syntax in config:
```yaml
models:
default:
anthropic:
apiKey: '${ANTHROPIC_API_KEY}' # Correct
# Not: $ANTHROPIC_API_KEY or "${ANTHROPIC_API_KEY}"
```
3. Ensure no quotes around the entire value:
```yaml
# Wrong
apiKey: '${ANTHROPIC_API_KEY}'
# Correct
apiKey: '${ANTHROPIC_API_KEY}'
```
### Secrets Showing in Logs
API keys or sensitive data visible in logs.
**Solution:**
Flynn automatically redacts secrets when showing config with `flynn config`.
If you see secrets in output, check:
- Don't log config manually (use the built-in redaction)
- Don't commit config files to git (add to `.gitignore`)
- Use `FLYNN_CONFIG` env var for sensitive paths
## Memory & Database Issues
### SQLite Database Locked
Can't write to session or vector database.
**Solution:**
1. Check if another Flynn instance is running:
```bash
ps aux | grep flynn
```
2. Kill existing instances:
```bash
pkill flynn
```
3. Check database permissions:
```bash
ls -la ~/.local/share/flynn/*.db
chmod 644 ~/.local/share/flynn/*.db
```
### Memory Not Saving
Memory writes not persisting.
**Solution:**
1. Check memory directory:
```bash
ls -la ~/.local/share/flynn/memory/
```
2. Check namespace format (must be valid filename):
```bash
# Good
memory.write --namespace 'my-knowledge'
# Bad
memory.write --namespace 'my/namespace' # Creates directory
```
3. Check disk space:
```bash
df -h ~/.local/share/flynn/
```
### Vector Search Returns No Results
Hybrid search not finding matches.
**Solution:**
1. Ensure embeddings are generated:
```bash
# Check vector database
sqlite3 ~/.local/share/flynn/vectors.db "SELECT COUNT(*) FROM embeddings;"
```
2. Check embedding provider config:
```yaml
memory:
embeddings:
provider: 'openai' # or 'gemini', 'ollama', 'llamacpp'
openai:
apiKey: '${OPENAI_API_KEY}'
model: 'text-embedding-3-small'
```
3. Try keyword search only:
```bash
memory.search --query 'my query' --keyword-only
```
## Debug Mode
### Enable Debug Logging
Flynn uses console logging. Enable verbose output:
```bash
# Set DEBUG environment variable (if using debug packages)
DEBUG='*' flynn start
# Or redirect all output to file
flynn start > /tmp/flynn.log 2>&1
# Monitor logs in real-time
tail -f /tmp/flynn.log
```
### Run Doctor Command
Use the built-in diagnostic tool:
```bash
# Full check
flynn doctor
# Check specific component
flynn doctor --config ~/.config/flynn/config.yaml
# Check connectivity
flynn doctor --check connectivity
```
The doctor checks:
- Config syntax and validation
- Model provider connectivity
- Channel adapter status
- Memory and database integrity
- Gateway health
### Test Components Individually
```bash
# Test model connectivity
flynn send "Hello, world!"
# Test specific tool (in TUI)
/file.read /path/to/file
# Test channel
# Send message via platform (Telegram, Discord, etc.)
# Test gateway
curl http://localhost:18800/health
```
### Inspect Session Data
```bash
# View sessions
flynn sessions
# Inspect session database directly
sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM sessions LIMIT 5;"
# View session messages
sqlite3 ~/.local/share/flynn/sessions.db "SELECT * FROM messages WHERE session_id = '...';"
```
### Enable Heartbeat Monitoring
Flynn includes a heartbeat monitor that checks system health:
```yaml
automation:
heartbeat:
enabled: true
interval: '5m'
checks:
- 'gateway'
- 'model'
- 'channels'
- 'memory'
- 'disk'
```
Failures will be logged and notifications sent (if configured).
## Getting Help
### Log Your Issue
When reporting issues, include:
1. **Error message** (full stack trace if available)
2. **Flynn version**: `flynn --version`
3. **Configuration** (redact secrets): `flynn config`
4. **Node version**: `node --version`
5. **OS**: `uname -a`
6. **Steps to reproduce**
### Check Existing Issues
Search GitHub issues for similar problems:
- https://github.com/your-repo/flynn/issues
### Ask for Help
- **GitHub Issues**: Bug reports and feature requests
- **GitHub Discussions**: Questions and community help
- **Documentation**: Check README.md, AGENTS.md, and this file
### Provide Diagnostic Output
```bash
# Run doctor and save output
flynn doctor > /tmp/flynn-doctor.txt 2>&1
# Get config (redacted)
flynn config > /tmp/flynn-config.txt
# Get logs
journalctl -u flynn -n 100 --no-pager > /tmp/flynn-logs.txt
# Attach these files to your issue
```
---
Still having issues? Open a GitHub issue with the diagnostic output above.