75 lines
4.6 KiB
Markdown
75 lines
4.6 KiB
Markdown
# Flynn — Operator DX Milestone
|
|
|
|
## What This Is
|
|
|
|
A focused quality-of-life milestone for Flynn's operator (you). Flynn is a mature multi-channel AI assistant daemon with 10 model providers, 5 channel adapters, 40+ tools, a web dashboard, automation, sandboxing, and 1077 passing tests. This milestone targets the three biggest friction points in developing and operating Flynn: the monolithic daemon wiring file, lack of multi-environment config management, and limited runtime observability.
|
|
|
|
## Core Value
|
|
|
|
Make Flynn easier to reason about, configure, and monitor — so that adding features and diagnosing issues takes minutes, not hours.
|
|
|
|
## Requirements
|
|
|
|
### Validated
|
|
|
|
- ✓ Multi-channel AI assistant daemon with tool loop — existing
|
|
- ✓ 10 model providers (Anthropic, OpenAI, Gemini, Bedrock, GitHub, OpenRouter, Zhipu, xAI, Ollama, llama.cpp) — existing
|
|
- ✓ 5 channel adapters (Telegram, Discord, Slack, WhatsApp, WebChat) — existing
|
|
- ✓ WebSocket gateway with JSON-RPC protocol — existing
|
|
- ✓ Web dashboard SPA (dashboard, chat, sessions, settings, usage) — existing
|
|
- ✓ YAML config with Zod validation and env var expansion — existing
|
|
- ✓ SQLite session persistence with TTL pruning — existing
|
|
- ✓ Memory system with hybrid keyword + vector search — existing
|
|
- ✓ Docker sandboxing per session — existing
|
|
- ✓ Multi-agent routing with per-agent config — existing
|
|
- ✓ Automation: cron, webhooks, heartbeat, Gmail watcher — existing
|
|
- ✓ MCP tool server integration — existing
|
|
- ✓ Skills system (bundled/managed/workspace) — existing
|
|
- ✓ Media pipeline (image analysis, audio transcription, outbound attachments) — existing
|
|
- ✓ Context compaction with memory extraction — existing
|
|
- ✓ Memory persistence is hybrid: manual (`memory.write`) plus optional auto-extraction during compaction (`memory.auto_extract`) — existing
|
|
- ✓ Tool policy profiles with allow/deny lists — existing
|
|
- ✓ 1077 tests passing — existing
|
|
|
|
### Active
|
|
|
|
- [ ] Decompose daemon/index.ts into focused service modules
|
|
- [ ] Multi-environment config system (base + overlays)
|
|
- [ ] Live operational dashboard with real-time metrics
|
|
|
|
### Out of Scope
|
|
|
|
- New channel adapters (Signal, Matrix, Teams, Google Chat) — not the focus of this milestone
|
|
- Companion apps (macOS, iOS, Android) — massive scope, different project
|
|
- Structured logging framework — would complement the dashboard but adds complexity; evaluate after dashboard reveals what metrics matter
|
|
- Agent intelligence features (sub-agent spawning, planning) — separate milestone
|
|
- ESLint / type safety cleanup — worthwhile but not blocking current development
|
|
|
|
## Context
|
|
|
|
Flynn has been through rapid feature development (P0-P8, Tiers 1-4 all completed in ~7 days). The codebase grew fast and the wiring layer absorbed complexity. Key context:
|
|
|
|
- **daemon/index.ts** is 1087 lines — it handles model client creation, channel setup, agent factory, memory initialization, vector indexer, session pruning, lifecycle management, and graceful shutdown. Every new feature touches this file.
|
|
- **Config** is a single YAML file validated by a 400+ line Zod schema. Managing dev vs Docker vs production requires manual YAML duplication. No layering, no environment-specific overrides.
|
|
- **Observability** is currently console.log/error/warn. The web dashboard shows basic stats but no real-time metrics: no message trace, no queue depth, no model call latency, no error stream. Debugging requires reading source and tailing stdout.
|
|
- The existing web dashboard (vanilla JS SPA) is functional and can be extended rather than rewritten.
|
|
|
|
## Constraints
|
|
|
|
- **Tech stack**: TypeScript, Node.js >= 22, pnpm. No new frameworks (keep vanilla JS for dashboard).
|
|
- **Backwards compatibility**: Existing config files must continue to work. Decomposition must not change public API or behavior.
|
|
- **Test coverage**: Maintain 1077+ passing tests. New modules need tests.
|
|
- **Single-operator**: This is a personal tool. Don't over-engineer for multi-tenant or team scenarios.
|
|
|
|
## Key Decisions
|
|
|
|
| Decision | Rationale | Outcome |
|
|
|----------|-----------|---------|
|
|
| Decompose god file, not rewrite | Preserves working code, reduces risk, can be done incrementally | -- Pending |
|
|
| Config overlays over separate files | Environment-specific overrides are less error-prone than maintaining N complete configs | -- Pending |
|
|
| Extend existing web dashboard | Already works, no framework dependencies, familiar codebase | -- Pending |
|
|
| Skip structured logging for now | Dashboard will reveal what metrics actually matter; avoid premature abstraction | -- Pending |
|
|
|
|
---
|
|
*Last updated: 2026-02-09 after initialization*
|