# Flynn — Operator DX Milestone ## What This Is A focused quality-of-life milestone for Flynn's operator (you). Flynn is a mature multi-channel AI assistant daemon with 10 model providers, 5 channel adapters, 40+ tools, a web dashboard, automation, sandboxing, and 1077 passing tests. This milestone targets the three biggest friction points in developing and operating Flynn: the monolithic daemon wiring file, lack of multi-environment config management, and limited runtime observability. ## Core Value Make Flynn easier to reason about, configure, and monitor — so that adding features and diagnosing issues takes minutes, not hours. ## Requirements ### Validated - ✓ Multi-channel AI assistant daemon with tool loop — existing - ✓ 10 model providers (Anthropic, OpenAI, Gemini, Bedrock, GitHub, OpenRouter, Zhipu, xAI, Ollama, llama.cpp) — existing - ✓ 5 channel adapters (Telegram, Discord, Slack, WhatsApp, WebChat) — existing - ✓ WebSocket gateway with JSON-RPC protocol — existing - ✓ Web dashboard SPA (dashboard, chat, sessions, settings, usage) — existing - ✓ YAML config with Zod validation and env var expansion — existing - ✓ SQLite session persistence with TTL pruning — existing - ✓ Memory system with hybrid keyword + vector search — existing - ✓ Docker sandboxing per session — existing - ✓ Multi-agent routing with per-agent config — existing - ✓ Automation: cron, webhooks, heartbeat, Gmail watcher — existing - ✓ MCP tool server integration — existing - ✓ Skills system (bundled/managed/workspace) — existing - ✓ Media pipeline (image analysis, audio transcription, outbound attachments) — existing - ✓ Context compaction with memory extraction — existing - ✓ Tool policy profiles with allow/deny lists — existing - ✓ 1077 tests passing — existing ### Active - [ ] Decompose daemon/index.ts into focused service modules - [ ] Multi-environment config system (base + overlays) - [ ] Live operational dashboard with real-time metrics ### Out of Scope - New channel adapters (Signal, Matrix, Teams, Google Chat) — not the focus of this milestone - Companion apps (macOS, iOS, Android) — massive scope, different project - Structured logging framework — would complement the dashboard but adds complexity; evaluate after dashboard reveals what metrics matter - Agent intelligence features (sub-agent spawning, planning) — separate milestone - ESLint / type safety cleanup — worthwhile but not blocking current development ## Context Flynn has been through rapid feature development (P0-P8, Tiers 1-4 all completed in ~7 days). The codebase grew fast and the wiring layer absorbed complexity. Key context: - **daemon/index.ts** is 1087 lines — it handles model client creation, channel setup, agent factory, memory initialization, vector indexer, session pruning, lifecycle management, and graceful shutdown. Every new feature touches this file. - **Config** is a single YAML file validated by a 400+ line Zod schema. Managing dev vs Docker vs production requires manual YAML duplication. No layering, no environment-specific overrides. - **Observability** is currently console.log/error/warn. The web dashboard shows basic stats but no real-time metrics: no message trace, no queue depth, no model call latency, no error stream. Debugging requires reading source and tailing stdout. - The existing web dashboard (vanilla JS SPA) is functional and can be extended rather than rewritten. ## Constraints - **Tech stack**: TypeScript, Node.js >= 22, pnpm. No new frameworks (keep vanilla JS for dashboard). - **Backwards compatibility**: Existing config files must continue to work. Decomposition must not change public API or behavior. - **Test coverage**: Maintain 1077+ passing tests. New modules need tests. - **Single-operator**: This is a personal tool. Don't over-engineer for multi-tenant or team scenarios. ## Key Decisions | Decision | Rationale | Outcome | |----------|-----------|---------| | Decompose god file, not rewrite | Preserves working code, reduces risk, can be done incrementally | -- Pending | | Config overlays over separate files | Environment-specific overrides are less error-prone than maintaining N complete configs | -- Pending | | Extend existing web dashboard | Already works, no framework dependencies, familiar codebase | -- Pending | | Skip structured logging for now | Dashboard will reveal what metrics actually matter; avoid premature abstraction | -- Pending | --- *Last updated: 2026-02-09 after initialization*