Flynn — Operator DX Milestone

What This Is

A focused quality-of-life milestone for Flynn's operator (you). Flynn is a mature multi-channel AI assistant daemon with 10 model providers, 5 channel adapters, 40+ tools, a web dashboard, automation, sandboxing, and 1077 passing tests. This milestone targets the three biggest friction points in developing and operating Flynn: the monolithic daemon wiring file, lack of multi-environment config management, and limited runtime observability.

Core Value

Make Flynn easier to reason about, configure, and monitor — so that adding features and diagnosing issues takes minutes, not hours.

Requirements

Validated

✓ Multi-channel AI assistant daemon with tool loop — existing
✓ 10 model providers (Anthropic, OpenAI, Gemini, Bedrock, GitHub, OpenRouter, Zhipu, xAI, Ollama, llama.cpp) — existing
✓ 5 channel adapters (Telegram, Discord, Slack, WhatsApp, WebChat) — existing
✓ WebSocket gateway with JSON-RPC protocol — existing
✓ Web dashboard SPA (dashboard, chat, sessions, settings, usage) — existing
✓ YAML config with Zod validation and env var expansion — existing
✓ SQLite session persistence with TTL pruning — existing
✓ Memory system with hybrid keyword + vector search — existing
✓ Docker sandboxing per session — existing
✓ Multi-agent routing with per-agent config — existing
✓ Automation: cron, webhooks, heartbeat, Gmail watcher — existing
✓ MCP tool server integration — existing
✓ Skills system (bundled/managed/workspace) — existing
✓ Media pipeline (image analysis, audio transcription, outbound attachments) — existing
✓ Context compaction with memory extraction — existing
✓ Tool policy profiles with allow/deny lists — existing
✓ 1077 tests passing — existing

Active

Decompose daemon/index.ts into focused service modules
Multi-environment config system (base + overlays)
Live operational dashboard with real-time metrics

Out of Scope

New channel adapters (Signal, Matrix, Teams, Google Chat) — not the focus of this milestone
Companion apps (macOS, iOS, Android) — massive scope, different project
Structured logging framework — would complement the dashboard but adds complexity; evaluate after dashboard reveals what metrics matter
Agent intelligence features (sub-agent spawning, planning) — separate milestone
ESLint / type safety cleanup — worthwhile but not blocking current development

Context

Flynn has been through rapid feature development (P0-P8, Tiers 1-4 all completed in ~7 days). The codebase grew fast and the wiring layer absorbed complexity. Key context:

daemon/index.ts is 1087 lines — it handles model client creation, channel setup, agent factory, memory initialization, vector indexer, session pruning, lifecycle management, and graceful shutdown. Every new feature touches this file.
Config is a single YAML file validated by a 400+ line Zod schema. Managing dev vs Docker vs production requires manual YAML duplication. No layering, no environment-specific overrides.
Observability is currently console.log/error/warn. The web dashboard shows basic stats but no real-time metrics: no message trace, no queue depth, no model call latency, no error stream. Debugging requires reading source and tailing stdout.
The existing web dashboard (vanilla JS SPA) is functional and can be extended rather than rewritten.

Constraints

Tech stack: TypeScript, Node.js >= 22, pnpm. No new frameworks (keep vanilla JS for dashboard).
Backwards compatibility: Existing config files must continue to work. Decomposition must not change public API or behavior.
Test coverage: Maintain 1077+ passing tests. New modules need tests.
Single-operator: This is a personal tool. Don't over-engineer for multi-tenant or team scenarios.

Key Decisions

Decision	Rationale	Outcome
Decompose god file, not rewrite	Preserves working code, reduces risk, can be done incrementally	-- Pending
Config overlays over separate files	Environment-specific overrides are less error-prone than maintaining N complete configs	-- Pending
Extend existing web dashboard	Already works, no framework dependencies, familiar codebase	-- Pending
Skip structured logging for now	Dashboard will reveal what metrics actually matter; avoid premature abstraction	-- Pending

Last updated: 2026-02-09 after initialization

4.5 KiB Raw Blame History