flynn/.planning/PROJECT.md

# Flynn — Operator DX Milestone

## What This Is

A focused quality-of-life milestone for Flynn's operator (you). Flynn is a mature multi-channel AI assistant daemon with 10 model providers, 5 channel adapters, 40+ tools, a web dashboard, automation, sandboxing, and 1077 passing tests. This milestone targets the three biggest friction points in developing and operating Flynn: the monolithic daemon wiring file, lack of multi-environment config management, and limited runtime observability.

## Core Value

Make Flynn easier to reason about, configure, and monitor — so that adding features and diagnosing issues takes minutes, not hours.

## Requirements

### Validated

- ✓ Multi-channel AI assistant daemon with tool loop — existing
- ✓ 10 model providers (Anthropic, OpenAI, Gemini, Bedrock, GitHub, OpenRouter, Zhipu, xAI, Ollama, llama.cpp) — existing
- ✓ 5 channel adapters (Telegram, Discord, Slack, WhatsApp, WebChat) — existing
- ✓ WebSocket gateway with JSON-RPC protocol — existing
- ✓ Web dashboard SPA (dashboard, chat, sessions, settings, usage) — existing
- ✓ YAML config with Zod validation and env var expansion — existing
- ✓ SQLite session persistence with TTL pruning — existing
- ✓ Memory system with hybrid keyword + vector search — existing
- ✓ Docker sandboxing per session — existing
- ✓ Multi-agent routing with per-agent config — existing
- ✓ Automation: cron, webhooks, heartbeat, Gmail watcher — existing
- ✓ MCP tool server integration — existing
- ✓ Skills system (bundled/managed/workspace) — existing
- ✓ Media pipeline (image analysis, audio transcription, outbound attachments) — existing
- ✓ Context compaction with memory extraction — existing
- ✓ Memory persistence is hybrid: manual (`memory.write`) plus optional auto-extraction during compaction (`memory.auto_extract`) — existing
- ✓ Tool policy profiles with allow/deny lists — existing
- ✓ 1077 tests passing — existing

### Active

- [ ] Decompose daemon/index.ts into focused service modules
- [ ] Multi-environment config system (base + overlays)
- [ ] Live operational dashboard with real-time metrics

### Out of Scope

- New channel adapters (Signal, Matrix, Teams, Google Chat) — not the focus of this milestone
- Companion apps (macOS, iOS, Android) — massive scope, different project
- Structured logging framework — would complement the dashboard but adds complexity; evaluate after dashboard reveals what metrics matter
- Agent intelligence features (sub-agent spawning, planning) — separate milestone
- ESLint / type safety cleanup — worthwhile but not blocking current development

## Context

Flynn has been through rapid feature development (P0-P8, Tiers 1-4 all completed in ~7 days). The codebase grew fast and the wiring layer absorbed complexity. Key context:

- **daemon/index.ts** is 1087 lines — it handles model client creation, channel setup, agent factory, memory initialization, vector indexer, session pruning, lifecycle management, and graceful shutdown. Every new feature touches this file.
- **Config** is a single YAML file validated by a 400+ line Zod schema. Managing dev vs Docker vs production requires manual YAML duplication. No layering, no environment-specific overrides.
- **Observability** is currently console.log/error/warn. The web dashboard shows basic stats but no real-time metrics: no message trace, no queue depth, no model call latency, no error stream. Debugging requires reading source and tailing stdout.
- The existing web dashboard (vanilla JS SPA) is functional and can be extended rather than rewritten.

## Constraints

- **Tech stack**: TypeScript, Node.js >= 22, pnpm. No new frameworks (keep vanilla JS for dashboard).
- **Backwards compatibility**: Existing config files must continue to work. Decomposition must not change public API or behavior.
- **Test coverage**: Maintain 1077+ passing tests. New modules need tests.
- **Single-operator**: This is a personal tool. Don't over-engineer for multi-tenant or team scenarios.

## Key Decisions

| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Decompose god file, not rewrite | Preserves working code, reduces risk, can be done incrementally | -- Pending |
| Config overlays over separate files | Environment-specific overrides are less error-prone than maintaining N complete configs | -- Pending |
| Extend existing web dashboard | Already works, no framework dependencies, familiar codebase | -- Pending |
| Skip structured logging for now | Dashboard will reveal what metrics actually matter; avoid premature abstraction | -- Pending |

---
*Last updated: 2026-02-09 after initialization*