docs(plans): add openclaw analysis and rollout checklists

This commit is contained in:
William Valentin
2026-02-12 22:47:28 -08:00
parent 6e8984f788
commit 3472a0b926
13 changed files with 2090 additions and 0 deletions
+9
View File
@@ -0,0 +1,9 @@
# OpenClaw vs Flynn Comparison
This is a short pointer file.
Canonical merged and deduplicated report:
- `docs/plans/analysis/openclaw-comparison.md`
Last updated: 2026-02-12
+274
View File
@@ -0,0 +1,274 @@
---
title: Flynn vs OpenClaw Architecture Comparison
doc_type: analysis_report
canonical: true
last_updated: 2026-02-12
scope: single-user personal-assistant efficiency
projects_compared:
- Flynn
- OpenClaw
key_scores:
openclaw_weighted: 478
flynn_weighted: 393
max_points: 500
openclaw_pct: 95.6
flynn_pct: 78.6
primary_sources:
- https://github.com/openclaw/openclaw
- https://docs.openclaw.ai/llms.txt
- https://docs.openclaw.ai/concepts/architecture
- https://docs.openclaw.ai/concepts/agent-loop
- https://docs.openclaw.ai/concepts/session
- https://docs.openclaw.ai/concepts/model-failover
- https://docs.openclaw.ai/concepts/queue
- https://docs.openclaw.ai/concepts/streaming
- https://docs.openclaw.ai/concepts/memory
- https://docs.openclaw.ai/tools/skills
- https://docs.openclaw.ai/gateway/security
- https://docs.openclaw.ai/start/wizard
- https://docs.openclaw.ai/start/lore
local_sources:
- AGENTS.md
- src/
---
# Flynn vs OpenClaw: Architecture Comparison and Efficiency Analysis
## Executive Summary
Flynn is a well-architected multi-channel AI assistant daemon with strict TypeScript design, strong modular boundaries, and high test coverage. OpenClaw is a highly productized personal-assistant platform with broader channel/device reach, stronger onboarding UX, and companion app features.
Weighted efficiency score for single-user personal-assistant use:
| Project | Score | Percentage |
|---|---:|---:|
| OpenClaw | 478 / 500 | 95.6% |
| Flynn | 393 / 500 | 78.6% |
Main finding: Flynn leads on architecture quality and cost/automation control; OpenClaw leads on end-user surface area and turnkey product experience.
## LLM Quick Facts
| Key | Value |
|---|---|
| Canonical file | `docs/plans/analysis/openclaw-comparison.md` |
| Decision summary | OpenClaw leads on productized assistant reach; Flynn leads on architecture and controllability |
| Biggest Flynn deltas | channel breadth, companion apps/device nodes, voice surfaces, guided onboarding |
| Biggest Flynn strengths | model tier cost shaping, automation primitives, tool policy controls, strict architecture |
| Naming map | OpenClaw (platform), Molty (persona), Clawd/ClawdBot and MoltBot (legacy names) |
| Use this report for | roadmap prioritization and product-vs-platform tradeoff decisions |
## Evidence Sources and Methodology
### Sources
- OpenClaw repo README: https://github.com/openclaw/openclaw
- OpenClaw docs index: https://docs.openclaw.ai/llms.txt
- OpenClaw docs used directly:
- https://docs.openclaw.ai/concepts/architecture
- https://docs.openclaw.ai/concepts/agent-loop
- https://docs.openclaw.ai/concepts/session
- https://docs.openclaw.ai/concepts/model-failover
- https://docs.openclaw.ai/concepts/queue
- https://docs.openclaw.ai/concepts/streaming
- https://docs.openclaw.ai/concepts/memory
- https://docs.openclaw.ai/tools/skills
- https://docs.openclaw.ai/gateway/security
- https://docs.openclaw.ai/start/wizard
- https://docs.openclaw.ai/start/lore
- Flynn local references:
- `AGENTS.md`
- architecture and subsystem modules under `src/`
### Naming clarification
- OpenClaw is the current project/platform name.
- Molty is the assistant persona.
- Clawd/ClawdBot and MoltBot are legacy naming stages.
- Evidence: https://docs.openclaw.ai/start/lore
### Scoring method
- Per-dimension score: 0 to 5
- Weighted by importance to personal-assistant efficiency
- Weighted points = score * weight
## What Makes OpenClaw Efficient as a Personal Assistant
### 1) Unified multi-channel inbox
- Broad channel support (WhatsApp, Telegram, Slack, Discord, Google Chat, Signal, iMessage/BlueBubbles, Microsoft Teams, Matrix, Zalo, WebChat) behind one gateway.
- Session continuity across surfaces.
- Efficiency gain: less context switching, higher day-to-day usage.
- Evidence: https://github.com/openclaw/openclaw
### 2) Local-first gateway
- Gateway runs locally on user-controlled infrastructure.
- Local data ownership for sessions/credentials/workspace.
- Efficiency gain: trust, privacy posture, and reduced cloud dependency friction.
- Evidence: https://docs.openclaw.ai/concepts/architecture
### 3) Session isolation with queue policy
- Per-channel/sender-style isolation, group activation controls, queue modes, TTL hygiene.
- Efficiency gain: minimal context bleed and stable behavior under message bursts.
- Evidence: https://docs.openclaw.ai/concepts/session and https://docs.openclaw.ai/concepts/queue
### 4) Real-time control plane
- JSON-RPC over WebSocket for request/response/events.
- Streaming, typing, and event updates for responsive UX.
- Efficiency gain: reduced "silent waiting" and better perceived performance.
- Evidence: https://docs.openclaw.ai/concepts/architecture and https://docs.openclaw.ai/concepts/streaming
### 5) Companion apps and voice
- macOS/iOS/Android node story with device-local capabilities.
- Voice wake and talk mode patterns.
- Efficiency gain: assistant is reachable in more real-world contexts, including hands-free usage.
- Evidence: https://github.com/openclaw/openclaw and docs index pages under `platforms/` and `nodes/`
### 6) Skills plus security defaults
- Skills ecosystem and registry/discovery posture.
- Safety defaults for inbound DMs and exposed surfaces.
- Efficiency gain: extensibility without losing trust.
- Evidence: https://docs.openclaw.ai/tools/skills and https://docs.openclaw.ai/gateway/security
### 7) Browser/canvas surfaces
- Browser automation and visual workspace patterns are integrated into assistant workflows.
- Efficiency gain: more task classes become automatable end-to-end.
- Evidence: https://github.com/openclaw/openclaw and docs index pages under `tools/browser` and `platforms/mac/canvas`
### 8) Guided onboarding
- Setup wizard approach lowers setup friction.
- Efficiency gain: better activation for non-expert users.
- Evidence: https://docs.openclaw.ai/start/wizard
## Flynn Architecture Overview
```text
Channel Adapter -> ChannelRegistry -> MessageRouter -> AgentOrchestrator -> NativeAgent -> ModelClient
| | |
SessionManager <--------------------------------+
|
SQLite
```
Key Flynn strengths:
- Strict TypeScript and clear subsystem interfaces.
- Modular architecture with clean extension points.
- Strong test posture.
- YAML + Zod validation with environment expansion.
- 4-tier model routing (local/fast/default/complex) with fallback chains.
- Mature tool policy profile system and grouped controls.
- Robust automation primitives (cron/webhooks/Gmail watcher/heartbeat patterns).
## Weighted Efficiency Scorecard
| Dimension | Weight | OpenClaw | Flynn | Why it matters |
|---|---:|---:|---:|---|
| Reach: channels and surfaces | 16 | 5 | 3 | Lower context switching drives assistant usage |
| Onboarding speed | 10 | 5 | 3 | Faster setup improves adoption |
| Responsiveness under load | 12 | 5 | 4 | Queue + streaming quality affects daily UX |
| Session isolation and continuity | 10 | 5 | 4 | Prevents context bleed across conversations |
| Model reliability + failover | 10 | 5 | 4 | Avoids downtime and degraded behavior |
| Cost efficiency controls | 8 | 4 | 5 | Critical for frequent daily operation |
| Safety defaults on messaging surfaces | 12 | 5 | 4 | Prevents risky or unauthorized actions |
| Proactive automation | 10 | 4 | 5 | Increases utility without manual prompting |
| Memory architecture (quality vs cost) | 7 | 4 | 4 | Better recall with bounded token growth |
| Extensibility (skills/tools/plugins) | 5 | 5 | 4 | Keeps assistant adaptable over time |
Totals:
- OpenClaw: 478 / 500 (95.6%)
- Flynn: 393 / 500 (78.6%)
## Feature-by-Feature Comparison
### Gateway and protocol
| Feature | Flynn | OpenClaw | Notes |
|---|:---:|:---:|---|
| JSON-RPC gateway protocol | Yes | Yes | Core parity |
| Session-aware orchestration | Yes | Yes | Core parity |
| Static web surfaces | Yes | Yes | Core parity |
| Tailscale-style remote access support | Yes | Yes | Similar posture |
| Role/scoped node permissions | Partial | Strong | OpenClaw has more productized node model |
| Protocol version negotiation | Limited | Strong | OpenClaw more explicit |
### Channel reach
| Channel cluster | Flynn | OpenClaw |
|---|---:|---:|
| Core chat (Telegram/Discord/Slack/WhatsApp/WebChat) | Strong | Strong |
| Signal/Matrix/Google Chat | Limited | Strong |
| iMessage/BlueBubbles/Teams/LINE-family | Limited | Strong |
### Session and memory
| Aspect | Flynn | OpenClaw | Edge |
|---|---|---|---|
| Session store | SQLite-backed | Session-centric gateway model | Different strengths |
| Isolation model | Strong | Strong | Parity on concept |
| Memory + retrieval | Hybrid approach | Hybrid approach | Near parity |
| Context pressure handling | Compaction/extraction patterns | Compaction/trimming patterns | Near parity |
### Tooling and automation
| Aspect | Flynn | OpenClaw | Edge |
|---|---|---|---|
| Tool policy granularity | Strong profiles/groups | Strong product safety defaults | Different strengths |
| Automation (cron/webhooks/triggers) | Strong | Strong | Near parity |
| Browser/canvas/node actions | Limited | Strong | OpenClaw |
### Model strategy
| Aspect | Flynn | OpenClaw | Edge |
|---|---|---|---|
| Tiered routing for cost shaping | Strong | Moderate | Flynn |
| Failover/auth profile resilience | Strong | Strong | Near parity |
| Per-session model behavior control | Strong | Moderate | Flynn |
## Flynn Unique Strengths
- Strong cost-shaping via explicit model tiers and delegation.
- High engineering maintainability (types, modularity, tests).
- Mature policy controls around tools and runtime behavior.
- Robust automation foundations for proactive assistant workflows.
## Critical Gaps (Flynn vs OpenClaw product efficiency)
### High-impact gaps
- Channel breadth beyond current core set.
- Companion app/device-node ecosystem.
- Voice-first interaction surfaces.
- Browser/canvas-level assistant UX.
- Guided onboarding parity.
### Gap interpretation
Most score delta comes from reach and product polish, not core architecture quality.
## When to Choose Which
- Choose OpenClaw when you need maximum out-of-box personal-assistant product feel now (broader surfaces, companion apps, voice).
- Choose Flynn when you prioritize architecture control, cost predictability, and strong automation/tool-policy mechanics.
## Priority Roadmap for Flynn (deduplicated)
1. Add top-impact channels (Signal and Matrix first).
2. Improve onboarding flow with a guided wizard for common setups.
3. Expose queue-policy UX controls for real-world chat burst handling.
4. Add a minimal browser-control toolset for practical automation.
5. Create personal-assistant preset bundles (safety/memory/automation defaults).
6. Treat companion apps and voice as a separate larger initiative with a stable shared protocol.
## Conclusion
OpenClaw currently wins on personal-assistant product efficiency because it is more complete at the interaction surface level. Flynn wins on architecture quality and controllability. Flynn can close much of the practical gap quickly by prioritizing onboarding, reach expansion, and assistant-first UX layers on top of its already strong core.
+107
View File
@@ -0,0 +1,107 @@
# Flynn Implementation Sequence (Phase 1 -> Phase 3)
Created: 2026-02-12
Owner: Flynn core
## Objective
Provide a single execution order for all planned PRs, with dependencies, risk level, and rough delivery timeline.
## Subagent Model Override
For implementation execution across these phases, use:
- `zai-coding-plan/glm-4.7`
Replace prior Sonnet-default subagent execution assumptions with this model for planning/implementation/review passes unless a task explicitly requires a different model.
## PR Order
1. Phase 1 PR #1 - Context levels
- File: `docs/plans/phase1-pr1-context-level-checklist.md`
- Why first: lowest-risk foundation for prompt behavior control.
- Dependencies: none.
2. Phase 1 PR #2 - Fast-path command registry
- File: `docs/plans/phase1-pr2-command-registry-checklist.md`
- Why second: adds deterministic low-latency command handling.
- Dependencies: none (independent of PR #1).
3. Phase 1 PR #3 - Memory category structure
- File: `docs/plans/phase1-pr3-memory-structure-checklist.md`
- Why third: additive memory foundation for later adaptive behavior.
- Dependencies: none.
4. Phase 2 PR #1 - Component registry routing
- File: `docs/plans/phase2-pr1-component-registry-checklist.md`
- Why now: enables configurable intent-to-target mapping.
- Dependencies: ideally after Phase 1 PR #2 (shared fast-path patterns), but can run independently.
5. Phase 2 PR #2 - Confidence-based routing
- File: `docs/plans/phase2-pr2-confidence-routing-checklist.md`
- Why after PR #1: consumes intent match outputs from component registry.
- Dependencies: Phase 2 PR #1.
6. Phase 2 PR #3 - History index and topic search
- File: `docs/plans/phase2-pr3-history-index-checklist.md`
- Why here: augments routing/context with historical relevance.
- Dependencies: none hard; optional integration with Phase 2 PR #2 for confidence boost.
7. Phase 3 PR #1 - Adaptive memory + weighted compaction
- File: `docs/plans/phase3-pr1-adaptive-memory-compaction-checklist.md`
- Why after memory structure: relies on robust memory primitives and categories.
- Dependencies: Phase 1 PR #3.
8. Phase 3 PR #2 - Truthfulness/policy/autonomy hardening
- File: `docs/plans/phase3-pr2-policy-autonomy-hardening-checklist.md`
- Why last: cross-cutting policy changes should land after routing/memory stabilization.
- Dependencies: none hard; recommended final to reduce churn.
## Dependency Graph (Simple)
- Phase 1 PR #1 -> none
- Phase 1 PR #2 -> none
- Phase 1 PR #3 -> none
- Phase 2 PR #1 -> (recommended after Phase 1 PR #2)
- Phase 2 PR #2 -> Phase 2 PR #1
- Phase 2 PR #3 -> none (optional hook into Phase 2 PR #2)
- Phase 3 PR #1 -> Phase 1 PR #3
- Phase 3 PR #2 -> recommended after all previous PRs
## Suggested Parallelization
Parallel lane A:
- Phase 1 PR #1 -> Phase 1 PR #3 -> Phase 3 PR #1
Parallel lane B:
- Phase 1 PR #2 -> Phase 2 PR #1 -> Phase 2 PR #2
Parallel lane C:
- Phase 2 PR #3 (can start after session/store migration review)
Final convergence:
- Phase 3 PR #2
## Estimated Timeline (Engineering Time)
- Phase 1 total: ~15-21 hours
- Phase 2 total: ~18-24 hours
- Phase 3 total: ~16-20 hours
- Total execution: ~49-65 hours
With 2 active lanes and normal review cadence:
- Best case: 2-3 working weeks
- Conservative: 3-4 working weeks
## Merge Policy
- Merge one PR per checklist file.
- Do not batch multiple checklist PRs into one branch.
- Re-run full quality gates each merge:
- `docs/plans/remaining-phases-rollout-quality-gates.md`
## State Tracking
After each merged implementation PR:
- Update `docs/plans/state.json` progress/test counts.
- Keep feature status and gap scorecard in sync.
- Reference merged checklist file in PR description for traceability.
@@ -0,0 +1,179 @@
# Phase 1 PR #1 Checklist: Context Levels
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Add configurable prompt context levels so Flynn can trade off speed/cost vs depth:
- `minimal`
- `normal` (default)
- `detailed`
- `debug`
This PR only adds config + prompt assembly behavior. It does not add command routing or runtime switching.
## PR Boundary
In scope:
- Schema support for `prompt.context_level`
- Pass `context_level` into prompt assembly
- Context-level behavior in `assembleSystemPrompt`
- Unit tests for parsing and prompt output differences
Out of scope:
- Per-agent overrides
- Slash-command switching (`/context ...`)
- Adaptive auto-switching by token budget
- Tool-list verbosity tuning inside `NativeAgent`
## File-by-File Diff Plan
1) `src/config/schema.ts`
- Add enum and default under `promptSchema`.
- Export type for use by prompt module.
```diff
@@
const promptSchema = z.object({
search_dirs: z.array(z.string()).default([]),
extra_sections: z.array(z.object({
name: z.string(),
content: z.string(),
})).default([]),
+ context_level: z.enum(['minimal', 'normal', 'detailed', 'debug']).default('normal'),
}).default({});
@@
+export type ContextLevel = z.infer<typeof promptSchema.shape.context_level>;
export type PromptConfig = z.infer<typeof promptSchema>;
```
2) `src/prompt/template.ts`
- Accept `contextLevel` in `PromptTemplateConfig`.
- Implement level-based assembly behavior:
- `minimal`: load `SOUL.md` + runtime only, skip extra sections
- `normal`: current behavior (all templates + runtime)
- `detailed`: `normal` + include extra sections explicitly (same as current, documented)
- `debug`: `detailed` + append loaded file list and directory resolution notes in a debug section
```diff
@@
+import type { ContextLevel } from '../config/schema.js';
@@
export interface PromptTemplateConfig {
searchDirs: string[];
extraSections?: Array<{ name: string; content: string }>;
+ contextLevel?: ContextLevel;
}
@@
-export function assembleSystemPrompt(config: PromptTemplateConfig): PromptTemplateResult {
+export function assembleSystemPrompt(config: PromptTemplateConfig): PromptTemplateResult {
+ const level = config.contextLevel ?? 'normal';
+ const includeAllTemplates = level !== 'minimal';
+ const includeExtraSections = level !== 'minimal';
+ const includeDebugSection = level === 'debug';
const sections: string[] = [];
const loadedFiles: string[] = [];
- for (const { name, section } of PROMPT_FILES) {
+ for (const { name, section } of PROMPT_FILES) {
+ if (!includeAllTemplates && name !== 'SOUL.md') { continue; }
...
}
- if (config.extraSections) {
+ if (includeExtraSections && config.extraSections) {
...
}
+ if (includeDebugSection) {
+ sections.push(`# Prompt Debug\n\nContext level: ${level}\nLoaded files:\n${loadedFiles.map(f => `- ${f}`).join('\n') || '- none'}`);
+ }
...
}
```
3) `src/daemon/services.ts`
- Pass config value through `loadSystemPrompt`.
```diff
@@
const result = assembleSystemPrompt({
searchDirs,
extraSections: config.prompt.extra_sections,
+ contextLevel: config.prompt.context_level,
});
```
4) `src/prompt/template.test.ts`
- Add focused tests for level behavior.
```diff
@@
+it('uses normal as default context level', ...)
+it('minimal loads SOUL plus runtime only', ...)
+it('normal keeps current template behavior', ...)
+it('detailed includes extra sections', ...)
+it('debug appends prompt debug section with loaded files', ...)
+it('minimal skips extra sections', ...)
```
5) `config/default.yaml` (optional but recommended)
- Add commented prompt block to advertise the new setting.
```diff
@@
+# prompt:
+# search_dirs: []
+# extra_sections: []
+# context_level: normal # minimal | normal | detailed | debug
```
## Implementation Steps
1. Add schema enum + exported type in `src/config/schema.ts`.
2. Update prompt template types and logic in `src/prompt/template.ts`.
3. Thread config value in `src/daemon/services.ts`.
4. Add/adjust tests in `src/prompt/template.test.ts`.
5. Optionally document setting in `config/default.yaml`.
6. Run validation commands and fix any regressions.
## Validation Commands
Run in this order:
```bash
pnpm typecheck
pnpm test:run src/prompt/template.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Config parses with no `prompt.context_level` set (defaults to `normal`).
- Invalid context levels are rejected by schema.
- `minimal` omits AGENTS/IDENTITY/USER/TOOLS sections.
- `normal` output matches current behavior.
- `detailed` keeps full prompt and extra sections.
- `debug` adds an explicit debug section with loaded file paths.
- Existing prompt tests continue to pass.
## Risk Notes
- Main risk: prompt regressions from changed assembly order.
- Mitigation: keep `normal` behavior byte-close to current output and pin tests.
## Suggested Commit Message
`feat(prompt): add configurable context levels for system prompt assembly`
## Follow-up PRs
1. Add runtime/session override (`/context minimal|normal|detailed|debug`).
2. Extend levels to tool-list verbosity and memory injection depth.
3. Add per-agent `context_level` override in `agent_configs`.
@@ -0,0 +1,191 @@
# Phase 1 PR #2 Checklist: Fast-Path Command Registry
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Add a deterministic `CommandRegistry` that handles simple slash commands before `AgentOrchestrator`, reducing latency/token usage and keeping fallback behavior fully intact.
Initial fast-path commands for this PR:
- `/help`
- `/status`
- `/usage`
- `/model`
- `/compact`
- `/reset`
## PR Boundary
In scope:
- New `CommandRegistry` abstraction with typed handlers
- Built-in deterministic command handlers
- Fast-path integration in channel routing and gateway agent handler
- Graceful fallback to existing orchestrator flow when unmatched
- Unit + integration tests
Out of scope:
- Runtime plugin command loading
- Command auth/ACL system beyond current session/channel constraints
- Command chaining/pipelines
- Per-command rate limiting
## File-by-File Diff Plan
1) `src/commands/types.ts` (new)
- Define command contracts.
```ts
export interface CommandContext {
channel: string;
senderId: string;
sessionId: string;
rawInput: string;
}
export interface CommandResult {
handled: boolean;
text: string;
}
export interface CommandDefinition {
name: string;
aliases?: string[];
description: string;
execute: (args: string[], ctx: CommandContext) => Promise<CommandResult>;
}
```
2) `src/commands/registry.ts` (new)
- Implement registration + parse + execute.
- Use `Map` for O(1) lookup.
Required methods:
- `register(def: CommandDefinition): void`
- `get(nameOrAlias: string): CommandDefinition | undefined`
- `list(): CommandDefinition[]`
- `isCommand(input: string): boolean`
- `parse(input: string): { name: string; args: string[] } | null`
- `execute(input: string, ctx: CommandContext): Promise<CommandResult>`
3) `src/commands/builtin/index.ts` (new)
- Export command factories to avoid circular deps and enable daemon wiring with dependencies.
Factories:
- `createHelpCommand(registry: CommandRegistry)`
- `createStatusCommand()`
- `createUsageCommand()`
- `createModelCommand(...)`
- `createCompactCommand(...)`
- `createResetCommand(...)`
4) `src/commands/index.ts` (new)
- Barrel exports.
5) `src/daemon/index.ts` (modify)
- Construct a `CommandRegistry` at daemon startup.
- Register built-in commands with required deps.
- Pass registry into both message routing and gateway handler dependency paths.
- Add to `DaemonContext` if needed for future introspection.
6) `src/daemon/routing.ts` (modify)
- In `handler`, before invoking `agent.process(...)`:
- detect slash command with registry
- execute command
- if `handled`, reply immediately and return
- if not handled, continue existing orchestrator path
Important:
- Preserve existing `msg.metadata?.isCommand` behavior.
- Do not break current `/model`, `/usage`, `/compact`, `/reset` logic while migrating; either reuse existing code in handlers or keep compatibility shim.
7) `src/gateway/handlers/agent.ts` (modify)
- Add equivalent fast-path before agent processing.
- Return command output through existing gateway response events.
- Maintain cancellation/queue behavior unchanged.
8) `src/gateway/handlers/index.ts` and related handler deps (modify as needed)
- Thread `commandRegistry` through constructor/dependency interfaces.
9) Tests
- `src/commands/registry.test.ts` (new)
- `src/daemon/routing.test.ts` (modify/add cases)
- `src/gateway/handlers/agent.test.ts` (modify/add cases)
## Implementation Steps
1. Create typed command contracts and registry.
2. Add core parse/lookup/execute behavior with strict handling for unknown commands.
3. Implement built-in command handlers as factories.
4. Wire registry in daemon bootstrap.
5. Add channel router fast-path.
6. Add gateway fast-path.
7. Add tests for success/fallback/error cases.
8. Run full validation suite.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/commands/registry.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run src/gateway/handlers/agent.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- `CommandRegistry` supports register/get/list/parse/execute.
- Known commands are handled without entering `AgentOrchestrator.process()`.
- Unknown commands cleanly fall through to existing orchestration flow.
- Existing slash command UX remains unchanged for users.
- `/model`, `/usage`, `/compact`, `/reset` still function correctly via fast-path.
- Gateway and channel routes both support fast-path execution.
- All tests pass; no regressions in existing command behavior.
## Quality Gates
- Backward compatibility:
- Existing command parsing and metadata behavior still works.
- Existing TUI/web command flows are not broken.
- Security:
- No shell execution from command names/args in registry core.
- Input length and basic parse validation enforced.
- Handler errors are caught and returned safely.
- Performance:
- Lookup uses O(1) maps.
- No measurable slowdown for non-command messages.
- Reliability:
- Registry failures do not crash daemon startup unexpectedly.
- Unknown command behavior deterministic and tested.
## Risks and Mitigations
- Risk: duplicate command logic between old path and new path.
- Mitigation: move existing command branches into reusable built-in handlers, keep thin compatibility shim.
- Risk: divergence between gateway and channel behavior.
- Mitigation: reuse same registry + handler functions; add parity tests for both entrypoints.
- Risk: hidden regressions in slash command edge-cases.
- Mitigation: preserve existing parse semantics and add regression tests.
## Suggested Commit Message
`feat(commands): add fast-path command registry before orchestrator`
## Follow-up PRs
1. Add command discovery/docs endpoint for UI autocomplete.
2. Add optional command-level permission profiles.
3. Add skill-registered commands (`//skill`) on top of registry.
@@ -0,0 +1,691 @@
# Phase 1 PR #3 Checklist: Memory Category Structure
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Introduce structured memory categories (facts/preferences/decisions/projects) to organize persistent memory storage while maintaining full backward compatibility with the existing flat namespace system. Add category-aware retrieval helpers and minimal integration points without changing user-facing commands yet.
Categories:
- **facts**: Verified facts about user, environment, tools, or systems
- **preferences**: User preferences, workflow patterns, communication style
- **decisions**: Important decisions made, their rationale, and outcomes
- **projects**: Project-specific context, goals, and progress notes
## PR Boundary
In scope:
- Category metadata structure and validation
- Backward-compatible category namespacing (e.g., `user/facts`, `user/preferences`)
- Category-aware read/write/search helpers in MemoryStore
- Retrieval helpers for category filtering
- Tests for category operations with backward compatibility validation
- Wire minimal integration points for auto-extraction (no command changes)
Out of scope:
- User-facing command changes (`/memory` commands)
- Auto-categorization of existing memory content
- Category-specific extraction prompts in AgentOrchestrator
- Category-based UI in web dashboard
- Category migration tools for existing deployments
- Category-specific embedding strategies
## File-by-File Implementation Plan
### 1. `src/memory/categories.ts` (new)
**Purpose:** Define category metadata, validation, and namespace mapping.
**Implementation:**
```typescript
/**
* Supported memory categories for structured organization.
*/
export const MEMORY_CATEGORIES = {
facts: {
name: 'facts',
description: 'Verified facts about user, environment, tools, or systems',
namespace: 'facts',
},
preferences: {
name: 'preferences',
description: 'User preferences, workflow patterns, communication style',
namespace: 'preferences',
},
decisions: {
name: 'decisions',
description: 'Important decisions made, their rationale, and outcomes',
namespace: 'decisions',
},
projects: {
name: 'projects',
description: 'Project-specific context, goals, and progress notes',
namespace: 'projects',
},
} as const;
export type MemoryCategory = keyof typeof MEMORY_CATEGORIES;
export const VALID_CATEGORIES = Object.keys(MEMORY_CATEGORIES) as MemoryCategory[];
/**
* Validate if a string is a valid memory category.
*/
export function isValidCategory(category: string): category is MemoryCategory {
return VALID_CATEGORIES.includes(category as MemoryCategory);
}
/**
* Map a category to its namespace path within a base namespace.
* Examples:
* - buildCategoryNamespace('user', 'facts') → 'user/facts'
* - buildCategoryNamespace('sessions/abc', 'preferences') → 'sessions/abc/preferences'
*/
export function buildCategoryNamespace(
baseNamespace: string,
category: MemoryCategory
): string {
return `${baseNamespace}/${MEMORY_CATEGORIES[category].namespace}`;
}
/**
* Parse a namespace path to extract base and category if present.
* Examples:
* - 'user/facts' → { base: 'user', category: 'facts' }
* - 'user' → { base: 'user', category: undefined }
* - 'sessions/abc/decisions' → { base: 'sessions/abc', category: 'decisions' }
*/
export function parseNamespace(namespace: string): {
base: string;
category?: MemoryCategory;
} {
const parts = namespace.split('/');
const lastPart = parts[parts.length - 1];
if (isValidCategory(lastPart)) {
const base = parts.slice(0, -1).join('/');
return { base, category: lastPart };
}
return { base: namespace, category: undefined };
}
/**
* List all category namespaces under a base namespace.
* Example: 'user' → ['user/facts', 'user/preferences', 'user/decisions', 'user/projects']
*/
export function listCategoryNamespaces(baseNamespace: string): string[] {
return VALID_CATEGORIES.map((cat) => buildCategoryNamespace(baseNamespace, cat));
}
```
**Tests:**
- Validate category constants and types
- Test `isValidCategory()` with valid/invalid inputs
- Test `buildCategoryNamespace()` with various base paths
- Test `parseNamespace()` with flat and nested paths
- Test `listCategoryNamespaces()` returns all categories
**Estimated effort:** 1 hour
---
### 2. `src/memory/store.ts` (modify)
**Purpose:** Add category-aware methods while preserving existing flat namespace behavior.
**Changes:**
1. Add imports:
```typescript
import type { MemoryCategory } from './categories.js';
import { isValidCategory, buildCategoryNamespace, parseNamespace } from './categories.js';
```
2. Add new methods (after existing public API, before private helpers):
```typescript
/**
* Read content from a specific category within a base namespace.
* Backward-compatible: if category file doesn't exist, returns empty string.
*
* @param baseNamespace - Base namespace (e.g., 'user', 'sessions/abc123')
* @param category - Category to read from
*/
readCategory(baseNamespace: string, category: MemoryCategory): string {
const namespace = buildCategoryNamespace(baseNamespace, category);
return this.read(namespace);
}
/**
* Write content to a specific category within a base namespace.
*
* @param baseNamespace - Base namespace (e.g., 'user', 'sessions/abc123')
* @param category - Category to write to
* @param content - Content to write
* @param mode - 'append' or 'replace'
*/
writeCategory(
baseNamespace: string,
category: MemoryCategory,
content: string,
mode: 'append' | 'replace'
): void {
const namespace = buildCategoryNamespace(baseNamespace, category);
this.write(namespace, content, mode);
}
/**
* List all categories present under a base namespace.
* Returns only categories that have existing files.
*
* @param baseNamespace - Base namespace to check
*/
listCategories(baseNamespace: string): MemoryCategory[] {
const allNamespaces = this.listNamespaces();
const categories: MemoryCategory[] = [];
for (const ns of allNamespaces) {
const parsed = parseNamespace(ns);
if (parsed.base === baseNamespace && parsed.category) {
categories.push(parsed.category);
}
}
return categories;
}
/**
* Read all category content under a base namespace.
* Returns a map of category → content for all existing category files.
*
* @param baseNamespace - Base namespace to read from
*/
readAllCategories(baseNamespace: string): Map<MemoryCategory, string> {
const result = new Map<MemoryCategory, string>();
const categories = this.listCategories(baseNamespace);
for (const category of categories) {
const content = this.readCategory(baseNamespace, category);
if (content.length > 0) {
result.set(category, content);
}
}
return result;
}
/**
* Search with optional category filtering.
* If categories are specified, only searches within those category namespaces.
*
* @param query - Search query
* @param options - Optional filters
*/
searchWithCategories(
query: string,
options?: {
baseNamespace?: string;
categories?: MemoryCategory[];
}
): SearchResult[] {
const allResults = this.search(query);
if (!options?.baseNamespace && !options?.categories) {
return allResults;
}
return allResults.filter((result) => {
const parsed = parseNamespace(result.namespace);
// Filter by base namespace if specified
if (options.baseNamespace && !result.namespace.startsWith(options.baseNamespace)) {
return false;
}
// Filter by categories if specified
if (options.categories && parsed.category) {
return options.categories.includes(parsed.category);
}
// If categories filter is specified but this namespace has no category, exclude it
if (options.categories && !parsed.category) {
return false;
}
return true;
});
}
```
3. Update `getContextForPrompt()` to include categories:
```typescript
/**
* Build memory context suitable for injection into a system prompt.
*
* Reads both flat `user.md`/`global.md` (for backward compat) and
* structured categories under 'user' namespace.
*
* Truncates to stay within {@link MemoryStoreConfig.maxContextTokens}.
*/
getContextForPrompt(): string {
const sections: string[] = [];
// Read legacy flat files for backward compatibility
const userMemory = this.read('user');
const globalMemory = this.read('global');
if (userMemory.length > 0) {
sections.push(`## User Memory\n\n${userMemory}`);
}
if (globalMemory.length > 0) {
sections.push(`## Global Memory\n\n${globalMemory}`);
}
// Read structured categories under 'user' namespace
const categories = this.readAllCategories('user');
for (const [category, content] of categories) {
if (content.length > 0) {
const categoryLabel = category.charAt(0).toUpperCase() + category.slice(1);
sections.push(`## User ${categoryLabel}\n\n${content}`);
}
}
// Nothing to inject
if (sections.length === 0) {
return '';
}
const full = sections.join('\n\n');
// Truncate to fit within the token budget (estimate: 4 chars ≈ 1 token)
const maxChars = this._config.maxContextTokens * 4;
if (full.length <= maxChars) {
return full;
}
return full.slice(0, maxChars);
}
```
**Tests:** (in `src/memory/store.test.ts`)
- Test `readCategory()` returns empty for non-existent categories
- Test `writeCategory()` creates proper namespace paths
- Test `listCategories()` returns only existing categories
- Test `readAllCategories()` returns map of all category content
- Test `searchWithCategories()` with no filters (returns all)
- Test `searchWithCategories()` with base namespace filter
- Test `searchWithCategories()` with categories filter
- Test `searchWithCategories()` with both filters
- Test `getContextForPrompt()` includes both legacy and category content
- Test backward compatibility: flat `user.md` still works alongside categories
**Estimated effort:** 2-3 hours
---
### 3. `src/memory/categories.test.ts` (new)
**Purpose:** Comprehensive tests for category utilities.
**Coverage:**
- Category validation
- Namespace building and parsing
- Edge cases (empty strings, special characters, nested paths)
- Backward compatibility with flat namespaces
**Estimated effort:** 1 hour
---
### 4. `src/memory/index.ts` (modify)
**Purpose:** Export category types and utilities.
**Changes:**
```typescript
export { MemoryStore } from './store.js';
export type { MemoryStoreConfig, SearchResult } from './store.js';
export { chunkText } from './chunker.js';
export type { Chunk, ChunkOptions } from './chunker.js';
export { createEmbeddingProvider, OpenAIEmbeddingProvider, GeminiEmbeddingProvider, OllamaEmbeddingProvider, LlamaCppEmbeddingProvider } from './embeddings.js';
export type { EmbeddingProvider } from './embeddings.js';
export { VectorStore, cosineSimilarity, contentHash } from './vector-store.js';
export type { VectorSearchResult, EmbeddingRow } from './vector-store.js';
export { HybridSearch } from './hybrid-search.js';
export type { HybridSearchResult } from './hybrid-search.js';
export { MEMORY_CATEGORIES, VALID_CATEGORIES, isValidCategory, buildCategoryNamespace, parseNamespace, listCategoryNamespaces } from './categories.js';
export type { MemoryCategory } from './categories.js';
```
**Estimated effort:** 5 minutes
---
### 5. `src/tools/builtin/memory-read.ts` (modify)
**Purpose:** Update tool description to mention categories (no functional change yet).
**Changes:**
Update description to document category support:
```typescript
description:
'Read a persistent memory file by namespace. Available namespaces include "user" (user preferences and facts), "global" (cross-session knowledge), and session-specific namespaces. ' +
'Supports structured categories: append "/facts", "/preferences", "/decisions", or "/projects" to a base namespace (e.g., "user/facts", "sessions/abc123/preferences"). ' +
'Returns the full contents of the memory file.',
```
**Estimated effort:** 5 minutes
---
### 6. `src/tools/builtin/memory-write.ts` (modify)
**Purpose:** Update tool description to mention categories (no functional change yet).
**Changes:**
Update description:
```typescript
description:
'Write to a persistent memory file. Use mode="append" to add new information without overwriting existing content, or mode="replace" to overwrite the entire namespace. ' +
'Supports structured categories: use namespaces like "user/facts", "user/preferences", "user/decisions", or "user/projects" for organized storage.',
```
**Estimated effort:** 5 minutes
---
### 7. `src/tools/builtin/memory-search.ts` (modify - optional for this PR)
**Purpose:** Document category filtering support in description (implementation deferred to follow-up).
**Changes:**
Update description:
```typescript
description:
'Search across all memory files for a keyword or phrase. Returns matching lines with surrounding context from every namespace.' +
(hybridSearch ? ' Uses semantic vector search combined with keyword matching for better results.' : '') +
' Future: will support category filtering (e.g., search only within facts or preferences).',
```
**Estimated effort:** 2 minutes
---
### 8. Integration wiring (no user-facing changes)
**Files to review (no changes required for this PR):**
- `src/backends/native/orchestrator.ts` — memory extraction hook point (already uses `memoryStore.write()`, categories will work automatically)
- `src/daemon/memory.ts` — memory initialization (no changes needed, categories are opt-in)
**Validation:** Existing memory tools continue to work with flat namespaces; category namespaces are treated as regular namespaces by existing code.
**Estimated effort:** 30 minutes review
---
## Implementation Steps
**Phase 1: Foundation (3-4 hours)**
1. Create `src/memory/categories.ts` with types, constants, and utilities
2. Create `src/memory/categories.test.ts` with comprehensive tests
3. Run: `pnpm typecheck && pnpm test:run src/memory/categories.test.ts`
**Phase 2: Store Integration (3-4 hours)**
4. Add category-aware methods to `src/memory/store.ts`
5. Update `getContextForPrompt()` to include category content
6. Add tests to `src/memory/store.test.ts` for new methods and backward compat
7. Run: `pnpm typecheck && pnpm test:run src/memory/store.test.ts`
**Phase 3: Exports and Documentation (1 hour)**
8. Update `src/memory/index.ts` with new exports
9. Update tool descriptions in `memory-read.ts`, `memory-write.ts`, `memory-search.ts`
10. Run: `pnpm typecheck && pnpm test:run`
**Phase 4: Validation (30 minutes)**
11. Build and lint: `pnpm build && pnpm lint`
12. Full test suite: `pnpm test:run`
13. Manual smoke test: verify flat `user.md` and `user/facts.md` both work
---
## Validation Commands
Run in order:
```bash
# Type checking
pnpm typecheck
# Unit tests for categories
pnpm test:run src/memory/categories.test.ts
# Unit tests for store
pnpm test:run src/memory/store.test.ts
# Full test suite
pnpm test:run
# Linting
pnpm lint
# Build
pnpm build
```
---
## Acceptance Criteria
### Functional
- [ ] Category constants defined with proper TypeScript types
- [ ] `buildCategoryNamespace()` creates proper paths (e.g., `user/facts`)
- [ ] `parseNamespace()` correctly extracts base and category
- [ ] `MemoryStore.readCategory()` reads category-specific content
- [ ] `MemoryStore.writeCategory()` writes to category namespaces
- [ ] `MemoryStore.listCategories()` returns only existing categories
- [ ] `MemoryStore.readAllCategories()` returns map of all category content
- [ ] `searchWithCategories()` filters by base namespace and/or categories
- [ ] `getContextForPrompt()` includes both legacy flat files and categories
### Backward Compatibility
- [ ] Existing flat `user.md` and `global.md` continue to work
- [ ] Existing memory tools work unchanged with flat namespaces
- [ ] Existing tests pass without modification
- [ ] Category namespaces are opt-in, not required
- [ ] Legacy namespace patterns (e.g., `sessions/abc123`) unaffected
### Quality
- [ ] All new functions have JSDoc comments
- [ ] Test coverage ≥90% for new code
- [ ] No type errors
- [ ] No lint warnings
- [ ] Build succeeds
---
## Risk Assessment and Mitigations
### Risk 1: Breaking existing memory access patterns
**Likelihood:** Low
**Impact:** High
**Mitigation:**
- Keep all existing MemoryStore methods unchanged
- Add category methods as new API surface
- Test backward compatibility explicitly
- Ensure `getContextForPrompt()` includes both legacy and category content
### Risk 2: Namespace collision between flat and category paths
**Likelihood:** Low
**Impact:** Medium
**Example:** User creates `user/facts` manually before categories exist
**Mitigation:**
- Document category namespaces in tool descriptions
- `parseNamespace()` handles both flat and category paths gracefully
- No automatic migration means existing files remain untouched
### Risk 3: Token budget exceeded due to category content injection
**Likelihood:** Medium
**Impact:** Low
**Mitigation:**
- Existing `maxContextTokens` truncation logic handles all content
- Categories are additive, not multiplicative (replaces flat structure over time)
- Follow-up PR can add per-category token allocation if needed
### Risk 4: Test regressions in memory-related tests
**Likelihood:** Low
**Impact:** Medium
**Mitigation:**
- Run full test suite before and after changes
- Add explicit backward compatibility test cases
- Use test fixtures that mix flat and category namespaces
---
## Quality Gates
### Code Quality
- [ ] All functions have TypeScript types (no `any`)
- [ ] All public APIs have JSDoc comments
- [ ] Error handling for invalid category names
- [ ] Consistent naming conventions (camelCase functions, PascalCase types)
### Testing
- [ ] Unit tests for all category utilities
- [ ] Integration tests for category read/write/search
- [ ] Backward compatibility tests with legacy namespaces
- [ ] Edge case tests (empty content, missing files, invalid categories)
### Performance
- [ ] No measurable performance regression in `MemoryStore.search()`
- [ ] `listCategories()` scans namespace list once (no redundant I/O)
- [ ] `getContextForPrompt()` maintains O(n) complexity
### Documentation
- [ ] Tool descriptions updated to mention categories
- [ ] JSDoc comments explain category namespace format
- [ ] Examples in JSDoc show category usage patterns
---
## Follow-up PRs (Out of Scope)
1. **Category-specific memory tools:**
- Add `memory.read_category` and `memory.write_category` tools
- Add category filtering to `memory.search` tool
- Update TUI commands to support category selection
2. **Auto-categorization in AgentOrchestrator:**
- Update memory extraction prompts to suggest categories
- Add heuristics for auto-routing content to appropriate categories
3. **Category migration tool:**
- CLI command to analyze existing flat memory files
- Suggest categorization of existing content
- Batch migration with user confirmation
4. **Web dashboard category UI:**
- Category tabs in memory viewer
- Per-category search filters
- Category statistics and visualization
5. **Category-specific embedding strategies:**
- Different embedding models per category
- Category-weighted hybrid search scoring
- Cross-category semantic links
---
## Suggested Commit Message
```
feat(memory): add structured memory categories with backward compatibility
Introduces four memory categories (facts/preferences/decisions/projects)
for organized persistent storage:
- Add category types, constants, and namespace utilities
- Add category-aware read/write/search methods to MemoryStore
- Update getContextForPrompt() to include category content
- Maintain full backward compatibility with flat namespace files
- Update tool descriptions to document category support
Categories use namespace paths (e.g., 'user/facts', 'sessions/abc/preferences')
and are opt-in. Existing flat files ('user.md', 'global.md') continue to work
unchanged.
Tests: 15+ new tests for category operations and backward compatibility
```
---
## PR Description Template
```markdown
## Summary
Adds structured memory categories (facts/preferences/decisions/projects) to organize persistent memory while maintaining full backward compatibility with existing flat namespace files.
## Changes
- **New:** `src/memory/categories.ts` — category types, utilities, namespace mapping
- **Modified:** `src/memory/store.ts` — category-aware read/write/search methods
- **Modified:** `src/memory/index.ts` — export category types
- **Modified:** Tool descriptions updated to document category support
- **Tests:** 15+ new tests for category operations and backward compatibility
## Category Structure
Categories are implemented as namespace paths:
- `user/facts` — verified facts about user, environment, tools
- `user/preferences` — user preferences, workflow patterns, style
- `user/decisions` — important decisions, rationale, outcomes
- `user/projects` — project-specific context and progress
Flat files (`user.md`, `global.md`) continue to work unchanged.
## Testing
All existing tests pass. New tests cover:
- Category namespace building and parsing
- Category-specific read/write operations
- Category filtering in search
- Backward compatibility with flat files
- Mixed flat + category content in prompts
## Backward Compatibility
✅ Existing `user.md` and `global.md` files work unchanged
✅ Existing memory tools work with flat namespaces
✅ Categories are opt-in, not required
✅ No breaking changes to public API
## Follow-up Work
- Category-specific memory tools (`memory.read_category`, etc.)
- Auto-categorization in memory extraction
- Category UI in web dashboard
- Migration tool for existing flat files
```
---
## Time Estimate
Total: **8-10 hours**
Breakdown:
- Categories foundation: 3-4 hours
- Store integration: 3-4 hours
- Exports and docs: 1 hour
- Validation and testing: 1-2 hours
---
## Success Metrics
- [ ] PR merged without breaking changes
- [ ] Test count increases by 15+
- [ ] All existing memory tests pass unchanged
- [ ] Documentation updated and clear
- [ ] No performance regression in memory operations
- [ ] state.json updated with phase completion
@@ -0,0 +1,174 @@
# Phase 1 PR #3 Checklist: Memory Category Structure
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Introduce structured memory categories while preserving current memory behavior.
Categories:
- `facts`
- `preferences`
- `decisions`
- `projects`
This PR is infrastructure-only: no new user-facing commands yet.
## PR Boundary
In scope:
- Add category types/constants/utilities
- Add category-aware `MemoryStore` APIs
- Keep existing `read/write/search/getContextForPrompt` behavior intact
- Add category-aware retrieval helpers for follow-on PRs
- Unit tests for compatibility and correctness
Out of scope:
- Auto-categorization/extraction heuristics
- New slash commands (`/memory add --category ...`)
- Migration CLI that rewrites existing files
- UI/dashboard for category browsing
## Compatibility Model
Existing flat namespaces remain valid:
- `user`
- `global`
- `sessions/<id>`
Category namespaces are additive and path-based:
- `user/facts`
- `user/preferences`
- `global/decisions`
- `sessions/<id>/projects`
No destructive migration in this PR.
## File-by-File Diff Plan
1) `src/memory/categories.ts` (new)
- Add category constants and helpers.
```ts
export const MEMORY_CATEGORIES = ['facts', 'preferences', 'decisions', 'projects'] as const;
export type MemoryCategory = (typeof MEMORY_CATEGORIES)[number];
export function isMemoryCategory(value: string): value is MemoryCategory {
return (MEMORY_CATEGORIES as readonly string[]).includes(value);
}
export function categoryNamespace(baseNamespace: string, category: MemoryCategory): string {
return `${baseNamespace}/${category}`;
}
```
2) `src/memory/store.ts` (modify)
- Add category-aware methods without changing existing API behavior.
New methods:
- `readCategory(baseNamespace: string, category: MemoryCategory): string`
- `writeCategory(baseNamespace: string, category: MemoryCategory, content: string, mode: 'append' | 'replace'): void`
- `listCategories(baseNamespace: string): MemoryCategory[]`
- `readAllCategories(baseNamespace: string): Partial<Record<MemoryCategory, string>>`
- `search(query: string, opts?: { categories?: MemoryCategory[]; baseNamespacePrefix?: string }): SearchResult[]`
Notes:
- Keep old `search(query: string)` call pattern working via optional `opts`.
- Ensure dirty namespace tracking marks category namespaces too.
- Keep `getContextForPrompt()` backward compatible: include legacy sections first, then category sections if present and token budget allows.
3) `src/memory/index.ts` (modify)
- Export category types/utilities.
```ts
export * from './categories.js';
```
4) `src/memory/categories.test.ts` (new)
- Validate category constants/helpers:
- `isMemoryCategory` true/false paths
- namespace composition correctness
5) `src/memory/store.test.ts` (modify/add)
- Add category coverage:
- category read/write/append/replace
- list/readAll categories
- filtered search by category
- backward compatibility for legacy namespaces
- `getContextForPrompt()` includes legacy + category content under token budget
6) `src/tools/builtin/memory.ts` (modify docs-only or no-op code)
- If tool schemas/docs mention namespaces, document category namespace pattern so next PR can add command-level category args cleanly.
## Implementation Steps
1. Add `categories.ts` with constants/types/helpers.
2. Extend `MemoryStore` with category methods (additive only).
3. Extend `search` to support optional category filters.
4. Update prompt context composition to include categories safely.
5. Export new memory APIs from memory index barrel.
6. Add unit tests for category + compatibility behavior.
7. Run full validation.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/memory/categories.test.ts
pnpm test:run src/memory/store.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Existing memory paths (`user`, `global`, `sessions/<id>`) still work unchanged.
- Category APIs function for all four categories.
- Search works both with and without category filters.
- No data loss or rewrite required for existing memory files.
- `getContextForPrompt()` remains stable and token-bounded.
- Dirty namespace/indexing behavior remains correct for new category paths.
- All existing tests pass; new memory tests added.
## Quality Gates
- Backward compatibility:
- Legacy reads/writes/searches unchanged.
- Old memory files remain readable with zero migration.
- Retrieval correctness:
- Category-filtered searches return only matching namespaces.
- Unfiltered searches still span all namespaces.
- Performance:
- No meaningful regression for unfiltered search/list operations.
- Category filtering avoids unnecessary extra scans where possible.
- Reliability:
- Missing category files return empty strings (not errors).
- Invalid categories are rejected at type-level and guarded in runtime helper paths.
## Risks and Mitigations
- Risk: subtle prompt-context ordering changes.
- Mitigation: preserve legacy section order, append category sections after legacy content.
- Risk: token budget overrun from extra sections.
- Mitigation: enforce existing truncation logic after composing all sections.
- Risk: search API break due to signature change.
- Mitigation: keep `opts` optional and maintain old call contract.
## Suggested Commit Message
`feat(memory): add structured category namespaces with backward-compatible APIs`
## Follow-up PRs
1. Add category-aware memory tool arguments and slash command UX.
2. Add auto-extraction into categories during compaction/memory pipeline.
3. Add migration/normalization utility for legacy memory into categories.
@@ -0,0 +1,82 @@
# Phase 2 PR #1 Checklist: Component Registry Routing
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Add a component registry for intent routing so message patterns can map to target agents/skills before full orchestration.
## Scope
In scope:
- intent rule schema in config
- registry with match + priority resolution
- router integration in `daemon/routing.ts`
- gateway inspection handler (`intents.list`, `intents.match`)
Out of scope:
- confidence-based fast-path decisions (PR #2)
- history/context boosts (PR #3)
## Files
New:
- `src/intents/registry.ts`
- `src/intents/index.ts`
- `src/intents/registry.test.ts`
- `src/gateway/handlers/intents.ts`
Modified:
- `src/config/schema.ts`
- `src/daemon/index.ts`
- `src/daemon/routing.ts`
- `src/gateway/handlers/index.ts`
## Implementation Steps
1. Add config schema section `intents`:
- `enabled` (bool, default false)
- `match_threshold` (0..1)
- `rules[]` with `name`, `patterns[]`, `target { type, name }`, `priority`, `enabled`
2. Implement `ComponentRegistry`:
- rule registration
- glob/literal match
- score + tie-break (`priority`, specificity)
3. Add tests for exact/glob matches and tie-breaking.
4. Instantiate registry in daemon startup and load rules from config.
5. In `createMessageRouter`, run intent match before agent resolution.
6. Add gateway handlers to inspect configured rules and test match behavior.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/intents/registry.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run src/gateway/handlers/handlers.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Registry resolves matches deterministically.
- Intent routing is opt-in and disabled by default.
- Messages with matching rules can override default agent target.
- Messages without a match keep existing route behavior.
- Gateway inspection handlers return correct match/rule info.
- No regressions in existing routing tests.
## Risks
- False-positive matches route to wrong agent.
- Mitigation: conservative threshold + debug logging.
- Rule explosion impacts performance.
- Mitigation: precompile/cached patterns and O(n) bounded scan.
## Commit Message
`feat(routing): add component registry for intent-based target resolution`
@@ -0,0 +1,83 @@
# Phase 2 PR #2 Checklist: Confidence-Based Routing
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Introduce confidence-based routing policy to choose between fast-path target execution and full LLM orchestration.
## Scope
In scope:
- routing policy schema with thresholds
- confidence output from intent matcher
- policy decision engine (`fast` vs `llm`)
- routing integration and gateway debug handler
Out of scope:
- history/topic boost (PR #3)
- command registry changes (already in Phase 1)
## Files
New:
- `src/routing/policy.ts`
- `src/routing/index.ts`
- `src/routing/policy.test.ts`
- `src/gateway/handlers/routing.ts`
Modified:
- `src/config/schema.ts`
- `src/intents/registry.ts`
- `src/intents/registry.test.ts`
- `src/daemon/index.ts`
- `src/daemon/routing.ts`
- `src/gateway/handlers/index.ts`
## Implementation Steps
1. Add `routing_policy` config:
- `enabled`
- `fast_path_threshold`
- `llm_threshold`
- `default_path` (`fast|llm`)
2. Extend intent matcher to return confidence score.
3. Implement `RoutingPolicy.decide(...)` with deterministic threshold logic.
4. Apply policy in `daemon/routing.ts` before `agent.process`:
- high confidence -> fast path
- medium/low -> standard orchestration
5. Add `routing.decide` gateway handler for inspection/testing.
6. Add unit/integration tests for threshold boundaries and fallback.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/routing/policy.test.ts
pnpm test:run src/intents/registry.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Policy decisions are deterministic at threshold edges.
- Fast-path is only used when confidence meets configured threshold.
- Unmatched/low-confidence requests always fall back safely.
- Existing command and routing flows remain compatible.
- Routing decision logging is visible for debugging.
## Risks
- Over-aggressive fast-path causes loss of context.
- Mitigation: high default threshold and explicit fallback path.
- Confidence scoring instability.
- Mitigation: test edge cases and keep scoring simple.
## Commit Message
`feat(routing): add confidence-based fast-path policy`
@@ -0,0 +1,88 @@
# Phase 2 PR #3 Checklist: History Index and Topic Search
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Add lightweight session history indexing to support topic/keyword search and improve future routing/context decisions.
## Scope
In scope:
- message metadata indexing (keywords/topics)
- search API over indexed history
- optional routing confidence boost hook
- gateway handlers for search and reindex
Out of scope:
- heavyweight semantic/vector history search
- UI redesign for history explorer
## Files
New:
- `src/session/indexer.ts`
- `src/session/search.ts`
- `src/session/indexer.test.ts`
- `src/session/search.test.ts`
- `src/gateway/handlers/history.ts`
Modified:
- `src/config/schema.ts`
- `src/session/store.ts`
- `src/session/manager.ts`
- `src/daemon/index.ts`
- `src/daemon/routing.ts`
- `src/gateway/handlers/index.ts`
## Implementation Steps
1. Add `history_index` config section (`enabled`, `max_keywords`, `search_limit`, etc.).
2. Extend session persistence with metadata field (migration-safe).
3. Implement indexer:
- tokenize + stopword filtering
- extract top keywords/topics
- attach metadata when messages are written
4. Implement searcher:
- query keyword overlap
- rank by relevance + recency
5. Wire into session manager lifecycle.
6. Add gateway handlers:
- `history.search`
- `history.reindex`
7. Add optional routing hook to boost confidence when query overlaps strong historical topics.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/session/indexer.test.ts
pnpm test:run src/session/search.test.ts
pnpm test:run src/session/store.test.ts
pnpm test:run src/session/manager.test.ts
pnpm test:run src/daemon/routing.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Metadata indexing persists without breaking existing sessions.
- History search returns ranked, relevant results.
- Reindex operation is safe and idempotent.
- Existing session behavior remains unchanged when feature disabled.
- Routing boost path is optional and bounded.
## Risks
- DB migration regressions on existing data.
- Mitigation: additive migration + migration tests.
- Search noise (low precision).
- Mitigation: score threshold + max result cap.
## Commit Message
`feat(session): add history indexing and topic search metadata`
@@ -0,0 +1,82 @@
# Phase 3 PR #1 Checklist: Adaptive Memory Injection and Compaction Weighting
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Improve memory usefulness and compaction quality by injecting relevant memory context adaptively and preserving high-value turns during compaction.
## Scope
In scope:
- adaptive memory relevance scoring
- configurable memory injection strategy
- compaction weighting for important turns (tool outcomes, corrections, preferences)
- tests and perf guardrails
Out of scope:
- new vector backend work
- UI controls for weighting
## Files
New:
- `src/memory/adaptive.ts`
- `src/memory/adaptive.test.ts`
- `src/context/weighting.ts`
- `src/context/weighting.test.ts`
Modified:
- `src/backends/native/orchestrator.ts`
- `src/context/compaction.ts`
- `src/memory/store.ts`
- `src/config/schema.ts`
## Implementation Steps
1. Add adaptive memory scorer:
- keyword overlap with recent turns
- recency weighting
- token budget clipping
2. Add config flags:
- `memory.injection_strategy` (`all|recent|adaptive`)
- `memory.max_injection_tokens`
- `compaction.importance_threshold`
3. Integrate adaptive injector in orchestrator memory injection path.
4. Add compaction message weighting and selection algorithm.
5. Ensure fallback to existing behavior on errors/timeouts.
6. Add tests for relevance selection and weighted compaction ordering.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/memory/adaptive.test.ts
pnpm test:run src/context/weighting.test.ts
pnpm test:run src/context/compaction.test.ts
pnpm test:run src/backends/native/orchestrator.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Adaptive mode injects fewer but more relevant memory snippets.
- Compaction preserves high-importance turns over low-value chatter.
- Token budgets are respected consistently.
- Existing behavior preserved when adaptive features disabled.
- No measurable latency regression beyond agreed budget.
## Risks
- Relevance scoring false positives/negatives.
- Mitigation: conservative scoring + extensive fixtures.
- Added latency in hot path.
- Mitigation: bounded scoring time and fallback mode.
## Commit Message
`feat(memory): add adaptive injection and weighted compaction`
@@ -0,0 +1,80 @@
# Phase 3 PR #2 Checklist: Truthfulness, Policy, and Autonomy Hardening
Created: 2026-02-12
Owner: Flynn core
Status: ready to implement
## Goal
Make truthfulness and autonomy constraints enforceable at runtime (not prompt-only), with auditable policy decisions.
## Scope
In scope:
- truthfulness guardrail injection in prompt assembly/agent startup
- autonomy-aware hook/tool execution policy
- tool-denial audit improvements
- tests for enforcement paths
Out of scope:
- external fact-check APIs
- new UI policy management features
## Files
New:
- `src/backends/native/guardrails.ts`
- `src/backends/native/guardrails.test.ts`
- `src/hooks/autonomy.ts`
- `src/hooks/autonomy.test.ts`
Modified:
- `src/backends/native/agent.ts`
- `src/backends/native/orchestrator.ts`
- `src/prompt/template.ts`
- `src/tools/executor.ts`
- `src/hooks/engine.ts`
- `src/config/schema.ts`
## Implementation Steps
1. Add guardrail utility module for truthfulness guidance modes.
2. Add config:
- `agents.truthfulness_mode` (`strict|standard|relaxed`)
- `agents.autonomy_level` (`conservative|standard|autonomous`)
3. Inject truthfulness policy section in effective system prompt.
4. Add autonomy resolution layer in hook/tool executor path.
5. Ensure denied/overridden actions are audit-logged with reason.
6. Add tests for tool action resolution matrix and guardrail prompt output.
## Validation Commands
```bash
pnpm typecheck
pnpm test:run src/backends/native/guardrails.test.ts
pnpm test:run src/hooks/autonomy.test.ts
pnpm test:run src/tools/executor.test.ts
pnpm test:run src/hooks/engine.test.ts
pnpm test:run
pnpm lint
pnpm build
```
## Acceptance Criteria
- Truthfulness policy appears in assembled system prompt.
- Autonomy level changes effective tool action behavior as configured.
- Dangerous tools still require confirmation in conservative/standard modes.
- Policy denials/overrides produce audit records with clear reasons.
- Feature is backward compatible with safe defaults.
## Risks
- Overly strict guardrails reduce task completion.
- Mitigation: default `standard` mode with configurable strictness.
- Autonomy matrix conflicts with existing hooks.
- Mitigation: deterministic precedence rules + tests.
## Commit Message
`feat(policy): enforce truthfulness guardrails and autonomy-aware tool controls`
@@ -0,0 +1,50 @@
# Remaining Phases Rollout Quality Gates
Created: 2026-02-12
Applies to: Phase 2 and Phase 3 implementation PRs
## Regression Safety
- All existing tests pass before/after each PR.
- No public interface break in core abstractions (ModelClient, ChannelAdapter, Tool).
- Existing configs load without migration failures.
- Existing command flows continue working unchanged.
## Routing Determinism
- Sender/channel/default route resolution remains deterministic.
- Intent and confidence routing produce stable decisions for same input.
- Unknown/low-confidence inputs always fall back to normal orchestrator path.
- Routing decisions are logged for debugging.
## Memory Correctness
- Legacy and category memory reads/writes remain compatible.
- Search results respect namespace/category filters.
- Prompt memory injection remains token-bounded.
- Compaction preserves high-value turns under weighted selection.
## Latency Budgets
- New routing checks add negligible overhead for non-command messages.
- Memory relevance scoring remains bounded (time + token budgets).
- History search defaults are capped (`limit`, thresholds) to avoid spikes.
## Policy Enforcement Verifiability
- Tool allow/deny and autonomy decisions are test-covered.
- Denied/overridden actions are audit-logged with explicit reason.
- Confirmation behavior is deterministic by mode.
## Pre-Merge Checks Per PR
```bash
pnpm typecheck
pnpm test:run
pnpm lint
pnpm build
```
- Add targeted test commands from each PR checklist.
- Update `docs/plans/state.json` with progress and test counts when implementation lands.
- Keep commits atomic and scoped to one checklist PR at a time.