docs(design): Pi-inspired personal assistant memory design

Two-tier memory model (working memory + long-term store) with a unified user namespace across all channels. Addresses four gaps: cross-session forgetting, compaction context loss, no proactive recall, and channel fragmentation. Key design decisions: - user/working namespace written on every compaction (TTL-based expiry) - user/profile + user/patterns as shared identity across channels - Session-start injection before first turn (one-time, idempotent) - Opt-in via memory.user_namespace config; default is unchanged behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 12:23:49 -08:00
parent b5b4cb0a84
commit cc70c3e524
1 changed files with 262 additions and 0 deletions
@@ -0,0 +1,262 @@
+# Pi-Inspired Personal Assistant Memory Design
+
+Date: 2026-02-25
+Status: approved
+Inspired by: [badlogic/pi-mono](https://github.com/badlogic/pi-mono)
+Scope: Flynn-native implementation — no dependency on pi-agent-core
+
+## Problem
+
+Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant:
+
+1. **Forgets across sessions** — memory extraction runs but is unreliable; facts don't survive compaction consistently.
+2. **Clunky compaction** — compaction summaries are generic and discarded after context trimming; important personal context is lost.
+3. **No proactive recall** — `buildAdaptiveMemoryContext` exists but only scores by keyword overlap with the current message, never surfaces context unprompted.
+4. **Fragmented across channels** — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you."
+
+## Design Goals
+
+- Pick up where the last conversation left off, across any channel
+- Never lose recent context to compaction
+- Stable facts (preferences, patterns) persist indefinitely
+- All behavior gated behind config; default is current behavior (opt-in)
+
+## Non-Goals (this phase)
+
+- Multi-user deployments with per-user auth
+- Proactive mid-session memory surfacing (beyond session start)
+- Full vector/semantic replacement of adaptive injection
+
+---
+
+## Architecture
+
+Two-tier memory structure added to the orchestrator:
+
+```
+Long-term store (existing)          Working memory (new)
+  memory/user/profile      ←→         memory/user/working
+  memory/user/patterns                  (TTL: ~14 days)
+  memory/sessions/...                   (replaced per compaction)
+        ↓                                      ↓
+   injected via                         injected wholesale
+   adaptive scoring                     at session start
+   (keyword/vector match)               (always present if fresh)
+```
+
+**Long-term store** — existing `MemoryStore` namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn.
+
+**Working memory** — a new `user/working` namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14).
+
+**Unified user namespace** — a canonical `user/*` tree shared across all channels, replacing today's session-scoped isolation.
+
+---
+
+## Section 1: Unified User Namespace
+
+### Namespace Layout
+
+```
+memory/
+  user/
+    profile      ← stable facts: name, timezone, role, preferences
+    patterns     ← recurring behaviors: working style, recurring topics
+    working      ← rolling compaction summary (TTL-based)
+  sessions/
+    telegram:123/...    ← session-specific (unchanged, existing behavior)
+    ws:abc/...
+```
+
+### Identity Model
+
+A single `memory.user_namespace` config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged.
+
+This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope.
+
+### Config
+
+```yaml
+memory:
+  user_namespace: "user"   # enables shared identity; absent = session-scoped (current)
+```
+
+### Extraction Routing
+
+When `user_namespace` is set:
+- Stable extracted facts → `user/profile`, `user/patterns`
+- Compaction summary → `user/working`
+- Session-specific context → `sessions/<id>/...` (unchanged)
+
+---
+
+## Section 2: Working Memory Layer
+
+### Storage
+
+`user/working` namespace in the existing `MemoryStore` (flat file, no new storage engine). File format:
+
+```
+# Working Memory
+Updated: 2026-02-25T11:30:00Z
+Expires: 2026-03-10T11:30:00Z
+
+[compaction summary content]
+```
+
+### Lifecycle
+
+| Event | Action |
+|---|---|
+| Compaction runs | Write summary to `user/working`, replacing previous content |
+| Session starts | Read `user/working`; inject if `Expires` is in the future |
+| `Expires` in the past | File is ignored; overwritten on next compaction |
+| Memory store not configured | Entire feature is a no-op |
+
+No background cleanup job required — expiry is checked lazily on read.
+
+### Size Budget
+
+Capped at `working_memory_max_tokens` (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable.
+
+### Config
+
+```yaml
+memory:
+  working_memory_ttl_days: 14      # expiry window; default 14
+  working_memory_max_tokens: 1000  # injection size cap; default 1000
+```
+
+---
+
+## Section 3: Compaction → Working Memory Flow
+
+### Current Flow
+
+```
+history exceeds threshold
+  → compactHistory() produces summary string
+  → summary replaces trimmed messages in session history
+  → summary string discarded
+```
+
+### New Flow
+
+```
+history exceeds threshold
+  → compactHistory() produces summary string
+  → summary replaces trimmed messages in session history
+  → summary written to user/working (replaces previous)
+  → memory extraction writes facts to user/profile + user/patterns
+```
+
+### Compaction Prompt Change
+
+Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures:
+
+- What the user was working on and current status
+- Decisions made and their outcomes
+- Preferences or constraints the user expressed
+- Open threads and follow-up items
+
+This makes `user/working` genuinely useful as a "picking up where we left off" snapshot rather than a generic recap.
+
+The improved prompt is only used when `user_namespace` is set. Existing generic compaction is unchanged otherwise.
+
+---
+
+## Section 4: Session Start Injection
+
+### Injection Point
+
+A new one-time `_injectSessionContext()` call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn `_injectMemoryContext()`.
+
+### Injection Order in System Prompt
+
+```
+[base system prompt — SOUL.md / IDENTITY.md / etc.]
+
+--- Who you're talking to ---
+[user/profile content]          ← always injected if present
+
+--- Recent context ---
+[user/working content]          ← injected if not expired
+
+[adaptive per-turn memory injection — unchanged, runs every turn]
+```
+
+### Idempotency
+
+Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject.
+
+### Graceful Degradation
+
+| Condition | Behavior |
+|---|---|
+| No `user/profile` file | Skip block silently |
+| `user/working` expired | Skip block, log at debug level |
+| Memory store not configured | Entire feature no-ops |
+| `user_namespace` not set | Current behavior, unchanged |
+
+### Optional Proactive Greeting
+
+When `proactive_session_greeting: true`, include a system instruction on the first turn:
+
+> "If relevant, briefly acknowledge what the user was last working on before responding to their first message."
+
+Off by default. Gives the Pi-like "picking up the thread" feel when enabled.
+
+```yaml
+memory:
+  proactive_session_greeting: false   # default off
+```
+
+---
+
+## Section 5: Cross-Channel Identity
+
+No per-channel plumbing needed. All channels share the orchestrator config. When `user_namespace` is set, every channel reads/writes `user/*` automatically.
+
+**First message on a new channel** — if the user switches from web UI to Telegram, `user/working` from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior.
+
+---
+
+## File-Level Change Summary
+
+| File | Change |
+|---|---|
+| `src/memory/workingMemory.ts` | **New** — read/write/expiry logic for `user/working` |
+| `src/memory/store.ts` | Add `writeWithMetadata()` supporting timestamped/expiry headers |
+| `src/context/compaction.ts` | Add personal-assistant-focused compaction prompt option |
+| `src/backends/native/orchestrator.ts` | Session-start injection + write working memory after compaction |
+| `src/config/schema.ts` | New fields: `user_namespace`, `working_memory_ttl_days`, `working_memory_max_tokens`, `proactive_session_greeting` |
+| `src/daemon/index.ts` | Pass user namespace config through to orchestrator |
+
+---
+
+## Config Reference (full)
+
+```yaml
+memory:
+  # Shared identity namespace. When set, all channels share user/* memory.
+  # Absent (default) = current session-scoped behavior, unchanged.
+  user_namespace: "user"
+
+  # How long working memory stays valid after the last compaction.
+  working_memory_ttl_days: 14
+
+  # Token budget for working memory injection at session start.
+  working_memory_max_tokens: 1000
+
+  # If true, instruct the model to acknowledge prior context on session start.
+  proactive_session_greeting: false
+```
+
+---
+
+## Success Criteria
+
+1. Working memory survives a daemon restart and is injected on next session start.
+2. Switching channels (e.g. Telegram → web UI) injects the same `user/working` content.
+3. `user_namespace` absent = zero behavior change vs today (regression-safe).
+4. Compaction with `user_namespace` set writes to `user/working` on every run.
+5. Expired working memory is silently ignored without error.