From cc70c3e5248e3e1ec8f50c445b2505533ab3bff4 Mon Sep 17 00:00:00 2001 From: William Valentin Date: Wed, 25 Feb 2026 12:23:49 -0800 Subject: [PATCH] docs(design): Pi-inspired personal assistant memory design Two-tier memory model (working memory + long-term store) with a unified user namespace across all channels. Addresses four gaps: cross-session forgetting, compaction context loss, no proactive recall, and channel fragmentation. Key design decisions: - user/working namespace written on every compaction (TTL-based expiry) - user/profile + user/patterns as shared identity across channels - Session-start injection before first turn (one-time, idempotent) - Opt-in via memory.user_namespace config; default is unchanged behavior Co-Authored-By: Claude Sonnet 4.6 --- ...-25-pi-personal-assistant-memory-design.md | 262 ++++++++++++++++++ 1 file changed, 262 insertions(+) create mode 100644 docs/plans/2026-02-25-pi-personal-assistant-memory-design.md diff --git a/docs/plans/2026-02-25-pi-personal-assistant-memory-design.md b/docs/plans/2026-02-25-pi-personal-assistant-memory-design.md new file mode 100644 index 0000000..8ac26d9 --- /dev/null +++ b/docs/plans/2026-02-25-pi-personal-assistant-memory-design.md @@ -0,0 +1,262 @@ +# Pi-Inspired Personal Assistant Memory Design + +Date: 2026-02-25 +Status: approved +Inspired by: [badlogic/pi-mono](https://github.com/badlogic/pi-mono) +Scope: Flynn-native implementation — no dependency on pi-agent-core + +## Problem + +Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant: + +1. **Forgets across sessions** — memory extraction runs but is unreliable; facts don't survive compaction consistently. +2. **Clunky compaction** — compaction summaries are generic and discarded after context trimming; important personal context is lost. +3. **No proactive recall** — `buildAdaptiveMemoryContext` exists but only scores by keyword overlap with the current message, never surfaces context unprompted. +4. **Fragmented across channels** — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you." + +## Design Goals + +- Pick up where the last conversation left off, across any channel +- Never lose recent context to compaction +- Stable facts (preferences, patterns) persist indefinitely +- All behavior gated behind config; default is current behavior (opt-in) + +## Non-Goals (this phase) + +- Multi-user deployments with per-user auth +- Proactive mid-session memory surfacing (beyond session start) +- Full vector/semantic replacement of adaptive injection + +--- + +## Architecture + +Two-tier memory structure added to the orchestrator: + +``` +Long-term store (existing) Working memory (new) + memory/user/profile ←→ memory/user/working + memory/user/patterns (TTL: ~14 days) + memory/sessions/... (replaced per compaction) + ↓ ↓ + injected via injected wholesale + adaptive scoring at session start + (keyword/vector match) (always present if fresh) +``` + +**Long-term store** — existing `MemoryStore` namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn. + +**Working memory** — a new `user/working` namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14). + +**Unified user namespace** — a canonical `user/*` tree shared across all channels, replacing today's session-scoped isolation. + +--- + +## Section 1: Unified User Namespace + +### Namespace Layout + +``` +memory/ + user/ + profile ← stable facts: name, timezone, role, preferences + patterns ← recurring behaviors: working style, recurring topics + working ← rolling compaction summary (TTL-based) + sessions/ + telegram:123/... ← session-specific (unchanged, existing behavior) + ws:abc/... +``` + +### Identity Model + +A single `memory.user_namespace` config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged. + +This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope. + +### Config + +```yaml +memory: + user_namespace: "user" # enables shared identity; absent = session-scoped (current) +``` + +### Extraction Routing + +When `user_namespace` is set: +- Stable extracted facts → `user/profile`, `user/patterns` +- Compaction summary → `user/working` +- Session-specific context → `sessions//...` (unchanged) + +--- + +## Section 2: Working Memory Layer + +### Storage + +`user/working` namespace in the existing `MemoryStore` (flat file, no new storage engine). File format: + +``` +# Working Memory +Updated: 2026-02-25T11:30:00Z +Expires: 2026-03-10T11:30:00Z + +[compaction summary content] +``` + +### Lifecycle + +| Event | Action | +|---|---| +| Compaction runs | Write summary to `user/working`, replacing previous content | +| Session starts | Read `user/working`; inject if `Expires` is in the future | +| `Expires` in the past | File is ignored; overwritten on next compaction | +| Memory store not configured | Entire feature is a no-op | + +No background cleanup job required — expiry is checked lazily on read. + +### Size Budget + +Capped at `working_memory_max_tokens` (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable. + +### Config + +```yaml +memory: + working_memory_ttl_days: 14 # expiry window; default 14 + working_memory_max_tokens: 1000 # injection size cap; default 1000 +``` + +--- + +## Section 3: Compaction → Working Memory Flow + +### Current Flow + +``` +history exceeds threshold + → compactHistory() produces summary string + → summary replaces trimmed messages in session history + → summary string discarded +``` + +### New Flow + +``` +history exceeds threshold + → compactHistory() produces summary string + → summary replaces trimmed messages in session history + → summary written to user/working (replaces previous) + → memory extraction writes facts to user/profile + user/patterns +``` + +### Compaction Prompt Change + +Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures: + +- What the user was working on and current status +- Decisions made and their outcomes +- Preferences or constraints the user expressed +- Open threads and follow-up items + +This makes `user/working` genuinely useful as a "picking up where we left off" snapshot rather than a generic recap. + +The improved prompt is only used when `user_namespace` is set. Existing generic compaction is unchanged otherwise. + +--- + +## Section 4: Session Start Injection + +### Injection Point + +A new one-time `_injectSessionContext()` call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn `_injectMemoryContext()`. + +### Injection Order in System Prompt + +``` +[base system prompt — SOUL.md / IDENTITY.md / etc.] + +--- Who you're talking to --- +[user/profile content] ← always injected if present + +--- Recent context --- +[user/working content] ← injected if not expired + +[adaptive per-turn memory injection — unchanged, runs every turn] +``` + +### Idempotency + +Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject. + +### Graceful Degradation + +| Condition | Behavior | +|---|---| +| No `user/profile` file | Skip block silently | +| `user/working` expired | Skip block, log at debug level | +| Memory store not configured | Entire feature no-ops | +| `user_namespace` not set | Current behavior, unchanged | + +### Optional Proactive Greeting + +When `proactive_session_greeting: true`, include a system instruction on the first turn: + +> "If relevant, briefly acknowledge what the user was last working on before responding to their first message." + +Off by default. Gives the Pi-like "picking up the thread" feel when enabled. + +```yaml +memory: + proactive_session_greeting: false # default off +``` + +--- + +## Section 5: Cross-Channel Identity + +No per-channel plumbing needed. All channels share the orchestrator config. When `user_namespace` is set, every channel reads/writes `user/*` automatically. + +**First message on a new channel** — if the user switches from web UI to Telegram, `user/working` from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior. + +--- + +## File-Level Change Summary + +| File | Change | +|---|---| +| `src/memory/workingMemory.ts` | **New** — read/write/expiry logic for `user/working` | +| `src/memory/store.ts` | Add `writeWithMetadata()` supporting timestamped/expiry headers | +| `src/context/compaction.ts` | Add personal-assistant-focused compaction prompt option | +| `src/backends/native/orchestrator.ts` | Session-start injection + write working memory after compaction | +| `src/config/schema.ts` | New fields: `user_namespace`, `working_memory_ttl_days`, `working_memory_max_tokens`, `proactive_session_greeting` | +| `src/daemon/index.ts` | Pass user namespace config through to orchestrator | + +--- + +## Config Reference (full) + +```yaml +memory: + # Shared identity namespace. When set, all channels share user/* memory. + # Absent (default) = current session-scoped behavior, unchanged. + user_namespace: "user" + + # How long working memory stays valid after the last compaction. + working_memory_ttl_days: 14 + + # Token budget for working memory injection at session start. + working_memory_max_tokens: 1000 + + # If true, instruct the model to acknowledge prior context on session start. + proactive_session_greeting: false +``` + +--- + +## Success Criteria + +1. Working memory survives a daemon restart and is injected on next session start. +2. Switching channels (e.g. Telegram → web UI) injects the same `user/working` content. +3. `user_namespace` absent = zero behavior change vs today (regression-safe). +4. Compaction with `user_namespace` set writes to `user/working` on every run. +5. Expired working memory is silently ignored without error.