Files
flynn/docs/plans/2026-02-25-pi-personal-assistant-memory-design.md
William Valentin cc70c3e524 docs(design): Pi-inspired personal assistant memory design
Two-tier memory model (working memory + long-term store) with a unified
user namespace across all channels. Addresses four gaps: cross-session
forgetting, compaction context loss, no proactive recall, and channel
fragmentation.

Key design decisions:
- user/working namespace written on every compaction (TTL-based expiry)
- user/profile + user/patterns as shared identity across channels
- Session-start injection before first turn (one-time, idempotent)
- Opt-in via memory.user_namespace config; default is unchanged behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 12:23:49 -08:00

8.9 KiB

Pi-Inspired Personal Assistant Memory Design

Date: 2026-02-25 Status: approved Inspired by: badlogic/pi-mono Scope: Flynn-native implementation — no dependency on pi-agent-core

Problem

Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant:

  1. Forgets across sessions — memory extraction runs but is unreliable; facts don't survive compaction consistently.
  2. Clunky compaction — compaction summaries are generic and discarded after context trimming; important personal context is lost.
  3. No proactive recallbuildAdaptiveMemoryContext exists but only scores by keyword overlap with the current message, never surfaces context unprompted.
  4. Fragmented across channels — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you."

Design Goals

  • Pick up where the last conversation left off, across any channel
  • Never lose recent context to compaction
  • Stable facts (preferences, patterns) persist indefinitely
  • All behavior gated behind config; default is current behavior (opt-in)

Non-Goals (this phase)

  • Multi-user deployments with per-user auth
  • Proactive mid-session memory surfacing (beyond session start)
  • Full vector/semantic replacement of adaptive injection

Architecture

Two-tier memory structure added to the orchestrator:

Long-term store (existing)          Working memory (new)
  memory/user/profile      ←→         memory/user/working
  memory/user/patterns                  (TTL: ~14 days)
  memory/sessions/...                   (replaced per compaction)
        ↓                                      ↓
   injected via                         injected wholesale
   adaptive scoring                     at session start
   (keyword/vector match)               (always present if fresh)

Long-term store — existing MemoryStore namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn.

Working memory — a new user/working namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14).

Unified user namespace — a canonical user/* tree shared across all channels, replacing today's session-scoped isolation.


Section 1: Unified User Namespace

Namespace Layout

memory/
  user/
    profile      ← stable facts: name, timezone, role, preferences
    patterns     ← recurring behaviors: working style, recurring topics
    working      ← rolling compaction summary (TTL-based)
  sessions/
    telegram:123/...    ← session-specific (unchanged, existing behavior)
    ws:abc/...

Identity Model

A single memory.user_namespace config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged.

This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope.

Config

memory:
  user_namespace: "user"   # enables shared identity; absent = session-scoped (current)

Extraction Routing

When user_namespace is set:

  • Stable extracted facts → user/profile, user/patterns
  • Compaction summary → user/working
  • Session-specific context → sessions/<id>/... (unchanged)

Section 2: Working Memory Layer

Storage

user/working namespace in the existing MemoryStore (flat file, no new storage engine). File format:

# Working Memory
Updated: 2026-02-25T11:30:00Z
Expires: 2026-03-10T11:30:00Z

[compaction summary content]

Lifecycle

Event Action
Compaction runs Write summary to user/working, replacing previous content
Session starts Read user/working; inject if Expires is in the future
Expires in the past File is ignored; overwritten on next compaction
Memory store not configured Entire feature is a no-op

No background cleanup job required — expiry is checked lazily on read.

Size Budget

Capped at working_memory_max_tokens (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable.

Config

memory:
  working_memory_ttl_days: 14      # expiry window; default 14
  working_memory_max_tokens: 1000  # injection size cap; default 1000

Section 3: Compaction → Working Memory Flow

Current Flow

history exceeds threshold
  → compactHistory() produces summary string
  → summary replaces trimmed messages in session history
  → summary string discarded

New Flow

history exceeds threshold
  → compactHistory() produces summary string
  → summary replaces trimmed messages in session history
  → summary written to user/working (replaces previous)
  → memory extraction writes facts to user/profile + user/patterns

Compaction Prompt Change

Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures:

  • What the user was working on and current status
  • Decisions made and their outcomes
  • Preferences or constraints the user expressed
  • Open threads and follow-up items

This makes user/working genuinely useful as a "picking up where we left off" snapshot rather than a generic recap.

The improved prompt is only used when user_namespace is set. Existing generic compaction is unchanged otherwise.


Section 4: Session Start Injection

Injection Point

A new one-time _injectSessionContext() call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn _injectMemoryContext().

Injection Order in System Prompt

[base system prompt — SOUL.md / IDENTITY.md / etc.]

--- Who you're talking to ---
[user/profile content]          ← always injected if present

--- Recent context ---
[user/working content]          ← injected if not expired

[adaptive per-turn memory injection — unchanged, runs every turn]

Idempotency

Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject.

Graceful Degradation

Condition Behavior
No user/profile file Skip block silently
user/working expired Skip block, log at debug level
Memory store not configured Entire feature no-ops
user_namespace not set Current behavior, unchanged

Optional Proactive Greeting

When proactive_session_greeting: true, include a system instruction on the first turn:

"If relevant, briefly acknowledge what the user was last working on before responding to their first message."

Off by default. Gives the Pi-like "picking up the thread" feel when enabled.

memory:
  proactive_session_greeting: false   # default off

Section 5: Cross-Channel Identity

No per-channel plumbing needed. All channels share the orchestrator config. When user_namespace is set, every channel reads/writes user/* automatically.

First message on a new channel — if the user switches from web UI to Telegram, user/working from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior.


File-Level Change Summary

File Change
src/memory/workingMemory.ts New — read/write/expiry logic for user/working
src/memory/store.ts Add writeWithMetadata() supporting timestamped/expiry headers
src/context/compaction.ts Add personal-assistant-focused compaction prompt option
src/backends/native/orchestrator.ts Session-start injection + write working memory after compaction
src/config/schema.ts New fields: user_namespace, working_memory_ttl_days, working_memory_max_tokens, proactive_session_greeting
src/daemon/index.ts Pass user namespace config through to orchestrator

Config Reference (full)

memory:
  # Shared identity namespace. When set, all channels share user/* memory.
  # Absent (default) = current session-scoped behavior, unchanged.
  user_namespace: "user"

  # How long working memory stays valid after the last compaction.
  working_memory_ttl_days: 14

  # Token budget for working memory injection at session start.
  working_memory_max_tokens: 1000

  # If true, instruct the model to acknowledge prior context on session start.
  proactive_session_greeting: false

Success Criteria

  1. Working memory survives a daemon restart and is injected on next session start.
  2. Switching channels (e.g. Telegram → web UI) injects the same user/working content.
  3. user_namespace absent = zero behavior change vs today (regression-safe).
  4. Compaction with user_namespace set writes to user/working on every run.
  5. Expired working memory is silently ignored without error.