will/flynn

Files

T

William Valentin cc70c3e524 docs(design): Pi-inspired personal assistant memory design

Two-tier memory model (working memory + long-term store) with a unified
user namespace across all channels. Addresses four gaps: cross-session
forgetting, compaction context loss, no proactive recall, and channel
fragmentation.

Key design decisions:
- user/working namespace written on every compaction (TTL-based expiry)
- user/profile + user/patterns as shared identity across channels
- Session-start injection before first turn (one-time, idempotent)
- Opt-in via memory.user_namespace config; default is unchanged behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-25 12:23:49 -08:00

8.9 KiB

Raw Permalink Blame History

Pi-Inspired Personal Assistant Memory Design

Date: 2026-02-25 Status: approved Inspired by: badlogic/pi-mono Scope: Flynn-native implementation — no dependency on pi-agent-core

Problem

Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant:

Forgets across sessions — memory extraction runs but is unreliable; facts don't survive compaction consistently.
Clunky compaction — compaction summaries are generic and discarded after context trimming; important personal context is lost.
No proactive recall — buildAdaptiveMemoryContext exists but only scores by keyword overlap with the current message, never surfaces context unprompted.
Fragmented across channels — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you."

Design Goals

Pick up where the last conversation left off, across any channel
Never lose recent context to compaction
Stable facts (preferences, patterns) persist indefinitely
All behavior gated behind config; default is current behavior (opt-in)

Non-Goals (this phase)

Multi-user deployments with per-user auth
Proactive mid-session memory surfacing (beyond session start)
Full vector/semantic replacement of adaptive injection

Architecture

Two-tier memory structure added to the orchestrator:

Long-term store (existing)          Working memory (new)
  memory/user/profile      ←→         memory/user/working
  memory/user/patterns                  (TTL: ~14 days)
  memory/sessions/...                   (replaced per compaction)
        ↓                                      ↓
   injected via                         injected wholesale
   adaptive scoring                     at session start
   (keyword/vector match)               (always present if fresh)

Long-term store — existing MemoryStore namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn.

Working memory — a new user/working namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14).

Unified user namespace — a canonical user/* tree shared across all channels, replacing today's session-scoped isolation.

Section 1: Unified User Namespace

Namespace Layout

memory/
  user/
    profile      ← stable facts: name, timezone, role, preferences
    patterns     ← recurring behaviors: working style, recurring topics
    working      ← rolling compaction summary (TTL-based)
  sessions/
    telegram:123/...    ← session-specific (unchanged, existing behavior)
    ws:abc/...

Identity Model

A single memory.user_namespace config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged.

This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope.

Config

memory:
  user_namespace: "user"   # enables shared identity; absent = session-scoped (current)

Extraction Routing

When user_namespace is set:

Stable extracted facts → user/profile, user/patterns
Compaction summary → user/working
Session-specific context → sessions/<id>/... (unchanged)

Section 2: Working Memory Layer

Storage

user/working namespace in the existing MemoryStore (flat file, no new storage engine). File format:

# Working Memory
Updated: 2026-02-25T11:30:00Z
Expires: 2026-03-10T11:30:00Z

[compaction summary content]

Lifecycle

Event	Action
Compaction runs	Write summary to `user/working`, replacing previous content
Session starts	Read `user/working`; inject if `Expires` is in the future
`Expires` in the past	File is ignored; overwritten on next compaction
Memory store not configured	Entire feature is a no-op

No background cleanup job required — expiry is checked lazily on read.

Size Budget

Capped at working_memory_max_tokens (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable.

Config

memory:
  working_memory_ttl_days: 14      # expiry window; default 14
  working_memory_max_tokens: 1000  # injection size cap; default 1000

Section 3: Compaction → Working Memory Flow

Current Flow

history exceeds threshold
  → compactHistory() produces summary string
  → summary replaces trimmed messages in session history
  → summary string discarded

New Flow

history exceeds threshold
  → compactHistory() produces summary string
  → summary replaces trimmed messages in session history
  → summary written to user/working (replaces previous)
  → memory extraction writes facts to user/profile + user/patterns

Compaction Prompt Change

Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures:

What the user was working on and current status
Decisions made and their outcomes
Preferences or constraints the user expressed
Open threads and follow-up items

This makes user/working genuinely useful as a "picking up where we left off" snapshot rather than a generic recap.

The improved prompt is only used when user_namespace is set. Existing generic compaction is unchanged otherwise.

Section 4: Session Start Injection

Injection Point

A new one-time _injectSessionContext() call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn _injectMemoryContext().

Injection Order in System Prompt

[base system prompt — SOUL.md / IDENTITY.md / etc.]

--- Who you're talking to ---
[user/profile content]          ← always injected if present

--- Recent context ---
[user/working content]          ← injected if not expired

[adaptive per-turn memory injection — unchanged, runs every turn]

Idempotency

Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject.

Graceful Degradation

Condition	Behavior
No `user/profile` file	Skip block silently
`user/working` expired	Skip block, log at debug level
Memory store not configured	Entire feature no-ops
`user_namespace` not set	Current behavior, unchanged

Optional Proactive Greeting

When proactive_session_greeting: true, include a system instruction on the first turn:

"If relevant, briefly acknowledge what the user was last working on before responding to their first message."

Off by default. Gives the Pi-like "picking up the thread" feel when enabled.

memory:
  proactive_session_greeting: false   # default off

Section 5: Cross-Channel Identity

No per-channel plumbing needed. All channels share the orchestrator config. When user_namespace is set, every channel reads/writes user/* automatically.

First message on a new channel — if the user switches from web UI to Telegram, user/working from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior.

File-Level Change Summary

File	Change
`src/memory/workingMemory.ts`	New — read/write/expiry logic for `user/working`
`src/memory/store.ts`	Add `writeWithMetadata()` supporting timestamped/expiry headers
`src/context/compaction.ts`	Add personal-assistant-focused compaction prompt option
`src/backends/native/orchestrator.ts`	Session-start injection + write working memory after compaction
`src/config/schema.ts`	New fields: `user_namespace`, `working_memory_ttl_days`, `working_memory_max_tokens`, `proactive_session_greeting`
`src/daemon/index.ts`	Pass user namespace config through to orchestrator

Config Reference (full)

memory:
  # Shared identity namespace. When set, all channels share user/* memory.
  # Absent (default) = current session-scoped behavior, unchanged.
  user_namespace: "user"

  # How long working memory stays valid after the last compaction.
  working_memory_ttl_days: 14

  # Token budget for working memory injection at session start.
  working_memory_max_tokens: 1000

  # If true, instruct the model to acknowledge prior context on session start.
  proactive_session_greeting: false

Success Criteria

Working memory survives a daemon restart and is injected on next session start.
Switching channels (e.g. Telegram → web UI) injects the same user/working content.
user_namespace absent = zero behavior change vs today (regression-safe).
Compaction with user_namespace set writes to user/working on every run.
Expired working memory is silently ignored without error.

8.9 KiB Raw Permalink Blame History

Pi-Inspired Personal Assistant Memory Design

Problem

Design Goals

Non-Goals (this phase)

Architecture

Section 1: Unified User Namespace

Namespace Layout

Identity Model

Config

Extraction Routing

Section 2: Working Memory Layer

Storage

Lifecycle

Size Budget

Config

Section 3: Compaction → Working Memory Flow

Current Flow

New Flow

Compaction Prompt Change

Section 4: Session Start Injection

Injection Point

Injection Order in System Prompt

Idempotency

Graceful Degradation

Optional Proactive Greeting

Section 5: Cross-Channel Identity

File-Level Change Summary

Config Reference (full)

Success Criteria

8.9 KiB

Raw Permalink Blame History