Two-tier memory model (working memory + long-term store) with a unified user namespace across all channels. Addresses four gaps: cross-session forgetting, compaction context loss, no proactive recall, and channel fragmentation. Key design decisions: - user/working namespace written on every compaction (TTL-based expiry) - user/profile + user/patterns as shared identity across channels - Session-start injection before first turn (one-time, idempotent) - Opt-in via memory.user_namespace config; default is unchanged behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.9 KiB
Pi-Inspired Personal Assistant Memory Design
Date: 2026-02-25 Status: approved Inspired by: badlogic/pi-mono Scope: Flynn-native implementation — no dependency on pi-agent-core
Problem
Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant:
- Forgets across sessions — memory extraction runs but is unreliable; facts don't survive compaction consistently.
- Clunky compaction — compaction summaries are generic and discarded after context trimming; important personal context is lost.
- No proactive recall —
buildAdaptiveMemoryContextexists but only scores by keyword overlap with the current message, never surfaces context unprompted. - Fragmented across channels — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you."
Design Goals
- Pick up where the last conversation left off, across any channel
- Never lose recent context to compaction
- Stable facts (preferences, patterns) persist indefinitely
- All behavior gated behind config; default is current behavior (opt-in)
Non-Goals (this phase)
- Multi-user deployments with per-user auth
- Proactive mid-session memory surfacing (beyond session start)
- Full vector/semantic replacement of adaptive injection
Architecture
Two-tier memory structure added to the orchestrator:
Long-term store (existing) Working memory (new)
memory/user/profile ←→ memory/user/working
memory/user/patterns (TTL: ~14 days)
memory/sessions/... (replaced per compaction)
↓ ↓
injected via injected wholesale
adaptive scoring at session start
(keyword/vector match) (always present if fresh)
Long-term store — existing MemoryStore namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn.
Working memory — a new user/working namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14).
Unified user namespace — a canonical user/* tree shared across all channels, replacing today's session-scoped isolation.
Section 1: Unified User Namespace
Namespace Layout
memory/
user/
profile ← stable facts: name, timezone, role, preferences
patterns ← recurring behaviors: working style, recurring topics
working ← rolling compaction summary (TTL-based)
sessions/
telegram:123/... ← session-specific (unchanged, existing behavior)
ws:abc/...
Identity Model
A single memory.user_namespace config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged.
This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope.
Config
memory:
user_namespace: "user" # enables shared identity; absent = session-scoped (current)
Extraction Routing
When user_namespace is set:
- Stable extracted facts →
user/profile,user/patterns - Compaction summary →
user/working - Session-specific context →
sessions/<id>/...(unchanged)
Section 2: Working Memory Layer
Storage
user/working namespace in the existing MemoryStore (flat file, no new storage engine). File format:
# Working Memory
Updated: 2026-02-25T11:30:00Z
Expires: 2026-03-10T11:30:00Z
[compaction summary content]
Lifecycle
| Event | Action |
|---|---|
| Compaction runs | Write summary to user/working, replacing previous content |
| Session starts | Read user/working; inject if Expires is in the future |
Expires in the past |
File is ignored; overwritten on next compaction |
| Memory store not configured | Entire feature is a no-op |
No background cleanup job required — expiry is checked lazily on read.
Size Budget
Capped at working_memory_max_tokens (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable.
Config
memory:
working_memory_ttl_days: 14 # expiry window; default 14
working_memory_max_tokens: 1000 # injection size cap; default 1000
Section 3: Compaction → Working Memory Flow
Current Flow
history exceeds threshold
→ compactHistory() produces summary string
→ summary replaces trimmed messages in session history
→ summary string discarded
New Flow
history exceeds threshold
→ compactHistory() produces summary string
→ summary replaces trimmed messages in session history
→ summary written to user/working (replaces previous)
→ memory extraction writes facts to user/profile + user/patterns
Compaction Prompt Change
Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures:
- What the user was working on and current status
- Decisions made and their outcomes
- Preferences or constraints the user expressed
- Open threads and follow-up items
This makes user/working genuinely useful as a "picking up where we left off" snapshot rather than a generic recap.
The improved prompt is only used when user_namespace is set. Existing generic compaction is unchanged otherwise.
Section 4: Session Start Injection
Injection Point
A new one-time _injectSessionContext() call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn _injectMemoryContext().
Injection Order in System Prompt
[base system prompt — SOUL.md / IDENTITY.md / etc.]
--- Who you're talking to ---
[user/profile content] ← always injected if present
--- Recent context ---
[user/working content] ← injected if not expired
[adaptive per-turn memory injection — unchanged, runs every turn]
Idempotency
Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject.
Graceful Degradation
| Condition | Behavior |
|---|---|
No user/profile file |
Skip block silently |
user/working expired |
Skip block, log at debug level |
| Memory store not configured | Entire feature no-ops |
user_namespace not set |
Current behavior, unchanged |
Optional Proactive Greeting
When proactive_session_greeting: true, include a system instruction on the first turn:
"If relevant, briefly acknowledge what the user was last working on before responding to their first message."
Off by default. Gives the Pi-like "picking up the thread" feel when enabled.
memory:
proactive_session_greeting: false # default off
Section 5: Cross-Channel Identity
No per-channel plumbing needed. All channels share the orchestrator config. When user_namespace is set, every channel reads/writes user/* automatically.
First message on a new channel — if the user switches from web UI to Telegram, user/working from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior.
File-Level Change Summary
| File | Change |
|---|---|
src/memory/workingMemory.ts |
New — read/write/expiry logic for user/working |
src/memory/store.ts |
Add writeWithMetadata() supporting timestamped/expiry headers |
src/context/compaction.ts |
Add personal-assistant-focused compaction prompt option |
src/backends/native/orchestrator.ts |
Session-start injection + write working memory after compaction |
src/config/schema.ts |
New fields: user_namespace, working_memory_ttl_days, working_memory_max_tokens, proactive_session_greeting |
src/daemon/index.ts |
Pass user namespace config through to orchestrator |
Config Reference (full)
memory:
# Shared identity namespace. When set, all channels share user/* memory.
# Absent (default) = current session-scoped behavior, unchanged.
user_namespace: "user"
# How long working memory stays valid after the last compaction.
working_memory_ttl_days: 14
# Token budget for working memory injection at session start.
working_memory_max_tokens: 1000
# If true, instruct the model to acknowledge prior context on session start.
proactive_session_greeting: false
Success Criteria
- Working memory survives a daemon restart and is injected on next session start.
- Switching channels (e.g. Telegram → web UI) injects the same
user/workingcontent. user_namespaceabsent = zero behavior change vs today (regression-safe).- Compaction with
user_namespaceset writes touser/workingon every run. - Expired working memory is silently ignored without error.