docs(design): Pi-inspired personal assistant memory design

Two-tier memory model (working memory + long-term store) with a unified
user namespace across all channels. Addresses four gaps: cross-session
forgetting, compaction context loss, no proactive recall, and channel
fragmentation.

Key design decisions:
- user/working namespace written on every compaction (TTL-based expiry)
- user/profile + user/patterns as shared identity across channels
- Session-start injection before first turn (one-time, idempotent)
- Opt-in via memory.user_namespace config; default is unchanged behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
William Valentin
2026-02-25 12:23:49 -08:00
parent b5b4cb0a84
commit cc70c3e524
@@ -0,0 +1,262 @@
# Pi-Inspired Personal Assistant Memory Design
Date: 2026-02-25
Status: approved
Inspired by: [badlogic/pi-mono](https://github.com/badlogic/pi-mono)
Scope: Flynn-native implementation — no dependency on pi-agent-core
## Problem
Flynn's memory model has four concrete gaps that make it feel like a generic chatbot rather than a personal assistant:
1. **Forgets across sessions** — memory extraction runs but is unreliable; facts don't survive compaction consistently.
2. **Clunky compaction** — compaction summaries are generic and discarded after context trimming; important personal context is lost.
3. **No proactive recall**`buildAdaptiveMemoryContext` exists but only scores by keyword overlap with the current message, never surfaces context unprompted.
4. **Fragmented across channels** — Telegram, Discord, and the gateway each have isolated sessions with no shared sense of "you."
## Design Goals
- Pick up where the last conversation left off, across any channel
- Never lose recent context to compaction
- Stable facts (preferences, patterns) persist indefinitely
- All behavior gated behind config; default is current behavior (opt-in)
## Non-Goals (this phase)
- Multi-user deployments with per-user auth
- Proactive mid-session memory surfacing (beyond session start)
- Full vector/semantic replacement of adaptive injection
---
## Architecture
Two-tier memory structure added to the orchestrator:
```
Long-term store (existing) Working memory (new)
memory/user/profile ←→ memory/user/working
memory/user/patterns (TTL: ~14 days)
memory/sessions/... (replaced per compaction)
↓ ↓
injected via injected wholesale
adaptive scoring at session start
(keyword/vector match) (always present if fresh)
```
**Long-term store** — existing `MemoryStore` namespaces, unchanged. Stable facts extracted from conversations, searched adaptively per-turn.
**Working memory** — a new `user/working` namespace written on every compaction. Acts as a "what's been happening lately" snapshot. Injected in full at session start. Expires after N days (default 14).
**Unified user namespace** — a canonical `user/*` tree shared across all channels, replacing today's session-scoped isolation.
---
## Section 1: Unified User Namespace
### Namespace Layout
```
memory/
user/
profile ← stable facts: name, timezone, role, preferences
patterns ← recurring behaviors: working style, recurring topics
working ← rolling compaction summary (TTL-based)
sessions/
telegram:123/... ← session-specific (unchanged, existing behavior)
ws:abc/...
```
### Identity Model
A single `memory.user_namespace` config key (default: unset) ties all channels together. All channels on the Flynn instance with this config treat memory as belonging to one person. Unset = current session-scoped behavior, unchanged.
This is appropriate for personal assistant deployments (one person, many surfaces). Multi-user is out of scope.
### Config
```yaml
memory:
user_namespace: "user" # enables shared identity; absent = session-scoped (current)
```
### Extraction Routing
When `user_namespace` is set:
- Stable extracted facts → `user/profile`, `user/patterns`
- Compaction summary → `user/working`
- Session-specific context → `sessions/<id>/...` (unchanged)
---
## Section 2: Working Memory Layer
### Storage
`user/working` namespace in the existing `MemoryStore` (flat file, no new storage engine). File format:
```
# Working Memory
Updated: 2026-02-25T11:30:00Z
Expires: 2026-03-10T11:30:00Z
[compaction summary content]
```
### Lifecycle
| Event | Action |
|---|---|
| Compaction runs | Write summary to `user/working`, replacing previous content |
| Session starts | Read `user/working`; inject if `Expires` is in the future |
| `Expires` in the past | File is ignored; overwritten on next compaction |
| Memory store not configured | Entire feature is a no-op |
No background cleanup job required — expiry is checked lazily on read.
### Size Budget
Capped at `working_memory_max_tokens` (default 1000 tokens). If the compaction summary exceeds the budget it is truncated before writing. Keeps injection overhead predictable.
### Config
```yaml
memory:
working_memory_ttl_days: 14 # expiry window; default 14
working_memory_max_tokens: 1000 # injection size cap; default 1000
```
---
## Section 3: Compaction → Working Memory Flow
### Current Flow
```
history exceeds threshold
→ compactHistory() produces summary string
→ summary replaces trimmed messages in session history
→ summary string discarded
```
### New Flow
```
history exceeds threshold
→ compactHistory() produces summary string
→ summary replaces trimmed messages in session history
→ summary written to user/working (replaces previous)
→ memory extraction writes facts to user/profile + user/patterns
```
### Compaction Prompt Change
Today's compaction uses a generic summarization prompt. Add a personal-assistant-focused variant that explicitly captures:
- What the user was working on and current status
- Decisions made and their outcomes
- Preferences or constraints the user expressed
- Open threads and follow-up items
This makes `user/working` genuinely useful as a "picking up where we left off" snapshot rather than a generic recap.
The improved prompt is only used when `user_namespace` is set. Existing generic compaction is unchanged otherwise.
---
## Section 4: Session Start Injection
### Injection Point
A new one-time `_injectSessionContext()` call in the orchestrator, triggered before the first user message of a new session. Separate from the existing per-turn `_injectMemoryContext()`.
### Injection Order in System Prompt
```
[base system prompt — SOUL.md / IDENTITY.md / etc.]
--- Who you're talking to ---
[user/profile content] ← always injected if present
--- Recent context ---
[user/working content] ← injected if not expired
[adaptive per-turn memory injection — unchanged, runs every turn]
```
### Idempotency
Session-start injection is tracked by a boolean flag on the orchestrator instance. Reconnects to the same session ID do not re-inject.
### Graceful Degradation
| Condition | Behavior |
|---|---|
| No `user/profile` file | Skip block silently |
| `user/working` expired | Skip block, log at debug level |
| Memory store not configured | Entire feature no-ops |
| `user_namespace` not set | Current behavior, unchanged |
### Optional Proactive Greeting
When `proactive_session_greeting: true`, include a system instruction on the first turn:
> "If relevant, briefly acknowledge what the user was last working on before responding to their first message."
Off by default. Gives the Pi-like "picking up the thread" feel when enabled.
```yaml
memory:
proactive_session_greeting: false # default off
```
---
## Section 5: Cross-Channel Identity
No per-channel plumbing needed. All channels share the orchestrator config. When `user_namespace` is set, every channel reads/writes `user/*` automatically.
**First message on a new channel** — if the user switches from web UI to Telegram, `user/working` from web UI sessions is already present. The Telegram session injects it on first turn. This is the intended behavior.
---
## File-Level Change Summary
| File | Change |
|---|---|
| `src/memory/workingMemory.ts` | **New** — read/write/expiry logic for `user/working` |
| `src/memory/store.ts` | Add `writeWithMetadata()` supporting timestamped/expiry headers |
| `src/context/compaction.ts` | Add personal-assistant-focused compaction prompt option |
| `src/backends/native/orchestrator.ts` | Session-start injection + write working memory after compaction |
| `src/config/schema.ts` | New fields: `user_namespace`, `working_memory_ttl_days`, `working_memory_max_tokens`, `proactive_session_greeting` |
| `src/daemon/index.ts` | Pass user namespace config through to orchestrator |
---
## Config Reference (full)
```yaml
memory:
# Shared identity namespace. When set, all channels share user/* memory.
# Absent (default) = current session-scoped behavior, unchanged.
user_namespace: "user"
# How long working memory stays valid after the last compaction.
working_memory_ttl_days: 14
# Token budget for working memory injection at session start.
working_memory_max_tokens: 1000
# If true, instruct the model to acknowledge prior context on session start.
proactive_session_greeting: false
```
---
## Success Criteria
1. Working memory survives a daemon restart and is injected on next session start.
2. Switching channels (e.g. Telegram → web UI) injects the same `user/working` content.
3. `user_namespace` absent = zero behavior change vs today (regression-safe).
4. Compaction with `user_namespace` set writes to `user/working` on every run.
5. Expired working memory is silently ignored without error.