Add Agentic RAG design document

Design for extending Claude agent system with semantic search: - Two indexes: personal (state files) + external docs - ChromaDB + sentence-transformers stack - rag-search skill with search.py CLI - Daily systemd timer for index refresh - Ralph loop implementation with Haiku/Sonnet delegation Added future considerations (fc-043 to fc-046): - Auto-sync on tool version change - Broad doc indexing - K8s deployment - Query caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:08:00 -08:00
parent 4fe8957482
commit c21b152de8
2 changed files with 389 additions and 1 deletions
--- a/docs/plans/2025-01-21-agentic-rag-design.md
+++ b/docs/plans/2025-01-21-agentic-rag-design.md
@@ -0,0 +1,388 @@
+# Agentic RAG Design
+
+**Date:** 2025-01-21
+**Status:** Ready for implementation
+**Category:** Agent memory / Knowledge retrieval
+
+## Overview
+
+Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.
+
+### Goals
+
+- Retrieve relevant past decisions and preferences when answering questions
+- Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
+- Cross-reference personal context with official documentation
+- Support iterative query refinement (agentic behavior)
+
+### Non-Goals (Future Considerations)
+
+Deferred to `future-considerations.json`:
+
+- **fc-043**: Auto-sync on tool version change
+- **fc-044**: Broad doc indexing (hundreds of sources)
+- **fc-045**: K8s deployment
+- **fc-046**: Query caching
+
+## Architecture
+
+```
+User question
+     │
+     ▼
+Personal Assistant (existing)
+     │
+     ├── Decides if RAG would help
+     │
+     ▼
+rag-search skill (new)
+     │
+     ├── Query embedding
+     ├── Vector similarity search
+     ├── Return ranked chunks with metadata
+     │
+     ▼
+Claude reasons over results
+     │
+     ├── Good enough? → Answer
+     └── Need more? → Reformulate, search again
+```
+
+### Two Indexes
+
+| Index | Contents | Update Frequency |
+|-------|----------|------------------|
+| **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily |
+| **docs** | External documentation (k0s, ArgoCD, etc.) | Daily |
+
+### Why Two Indexes
+
+- Different update frequencies
+- Different retrieval strategies (personal may weight recency)
+- Can query one or both depending on the question
+
+## Components
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        rag-search skill                         │
+│                     (Claude invokes this)                       │
+└─────────────────────┬───────────────────────────────────────────┘
+                      │
+        ┌─────────────┴─────────────┐
+        ▼                           ▼
+┌───────────────────┐     ┌───────────────────┐
+│  Personal Index   │     │    Docs Index     │
+│                   │     │                   │
+│ ~/.claude/state/* │     │ External docs     │
+│ memory/*.json     │     │ (k0s, ArgoCD...)  │
+│ kb.json           │     │                   │
+└────────┬──────────┘     └────────┬──────────┘
+         │                         │
+         └──────────┬──────────────┘
+                    ▼
+         ┌───────────────────┐
+         │   Vector Store    │
+         │   (ChromaDB)      │
+         │                   │
+         │ Collections:      │
+         │  - personal       │
+         │  - docs           │
+         └────────┬──────────┘
+                  │
+                  ▼
+         ┌───────────────────┐
+         │  Embedding Model  │
+         │  (sentence-       │
+         │   transformers)   │
+         └───────────────────┘
+```
+
+### Stack
+
+| Component | Choice | Notes |
+|-----------|--------|-------|
+| Vector store | ChromaDB | Pure Python, no external deps |
+| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB |
+| Storage | `~/.claude/data/rag-search/` | Local to workstation |
+
+## Skill Structure
+
+**Location:** `~/.claude/skills/rag-search/`
+
+```
+rag-search/
+├── SKILL.md              # Instructions for Claude
+├── scripts/
+│   ├── search.py         # Main search entry point
+│   ├── index_personal.py # Index state files
+│   ├── index_docs.py     # Index external docs
+│   └── add_doc_source.py # Add new doc source
+└── references/
+    └── sources.json      # Configured doc sources
+```
+
+## Skill Interface
+
+### Invocation
+
+```bash
+# Basic search (both indexes)
+~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"
+
+# Search specific index
+~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
+~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"
+
+# Control result count
+~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"
+```
+
+### Output Format
+
+```json
+{
+  "query": "how did I configure ArgoCD sync?",
+  "results": [
+    {
+      "rank": 1,
+      "score": 0.847,
+      "source": "personal",
+      "file": "memory/decisions.json",
+      "chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
+      "metadata": {"date": "2025-01-15", "context": "k8s setup"}
+    },
+    {
+      "rank": 2,
+      "score": 0.823,
+      "source": "docs",
+      "file": "argocd/sync-options.md",
+      "chunk": "Auto-sync can be configured with selfHeal and prune options...",
+      "metadata": {"doc_version": "2.9", "url": "https://..."}
+    }
+  ],
+  "searched_collections": ["personal", "docs"],
+  "total_chunks_searched": 1847
+}
+```
+
+### SKILL.md Guidance
+
+- Start with broad query, refine if results aren't relevant
+- Cross-reference personal decisions with docs when both appear
+- Cite sources in answers (file + date for personal, URL for docs)
+
+## External Docs Management
+
+### Source Registry
+
+**Location:** `~/.claude/skills/rag-search/references/sources.json`
+
+```json
+{
+  "sources": [
+    {
+      "id": "k0s",
+      "name": "k0s Documentation",
+      "type": "git",
+      "url": "https://github.com/k0sproject/k0s.git",
+      "path": "docs/",
+      "glob": "**/*.md",
+      "version": "v1.30.0",
+      "last_indexed": "2025-01-20T10:00:00Z"
+    },
+    {
+      "id": "argocd",
+      "name": "ArgoCD Documentation",
+      "type": "web",
+      "base_url": "https://argo-cd.readthedocs.io/en/stable/",
+      "pages": ["user-guide/sync-options/", "operator-manual/"],
+      "last_indexed": "2025-01-18T14:30:00Z"
+    }
+  ]
+}
+```
+
+### Adding Sources
+
+```bash
+~/.claude/skills/rag-search/scripts/add_doc_source.py \
+  --id "cilium" \
+  --name "Cilium Docs" \
+  --type git \
+  --url "https://github.com/cilium/cilium.git" \
+  --path "Documentation/" \
+  --glob "**/*.md"
+
+# Then index it
+~/.claude/skills/rag-search/scripts/index_docs.py --source cilium
+```
+
+### Update Strategies
+
+| Strategy | Command | When |
+|----------|---------|------|
+| Manual | `index_docs.py --source <id>` | After version upgrade |
+| All sources | `index_docs.py --all` | Periodic refresh |
+
+## Periodic Refresh
+
+Daily systemd timer on workstation.
+
+### Service
+
+**Location:** `~/.config/systemd/user/rag-index.service`
+
+```ini
+[Unit]
+Description=Refresh RAG search indexes
+After=network-online.target
+
+[Service]
+Type=oneshot
+ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
+ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
+Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin
+
+[Install]
+WantedBy=default.target
+```
+
+### Timer
+
+**Location:** `~/.config/systemd/user/rag-index.timer`
+
+```ini
+[Unit]
+Description=Daily RAG index refresh
+
+[Timer]
+OnCalendar=daily
+Persistent=true
+RandomizedDelaySec=3600
+
+[Install]
+WantedBy=timers.target
+```
+
+### Enable
+
+```bash
+systemctl --user daemon-reload
+systemctl --user enable --now rag-index.timer
+```
+
+### Manual Trigger
+
+```bash
+systemctl --user start rag-index.service
+journalctl --user -u rag-index.service  # View logs
+```
+
+## Resource Requirements
+
+**Target:** Workstation or Pi5 8GB
+
+| Component | RAM | Disk | Notes |
+|-----------|-----|------|-------|
+| Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand |
+| ChromaDB | ~100-500MB | Varies | Scales with index size |
+| Index: personal (~50 files) | — | ~5MB | Small, fast to query |
+| Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume |
+| Indexing process (peak) | ~1GB | — | During embedding generation |
+
+**Pi3 1GB:** Not suitable for this workload.
+
+## Chunking Strategy
+
+| Index | Strategy |
+|-------|----------|
+| Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) |
+| Docs | ~500 tokens per chunk with overlap, preserve headers as metadata |
+
+## Implementation Notes
+
+### Recommended: Ralph Loop
+
+This design is suitable for Ralph loop implementation:
+- Clear success criteria (tests, functional checks)
+- Iterative refinement expected (tuning chunking, embeddings)
+- Automatic verification possible
+
+### Model Delegation
+
+Use appropriate models for each phase:
+
+| Phase | Task | Model |
+|-------|------|-------|
+| 1 | Set up ChromaDB + embedding model | Haiku |
+| 2 | Write `index_personal.py` | Sonnet |
+| 3 | Write `index_docs.py` | Sonnet |
+| 4 | Write `search.py` | Sonnet |
+| 5 | Write SKILL.md | Haiku |
+| 6 | Integration tests | Sonnet |
+| 7 | End-to-end validation | Sonnet |
+
+### Ralph Invocation
+
+```bash
+/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.
+
+Delegate to appropriate models:
+- Haiku: setup, docs, simple scripts
+- Sonnet: implementation, tests, debugging
+- Opus: only if stuck on complex reasoning
+
+Success criteria:
+1. ChromaDB + embeddings working
+2. Personal index populated from ~/.claude/state
+3. At least one external doc source indexed
+4. search.py returns relevant results
+5. All tests pass
+
+Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"
+```
+
+### When NOT to use Ralph
+
+- Design decisions still needed (use brainstorming first)
+- Requires human judgment mid-implementation
+- One-shot simple tasks
+
+## Workflow Integration
+
+```
+/superpowers:brainstorm
+        │
+        ▼
+   Design doc created
+   (docs/plans/YYYY-MM-DD-*-design.md)
+        │
+        ▼
+   "Ready to implement?"
+        │
+   ┌────┴────┐
+   │         │
+   ▼         ▼
+ Simple    Complex/Iterative
+   │              │
+   ▼              ▼
+ Manual     /ralph-loop
+ or TDD     with design doc
+            as spec
+```
+
+## Summary
+
+| Aspect | Decision |
+|--------|----------|
+| **Architecture** | Extend existing Claude skill system with semantic search |
+| **Indexes** | Two: personal (state files) + docs (external) |
+| **Vector store** | ChromaDB (local, no deps) |
+| **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) |
+| **Skill interface** | `rag-search` skill with `search.py` CLI |
+| **Doc management** | `sources.json` registry, git/web fetching |
+| **Refresh** | systemd user timer, daily |
+| **Storage** | `~/.claude/data/rag-search/` |
+| **Hardware** | Runs on workstation (Pi5 8GB capable if needed) |
+| **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |
--- a/state/future-considerations.json
+++ b/state/future-considerations.json