Add Agentic RAG design document
Design for extending Claude agent system with semantic search: - Two indexes: personal (state files) + external docs - ChromaDB + sentence-transformers stack - rag-search skill with search.py CLI - Daily systemd timer for index refresh - Ralph loop implementation with Haiku/Sonnet delegation Added future considerations (fc-043 to fc-046): - Auto-sync on tool version change - Broad doc indexing - K8s deployment - Query caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
388
docs/plans/2025-01-21-agentic-rag-design.md
Normal file
388
docs/plans/2025-01-21-agentic-rag-design.md
Normal file
@@ -0,0 +1,388 @@
|
||||
# Agentic RAG Design
|
||||
|
||||
**Date:** 2025-01-21
|
||||
**Status:** Ready for implementation
|
||||
**Category:** Agent memory / Knowledge retrieval
|
||||
|
||||
## Overview
|
||||
|
||||
Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.
|
||||
|
||||
### Goals
|
||||
|
||||
- Retrieve relevant past decisions and preferences when answering questions
|
||||
- Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
|
||||
- Cross-reference personal context with official documentation
|
||||
- Support iterative query refinement (agentic behavior)
|
||||
|
||||
### Non-Goals (Future Considerations)
|
||||
|
||||
Deferred to `future-considerations.json`:
|
||||
|
||||
- **fc-043**: Auto-sync on tool version change
|
||||
- **fc-044**: Broad doc indexing (hundreds of sources)
|
||||
- **fc-045**: K8s deployment
|
||||
- **fc-046**: Query caching
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
User question
|
||||
│
|
||||
▼
|
||||
Personal Assistant (existing)
|
||||
│
|
||||
├── Decides if RAG would help
|
||||
│
|
||||
▼
|
||||
rag-search skill (new)
|
||||
│
|
||||
├── Query embedding
|
||||
├── Vector similarity search
|
||||
├── Return ranked chunks with metadata
|
||||
│
|
||||
▼
|
||||
Claude reasons over results
|
||||
│
|
||||
├── Good enough? → Answer
|
||||
└── Need more? → Reformulate, search again
|
||||
```
|
||||
|
||||
### Two Indexes
|
||||
|
||||
| Index | Contents | Update Frequency |
|
||||
|-------|----------|------------------|
|
||||
| **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily |
|
||||
| **docs** | External documentation (k0s, ArgoCD, etc.) | Daily |
|
||||
|
||||
### Why Two Indexes
|
||||
|
||||
- Different update frequencies
|
||||
- Different retrieval strategies (personal may weight recency)
|
||||
- Can query one or both depending on the question
|
||||
|
||||
## Components
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ rag-search skill │
|
||||
│ (Claude invokes this) │
|
||||
└─────────────────────┬───────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────┴─────────────┐
|
||||
▼ ▼
|
||||
┌───────────────────┐ ┌───────────────────┐
|
||||
│ Personal Index │ │ Docs Index │
|
||||
│ │ │ │
|
||||
│ ~/.claude/state/* │ │ External docs │
|
||||
│ memory/*.json │ │ (k0s, ArgoCD...) │
|
||||
│ kb.json │ │ │
|
||||
└────────┬──────────┘ └────────┬──────────┘
|
||||
│ │
|
||||
└──────────┬──────────────┘
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ Vector Store │
|
||||
│ (ChromaDB) │
|
||||
│ │
|
||||
│ Collections: │
|
||||
│ - personal │
|
||||
│ - docs │
|
||||
└────────┬──────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ Embedding Model │
|
||||
│ (sentence- │
|
||||
│ transformers) │
|
||||
└───────────────────┘
|
||||
```
|
||||
|
||||
### Stack
|
||||
|
||||
| Component | Choice | Notes |
|
||||
|-----------|--------|-------|
|
||||
| Vector store | ChromaDB | Pure Python, no external deps |
|
||||
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB |
|
||||
| Storage | `~/.claude/data/rag-search/` | Local to workstation |
|
||||
|
||||
## Skill Structure
|
||||
|
||||
**Location:** `~/.claude/skills/rag-search/`
|
||||
|
||||
```
|
||||
rag-search/
|
||||
├── SKILL.md # Instructions for Claude
|
||||
├── scripts/
|
||||
│ ├── search.py # Main search entry point
|
||||
│ ├── index_personal.py # Index state files
|
||||
│ ├── index_docs.py # Index external docs
|
||||
│ └── add_doc_source.py # Add new doc source
|
||||
└── references/
|
||||
└── sources.json # Configured doc sources
|
||||
```
|
||||
|
||||
## Skill Interface
|
||||
|
||||
### Invocation
|
||||
|
||||
```bash
|
||||
# Basic search (both indexes)
|
||||
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"
|
||||
|
||||
# Search specific index
|
||||
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
|
||||
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"
|
||||
|
||||
# Control result count
|
||||
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"
|
||||
```
|
||||
|
||||
### Output Format
|
||||
|
||||
```json
|
||||
{
|
||||
"query": "how did I configure ArgoCD sync?",
|
||||
"results": [
|
||||
{
|
||||
"rank": 1,
|
||||
"score": 0.847,
|
||||
"source": "personal",
|
||||
"file": "memory/decisions.json",
|
||||
"chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
|
||||
"metadata": {"date": "2025-01-15", "context": "k8s setup"}
|
||||
},
|
||||
{
|
||||
"rank": 2,
|
||||
"score": 0.823,
|
||||
"source": "docs",
|
||||
"file": "argocd/sync-options.md",
|
||||
"chunk": "Auto-sync can be configured with selfHeal and prune options...",
|
||||
"metadata": {"doc_version": "2.9", "url": "https://..."}
|
||||
}
|
||||
],
|
||||
"searched_collections": ["personal", "docs"],
|
||||
"total_chunks_searched": 1847
|
||||
}
|
||||
```
|
||||
|
||||
### SKILL.md Guidance
|
||||
|
||||
- Start with broad query, refine if results aren't relevant
|
||||
- Cross-reference personal decisions with docs when both appear
|
||||
- Cite sources in answers (file + date for personal, URL for docs)
|
||||
|
||||
## External Docs Management
|
||||
|
||||
### Source Registry
|
||||
|
||||
**Location:** `~/.claude/skills/rag-search/references/sources.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"sources": [
|
||||
{
|
||||
"id": "k0s",
|
||||
"name": "k0s Documentation",
|
||||
"type": "git",
|
||||
"url": "https://github.com/k0sproject/k0s.git",
|
||||
"path": "docs/",
|
||||
"glob": "**/*.md",
|
||||
"version": "v1.30.0",
|
||||
"last_indexed": "2025-01-20T10:00:00Z"
|
||||
},
|
||||
{
|
||||
"id": "argocd",
|
||||
"name": "ArgoCD Documentation",
|
||||
"type": "web",
|
||||
"base_url": "https://argo-cd.readthedocs.io/en/stable/",
|
||||
"pages": ["user-guide/sync-options/", "operator-manual/"],
|
||||
"last_indexed": "2025-01-18T14:30:00Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Adding Sources
|
||||
|
||||
```bash
|
||||
~/.claude/skills/rag-search/scripts/add_doc_source.py \
|
||||
--id "cilium" \
|
||||
--name "Cilium Docs" \
|
||||
--type git \
|
||||
--url "https://github.com/cilium/cilium.git" \
|
||||
--path "Documentation/" \
|
||||
--glob "**/*.md"
|
||||
|
||||
# Then index it
|
||||
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium
|
||||
```
|
||||
|
||||
### Update Strategies
|
||||
|
||||
| Strategy | Command | When |
|
||||
|----------|---------|------|
|
||||
| Manual | `index_docs.py --source <id>` | After version upgrade |
|
||||
| All sources | `index_docs.py --all` | Periodic refresh |
|
||||
|
||||
## Periodic Refresh
|
||||
|
||||
Daily systemd timer on workstation.
|
||||
|
||||
### Service
|
||||
|
||||
**Location:** `~/.config/systemd/user/rag-index.service`
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Refresh RAG search indexes
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
|
||||
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
|
||||
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin
|
||||
|
||||
[Install]
|
||||
WantedBy=default.target
|
||||
```
|
||||
|
||||
### Timer
|
||||
|
||||
**Location:** `~/.config/systemd/user/rag-index.timer`
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Daily RAG index refresh
|
||||
|
||||
[Timer]
|
||||
OnCalendar=daily
|
||||
Persistent=true
|
||||
RandomizedDelaySec=3600
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
```
|
||||
|
||||
### Enable
|
||||
|
||||
```bash
|
||||
systemctl --user daemon-reload
|
||||
systemctl --user enable --now rag-index.timer
|
||||
```
|
||||
|
||||
### Manual Trigger
|
||||
|
||||
```bash
|
||||
systemctl --user start rag-index.service
|
||||
journalctl --user -u rag-index.service # View logs
|
||||
```
|
||||
|
||||
## Resource Requirements
|
||||
|
||||
**Target:** Workstation or Pi5 8GB
|
||||
|
||||
| Component | RAM | Disk | Notes |
|
||||
|-----------|-----|------|-------|
|
||||
| Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand |
|
||||
| ChromaDB | ~100-500MB | Varies | Scales with index size |
|
||||
| Index: personal (~50 files) | — | ~5MB | Small, fast to query |
|
||||
| Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume |
|
||||
| Indexing process (peak) | ~1GB | — | During embedding generation |
|
||||
|
||||
**Pi3 1GB:** Not suitable for this workload.
|
||||
|
||||
## Chunking Strategy
|
||||
|
||||
| Index | Strategy |
|
||||
|-------|----------|
|
||||
| Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) |
|
||||
| Docs | ~500 tokens per chunk with overlap, preserve headers as metadata |
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### Recommended: Ralph Loop
|
||||
|
||||
This design is suitable for Ralph loop implementation:
|
||||
- Clear success criteria (tests, functional checks)
|
||||
- Iterative refinement expected (tuning chunking, embeddings)
|
||||
- Automatic verification possible
|
||||
|
||||
### Model Delegation
|
||||
|
||||
Use appropriate models for each phase:
|
||||
|
||||
| Phase | Task | Model |
|
||||
|-------|------|-------|
|
||||
| 1 | Set up ChromaDB + embedding model | Haiku |
|
||||
| 2 | Write `index_personal.py` | Sonnet |
|
||||
| 3 | Write `index_docs.py` | Sonnet |
|
||||
| 4 | Write `search.py` | Sonnet |
|
||||
| 5 | Write SKILL.md | Haiku |
|
||||
| 6 | Integration tests | Sonnet |
|
||||
| 7 | End-to-end validation | Sonnet |
|
||||
|
||||
### Ralph Invocation
|
||||
|
||||
```bash
|
||||
/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.
|
||||
|
||||
Delegate to appropriate models:
|
||||
- Haiku: setup, docs, simple scripts
|
||||
- Sonnet: implementation, tests, debugging
|
||||
- Opus: only if stuck on complex reasoning
|
||||
|
||||
Success criteria:
|
||||
1. ChromaDB + embeddings working
|
||||
2. Personal index populated from ~/.claude/state
|
||||
3. At least one external doc source indexed
|
||||
4. search.py returns relevant results
|
||||
5. All tests pass
|
||||
|
||||
Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"
|
||||
```
|
||||
|
||||
### When NOT to use Ralph
|
||||
|
||||
- Design decisions still needed (use brainstorming first)
|
||||
- Requires human judgment mid-implementation
|
||||
- One-shot simple tasks
|
||||
|
||||
## Workflow Integration
|
||||
|
||||
```
|
||||
/superpowers:brainstorm
|
||||
│
|
||||
▼
|
||||
Design doc created
|
||||
(docs/plans/YYYY-MM-DD-*-design.md)
|
||||
│
|
||||
▼
|
||||
"Ready to implement?"
|
||||
│
|
||||
┌────┴────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
Simple Complex/Iterative
|
||||
│ │
|
||||
▼ ▼
|
||||
Manual /ralph-loop
|
||||
or TDD with design doc
|
||||
as spec
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
| Aspect | Decision |
|
||||
|--------|----------|
|
||||
| **Architecture** | Extend existing Claude skill system with semantic search |
|
||||
| **Indexes** | Two: personal (state files) + docs (external) |
|
||||
| **Vector store** | ChromaDB (local, no deps) |
|
||||
| **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) |
|
||||
| **Skill interface** | `rag-search` skill with `search.py` CLI |
|
||||
| **Doc management** | `sources.json` registry, git/web fetching |
|
||||
| **Refresh** | systemd user timer, daily |
|
||||
| **Storage** | `~/.claude/data/rag-search/` |
|
||||
| **Hardware** | Runs on workstation (Pi5 8GB capable if needed) |
|
||||
| **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |
|
||||
File diff suppressed because one or more lines are too long
Reference in New Issue
Block a user