Add Agentic RAG design document

Design for extending Claude agent system with semantic search:
- Two indexes: personal (state files) + external docs
- ChromaDB + sentence-transformers stack
- rag-search skill with search.py CLI
- Daily systemd timer for index refresh
- Ralph loop implementation with Haiku/Sonnet delegation

Added future considerations (fc-043 to fc-046):
- Auto-sync on tool version change
- Broad doc indexing
- K8s deployment
- Query caching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
OpenCode Test
2026-01-04 14:08:00 -08:00
parent 4fe8957482
commit c21b152de8
2 changed files with 389 additions and 1 deletions

View File

@@ -0,0 +1,388 @@
# Agentic RAG Design
**Date:** 2025-01-21
**Status:** Ready for implementation
**Category:** Agent memory / Knowledge retrieval
## Overview
Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.
### Goals
- Retrieve relevant past decisions and preferences when answering questions
- Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
- Cross-reference personal context with official documentation
- Support iterative query refinement (agentic behavior)
### Non-Goals (Future Considerations)
Deferred to `future-considerations.json`:
- **fc-043**: Auto-sync on tool version change
- **fc-044**: Broad doc indexing (hundreds of sources)
- **fc-045**: K8s deployment
- **fc-046**: Query caching
## Architecture
```
User question
Personal Assistant (existing)
├── Decides if RAG would help
rag-search skill (new)
├── Query embedding
├── Vector similarity search
├── Return ranked chunks with metadata
Claude reasons over results
├── Good enough? → Answer
└── Need more? → Reformulate, search again
```
### Two Indexes
| Index | Contents | Update Frequency |
|-------|----------|------------------|
| **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily |
| **docs** | External documentation (k0s, ArgoCD, etc.) | Daily |
### Why Two Indexes
- Different update frequencies
- Different retrieval strategies (personal may weight recency)
- Can query one or both depending on the question
## Components
```
┌─────────────────────────────────────────────────────────────────┐
│ rag-search skill │
│ (Claude invokes this) │
└─────────────────────┬───────────────────────────────────────────┘
┌─────────────┴─────────────┐
▼ ▼
┌───────────────────┐ ┌───────────────────┐
│ Personal Index │ │ Docs Index │
│ │ │ │
│ ~/.claude/state/* │ │ External docs │
│ memory/*.json │ │ (k0s, ArgoCD...) │
│ kb.json │ │ │
└────────┬──────────┘ └────────┬──────────┘
│ │
└──────────┬──────────────┘
┌───────────────────┐
│ Vector Store │
│ (ChromaDB) │
│ │
│ Collections: │
│ - personal │
│ - docs │
└────────┬──────────┘
┌───────────────────┐
│ Embedding Model │
│ (sentence- │
│ transformers) │
└───────────────────┘
```
### Stack
| Component | Choice | Notes |
|-----------|--------|-------|
| Vector store | ChromaDB | Pure Python, no external deps |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB |
| Storage | `~/.claude/data/rag-search/` | Local to workstation |
## Skill Structure
**Location:** `~/.claude/skills/rag-search/`
```
rag-search/
├── SKILL.md # Instructions for Claude
├── scripts/
│ ├── search.py # Main search entry point
│ ├── index_personal.py # Index state files
│ ├── index_docs.py # Index external docs
│ └── add_doc_source.py # Add new doc source
└── references/
└── sources.json # Configured doc sources
```
## Skill Interface
### Invocation
```bash
# Basic search (both indexes)
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"
# Search specific index
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"
# Control result count
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"
```
### Output Format
```json
{
"query": "how did I configure ArgoCD sync?",
"results": [
{
"rank": 1,
"score": 0.847,
"source": "personal",
"file": "memory/decisions.json",
"chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
"metadata": {"date": "2025-01-15", "context": "k8s setup"}
},
{
"rank": 2,
"score": 0.823,
"source": "docs",
"file": "argocd/sync-options.md",
"chunk": "Auto-sync can be configured with selfHeal and prune options...",
"metadata": {"doc_version": "2.9", "url": "https://..."}
}
],
"searched_collections": ["personal", "docs"],
"total_chunks_searched": 1847
}
```
### SKILL.md Guidance
- Start with broad query, refine if results aren't relevant
- Cross-reference personal decisions with docs when both appear
- Cite sources in answers (file + date for personal, URL for docs)
## External Docs Management
### Source Registry
**Location:** `~/.claude/skills/rag-search/references/sources.json`
```json
{
"sources": [
{
"id": "k0s",
"name": "k0s Documentation",
"type": "git",
"url": "https://github.com/k0sproject/k0s.git",
"path": "docs/",
"glob": "**/*.md",
"version": "v1.30.0",
"last_indexed": "2025-01-20T10:00:00Z"
},
{
"id": "argocd",
"name": "ArgoCD Documentation",
"type": "web",
"base_url": "https://argo-cd.readthedocs.io/en/stable/",
"pages": ["user-guide/sync-options/", "operator-manual/"],
"last_indexed": "2025-01-18T14:30:00Z"
}
]
}
```
### Adding Sources
```bash
~/.claude/skills/rag-search/scripts/add_doc_source.py \
--id "cilium" \
--name "Cilium Docs" \
--type git \
--url "https://github.com/cilium/cilium.git" \
--path "Documentation/" \
--glob "**/*.md"
# Then index it
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium
```
### Update Strategies
| Strategy | Command | When |
|----------|---------|------|
| Manual | `index_docs.py --source <id>` | After version upgrade |
| All sources | `index_docs.py --all` | Periodic refresh |
## Periodic Refresh
Daily systemd timer on workstation.
### Service
**Location:** `~/.config/systemd/user/rag-index.service`
```ini
[Unit]
Description=Refresh RAG search indexes
After=network-online.target
[Service]
Type=oneshot
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin
[Install]
WantedBy=default.target
```
### Timer
**Location:** `~/.config/systemd/user/rag-index.timer`
```ini
[Unit]
Description=Daily RAG index refresh
[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=3600
[Install]
WantedBy=timers.target
```
### Enable
```bash
systemctl --user daemon-reload
systemctl --user enable --now rag-index.timer
```
### Manual Trigger
```bash
systemctl --user start rag-index.service
journalctl --user -u rag-index.service # View logs
```
## Resource Requirements
**Target:** Workstation or Pi5 8GB
| Component | RAM | Disk | Notes |
|-----------|-----|------|-------|
| Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand |
| ChromaDB | ~100-500MB | Varies | Scales with index size |
| Index: personal (~50 files) | — | ~5MB | Small, fast to query |
| Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume |
| Indexing process (peak) | ~1GB | — | During embedding generation |
**Pi3 1GB:** Not suitable for this workload.
## Chunking Strategy
| Index | Strategy |
|-------|----------|
| Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) |
| Docs | ~500 tokens per chunk with overlap, preserve headers as metadata |
## Implementation Notes
### Recommended: Ralph Loop
This design is suitable for Ralph loop implementation:
- Clear success criteria (tests, functional checks)
- Iterative refinement expected (tuning chunking, embeddings)
- Automatic verification possible
### Model Delegation
Use appropriate models for each phase:
| Phase | Task | Model |
|-------|------|-------|
| 1 | Set up ChromaDB + embedding model | Haiku |
| 2 | Write `index_personal.py` | Sonnet |
| 3 | Write `index_docs.py` | Sonnet |
| 4 | Write `search.py` | Sonnet |
| 5 | Write SKILL.md | Haiku |
| 6 | Integration tests | Sonnet |
| 7 | End-to-end validation | Sonnet |
### Ralph Invocation
```bash
/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.
Delegate to appropriate models:
- Haiku: setup, docs, simple scripts
- Sonnet: implementation, tests, debugging
- Opus: only if stuck on complex reasoning
Success criteria:
1. ChromaDB + embeddings working
2. Personal index populated from ~/.claude/state
3. At least one external doc source indexed
4. search.py returns relevant results
5. All tests pass
Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"
```
### When NOT to use Ralph
- Design decisions still needed (use brainstorming first)
- Requires human judgment mid-implementation
- One-shot simple tasks
## Workflow Integration
```
/superpowers:brainstorm
Design doc created
(docs/plans/YYYY-MM-DD-*-design.md)
"Ready to implement?"
┌────┴────┐
│ │
▼ ▼
Simple Complex/Iterative
│ │
▼ ▼
Manual /ralph-loop
or TDD with design doc
as spec
```
## Summary
| Aspect | Decision |
|--------|----------|
| **Architecture** | Extend existing Claude skill system with semantic search |
| **Indexes** | Two: personal (state files) + docs (external) |
| **Vector store** | ChromaDB (local, no deps) |
| **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) |
| **Skill interface** | `rag-search` skill with `search.py` CLI |
| **Doc management** | `sources.json` registry, git/web fetching |
| **Refresh** | systemd user timer, daily |
| **Storage** | `~/.claude/data/rag-search/` |
| **Hardware** | Runs on workstation (Pi5 8GB capable if needed) |
| **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |

File diff suppressed because one or more lines are too long