claude-code/docs/plans/2025-01-21-agentic-rag-design.md

# Agentic RAG Design

**Date:** 2025-01-21
**Status:** Ready for implementation
**Category:** Agent memory / Knowledge retrieval

## Overview

Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.

### Goals

- Retrieve relevant past decisions and preferences when answering questions
- Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
- Cross-reference personal context with official documentation
- Support iterative query refinement (agentic behavior)

### Non-Goals (Future Considerations)

Deferred to `future-considerations.json`:

- **fc-043**: Auto-sync on tool version change
- **fc-044**: Broad doc indexing (hundreds of sources)
- **fc-045**: K8s deployment
- **fc-046**: Query caching

## Architecture

```
User question
     │
     ▼
Personal Assistant (existing)
     │
     ├── Decides if RAG would help
     │
     ▼
rag-search skill (new)
     │
     ├── Query embedding
     ├── Vector similarity search
     ├── Return ranked chunks with metadata
     │
     ▼
Claude reasons over results
     │
     ├── Good enough? → Answer
     └── Need more? → Reformulate, search again
```

### Two Indexes

| Index | Contents | Update Frequency |
|-------|----------|------------------|
| **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily |
| **docs** | External documentation (k0s, ArgoCD, etc.) | Daily |

### Why Two Indexes

- Different update frequencies
- Different retrieval strategies (personal may weight recency)
- Can query one or both depending on the question

## Components

```
┌─────────────────────────────────────────────────────────────────┐
│                        rag-search skill                         │
│                     (Claude invokes this)                       │
└─────────────────────┬───────────────────────────────────────────┘
                      │
        ┌─────────────┴─────────────┐
        ▼                           ▼
┌───────────────────┐     ┌───────────────────┐
│  Personal Index   │     │    Docs Index     │
│                   │     │                   │
│ ~/.claude/state/* │     │ External docs     │
│ memory/*.json     │     │ (k0s, ArgoCD...)  │
│ kb.json           │     │                   │
└────────┬──────────┘     └────────┬──────────┘
         │                         │
         └──────────┬──────────────┘
                    ▼
         ┌───────────────────┐
         │   Vector Store    │
         │   (ChromaDB)      │
         │                   │
         │ Collections:      │
         │  - personal       │
         │  - docs           │
         └────────┬──────────┘
                  │
                  ▼
         ┌───────────────────┐
         │  Embedding Model  │
         │  (sentence-       │
         │   transformers)   │
         └───────────────────┘
```

### Stack

| Component | Choice | Notes |
|-----------|--------|-------|
| Vector store | ChromaDB | Pure Python, no external deps |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB |
| Storage | `~/.claude/data/rag-search/` | Local to workstation |

## Skill Structure

**Location:** `~/.claude/skills/rag-search/`

```
rag-search/
├── SKILL.md              # Instructions for Claude
├── scripts/
│   ├── search.py         # Main search entry point
│   ├── index_personal.py # Index state files
│   ├── index_docs.py     # Index external docs
│   └── add_doc_source.py # Add new doc source
└── references/
    └── sources.json      # Configured doc sources
```

## Skill Interface

### Invocation

```bash
# Basic search (both indexes)
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"

# Search specific index
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"

# Control result count
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"
```

### Output Format

```json
{
  "query": "how did I configure ArgoCD sync?",
  "results": [
    {
      "rank": 1,
      "score": 0.847,
      "source": "personal",
      "file": "memory/decisions.json",
      "chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
      "metadata": {"date": "2025-01-15", "context": "k8s setup"}
    },
    {
      "rank": 2,
      "score": 0.823,
      "source": "docs",
      "file": "argocd/sync-options.md",
      "chunk": "Auto-sync can be configured with selfHeal and prune options...",
      "metadata": {"doc_version": "2.9", "url": "https://..."}
    }
  ],
  "searched_collections": ["personal", "docs"],
  "total_chunks_searched": 1847
}
```

### SKILL.md Guidance

- Start with broad query, refine if results aren't relevant
- Cross-reference personal decisions with docs when both appear
- Cite sources in answers (file + date for personal, URL for docs)

## External Docs Management

### Source Registry

**Location:** `~/.claude/skills/rag-search/references/sources.json`

```json
{
  "sources": [
    {
      "id": "k0s",
      "name": "k0s Documentation",
      "type": "git",
      "url": "https://github.com/k0sproject/k0s.git",
      "path": "docs/",
      "glob": "**/*.md",
      "version": "v1.30.0",
      "last_indexed": "2025-01-20T10:00:00Z"
    },
    {
      "id": "argocd",
      "name": "ArgoCD Documentation",
      "type": "web",
      "base_url": "https://argo-cd.readthedocs.io/en/stable/",
      "pages": ["user-guide/sync-options/", "operator-manual/"],
      "last_indexed": "2025-01-18T14:30:00Z"
    }
  ]
}
```

### Adding Sources

```bash
~/.claude/skills/rag-search/scripts/add_doc_source.py \
  --id "cilium" \
  --name "Cilium Docs" \
  --type git \
  --url "https://github.com/cilium/cilium.git" \
  --path "Documentation/" \
  --glob "**/*.md"

# Then index it
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium
```

### Update Strategies

| Strategy | Command | When |
|----------|---------|------|
| Manual | `index_docs.py --source <id>` | After version upgrade |
| All sources | `index_docs.py --all` | Periodic refresh |

## Periodic Refresh

Daily systemd timer on workstation.

### Service

**Location:** `~/.config/systemd/user/rag-index.service`

```ini
[Unit]
Description=Refresh RAG search indexes
After=network-online.target

[Service]
Type=oneshot
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin

[Install]
WantedBy=default.target
```

### Timer

**Location:** `~/.config/systemd/user/rag-index.timer`

```ini
[Unit]
Description=Daily RAG index refresh

[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=3600

[Install]
WantedBy=timers.target
```

### Enable

```bash
systemctl --user daemon-reload
systemctl --user enable --now rag-index.timer
```

### Manual Trigger

```bash
systemctl --user start rag-index.service
journalctl --user -u rag-index.service  # View logs
```

## Resource Requirements

**Target:** Workstation or Pi5 8GB

| Component | RAM | Disk | Notes |
|-----------|-----|------|-------|
| Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand |
| ChromaDB | ~100-500MB | Varies | Scales with index size |
| Index: personal (~50 files) | — | ~5MB | Small, fast to query |
| Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume |
| Indexing process (peak) | ~1GB | — | During embedding generation |

**Pi3 1GB:** Not suitable for this workload.

## Chunking Strategy

| Index | Strategy |
|-------|----------|
| Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) |
| Docs | ~500 tokens per chunk with overlap, preserve headers as metadata |

## Implementation Notes

### Recommended: Ralph Loop

This design is suitable for Ralph loop implementation:
- Clear success criteria (tests, functional checks)
- Iterative refinement expected (tuning chunking, embeddings)
- Automatic verification possible

### Model Delegation

Use appropriate models for each phase:

| Phase | Task | Model |
|-------|------|-------|
| 1 | Set up ChromaDB + embedding model | Haiku |
| 2 | Write `index_personal.py` | Sonnet |
| 3 | Write `index_docs.py` | Sonnet |
| 4 | Write `search.py` | Sonnet |
| 5 | Write SKILL.md | Haiku |
| 6 | Integration tests | Sonnet |
| 7 | End-to-end validation | Sonnet |

### Ralph Invocation

```bash
/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.

Delegate to appropriate models:
- Haiku: setup, docs, simple scripts
- Sonnet: implementation, tests, debugging
- Opus: only if stuck on complex reasoning

Success criteria:
1. ChromaDB + embeddings working
2. Personal index populated from ~/.claude/state
3. At least one external doc source indexed
4. search.py returns relevant results
5. All tests pass

Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"
```

### When NOT to use Ralph

- Design decisions still needed (use brainstorming first)
- Requires human judgment mid-implementation
- One-shot simple tasks

## Workflow Integration

```
/superpowers:brainstorm
        │
        ▼
   Design doc created
   (docs/plans/YYYY-MM-DD-*-design.md)
        │
        ▼
   "Ready to implement?"
        │
   ┌────┴────┐
   │         │
   ▼         ▼
 Simple    Complex/Iterative
   │              │
   ▼              ▼
 Manual     /ralph-loop
 or TDD     with design doc
            as spec
```

## Summary

| Aspect | Decision |
|--------|----------|
| **Architecture** | Extend existing Claude skill system with semantic search |
| **Indexes** | Two: personal (state files) + docs (external) |
| **Vector store** | ChromaDB (local, no deps) |
| **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) |
| **Skill interface** | `rag-search` skill with `search.py` CLI |
| **Doc management** | `sources.json` registry, git/web fetching |
| **Refresh** | systemd user timer, daily |
| **Storage** | `~/.claude/data/rag-search/` |
| **Hardware** | Runs on workstation (Pi5 8GB capable if needed) |
| **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |