Files
claude-code/docs/plans/2025-01-21-agentic-rag-design.md
OpenCode Test c21b152de8 Add Agentic RAG design document
Design for extending Claude agent system with semantic search:
- Two indexes: personal (state files) + external docs
- ChromaDB + sentence-transformers stack
- rag-search skill with search.py CLI
- Daily systemd timer for index refresh
- Ralph loop implementation with Haiku/Sonnet delegation

Added future considerations (fc-043 to fc-046):
- Auto-sync on tool version change
- Broad doc indexing
- K8s deployment
- Query caching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 14:08:00 -08:00

11 KiB

Agentic RAG Design

Date: 2025-01-21 Status: Ready for implementation Category: Agent memory / Knowledge retrieval

Overview

Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.

Goals

  • Retrieve relevant past decisions and preferences when answering questions
  • Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
  • Cross-reference personal context with official documentation
  • Support iterative query refinement (agentic behavior)

Non-Goals (Future Considerations)

Deferred to future-considerations.json:

  • fc-043: Auto-sync on tool version change
  • fc-044: Broad doc indexing (hundreds of sources)
  • fc-045: K8s deployment
  • fc-046: Query caching

Architecture

User question
     │
     ▼
Personal Assistant (existing)
     │
     ├── Decides if RAG would help
     │
     ▼
rag-search skill (new)
     │
     ├── Query embedding
     ├── Vector similarity search
     ├── Return ranked chunks with metadata
     │
     ▼
Claude reasons over results
     │
     ├── Good enough? → Answer
     └── Need more? → Reformulate, search again

Two Indexes

Index Contents Update Frequency
personal ~/.claude/state/ files, memory, decisions, preferences Daily
docs External documentation (k0s, ArgoCD, etc.) Daily

Why Two Indexes

  • Different update frequencies
  • Different retrieval strategies (personal may weight recency)
  • Can query one or both depending on the question

Components

┌─────────────────────────────────────────────────────────────────┐
│                        rag-search skill                         │
│                     (Claude invokes this)                       │
└─────────────────────┬───────────────────────────────────────────┘
                      │
        ┌─────────────┴─────────────┐
        ▼                           ▼
┌───────────────────┐     ┌───────────────────┐
│  Personal Index   │     │    Docs Index     │
│                   │     │                   │
│ ~/.claude/state/* │     │ External docs     │
│ memory/*.json     │     │ (k0s, ArgoCD...)  │
│ kb.json           │     │                   │
└────────┬──────────┘     └────────┬──────────┘
         │                         │
         └──────────┬──────────────┘
                    ▼
         ┌───────────────────┐
         │   Vector Store    │
         │   (ChromaDB)      │
         │                   │
         │ Collections:      │
         │  - personal       │
         │  - docs           │
         └────────┬──────────┘
                  │
                  ▼
         ┌───────────────────┐
         │  Embedding Model  │
         │  (sentence-       │
         │   transformers)   │
         └───────────────────┘

Stack

Component Choice Notes
Vector store ChromaDB Pure Python, no external deps
Embeddings sentence-transformers (all-MiniLM-L6-v2) Runs on arm64, ~90MB
Storage ~/.claude/data/rag-search/ Local to workstation

Skill Structure

Location: ~/.claude/skills/rag-search/

rag-search/
├── SKILL.md              # Instructions for Claude
├── scripts/
│   ├── search.py         # Main search entry point
│   ├── index_personal.py # Index state files
│   ├── index_docs.py     # Index external docs
│   └── add_doc_source.py # Add new doc source
└── references/
    └── sources.json      # Configured doc sources

Skill Interface

Invocation

# Basic search (both indexes)
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"

# Search specific index
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"

# Control result count
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"

Output Format

{
  "query": "how did I configure ArgoCD sync?",
  "results": [
    {
      "rank": 1,
      "score": 0.847,
      "source": "personal",
      "file": "memory/decisions.json",
      "chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
      "metadata": {"date": "2025-01-15", "context": "k8s setup"}
    },
    {
      "rank": 2,
      "score": 0.823,
      "source": "docs",
      "file": "argocd/sync-options.md",
      "chunk": "Auto-sync can be configured with selfHeal and prune options...",
      "metadata": {"doc_version": "2.9", "url": "https://..."}
    }
  ],
  "searched_collections": ["personal", "docs"],
  "total_chunks_searched": 1847
}

SKILL.md Guidance

  • Start with broad query, refine if results aren't relevant
  • Cross-reference personal decisions with docs when both appear
  • Cite sources in answers (file + date for personal, URL for docs)

External Docs Management

Source Registry

Location: ~/.claude/skills/rag-search/references/sources.json

{
  "sources": [
    {
      "id": "k0s",
      "name": "k0s Documentation",
      "type": "git",
      "url": "https://github.com/k0sproject/k0s.git",
      "path": "docs/",
      "glob": "**/*.md",
      "version": "v1.30.0",
      "last_indexed": "2025-01-20T10:00:00Z"
    },
    {
      "id": "argocd",
      "name": "ArgoCD Documentation",
      "type": "web",
      "base_url": "https://argo-cd.readthedocs.io/en/stable/",
      "pages": ["user-guide/sync-options/", "operator-manual/"],
      "last_indexed": "2025-01-18T14:30:00Z"
    }
  ]
}

Adding Sources

~/.claude/skills/rag-search/scripts/add_doc_source.py \
  --id "cilium" \
  --name "Cilium Docs" \
  --type git \
  --url "https://github.com/cilium/cilium.git" \
  --path "Documentation/" \
  --glob "**/*.md"

# Then index it
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium

Update Strategies

Strategy Command When
Manual index_docs.py --source <id> After version upgrade
All sources index_docs.py --all Periodic refresh

Periodic Refresh

Daily systemd timer on workstation.

Service

Location: ~/.config/systemd/user/rag-index.service

[Unit]
Description=Refresh RAG search indexes
After=network-online.target

[Service]
Type=oneshot
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin

[Install]
WantedBy=default.target

Timer

Location: ~/.config/systemd/user/rag-index.timer

[Unit]
Description=Daily RAG index refresh

[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=3600

[Install]
WantedBy=timers.target

Enable

systemctl --user daemon-reload
systemctl --user enable --now rag-index.timer

Manual Trigger

systemctl --user start rag-index.service
journalctl --user -u rag-index.service  # View logs

Resource Requirements

Target: Workstation or Pi5 8GB

Component RAM Disk Notes
Embedding model (all-MiniLM-L6-v2) ~256MB ~90MB Loaded on-demand
ChromaDB ~100-500MB Varies Scales with index size
Index: personal (~50 files) ~5MB Small, fast to query
Index: docs (10-20 sources) ~100-500MB Depends on doc volume
Indexing process (peak) ~1GB During embedding generation

Pi3 1GB: Not suitable for this workload.

Chunking Strategy

Index Strategy
Personal Per JSON key or logical section (decisions, preferences, facts as separate chunks)
Docs ~500 tokens per chunk with overlap, preserve headers as metadata

Implementation Notes

This design is suitable for Ralph loop implementation:

  • Clear success criteria (tests, functional checks)
  • Iterative refinement expected (tuning chunking, embeddings)
  • Automatic verification possible

Model Delegation

Use appropriate models for each phase:

Phase Task Model
1 Set up ChromaDB + embedding model Haiku
2 Write index_personal.py Sonnet
3 Write index_docs.py Sonnet
4 Write search.py Sonnet
5 Write SKILL.md Haiku
6 Integration tests Sonnet
7 End-to-end validation Sonnet

Ralph Invocation

/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.

Delegate to appropriate models:
- Haiku: setup, docs, simple scripts
- Sonnet: implementation, tests, debugging
- Opus: only if stuck on complex reasoning

Success criteria:
1. ChromaDB + embeddings working
2. Personal index populated from ~/.claude/state
3. At least one external doc source indexed
4. search.py returns relevant results
5. All tests pass

Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"

When NOT to use Ralph

  • Design decisions still needed (use brainstorming first)
  • Requires human judgment mid-implementation
  • One-shot simple tasks

Workflow Integration

/superpowers:brainstorm
        │
        ▼
   Design doc created
   (docs/plans/YYYY-MM-DD-*-design.md)
        │
        ▼
   "Ready to implement?"
        │
   ┌────┴────┐
   │         │
   ▼         ▼
 Simple    Complex/Iterative
   │              │
   ▼              ▼
 Manual     /ralph-loop
 or TDD     with design doc
            as spec

Summary

Aspect Decision
Architecture Extend existing Claude skill system with semantic search
Indexes Two: personal (state files) + docs (external)
Vector store ChromaDB (local, no deps)
Embeddings sentence-transformers (all-MiniLM-L6-v2)
Skill interface rag-search skill with search.py CLI
Doc management sources.json registry, git/web fetching
Refresh systemd user timer, daily
Storage ~/.claude/data/rag-search/
Hardware Runs on workstation (Pi5 8GB capable if needed)
Implementation Ralph loop with Haiku/Sonnet subagent delegation