Files

OpenCode Test c21b152de8 Add Agentic RAG design document

Design for extending Claude agent system with semantic search:
- Two indexes: personal (state files) + external docs
- ChromaDB + sentence-transformers stack
- rag-search skill with search.py CLI
- Daily systemd timer for index refresh
- Ralph loop implementation with Haiku/Sonnet delegation

Added future considerations (fc-043 to fc-046):
- Auto-sync on tool version change
- Broad doc indexing
- K8s deployment
- Query caching

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-04 14:08:00 -08:00

11 KiB

Raw Blame History

Agentic RAG Design

Date: 2025-01-21 Status: Ready for implementation Category: Agent memory / Knowledge retrieval

Overview

Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.

Goals

Retrieve relevant past decisions and preferences when answering questions
Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
Cross-reference personal context with official documentation
Support iterative query refinement (agentic behavior)

Non-Goals (Future Considerations)

Deferred to future-considerations.json:

fc-043: Auto-sync on tool version change
fc-044: Broad doc indexing (hundreds of sources)
fc-045: K8s deployment
fc-046: Query caching

Architecture

User question
     │
     ▼
Personal Assistant (existing)
     │
     ├── Decides if RAG would help
     │
     ▼
rag-search skill (new)
     │
     ├── Query embedding
     ├── Vector similarity search
     ├── Return ranked chunks with metadata
     │
     ▼
Claude reasons over results
     │
     ├── Good enough? → Answer
     └── Need more? → Reformulate, search again

Two Indexes

Index	Contents	Update Frequency
personal	`~/.claude/state/` files, memory, decisions, preferences	Daily
docs	External documentation (k0s, ArgoCD, etc.)	Daily

Why Two Indexes

Different update frequencies
Different retrieval strategies (personal may weight recency)
Can query one or both depending on the question

Components

┌─────────────────────────────────────────────────────────────────┐
│                        rag-search skill                         │
│                     (Claude invokes this)                       │
└─────────────────────┬───────────────────────────────────────────┘
                      │
        ┌─────────────┴─────────────┐
        ▼                           ▼
┌───────────────────┐     ┌───────────────────┐
│  Personal Index   │     │    Docs Index     │
│                   │     │                   │
│ ~/.claude/state/* │     │ External docs     │
│ memory/*.json     │     │ (k0s, ArgoCD...)  │
│ kb.json           │     │                   │
└────────┬──────────┘     └────────┬──────────┘
         │                         │
         └──────────┬──────────────┘
                    ▼
         ┌───────────────────┐
         │   Vector Store    │
         │   (ChromaDB)      │
         │                   │
         │ Collections:      │
         │  - personal       │
         │  - docs           │
         └────────┬──────────┘
                  │
                  ▼
         ┌───────────────────┐
         │  Embedding Model  │
         │  (sentence-       │
         │   transformers)   │
         └───────────────────┘

Stack

Component	Choice	Notes
Vector store	ChromaDB	Pure Python, no external deps
Embeddings	sentence-transformers (all-MiniLM-L6-v2)	Runs on arm64, ~90MB
Storage	`~/.claude/data/rag-search/`	Local to workstation

Skill Structure

Location: ~/.claude/skills/rag-search/

rag-search/
├── SKILL.md              # Instructions for Claude
├── scripts/
│   ├── search.py         # Main search entry point
│   ├── index_personal.py # Index state files
│   ├── index_docs.py     # Index external docs
│   └── add_doc_source.py # Add new doc source
└── references/
    └── sources.json      # Configured doc sources

Skill Interface

Invocation

# Basic search (both indexes)
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"

# Search specific index
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"

# Control result count
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"

Output Format

{
  "query": "how did I configure ArgoCD sync?",
  "results": [
    {
      "rank": 1,
      "score": 0.847,
      "source": "personal",
      "file": "memory/decisions.json",
      "chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
      "metadata": {"date": "2025-01-15", "context": "k8s setup"}
    },
    {
      "rank": 2,
      "score": 0.823,
      "source": "docs",
      "file": "argocd/sync-options.md",
      "chunk": "Auto-sync can be configured with selfHeal and prune options...",
      "metadata": {"doc_version": "2.9", "url": "https://..."}
    }
  ],
  "searched_collections": ["personal", "docs"],
  "total_chunks_searched": 1847
}

SKILL.md Guidance

Start with broad query, refine if results aren't relevant
Cross-reference personal decisions with docs when both appear
Cite sources in answers (file + date for personal, URL for docs)

External Docs Management

Source Registry

Location: ~/.claude/skills/rag-search/references/sources.json

{
  "sources": [
    {
      "id": "k0s",
      "name": "k0s Documentation",
      "type": "git",
      "url": "https://github.com/k0sproject/k0s.git",
      "path": "docs/",
      "glob": "**/*.md",
      "version": "v1.30.0",
      "last_indexed": "2025-01-20T10:00:00Z"
    },
    {
      "id": "argocd",
      "name": "ArgoCD Documentation",
      "type": "web",
      "base_url": "https://argo-cd.readthedocs.io/en/stable/",
      "pages": ["user-guide/sync-options/", "operator-manual/"],
      "last_indexed": "2025-01-18T14:30:00Z"
    }
  ]
}

Adding Sources

~/.claude/skills/rag-search/scripts/add_doc_source.py \
  --id "cilium" \
  --name "Cilium Docs" \
  --type git \
  --url "https://github.com/cilium/cilium.git" \
  --path "Documentation/" \
  --glob "**/*.md"

# Then index it
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium

Update Strategies

Strategy	Command	When
Manual	`index_docs.py --source <id>`	After version upgrade
All sources	`index_docs.py --all`	Periodic refresh

Periodic Refresh

Daily systemd timer on workstation.

Service

Location: ~/.config/systemd/user/rag-index.service

[Unit]
Description=Refresh RAG search indexes
After=network-online.target

[Service]
Type=oneshot
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin

[Install]
WantedBy=default.target

Timer

Location: ~/.config/systemd/user/rag-index.timer

[Unit]
Description=Daily RAG index refresh

[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=3600

[Install]
WantedBy=timers.target

Enable

systemctl --user daemon-reload
systemctl --user enable --now rag-index.timer

Manual Trigger

systemctl --user start rag-index.service
journalctl --user -u rag-index.service  # View logs

Resource Requirements

Target: Workstation or Pi5 8GB

Component	RAM	Disk	Notes
Embedding model (all-MiniLM-L6-v2)	~256MB	~90MB	Loaded on-demand
ChromaDB	~100-500MB	Varies	Scales with index size
Index: personal (~50 files)	—	~5MB	Small, fast to query
Index: docs (10-20 sources)	—	~100-500MB	Depends on doc volume
Indexing process (peak)	~1GB	—	During embedding generation

Pi3 1GB: Not suitable for this workload.

Chunking Strategy

Index	Strategy
Personal	Per JSON key or logical section (decisions, preferences, facts as separate chunks)
Docs	~500 tokens per chunk with overlap, preserve headers as metadata

Implementation Notes

Recommended: Ralph Loop

This design is suitable for Ralph loop implementation:

Clear success criteria (tests, functional checks)
Iterative refinement expected (tuning chunking, embeddings)
Automatic verification possible

Model Delegation

Use appropriate models for each phase:

Phase	Task	Model
1	Set up ChromaDB + embedding model	Haiku
2	Write `index_personal.py`	Sonnet
3	Write `index_docs.py`	Sonnet
4	Write `search.py`	Sonnet
5	Write SKILL.md	Haiku
6	Integration tests	Sonnet
7	End-to-end validation	Sonnet

Ralph Invocation

/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.

Delegate to appropriate models:
- Haiku: setup, docs, simple scripts
- Sonnet: implementation, tests, debugging
- Opus: only if stuck on complex reasoning

Success criteria:
1. ChromaDB + embeddings working
2. Personal index populated from ~/.claude/state
3. At least one external doc source indexed
4. search.py returns relevant results
5. All tests pass

Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"

When NOT to use Ralph

Design decisions still needed (use brainstorming first)
Requires human judgment mid-implementation
One-shot simple tasks

Workflow Integration

/superpowers:brainstorm
        │
        ▼
   Design doc created
   (docs/plans/YYYY-MM-DD-*-design.md)
        │
        ▼
   "Ready to implement?"
        │
   ┌────┴────┐
   │         │
   ▼         ▼
 Simple    Complex/Iterative
   │              │
   ▼              ▼
 Manual     /ralph-loop
 or TDD     with design doc
            as spec

Summary

Aspect	Decision
Architecture	Extend existing Claude skill system with semantic search
Indexes	Two: personal (state files) + docs (external)
Vector store	ChromaDB (local, no deps)
Embeddings	sentence-transformers (all-MiniLM-L6-v2)
Skill interface	`rag-search` skill with `search.py` CLI
Doc management	`sources.json` registry, git/web fetching
Refresh	systemd user timer, daily
Storage	`~/.claude/data/rag-search/`
Hardware	Runs on workstation (Pi5 8GB capable if needed)
Implementation	Ralph loop with Haiku/Sonnet subagent delegation

11 KiB Raw Blame History