# Agentic RAG Design **Date:** 2025-01-21 **Status:** Ready for implementation **Category:** Agent memory / Knowledge retrieval ## Overview Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation. ### Goals - Retrieve relevant past decisions and preferences when answering questions - Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference - Cross-reference personal context with official documentation - Support iterative query refinement (agentic behavior) ### Non-Goals (Future Considerations) Deferred to `future-considerations.json`: - **fc-043**: Auto-sync on tool version change - **fc-044**: Broad doc indexing (hundreds of sources) - **fc-045**: K8s deployment - **fc-046**: Query caching ## Architecture ``` User question │ ▼ Personal Assistant (existing) │ ├── Decides if RAG would help │ ▼ rag-search skill (new) │ ├── Query embedding ├── Vector similarity search ├── Return ranked chunks with metadata │ ▼ Claude reasons over results │ ├── Good enough? → Answer └── Need more? → Reformulate, search again ``` ### Two Indexes | Index | Contents | Update Frequency | |-------|----------|------------------| | **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily | | **docs** | External documentation (k0s, ArgoCD, etc.) | Daily | ### Why Two Indexes - Different update frequencies - Different retrieval strategies (personal may weight recency) - Can query one or both depending on the question ## Components ``` ┌─────────────────────────────────────────────────────────────────┐ │ rag-search skill │ │ (Claude invokes this) │ └─────────────────────┬───────────────────────────────────────────┘ │ ┌─────────────┴─────────────┐ ▼ ▼ ┌───────────────────┐ ┌───────────────────┐ │ Personal Index │ │ Docs Index │ │ │ │ │ │ ~/.claude/state/* │ │ External docs │ │ memory/*.json │ │ (k0s, ArgoCD...) │ │ kb.json │ │ │ └────────┬──────────┘ └────────┬──────────┘ │ │ └──────────┬──────────────┘ ▼ ┌───────────────────┐ │ Vector Store │ │ (ChromaDB) │ │ │ │ Collections: │ │ - personal │ │ - docs │ └────────┬──────────┘ │ ▼ ┌───────────────────┐ │ Embedding Model │ │ (sentence- │ │ transformers) │ └───────────────────┘ ``` ### Stack | Component | Choice | Notes | |-----------|--------|-------| | Vector store | ChromaDB | Pure Python, no external deps | | Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB | | Storage | `~/.claude/data/rag-search/` | Local to workstation | ## Skill Structure **Location:** `~/.claude/skills/rag-search/` ``` rag-search/ ├── SKILL.md # Instructions for Claude ├── scripts/ │ ├── search.py # Main search entry point │ ├── index_personal.py # Index state files │ ├── index_docs.py # Index external docs │ └── add_doc_source.py # Add new doc source └── references/ └── sources.json # Configured doc sources ``` ## Skill Interface ### Invocation ```bash # Basic search (both indexes) ~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?" # Search specific index ~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching" ~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance" # Control result count ~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules" ``` ### Output Format ```json { "query": "how did I configure ArgoCD sync?", "results": [ { "rank": 1, "score": 0.847, "source": "personal", "file": "memory/decisions.json", "chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...", "metadata": {"date": "2025-01-15", "context": "k8s setup"} }, { "rank": 2, "score": 0.823, "source": "docs", "file": "argocd/sync-options.md", "chunk": "Auto-sync can be configured with selfHeal and prune options...", "metadata": {"doc_version": "2.9", "url": "https://..."} } ], "searched_collections": ["personal", "docs"], "total_chunks_searched": 1847 } ``` ### SKILL.md Guidance - Start with broad query, refine if results aren't relevant - Cross-reference personal decisions with docs when both appear - Cite sources in answers (file + date for personal, URL for docs) ## External Docs Management ### Source Registry **Location:** `~/.claude/skills/rag-search/references/sources.json` ```json { "sources": [ { "id": "k0s", "name": "k0s Documentation", "type": "git", "url": "https://github.com/k0sproject/k0s.git", "path": "docs/", "glob": "**/*.md", "version": "v1.30.0", "last_indexed": "2025-01-20T10:00:00Z" }, { "id": "argocd", "name": "ArgoCD Documentation", "type": "web", "base_url": "https://argo-cd.readthedocs.io/en/stable/", "pages": ["user-guide/sync-options/", "operator-manual/"], "last_indexed": "2025-01-18T14:30:00Z" } ] } ``` ### Adding Sources ```bash ~/.claude/skills/rag-search/scripts/add_doc_source.py \ --id "cilium" \ --name "Cilium Docs" \ --type git \ --url "https://github.com/cilium/cilium.git" \ --path "Documentation/" \ --glob "**/*.md" # Then index it ~/.claude/skills/rag-search/scripts/index_docs.py --source cilium ``` ### Update Strategies | Strategy | Command | When | |----------|---------|------| | Manual | `index_docs.py --source ` | After version upgrade | | All sources | `index_docs.py --all` | Periodic refresh | ## Periodic Refresh Daily systemd timer on workstation. ### Service **Location:** `~/.config/systemd/user/rag-index.service` ```ini [Unit] Description=Refresh RAG search indexes After=network-online.target [Service] Type=oneshot ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin [Install] WantedBy=default.target ``` ### Timer **Location:** `~/.config/systemd/user/rag-index.timer` ```ini [Unit] Description=Daily RAG index refresh [Timer] OnCalendar=daily Persistent=true RandomizedDelaySec=3600 [Install] WantedBy=timers.target ``` ### Enable ```bash systemctl --user daemon-reload systemctl --user enable --now rag-index.timer ``` ### Manual Trigger ```bash systemctl --user start rag-index.service journalctl --user -u rag-index.service # View logs ``` ## Resource Requirements **Target:** Workstation or Pi5 8GB | Component | RAM | Disk | Notes | |-----------|-----|------|-------| | Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand | | ChromaDB | ~100-500MB | Varies | Scales with index size | | Index: personal (~50 files) | — | ~5MB | Small, fast to query | | Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume | | Indexing process (peak) | ~1GB | — | During embedding generation | **Pi3 1GB:** Not suitable for this workload. ## Chunking Strategy | Index | Strategy | |-------|----------| | Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) | | Docs | ~500 tokens per chunk with overlap, preserve headers as metadata | ## Implementation Notes ### Recommended: Ralph Loop This design is suitable for Ralph loop implementation: - Clear success criteria (tests, functional checks) - Iterative refinement expected (tuning chunking, embeddings) - Automatic verification possible ### Model Delegation Use appropriate models for each phase: | Phase | Task | Model | |-------|------|-------| | 1 | Set up ChromaDB + embedding model | Haiku | | 2 | Write `index_personal.py` | Sonnet | | 3 | Write `index_docs.py` | Sonnet | | 4 | Write `search.py` | Sonnet | | 5 | Write SKILL.md | Haiku | | 6 | Integration tests | Sonnet | | 7 | End-to-end validation | Sonnet | ### Ralph Invocation ```bash /ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md. Delegate to appropriate models: - Haiku: setup, docs, simple scripts - Sonnet: implementation, tests, debugging - Opus: only if stuck on complex reasoning Success criteria: 1. ChromaDB + embeddings working 2. Personal index populated from ~/.claude/state 3. At least one external doc source indexed 4. search.py returns relevant results 5. All tests pass Output COMPLETE when done." --max-iterations 30 --completion-promise "COMPLETE" ``` ### When NOT to use Ralph - Design decisions still needed (use brainstorming first) - Requires human judgment mid-implementation - One-shot simple tasks ## Workflow Integration ``` /superpowers:brainstorm │ ▼ Design doc created (docs/plans/YYYY-MM-DD-*-design.md) │ ▼ "Ready to implement?" │ ┌────┴────┐ │ │ ▼ ▼ Simple Complex/Iterative │ │ ▼ ▼ Manual /ralph-loop or TDD with design doc as spec ``` ## Summary | Aspect | Decision | |--------|----------| | **Architecture** | Extend existing Claude skill system with semantic search | | **Indexes** | Two: personal (state files) + docs (external) | | **Vector store** | ChromaDB (local, no deps) | | **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) | | **Skill interface** | `rag-search` skill with `search.py` CLI | | **Doc management** | `sources.json` registry, git/web fetching | | **Refresh** | systemd user timer, daily | | **Storage** | `~/.claude/data/rag-search/` | | **Hardware** | Runs on workstation (Pi5 8GB capable if needed) | | **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |