Design for extending Claude agent system with semantic search: - Two indexes: personal (state files) + external docs - ChromaDB + sentence-transformers stack - rag-search skill with search.py CLI - Daily systemd timer for index refresh - Ralph loop implementation with Haiku/Sonnet delegation Added future considerations (fc-043 to fc-046): - Auto-sync on tool version change - Broad doc indexing - K8s deployment - Query caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
389 lines
11 KiB
Markdown
389 lines
11 KiB
Markdown
# Agentic RAG Design
|
|
|
|
**Date:** 2025-01-21
|
|
**Status:** Ready for implementation
|
|
**Category:** Agent memory / Knowledge retrieval
|
|
|
|
## Overview
|
|
|
|
Add semantic search to the existing Claude agent system, enabling multi-source reasoning that combines personal context (state files, memory, decisions) with external documentation.
|
|
|
|
### Goals
|
|
|
|
- Retrieve relevant past decisions and preferences when answering questions
|
|
- Search external docs (k0s, ArgoCD, Prometheus, etc.) for technical reference
|
|
- Cross-reference personal context with official documentation
|
|
- Support iterative query refinement (agentic behavior)
|
|
|
|
### Non-Goals (Future Considerations)
|
|
|
|
Deferred to `future-considerations.json`:
|
|
|
|
- **fc-043**: Auto-sync on tool version change
|
|
- **fc-044**: Broad doc indexing (hundreds of sources)
|
|
- **fc-045**: K8s deployment
|
|
- **fc-046**: Query caching
|
|
|
|
## Architecture
|
|
|
|
```
|
|
User question
|
|
│
|
|
▼
|
|
Personal Assistant (existing)
|
|
│
|
|
├── Decides if RAG would help
|
|
│
|
|
▼
|
|
rag-search skill (new)
|
|
│
|
|
├── Query embedding
|
|
├── Vector similarity search
|
|
├── Return ranked chunks with metadata
|
|
│
|
|
▼
|
|
Claude reasons over results
|
|
│
|
|
├── Good enough? → Answer
|
|
└── Need more? → Reformulate, search again
|
|
```
|
|
|
|
### Two Indexes
|
|
|
|
| Index | Contents | Update Frequency |
|
|
|-------|----------|------------------|
|
|
| **personal** | `~/.claude/state/` files, memory, decisions, preferences | Daily |
|
|
| **docs** | External documentation (k0s, ArgoCD, etc.) | Daily |
|
|
|
|
### Why Two Indexes
|
|
|
|
- Different update frequencies
|
|
- Different retrieval strategies (personal may weight recency)
|
|
- Can query one or both depending on the question
|
|
|
|
## Components
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ rag-search skill │
|
|
│ (Claude invokes this) │
|
|
└─────────────────────┬───────────────────────────────────────────┘
|
|
│
|
|
┌─────────────┴─────────────┐
|
|
▼ ▼
|
|
┌───────────────────┐ ┌───────────────────┐
|
|
│ Personal Index │ │ Docs Index │
|
|
│ │ │ │
|
|
│ ~/.claude/state/* │ │ External docs │
|
|
│ memory/*.json │ │ (k0s, ArgoCD...) │
|
|
│ kb.json │ │ │
|
|
└────────┬──────────┘ └────────┬──────────┘
|
|
│ │
|
|
└──────────┬──────────────┘
|
|
▼
|
|
┌───────────────────┐
|
|
│ Vector Store │
|
|
│ (ChromaDB) │
|
|
│ │
|
|
│ Collections: │
|
|
│ - personal │
|
|
│ - docs │
|
|
└────────┬──────────┘
|
|
│
|
|
▼
|
|
┌───────────────────┐
|
|
│ Embedding Model │
|
|
│ (sentence- │
|
|
│ transformers) │
|
|
└───────────────────┘
|
|
```
|
|
|
|
### Stack
|
|
|
|
| Component | Choice | Notes |
|
|
|-----------|--------|-------|
|
|
| Vector store | ChromaDB | Pure Python, no external deps |
|
|
| Embeddings | sentence-transformers (all-MiniLM-L6-v2) | Runs on arm64, ~90MB |
|
|
| Storage | `~/.claude/data/rag-search/` | Local to workstation |
|
|
|
|
## Skill Structure
|
|
|
|
**Location:** `~/.claude/skills/rag-search/`
|
|
|
|
```
|
|
rag-search/
|
|
├── SKILL.md # Instructions for Claude
|
|
├── scripts/
|
|
│ ├── search.py # Main search entry point
|
|
│ ├── index_personal.py # Index state files
|
|
│ ├── index_docs.py # Index external docs
|
|
│ └── add_doc_source.py # Add new doc source
|
|
└── references/
|
|
└── sources.json # Configured doc sources
|
|
```
|
|
|
|
## Skill Interface
|
|
|
|
### Invocation
|
|
|
|
```bash
|
|
# Basic search (both indexes)
|
|
~/.claude/skills/rag-search/scripts/search.py "how did I configure ArgoCD sync?"
|
|
|
|
# Search specific index
|
|
~/.claude/skills/rag-search/scripts/search.py --index personal "past decisions about caching"
|
|
~/.claude/skills/rag-search/scripts/search.py --index docs "k0s node maintenance"
|
|
|
|
# Control result count
|
|
~/.claude/skills/rag-search/scripts/search.py --top-k 10 "prometheus alerting rules"
|
|
```
|
|
|
|
### Output Format
|
|
|
|
```json
|
|
{
|
|
"query": "how did I configure ArgoCD sync?",
|
|
"results": [
|
|
{
|
|
"rank": 1,
|
|
"score": 0.847,
|
|
"source": "personal",
|
|
"file": "memory/decisions.json",
|
|
"chunk": "Decided to use ArgoCD auto-sync with self-heal disabled...",
|
|
"metadata": {"date": "2025-01-15", "context": "k8s setup"}
|
|
},
|
|
{
|
|
"rank": 2,
|
|
"score": 0.823,
|
|
"source": "docs",
|
|
"file": "argocd/sync-options.md",
|
|
"chunk": "Auto-sync can be configured with selfHeal and prune options...",
|
|
"metadata": {"doc_version": "2.9", "url": "https://..."}
|
|
}
|
|
],
|
|
"searched_collections": ["personal", "docs"],
|
|
"total_chunks_searched": 1847
|
|
}
|
|
```
|
|
|
|
### SKILL.md Guidance
|
|
|
|
- Start with broad query, refine if results aren't relevant
|
|
- Cross-reference personal decisions with docs when both appear
|
|
- Cite sources in answers (file + date for personal, URL for docs)
|
|
|
|
## External Docs Management
|
|
|
|
### Source Registry
|
|
|
|
**Location:** `~/.claude/skills/rag-search/references/sources.json`
|
|
|
|
```json
|
|
{
|
|
"sources": [
|
|
{
|
|
"id": "k0s",
|
|
"name": "k0s Documentation",
|
|
"type": "git",
|
|
"url": "https://github.com/k0sproject/k0s.git",
|
|
"path": "docs/",
|
|
"glob": "**/*.md",
|
|
"version": "v1.30.0",
|
|
"last_indexed": "2025-01-20T10:00:00Z"
|
|
},
|
|
{
|
|
"id": "argocd",
|
|
"name": "ArgoCD Documentation",
|
|
"type": "web",
|
|
"base_url": "https://argo-cd.readthedocs.io/en/stable/",
|
|
"pages": ["user-guide/sync-options/", "operator-manual/"],
|
|
"last_indexed": "2025-01-18T14:30:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Adding Sources
|
|
|
|
```bash
|
|
~/.claude/skills/rag-search/scripts/add_doc_source.py \
|
|
--id "cilium" \
|
|
--name "Cilium Docs" \
|
|
--type git \
|
|
--url "https://github.com/cilium/cilium.git" \
|
|
--path "Documentation/" \
|
|
--glob "**/*.md"
|
|
|
|
# Then index it
|
|
~/.claude/skills/rag-search/scripts/index_docs.py --source cilium
|
|
```
|
|
|
|
### Update Strategies
|
|
|
|
| Strategy | Command | When |
|
|
|----------|---------|------|
|
|
| Manual | `index_docs.py --source <id>` | After version upgrade |
|
|
| All sources | `index_docs.py --all` | Periodic refresh |
|
|
|
|
## Periodic Refresh
|
|
|
|
Daily systemd timer on workstation.
|
|
|
|
### Service
|
|
|
|
**Location:** `~/.config/systemd/user/rag-index.service`
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Refresh RAG search indexes
|
|
After=network-online.target
|
|
|
|
[Service]
|
|
Type=oneshot
|
|
ExecStart=%h/.claude/skills/rag-search/scripts/index_docs.py --all --quiet
|
|
ExecStartPost=%h/.claude/skills/rag-search/scripts/index_personal.py --quiet
|
|
Environment=PATH=%h/.claude/skills/rag-search/venv/bin:/usr/bin
|
|
|
|
[Install]
|
|
WantedBy=default.target
|
|
```
|
|
|
|
### Timer
|
|
|
|
**Location:** `~/.config/systemd/user/rag-index.timer`
|
|
|
|
```ini
|
|
[Unit]
|
|
Description=Daily RAG index refresh
|
|
|
|
[Timer]
|
|
OnCalendar=daily
|
|
Persistent=true
|
|
RandomizedDelaySec=3600
|
|
|
|
[Install]
|
|
WantedBy=timers.target
|
|
```
|
|
|
|
### Enable
|
|
|
|
```bash
|
|
systemctl --user daemon-reload
|
|
systemctl --user enable --now rag-index.timer
|
|
```
|
|
|
|
### Manual Trigger
|
|
|
|
```bash
|
|
systemctl --user start rag-index.service
|
|
journalctl --user -u rag-index.service # View logs
|
|
```
|
|
|
|
## Resource Requirements
|
|
|
|
**Target:** Workstation or Pi5 8GB
|
|
|
|
| Component | RAM | Disk | Notes |
|
|
|-----------|-----|------|-------|
|
|
| Embedding model (all-MiniLM-L6-v2) | ~256MB | ~90MB | Loaded on-demand |
|
|
| ChromaDB | ~100-500MB | Varies | Scales with index size |
|
|
| Index: personal (~50 files) | — | ~5MB | Small, fast to query |
|
|
| Index: docs (10-20 sources) | — | ~100-500MB | Depends on doc volume |
|
|
| Indexing process (peak) | ~1GB | — | During embedding generation |
|
|
|
|
**Pi3 1GB:** Not suitable for this workload.
|
|
|
|
## Chunking Strategy
|
|
|
|
| Index | Strategy |
|
|
|-------|----------|
|
|
| Personal | Per JSON key or logical section (decisions, preferences, facts as separate chunks) |
|
|
| Docs | ~500 tokens per chunk with overlap, preserve headers as metadata |
|
|
|
|
## Implementation Notes
|
|
|
|
### Recommended: Ralph Loop
|
|
|
|
This design is suitable for Ralph loop implementation:
|
|
- Clear success criteria (tests, functional checks)
|
|
- Iterative refinement expected (tuning chunking, embeddings)
|
|
- Automatic verification possible
|
|
|
|
### Model Delegation
|
|
|
|
Use appropriate models for each phase:
|
|
|
|
| Phase | Task | Model |
|
|
|-------|------|-------|
|
|
| 1 | Set up ChromaDB + embedding model | Haiku |
|
|
| 2 | Write `index_personal.py` | Sonnet |
|
|
| 3 | Write `index_docs.py` | Sonnet |
|
|
| 4 | Write `search.py` | Sonnet |
|
|
| 5 | Write SKILL.md | Haiku |
|
|
| 6 | Integration tests | Sonnet |
|
|
| 7 | End-to-end validation | Sonnet |
|
|
|
|
### Ralph Invocation
|
|
|
|
```bash
|
|
/ralph-loop "Implement rag-search skill per docs/plans/2025-01-21-agentic-rag-design.md.
|
|
|
|
Delegate to appropriate models:
|
|
- Haiku: setup, docs, simple scripts
|
|
- Sonnet: implementation, tests, debugging
|
|
- Opus: only if stuck on complex reasoning
|
|
|
|
Success criteria:
|
|
1. ChromaDB + embeddings working
|
|
2. Personal index populated from ~/.claude/state
|
|
3. At least one external doc source indexed
|
|
4. search.py returns relevant results
|
|
5. All tests pass
|
|
|
|
Output <promise>COMPLETE</promise> when done." --max-iterations 30 --completion-promise "COMPLETE"
|
|
```
|
|
|
|
### When NOT to use Ralph
|
|
|
|
- Design decisions still needed (use brainstorming first)
|
|
- Requires human judgment mid-implementation
|
|
- One-shot simple tasks
|
|
|
|
## Workflow Integration
|
|
|
|
```
|
|
/superpowers:brainstorm
|
|
│
|
|
▼
|
|
Design doc created
|
|
(docs/plans/YYYY-MM-DD-*-design.md)
|
|
│
|
|
▼
|
|
"Ready to implement?"
|
|
│
|
|
┌────┴────┐
|
|
│ │
|
|
▼ ▼
|
|
Simple Complex/Iterative
|
|
│ │
|
|
▼ ▼
|
|
Manual /ralph-loop
|
|
or TDD with design doc
|
|
as spec
|
|
```
|
|
|
|
## Summary
|
|
|
|
| Aspect | Decision |
|
|
|--------|----------|
|
|
| **Architecture** | Extend existing Claude skill system with semantic search |
|
|
| **Indexes** | Two: personal (state files) + docs (external) |
|
|
| **Vector store** | ChromaDB (local, no deps) |
|
|
| **Embeddings** | sentence-transformers (all-MiniLM-L6-v2) |
|
|
| **Skill interface** | `rag-search` skill with `search.py` CLI |
|
|
| **Doc management** | `sources.json` registry, git/web fetching |
|
|
| **Refresh** | systemd user timer, daily |
|
|
| **Storage** | `~/.claude/data/rag-search/` |
|
|
| **Hardware** | Runs on workstation (Pi5 8GB capable if needed) |
|
|
| **Implementation** | Ralph loop with Haiku/Sonnet subagent delegation |
|