Implement rag-search skill for semantic search

Add new skill for semantic search across personal state files and external documentation using ChromaDB and sentence-transformers. Components: - search.py: Main search interface (--index, --top-k flags) - index_personal.py: Index ~/.claude/state files - index_docs.py: Index external docs (git repos) - add_doc_source.py: Manage doc sources - test_rag.py: Test suite (5/5 passing) Features: - Two indexes: personal (116 chunks) and docs (k0s: 846 chunks) - all-MiniLM-L6-v2 embeddings (384 dimensions) - ChromaDB persistent storage - JSON output with ranked results and metadata Documentation: - Added to component-registry.json with triggers - Added /rag command alias - Updated skills/README.md - Resolved fc-013 (vector database for agent memory) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-04 23:41:38 -08:00
parent c21b152de8
commit 7ca8caeecb
11 changed files with 1781 additions and 155 deletions
--- a/skills/rag-search/SKILL.md
+++ b/skills/rag-search/SKILL.md
@@ -0,0 +1,123 @@
+---
+name: rag-search
+description: Semantic search across personal state files and external documentation
+triggers: [search, find, lookup, what did, how did, when did, past decisions, previous, documentation, docs]
+---
+
+# RAG Search Skill
+
+Semantic search across two indexes:
+- **personal**: Your state files, memory, decisions, preferences
+- **docs**: External documentation (k0s, ArgoCD, etc.)
+
+## When to Use
+
+- "What decisions did I make about X?"
+- "How did I configure Y?"
+- "What does the k0s documentation say about Z?"
+- "Find my past notes on..."
+- Cross-referencing personal context with official docs
+
+## Scripts
+
+All scripts use the venv at `~/.claude/skills/rag-search/venv/`.
+
+### Search (Primary Interface)
+
+```bash
+# Search both indexes
+~/.claude/skills/rag-search/venv/bin/python \
+  ~/.claude/skills/rag-search/scripts/search.py "query"
+
+# Search specific index
+~/.claude/skills/rag-search/scripts/search.py --index personal "query"
+~/.claude/skills/rag-search/scripts/search.py --index docs "query"
+
+# Control result count
+~/.claude/skills/rag-search/scripts/search.py --top-k 10 "query"
+```
+
+### Index Management
+
+```bash
+# Reindex personal state files
+~/.claude/skills/rag-search/venv/bin/python \
+  ~/.claude/skills/rag-search/scripts/index_personal.py
+
+# Index all doc sources
+~/.claude/skills/rag-search/venv/bin/python \
+  ~/.claude/skills/rag-search/scripts/index_docs.py --all
+
+# Index specific doc source
+~/.claude/skills/rag-search/scripts/index_docs.py --source k0s
+```
+
+### Adding Doc Sources
+
+```bash
+# Add a git-based doc source
+~/.claude/skills/rag-search/venv/bin/python \
+  ~/.claude/skills/rag-search/scripts/add_doc_source.py \
+  --id "argocd" \
+  --name "ArgoCD Documentation" \
+  --type git \
+  --url "https://github.com/argoproj/argo-cd.git" \
+  --path "docs/" \
+  --glob "**/*.md"
+
+# List configured sources
+~/.claude/skills/rag-search/scripts/add_doc_source.py --list
+```
+
+## Output Format
+
+Search returns JSON:
+
+```json
+{
+  "query": "your search query",
+  "results": [
+    {
+      "rank": 1,
+      "score": 0.847,
+      "source": "personal",
+      "file": "memory/decisions.json",
+      "chunk": "Relevant text content...",
+      "metadata": {"date": "2025-01-15"}
+    }
+  ],
+  "searched_collections": ["personal", "docs"],
+  "total_chunks_searched": 1847
+}
+```
+
+## Search Strategy
+
+1. **Start broad** - Use general terms first
+2. **Refine if needed** - Add specific keywords if results aren't relevant
+3. **Cross-reference** - When both personal and docs results appear, synthesize them
+4. **Cite sources** - Include file paths and dates in your answers
+
+## Example Workflow
+
+User asks: "How should I configure ArgoCD sync?"
+
+1. Search both indexes:
+   ```bash
+   search.py "ArgoCD sync configuration"
+   ```
+
+2. If personal results exist, prioritize those (user's past decisions)
+
+3. Supplement with docs results for official guidance
+
+4. Synthesize answer:
+   > Based on your previous decision (decisions.json, 2025-01-15), you configured ArgoCD with auto-sync enabled but self-heal disabled. The ArgoCD docs recommend this for production environments where you want automatic deployment but manual intervention for drift correction.
+
+## Maintenance
+
+Indexes should be refreshed periodically:
+- Personal: After significant state changes
+- Docs: After tool version upgrades
+
+A systemd timer can automate this (see design doc for setup).