From c82726b691ebcd6324601e139df282d27d2353e2 Mon Sep 17 00:00:00 2001
From: OpenCode Test <test@opencode.ai>
Date: Wed, 7 Jan 2026 11:11:34 -0800
Subject: [PATCH] Add RAG JSON-to-text transformation plan
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Design for improving semantic search quality by transforming JSON
structures into natural language at index time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 plans/temporal-foraging-milner.md | 110 ++++++++++++++++++++++++++++++
 1 file changed, 110 insertions(+)
 create mode 100644 plans/temporal-foraging-milner.md

diff --git a/plans/temporal-foraging-milner.md b/plans/temporal-foraging-milner.md
new file mode 100644
index 0000000..d22eac0
--- /dev/null
+++ b/plans/temporal-foraging-milner.md
@@ -0,0 +1,110 @@
+# Plan: Improve RAG Personal Index JSON-to-Natural-Language Transformation
+
+## Problem
+
+The RAG personal index produces low-quality matches for semantic queries because it indexes raw JSON structure rather than natural language.
+
+**Example failure:**
+- Query: "how to add a new agent"
+- Expected: Match `system-instructions.json` → `processes.agent-lifecycle.add`
+- Actual: Score 0.479, returns generic agent mentions instead
+
+**Root cause:** The chunker doesn't recognize process structures with `add`/`remove`/`rules`/`requirements` arrays, so they fall through to raw JSON stringification.
+
+## Solution
+
+Enhance `index_personal.py` to transform JSON structures into natural language at index time.
+
+## Files to Modify
+
+1. `~/.claude/skills/rag-search/scripts/index_personal.py` - Main changes
+
+## Implementation
+
+### 1. Add Process Pattern Recognition (lines ~127-138)
+
+Add handling for process objects with action arrays:
+
+```python
+# Process with action arrays (add, remove, rules, requirements, etc.)
+action_keys = ["add", "remove", "rules", "requirements", "steps", "validate"]
+if any(key in item for key in action_keys):
+    parts = []
+    if context:
+        parts.append(f"{context}:")
+    if item.get("description"):
+        parts.append(item["description"])
+
+    for action_key in action_keys:
+        if action_key in item and isinstance(item[action_key], list):
+            action_text = f"To {action_key}: " + ". ".join(item[action_key])
+            parts.append(action_text)
+
+    if parts:
+        yield (" ".join(parts), {**base_metadata, "process": context})
+        return
+```
+
+### 2. Improve Context Propagation
+
+When processing nested dicts, pass richer context:
+
+```python
+# In the top-level dict processing (line ~154-161)
+elif isinstance(value, dict):
+    # Pass the key as context for better chunk text
+    yield from process_item(value, context=key)
+```
+
+Already done, but ensure action arrays get the context.
+
+### 3. Handle Key-Value Pairs in Processes
+
+For structures like:
+```json
+"content-principles": {
+  "no-redundancy": "Information lives in one authoritative location",
+  "lean-files": "Keep files concise..."
+}
+```
+
+Transform to: `"content-principles: no-redundancy means information lives in one authoritative location. lean-files means keep files concise..."`
+
+### 4. Add Tests
+
+Create a simple test to verify transformation quality:
+
+```bash
+# After reindex, verify the failing query now works
+~/.claude/skills/rag-search/scripts/search.py "how to add a new agent" --index personal
+# Should return system-instructions.json with score > 0.7
+```
+
+## Expected Outcome
+
+| Query | Before | After |
+|-------|--------|-------|
+| "how to add a new agent" | 0.479, wrong file | >0.7, system-instructions.json |
+| "agent lifecycle" | Similar | Better match to process |
+| "model selection rules" | Depends | Match model-selection process |
+
+## Validation Steps
+
+1. Run modified indexer
+2. Test the three queries above
+3. Compare scores and result relevance
+
+## Rollback
+
+If results degrade: `git checkout scripts/index_personal.py && reindex`
+
+## Post-Implementation
+
+Add to `future-considerations.json`:
+- RAG indexer debug/verbose mode to inspect what text is being indexed
+
+## Future Considerations (Deferred)
+
+- Natural language templates per JSON schema type
+- LLM-generated summaries of complex structures
+- Caching transformed text alongside original JSON