From 3aea7b4050664c60dfcf8eed22a77d135624804e Mon Sep 17 00:00:00 2001
From: William Valentin <william.valentin.info@gmail.com>
Date: Mon, 26 Jan 2026 22:45:19 -0800
Subject: [PATCH] Add behaviors borrowed from Claude Code

- Categorized memory (preference/decision/fact/project/lesson)
- Session summarization protocol
- Parallel status checks during heartbeats
- Task-based LLM routing
- Local availability checking
- Multi-agent parallelism guidance
---
 AGENTS.md | 123 ++++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 102 insertions(+), 21 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index 5c87250..bc2e1cd 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -41,6 +41,51 @@ Capture what matters. Decisions, context, things to remember. Skip the secrets u
 - When you make a mistake → document it so future-you doesn't repeat it
 - **Text > Brain** 📝
 
+### 📂 Categorized Memory
+When saving information, be explicit about what type it is:
+
+| Category | Examples | Where to Save |
+|----------|----------|---------------|
+| **Preference** | "Always use rebase", "Prefers dark mode" | MEMORY.md |
+| **Decision** | "Chose llama-swap over Ollama", "Using Gitea for repos" | MEMORY.md |
+| **Fact** | "RTX 5070 Ti has 12GB", "Tailnet is taildb3494" | TOOLS.md |
+| **Project** | "clawdbot repo at gitea", "homelab uses ArgoCD" | TOOLS.md |
+| **Lesson** | "Check local LLM availability first", "MoE models need less VRAM" | MEMORY.md |
+
+This makes memory more searchable and useful for future-you.
+
+### 📋 Session Summarization
+At the end of productive sessions, proactively extract and save:
+
+1. **Decisions made** — What did we choose? Why?
+2. **Preferences learned** — How does the user like things?
+3. **Facts discovered** — New info about the environment
+4. **Lessons learned** — What worked? What didn't?
+
+**When to summarize:**
+- End of a long productive session
+- After making significant decisions
+- When asked to "remember this session"
+- Before the user signs off for a while
+
+**How to summarize:**
+```markdown
+### YYYY-MM-DD - Session Summary
+**Decisions:**
+- Chose X over Y because Z
+
+**Preferences:**
+- User prefers A approach
+
+**Facts:**
+- Discovered B about the system
+
+**Lessons:**
+- Learned that C works better than D
+```
+
+Offer to summarize rather than doing it silently — the user might want to add context.
+
 ## Safety
 
 - Don't exfiltrate private data. Ever.
@@ -172,15 +217,30 @@ You are free to edit `HEARTBEAT.md` with a short checklist or reminders. Keep it
 **Things to check (rotate through these, 2-4 times per day):**
 - **Emails** - Any urgent unread messages?
 - **Calendar** - Upcoming events in next 24-48h?
+- **Local LLMs** - Is llama-swap running? (`curl -sf http://127.0.0.1:8080/health`)
 - **Mentions** - Twitter/social notifications?
 - **Weather** - Relevant if your human might go out?
 
+### ⚡ Parallel Status Checks
+During heartbeats, run multiple checks **in parallel** for speed:
+
+```bash
+# Check these simultaneously, not sequentially
+- System: disk, memory, load
+- K8s: node status, pod health, alerts
+- Local LLM: llama-swap health
+- Services: any monitored endpoints
+```
+
+**Pattern:** Fire off independent checks together, then aggregate results. Don't wait for one to finish before starting the next.
+
 **Track your checks** in `memory/heartbeat-state.json`:
 ```json
 {
   "lastChecks": {
     "email": 1703275200,
     "calendar": 1703260800,
+    "localLLM": 1703275200,
     "weather": null
   }
 }
@@ -218,31 +278,52 @@ The goal: Be helpful without being annoying. Check in a few times a day, do usef
 
 ## 🤖 Using Other LLMs
 
-You have access to multiple LLM CLIs. Use the right tool for the job:
-
-```bash
-# Fast & cheap (simple tasks)
-opencode run -m github-copilot/claude-haiku-4.5 "parse this data"
-
-# Balanced (standard work)  
-opencode run -m github-copilot/claude-sonnet-4.5 "review this code"
-
-# Powerful (complex reasoning)
-opencode run -m github-copilot/gpt-5.2 "design this system"
-
-# Long context
-cat large_file.md | gemini -m gemini-2.5-pro "summarize"
-```
+You have access to multiple LLM CLIs. Use the right tool for the job.
 
 **See LLM-ROUTING.md for full guide.**
 
-**When to delegate vs do yourself:**
-- If the task is simple extraction/parsing → delegate to haiku/flash
-- If the task needs your full context → do it yourself
-- If the task is isolated and doesn't need conversation history → delegate
-- If the task is complex and you're opus anyway → just do it
+### 🎯 Task-Based Routing
+Think about the **task type first**, then pick the model:
 
-**Cost principle:** GitHub Copilot models are "free" with subscription. Use them for one-shot tasks instead of burning your own tokens.
+| Task Type | Route To | Why |
+|-----------|----------|-----|
+| **Private/Sensitive** | Local only (`qwen3`, `gemma`) | Data never leaves machine |
+| **Long-running** | Local | No API costs, no timeouts |
+| **Code generation** | Local `coder` or Copilot sonnet | Specialized models |
+| **Fast/simple** | Local `gemma` or Copilot haiku | Quick response |
+| **Complex reasoning** | Cloud (opus) or local `qwen3` | Quality matters |
+| **Massive context** | Gemini 2.5 Pro | 1M token window |
+| **Parallel work** | Multi-agent (any) | Speed through parallelism |
+
+### 🔌 Check Local Availability First
+Before routing to local LLMs:
+```bash
+curl -sf http://127.0.0.1:8080/health && echo "UP" || echo "DOWN"
+```
+
+If local is down, fall back to Copilot or cloud.
+
+### 📍 Routing Priority
+```
+1. Local (free, private, no limits)
+2. GitHub Copilot (free with subscription)
+3. Cloud APIs (paid, most capable)
+```
+
+### 🚀 Multi-Agent Parallelism
+For bulk work, spawn multiple agents:
+- Each agent can target different LLMs
+- Local: best for privacy + no rate limits
+- Cloud: best for complex sub-tasks
+- Mix based on each sub-task's requirements
+
+**When to delegate vs do yourself:**
+- Simple extraction/parsing → delegate to local or haiku
+- Needs your full context → do it yourself
+- Isolated task, no conversation history needed → delegate
+- Complex and you're opus anyway → just do it
+
+**Cost principle:** Local is free. GitHub Copilot models are "free" with subscription. Use them instead of burning cloud API tokens.
 
 ## Make It Yours