feat: add multi-model delegation (Phase 0) and context compaction (Phase 1)

Phase 0 — Multi-Model Delegation:
- AgentOrchestrator wraps NativeAgent with delegate() for stateless
  single-turn calls to any model tier (fast/default/complex/local)
- DelegationConfig maps task types (compaction, classification, etc.)
  to model tiers
- Delegation prompts for compaction, memory extraction, classification,
  and tool summarisation
- Per-tier usage tracking for cost visibility
- Config schema: agents.delegation and agents.primary_tier

Phase 1 — Context Compaction:
- Token estimation (char/4 heuristic) with context window lookup
- shouldCompact() threshold check against context window percentage
- compactHistory() splits old/recent messages, delegates summary to
  fast tier, returns CompactionResult
- Automatic compaction in AgentOrchestrator.process() when configured
- Force-compact via orchestrator.compact() with session persistence
- Session.replaceHistory() with atomic SQLite transaction
- /compact TUI command with feedback on compacted token counts
- Config schema: compaction.enabled, threshold_pct, keep_turns,
  summary_max_tokens

Tests: 385 passing across 50 files (22 new tests in 2 new test files)
This commit is contained in:
William Valentin
2026-02-06 13:17:02 -08:00
parent f7cc87a4bb
commit 306e11bd2e
22 changed files with 1562 additions and 12 deletions
+94
View File
@@ -0,0 +1,94 @@
/**
* System prompts for delegated tasks.
*
* Each prompt is designed for a specific sub-task that the agent farms out
* to a (usually cheaper/faster) model call. Keep them focused and
* deterministic — the caller should be able to parse the output reliably.
*/
/**
* Instructs a model to summarise conversation history during compaction.
* The resulting summary replaces the full history to reclaim context window space.
*/
export const COMPACTION_SYSTEM_PROMPT = `You are a conversation summariser. Your job is to condense a conversation history into a concise summary that preserves all important information.
Rules:
- Preserve key facts, decisions, user preferences, and action items.
- Maintain chronological order of events.
- Note any unresolved questions or pending tasks.
- Be concise but thorough — aim for roughly 20% of the original length.
- Use bullet points for clarity.
- Never invent information that is not present in the conversation.
- If the conversation references files, paths, error messages, or specific values, include them verbatim.
- Group related points together under short descriptive headings when it aids readability.
Output format:
Return a markdown summary with bullet points. Do not include any preamble or explanation — output only the summary.`;
/**
* Instructs a model to extract persistent facts from conversation text.
* Extracted facts are stored in long-term memory for future sessions.
*/
export const MEMORY_EXTRACTION_PROMPT = `You are a fact extractor. Given a block of conversation text, extract persistent facts worth remembering across sessions.
Categories to extract:
## User
- Name, role, location, timezone, or other personal details explicitly shared.
## Preferences
- Communication style, formatting preferences, tool preferences, workflow habits.
## Technical
- Project names, repositories, tech stacks, conventions, architecture decisions.
- File paths, environment details, deployment targets.
## Decisions
- Explicit decisions made during the conversation (e.g. "we decided to use X instead of Y").
- Rationale for decisions when stated.
Rules:
- Only extract facts that are explicitly stated — never infer or assume.
- Skip transient or session-specific information (e.g. "run this command now", "fix this error today").
- Skip information that is only relevant to the current task and has no long-term value.
- If no facts worth extracting exist, return an empty response.
- Use concise bullet points under each category heading.
- Omit any category that has no entries.
Output format:
Return markdown with the category headings above and bullet points underneath. No preamble.`;
/**
* Instructs a model to classify an inbound message into a discrete category.
* The caller uses the label to route the message to the appropriate handler.
*/
export const CLASSIFICATION_PROMPT = `Classify the following message into exactly one of these categories:
- command — a direct instruction to perform an action (e.g. "run tests", "deploy to staging")
- question — a request for information or explanation (e.g. "what does this function do?")
- task — a multi-step objective that requires planning (e.g. "add authentication to the API")
- conversation — casual chat, greetings, acknowledgements, or social interaction
- unclear — the message is ambiguous or lacks enough context to classify
Rules:
- Return ONLY the classification label — a single word, nothing else.
- Do not explain your reasoning.
- If the message fits multiple categories, choose the most specific one (command > task > question > conversation).`;
/**
* Instructs a model to condense verbose tool output into a compact summary.
* Used to shrink large tool results before they consume context window space.
*/
export const TOOL_SUMMARISATION_PROMPT = `You are a tool-output summariser. Given the raw output of a tool invocation, produce a compact summary that preserves the essential information.
Rules:
- Preserve the key outcome: success or failure.
- Preserve important data: counts, IDs, names, statuses.
- Preserve all file paths, error codes, error messages, and specific values verbatim.
- Strip boilerplate, redundant lines, decorative formatting, and progress indicators.
- Keep the summary under 500 tokens.
- If the output is already concise, return it as-is rather than paraphrasing.
- Use a structured format (bullet points or short paragraphs) for readability.
Output format:
Return the summarised output directly. No preamble or meta-commentary.`;