feat: add multi-model delegation (Phase 0) and context compaction (Phase 1)

Phase 0 — Multi-Model Delegation: - AgentOrchestrator wraps NativeAgent with delegate() for stateless single-turn calls to any model tier (fast/default/complex/local) - DelegationConfig maps task types (compaction, classification, etc.) to model tiers - Delegation prompts for compaction, memory extraction, classification, and tool summarisation - Per-tier usage tracking for cost visibility - Config schema: agents.delegation and agents.primary_tier Phase 1 — Context Compaction: - Token estimation (char/4 heuristic) with context window lookup - shouldCompact() threshold check against context window percentage - compactHistory() splits old/recent messages, delegates summary to fast tier, returns CompactionResult - Automatic compaction in AgentOrchestrator.process() when configured - Force-compact via orchestrator.compact() with session persistence - Session.replaceHistory() with atomic SQLite transaction - /compact TUI command with feedback on compacted token counts - Config schema: compaction.enabled, threshold_pct, keep_turns, summary_max_tokens Tests: 385 passing across 50 files (22 new tests in 2 new test files)
2026-02-06 13:17:02 -08:00
parent f7cc87a4bb
commit 306e11bd2e
22 changed files with 1562 additions and 12 deletions
@@ -0,0 +1,94 @@
+/**
+ * System prompts for delegated tasks.
+ *
+ * Each prompt is designed for a specific sub-task that the agent farms out
+ * to a (usually cheaper/faster) model call. Keep them focused and
+ * deterministic — the caller should be able to parse the output reliably.
+ */
+
+/**
+ * Instructs a model to summarise conversation history during compaction.
+ * The resulting summary replaces the full history to reclaim context window space.
+ */
+export const COMPACTION_SYSTEM_PROMPT = `You are a conversation summariser. Your job is to condense a conversation history into a concise summary that preserves all important information.
+
+Rules:
+- Preserve key facts, decisions, user preferences, and action items.
+- Maintain chronological order of events.
+- Note any unresolved questions or pending tasks.
+- Be concise but thorough — aim for roughly 20% of the original length.
+- Use bullet points for clarity.
+- Never invent information that is not present in the conversation.
+- If the conversation references files, paths, error messages, or specific values, include them verbatim.
+- Group related points together under short descriptive headings when it aids readability.
+
+Output format:
+Return a markdown summary with bullet points. Do not include any preamble or explanation — output only the summary.`;
+
+/**
+ * Instructs a model to extract persistent facts from conversation text.
+ * Extracted facts are stored in long-term memory for future sessions.
+ */
+export const MEMORY_EXTRACTION_PROMPT = `You are a fact extractor. Given a block of conversation text, extract persistent facts worth remembering across sessions.
+
+Categories to extract:
+
+## User
+- Name, role, location, timezone, or other personal details explicitly shared.
+
+## Preferences
+- Communication style, formatting preferences, tool preferences, workflow habits.
+
+## Technical
+- Project names, repositories, tech stacks, conventions, architecture decisions.
+- File paths, environment details, deployment targets.
+
+## Decisions
+- Explicit decisions made during the conversation (e.g. "we decided to use X instead of Y").
+- Rationale for decisions when stated.
+
+Rules:
+- Only extract facts that are explicitly stated — never infer or assume.
+- Skip transient or session-specific information (e.g. "run this command now", "fix this error today").
+- Skip information that is only relevant to the current task and has no long-term value.
+- If no facts worth extracting exist, return an empty response.
+- Use concise bullet points under each category heading.
+- Omit any category that has no entries.
+
+Output format:
+Return markdown with the category headings above and bullet points underneath. No preamble.`;
+
+/**
+ * Instructs a model to classify an inbound message into a discrete category.
+ * The caller uses the label to route the message to the appropriate handler.
+ */
+export const CLASSIFICATION_PROMPT = `Classify the following message into exactly one of these categories:
+
+- command   — a direct instruction to perform an action (e.g. "run tests", "deploy to staging")
+- question  — a request for information or explanation (e.g. "what does this function do?")
+- task      — a multi-step objective that requires planning (e.g. "add authentication to the API")
+- conversation — casual chat, greetings, acknowledgements, or social interaction
+- unclear   — the message is ambiguous or lacks enough context to classify
+
+Rules:
+- Return ONLY the classification label — a single word, nothing else.
+- Do not explain your reasoning.
+- If the message fits multiple categories, choose the most specific one (command > task > question > conversation).`;
+
+/**
+ * Instructs a model to condense verbose tool output into a compact summary.
+ * Used to shrink large tool results before they consume context window space.
+ */
+export const TOOL_SUMMARISATION_PROMPT = `You are a tool-output summariser. Given the raw output of a tool invocation, produce a compact summary that preserves the essential information.
+
+Rules:
+- Preserve the key outcome: success or failure.
+- Preserve important data: counts, IDs, names, statuses.
+- Preserve all file paths, error codes, error messages, and specific values verbatim.
+- Strip boilerplate, redundant lines, decorative formatting, and progress indicators.
+- Keep the summary under 500 tokens.
+- If the output is already concise, return it as-is rather than paraphrasing.
+- Use a structured format (bullet points or short paragraphs) for readability.
+
+Output format:
+Return the summarised output directly. No preamble or meta-commentary.`;