# Tier 1 Quick Wins — Design **Date:** 2026-02-07 **Status:** Draft **Scope:** 5 additive features, no breaking changes --- ## 1. Per-message thinking mode (`!!think` prefix) ### Trigger User prefixes a message with `!!think`. The prefix is stripped before the message reaches the model. ### Data flow 1. Frontend/channel adapter detects `!!think` prefix, strips it, sets `thinking: true` on the message metadata 2. Agent loop passes `thinking` flag through to `ChatRequest` 3. Each provider client checks the flag: - **Anthropic:** sets `thinking.budget_tokens` (default 4096) - **OpenAI/GitHub Models:** sets `reasoning_effort` (default `'medium'`) - **Gemini:** sets `thinkingConfig.thinkBudgetTokens` (default 4096) - **Bedrock:** sets via Anthropic thinking params - **Ollama/llama.cpp:** no-op (silently ignored) 4. Response thinking/reasoning content is included in the reply (displayed as a collapsible block in TUI/WebChat, omitted in channel adapters) ### Config additions All optional — controls per-provider defaults when `!!think` is active: ```yaml models: thinking: anthropic: budgetTokens: 4096 openai: reasoningEffort: medium # low | medium | high gemini: budgetTokens: 4096 ``` ### Types changes ```typescript // src/models/types.ts — ChatRequest export interface ChatRequest { messages: Message[]; system?: string; maxTokens?: number; tools?: ToolDefinition[]; thinking?: boolean; // NEW } // src/models/types.ts — ChatResponse export interface ChatResponse { content: string; toolCalls?: ToolCall[]; stopReason?: string; usage?: TokenUsage; thinkingContent?: string; // NEW — raw thinking/reasoning output } ``` ### Provider implementation Each client checks `request.thinking` and maps to native API: - **`anthropic.ts`**: Add `thinking: { type: 'enabled', budget_tokens }` to `messages.create()` params. Parse `thinking` content blocks from response. - **`openai.ts`**: Add `reasoning_effort` to `chat.completions.create()`. Parse `reasoning` from response. - **`github.ts`**: Same as OpenAI (uses OpenAI SDK). - **`gemini.ts`**: Add `thinkingConfig` to `generationConfig`. Parse thinking parts from response. - **`bedrock.ts`**: Add thinking params via Anthropic Converse API format. - **`ollama.ts` / `llamacpp.ts`**: Ignore the flag. ### Files affected - `src/models/types.ts` — Add `thinking` to ChatRequest, `thinkingContent` to ChatResponse - `src/models/anthropic.ts` — Wire `budget_tokens`, parse thinking blocks - `src/models/openai.ts` — Wire `reasoning_effort`, parse reasoning - `src/models/github.ts` — Pass through to OpenAI client - `src/models/gemini.ts` — Wire `thinkingConfig` - `src/models/bedrock.ts` — Wire thinking params - `src/config/schema.ts` — Add `models.thinking` config section - `src/backends/native/agent.ts` — Pass `thinking` flag from message metadata to ChatRequest - `src/frontends/tui/commands.ts` — Detect and strip `!!think` prefix - Channel adapters — Detect and strip `!!think` prefix - TUI/WebChat — Display `thinkingContent` as collapsible block --- ## 2. Verbose streaming mode (`/verbose`) ### Trigger `/verbose` command toggles a boolean in the frontend's local state. Not persisted to session or config. ### Effect when on - Raw streaming chunks displayed as they arrive, including tool call JSON being generated - Tool arguments and raw results shown in full (no summarization) ### Scope TUI and WebChat only. Channel adapters (Telegram, Discord, Slack, WhatsApp) do not support this. ### Implementation - Add `verbose: boolean` to TUI and WebChat frontend state (default `false`) - Add `/verbose` to command parser — toggles the flag, prints current status - Streaming renderer checks the flag: - **On:** emit raw chunks as-is, display full tool call JSON and results - **Off:** current behavior (summarized tool output, clean text display) - No backend changes — purely a display concern ### Files affected - `src/frontends/tui/commands.ts` — Add `verbose` command type and parsing - `src/frontends/tui/minimal.ts` — Handle `/verbose`, toggle state, modify streaming display - `src/gateway/ui/pages/chat.js` — WebChat verbose toggle and raw display mode - WebSocket message handler — Pass raw chunks when verbose is active --- ## 3. Typing indicators ### When Immediately on receiving a user message. Sustained until the response is fully sent. ### Per-adapter implementation | Adapter | API | Notes | |---------|-----|-------| | **Discord** | `channel.sendTyping()` | Auto-expires after 10s. Re-fire on a 9s interval while processing. | | **Slack** | Bolt typing indicator API | Fire on receipt, cancel on response. | | **WhatsApp** | `sock.sendPresenceUpdate('composing', jid)` | Fire on receipt, send `'paused'` on response. | | **Telegram** | grammY `sendChatAction('typing')` | Already implemented. No changes needed. | ### Implementation pattern Each adapter's message handler calls `sendTyping()` before dispatching to the agent loop. A cleanup/cancel mechanism (interval clear or presence update) stops the indicator once the response is sent. ```typescript // Pseudocode for Discord adapter async handleMessage(msg) { const typingInterval = setInterval(() => msg.channel.sendTyping(), 9000); msg.channel.sendTyping(); // immediate first call try { await this.dispatch(msg); } finally { clearInterval(typingInterval); } } ``` ### Files affected - `src/channels/discord/adapter.ts` — Add typing interval in message handler - `src/channels/slack/adapter.ts` — Add typing indicator in message handler - `src/channels/whatsapp/adapter.ts` — Add presence composing/paused in message handler --- ## 4. Session pruning (TTL-based) ### Config addition ```yaml sessions: ttl: 30d # duration string. Default: 30d. Set to 0 or false to disable. ``` Supported formats: `"30d"`, `"7d"`, `"12h"`, `"0"` (disabled). ### Mechanism 1. Daemon startup schedules a periodic timer (every 1 hour) 2. Timer calls `SessionStore.pruneStale(cutoffTimestamp)` 3. SQLite query finds all `session_id`s where `MAX(created_at) < cutoff` 4. Deletes all messages for stale sessions 5. Evicts pruned sessions from `SessionManager`'s in-memory cache 6. Logs: `"Pruned 3 stale sessions (TTL: 30d)"` ### Duration parsing Simple regex parser for duration strings — no external library: ```typescript function parseDuration(s: string): number | null { const match = s.match(/^(\d+)(h|d)$/); if (!match) return null; const [, n, unit] = match; const ms = unit === 'h' ? Number(n) * 3600000 : Number(n) * 86400000; return ms; } ``` ### New SessionStore method ```typescript async pruneStale(beforeTimestamp: number): Promise { // Returns list of pruned session IDs const stale = db.prepare(` SELECT session_id FROM messages GROUP BY session_id HAVING MAX(created_at) < ? `).all(beforeTimestamp); for (const { session_id } of stale) { db.prepare('DELETE FROM messages WHERE session_id = ?').run(session_id); } return stale.map(r => r.session_id); } ``` ### Files affected - `src/config/schema.ts` — Add `sessions.ttl` field - `src/session/store.ts` — Add `pruneStale()` method - `src/session/manager.ts` — Add `evictSessions(ids)` to clear in-memory cache - `src/daemon/index.ts` — Schedule pruning timer on startup --- ## 5. Tool groups ### Group definitions Static map in `policy.ts`: ```typescript export const TOOL_GROUPS: Record = { 'group:fs': ['file.read', 'file.write', 'file.edit', 'file.list'], 'group:runtime': ['shell.exec', 'process.start', 'process.output', 'process.status', 'process.kill', 'process.list'], 'group:web': ['web.fetch', 'web.search', 'browser.navigate', 'browser.click', 'browser.type', 'browser.screenshot', 'browser.evaluate'], 'group:memory': ['memory.read', 'memory.write', 'memory.search'], }; ``` ### Resolution `ToolPolicy` expands `group:*` entries in allow/deny lists before applying filters. Expansion happens early in the resolution pipeline, before any set operations. ```typescript function expandGroups(names: string[]): string[] { return names.flatMap(n => TOOL_GROUPS[n] ?? [n]); } ``` Works in all scopes: global allow/deny, per-agent overrides, per-provider overrides. ### Config usage example ```yaml tools: profile: minimal allow: ['group:web'] agents: fast: allow: ['group:fs'] deny: ['shell.exec'] providers: ollama: deny: ['group:web'] ``` ### Files affected - `src/tools/policy.ts` — Add `TOOL_GROUPS` map, `expandGroups()` helper, integrate into resolution pipeline - `src/tools/policy.test.ts` — Tests for group expansion in all scopes --- ## Implementation order Recommended order by independence and risk: 1. **Tool groups** — Isolated to `policy.ts`, no cross-cutting concerns 2. **Typing indicators** — Per-adapter, independent changes 3. **Session pruning** — Self-contained, touches store/manager/daemon 4. **`/verbose`** — Frontend-only, no backend changes 5. **`!!think`** — Largest scope, touches all providers + agent loop + frontends Features 1–3 can be implemented in parallel. Feature 4 is independent. Feature 5 depends on understanding the streaming path touched by feature 4.