Files
flynn/docs/plans/2026-02-07-tier1-quick-wins-design.md
William Valentin 1c2f54fae3 feat: implement tier 1 quick wins (tool groups, typing, pruning, verbose, think)
Five additive features with no breaking changes:

- Tool groups: group:fs, group:runtime, group:web, group:memory syntactic
  sugar for allow/deny lists in tool policy config
- Typing indicators: Discord sendTyping() and WhatsApp sendStateTyping()
  on message receipt for better UX feedback
- Session pruning: TTL-based auto-cleanup via sessions.ttl config with
  hourly daemon timer and SQLite GROUP BY pruning
- /verbose command: TUI command parser toggle for raw streaming display
- !!think prefix: per-message extended thinking mode wired through
  Anthropic (budget_tokens), OpenAI/GitHub (reasoning_effort), and
  Gemini (thinkingConfig) providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:35:00 -08:00

9.1 KiB
Raw Permalink Blame History

Tier 1 Quick Wins — Design

Date: 2026-02-07 Status: Draft Scope: 5 additive features, no breaking changes


1. Per-message thinking mode (!!think prefix)

Trigger

User prefixes a message with !!think. The prefix is stripped before the message reaches the model.

Data flow

  1. Frontend/channel adapter detects !!think prefix, strips it, sets thinking: true on the message metadata
  2. Agent loop passes thinking flag through to ChatRequest
  3. Each provider client checks the flag:
    • Anthropic: sets thinking.budget_tokens (default 4096)
    • OpenAI/GitHub Models: sets reasoning_effort (default 'medium')
    • Gemini: sets thinkingConfig.thinkBudgetTokens (default 4096)
    • Bedrock: sets via Anthropic thinking params
    • Ollama/llama.cpp: no-op (silently ignored)
  4. Response thinking/reasoning content is included in the reply (displayed as a collapsible block in TUI/WebChat, omitted in channel adapters)

Config additions

All optional — controls per-provider defaults when !!think is active:

models:
  thinking:
    anthropic:
      budgetTokens: 4096
    openai:
      reasoningEffort: medium   # low | medium | high
    gemini:
      budgetTokens: 4096

Types changes

// src/models/types.ts — ChatRequest
export interface ChatRequest {
  messages: Message[];
  system?: string;
  maxTokens?: number;
  tools?: ToolDefinition[];
  thinking?: boolean;           // NEW
}

// src/models/types.ts — ChatResponse
export interface ChatResponse {
  content: string;
  toolCalls?: ToolCall[];
  stopReason?: string;
  usage?: TokenUsage;
  thinkingContent?: string;     // NEW — raw thinking/reasoning output
}

Provider implementation

Each client checks request.thinking and maps to native API:

  • anthropic.ts: Add thinking: { type: 'enabled', budget_tokens } to messages.create() params. Parse thinking content blocks from response.
  • openai.ts: Add reasoning_effort to chat.completions.create(). Parse reasoning from response.
  • github.ts: Same as OpenAI (uses OpenAI SDK).
  • gemini.ts: Add thinkingConfig to generationConfig. Parse thinking parts from response.
  • bedrock.ts: Add thinking params via Anthropic Converse API format.
  • ollama.ts / llamacpp.ts: Ignore the flag.

Files affected

  • src/models/types.ts — Add thinking to ChatRequest, thinkingContent to ChatResponse
  • src/models/anthropic.ts — Wire budget_tokens, parse thinking blocks
  • src/models/openai.ts — Wire reasoning_effort, parse reasoning
  • src/models/github.ts — Pass through to OpenAI client
  • src/models/gemini.ts — Wire thinkingConfig
  • src/models/bedrock.ts — Wire thinking params
  • src/config/schema.ts — Add models.thinking config section
  • src/backends/native/agent.ts — Pass thinking flag from message metadata to ChatRequest
  • src/frontends/tui/commands.ts — Detect and strip !!think prefix
  • Channel adapters — Detect and strip !!think prefix
  • TUI/WebChat — Display thinkingContent as collapsible block

2. Verbose streaming mode (/verbose)

Trigger

/verbose command toggles a boolean in the frontend's local state. Not persisted to session or config.

Effect when on

  • Raw streaming chunks displayed as they arrive, including tool call JSON being generated
  • Tool arguments and raw results shown in full (no summarization)

Scope

TUI and WebChat only. Channel adapters (Telegram, Discord, Slack, WhatsApp) do not support this.

Implementation

  • Add verbose: boolean to TUI and WebChat frontend state (default false)
  • Add /verbose to command parser — toggles the flag, prints current status
  • Streaming renderer checks the flag:
    • On: emit raw chunks as-is, display full tool call JSON and results
    • Off: current behavior (summarized tool output, clean text display)
  • No backend changes — purely a display concern

Files affected

  • src/frontends/tui/commands.ts — Add verbose command type and parsing
  • src/frontends/tui/minimal.ts — Handle /verbose, toggle state, modify streaming display
  • src/gateway/ui/pages/chat.js — WebChat verbose toggle and raw display mode
  • WebSocket message handler — Pass raw chunks when verbose is active

3. Typing indicators

When

Immediately on receiving a user message. Sustained until the response is fully sent.

Per-adapter implementation

Adapter API Notes
Discord channel.sendTyping() Auto-expires after 10s. Re-fire on a 9s interval while processing.
Slack Bolt typing indicator API Fire on receipt, cancel on response.
WhatsApp sock.sendPresenceUpdate('composing', jid) Fire on receipt, send 'paused' on response.
Telegram grammY sendChatAction('typing') Already implemented. No changes needed.

Implementation pattern

Each adapter's message handler calls sendTyping() before dispatching to the agent loop. A cleanup/cancel mechanism (interval clear or presence update) stops the indicator once the response is sent.

// Pseudocode for Discord adapter
async handleMessage(msg) {
  const typingInterval = setInterval(() => msg.channel.sendTyping(), 9000);
  msg.channel.sendTyping(); // immediate first call
  try {
    await this.dispatch(msg);
  } finally {
    clearInterval(typingInterval);
  }
}

Files affected

  • src/channels/discord/adapter.ts — Add typing interval in message handler
  • src/channels/slack/adapter.ts — Add typing indicator in message handler
  • src/channels/whatsapp/adapter.ts — Add presence composing/paused in message handler

4. Session pruning (TTL-based)

Config addition

sessions:
  ttl: 30d    # duration string. Default: 30d. Set to 0 or false to disable.

Supported formats: "30d", "7d", "12h", "0" (disabled).

Mechanism

  1. Daemon startup schedules a periodic timer (every 1 hour)
  2. Timer calls SessionStore.pruneStale(cutoffTimestamp)
  3. SQLite query finds all session_ids where MAX(created_at) < cutoff
  4. Deletes all messages for stale sessions
  5. Evicts pruned sessions from SessionManager's in-memory cache
  6. Logs: "Pruned 3 stale sessions (TTL: 30d)"

Duration parsing

Simple regex parser for duration strings — no external library:

function parseDuration(s: string): number | null {
  const match = s.match(/^(\d+)(h|d)$/);
  if (!match) return null;
  const [, n, unit] = match;
  const ms = unit === 'h' ? Number(n) * 3600000 : Number(n) * 86400000;
  return ms;
}

New SessionStore method

async pruneStale(beforeTimestamp: number): Promise<string[]> {
  // Returns list of pruned session IDs
  const stale = db.prepare(`
    SELECT session_id FROM messages
    GROUP BY session_id
    HAVING MAX(created_at) < ?
  `).all(beforeTimestamp);

  for (const { session_id } of stale) {
    db.prepare('DELETE FROM messages WHERE session_id = ?').run(session_id);
  }
  return stale.map(r => r.session_id);
}

Files affected

  • src/config/schema.ts — Add sessions.ttl field
  • src/session/store.ts — Add pruneStale() method
  • src/session/manager.ts — Add evictSessions(ids) to clear in-memory cache
  • src/daemon/index.ts — Schedule pruning timer on startup

5. Tool groups

Group definitions

Static map in policy.ts:

export const TOOL_GROUPS: Record<string, string[]> = {
  'group:fs':      ['file.read', 'file.write', 'file.edit', 'file.list'],
  'group:runtime': ['shell.exec', 'process.start', 'process.output', 'process.status', 'process.kill', 'process.list'],
  'group:web':     ['web.fetch', 'web.search', 'browser.navigate', 'browser.click', 'browser.type', 'browser.screenshot', 'browser.evaluate'],
  'group:memory':  ['memory.read', 'memory.write', 'memory.search'],
};

Resolution

ToolPolicy expands group:* entries in allow/deny lists before applying filters. Expansion happens early in the resolution pipeline, before any set operations.

function expandGroups(names: string[]): string[] {
  return names.flatMap(n => TOOL_GROUPS[n] ?? [n]);
}

Works in all scopes: global allow/deny, per-agent overrides, per-provider overrides.

Config usage example

tools:
  profile: minimal
  allow: ['group:web']
  agents:
    fast:
      allow: ['group:fs']
      deny: ['shell.exec']
  providers:
    ollama:
      deny: ['group:web']

Files affected

  • src/tools/policy.ts — Add TOOL_GROUPS map, expandGroups() helper, integrate into resolution pipeline
  • src/tools/policy.test.ts — Tests for group expansion in all scopes

Implementation order

Recommended order by independence and risk:

  1. Tool groups — Isolated to policy.ts, no cross-cutting concerns
  2. Typing indicators — Per-adapter, independent changes
  3. Session pruning — Self-contained, touches store/manager/daemon
  4. /verbose — Frontend-only, no backend changes
  5. !!think — Largest scope, touches all providers + agent loop + frontends

Features 13 can be implemented in parallel. Feature 4 is independent. Feature 5 depends on understanding the streaming path touched by feature 4.