flynn/docs/plans/2026-02-07-tier1-quick-wins-design.md

# Tier 1 Quick Wins — Design

**Date:** 2026-02-07
**Status:** Draft
**Scope:** 5 additive features, no breaking changes

---

## 1. Per-message thinking mode (`!!think` prefix)

### Trigger

User prefixes a message with `!!think`. The prefix is stripped before the message reaches the model.

### Data flow

1. Frontend/channel adapter detects `!!think` prefix, strips it, sets `thinking: true` on the message metadata
2. Agent loop passes `thinking` flag through to `ChatRequest`
3. Each provider client checks the flag:
   - **Anthropic:** sets `thinking.budget_tokens` (default 4096)
   - **OpenAI/GitHub Models:** sets `reasoning_effort` (default `'medium'`)
   - **Gemini:** sets `thinkingConfig.thinkBudgetTokens` (default 4096)
   - **Bedrock:** sets via Anthropic thinking params
   - **Ollama/llama.cpp:** no-op (silently ignored)
4. Response thinking/reasoning content is included in the reply (displayed as a collapsible block in TUI/WebChat, omitted in channel adapters)

### Config additions

All optional — controls per-provider defaults when `!!think` is active:

```yaml
models:
  thinking:
    anthropic:
      budgetTokens: 4096
    openai:
      reasoningEffort: medium   # low | medium | high
    gemini:
      budgetTokens: 4096
```

### Types changes

```typescript
// src/models/types.ts — ChatRequest
export interface ChatRequest {
  messages: Message[];
  system?: string;
  maxTokens?: number;
  tools?: ToolDefinition[];
  thinking?: boolean;           // NEW
}

// src/models/types.ts — ChatResponse
export interface ChatResponse {
  content: string;
  toolCalls?: ToolCall[];
  stopReason?: string;
  usage?: TokenUsage;
  thinkingContent?: string;     // NEW — raw thinking/reasoning output
}
```

### Provider implementation

Each client checks `request.thinking` and maps to native API:

- **`anthropic.ts`**: Add `thinking: { type: 'enabled', budget_tokens }` to `messages.create()` params. Parse `thinking` content blocks from response.
- **`openai.ts`**: Add `reasoning_effort` to `chat.completions.create()`. Parse `reasoning` from response.
- **`github.ts`**: Same as OpenAI (uses OpenAI SDK).
- **`gemini.ts`**: Add `thinkingConfig` to `generationConfig`. Parse thinking parts from response.
- **`bedrock.ts`**: Add thinking params via Anthropic Converse API format.
- **`ollama.ts` / `llamacpp.ts`**: Ignore the flag.

### Files affected

- `src/models/types.ts` — Add `thinking` to ChatRequest, `thinkingContent` to ChatResponse
- `src/models/anthropic.ts` — Wire `budget_tokens`, parse thinking blocks
- `src/models/openai.ts` — Wire `reasoning_effort`, parse reasoning
- `src/models/github.ts` — Pass through to OpenAI client
- `src/models/gemini.ts` — Wire `thinkingConfig`
- `src/models/bedrock.ts` — Wire thinking params
- `src/config/schema.ts` — Add `models.thinking` config section
- `src/backends/native/agent.ts` — Pass `thinking` flag from message metadata to ChatRequest
- `src/frontends/tui/commands.ts` — Detect and strip `!!think` prefix
- Channel adapters — Detect and strip `!!think` prefix
- TUI/WebChat — Display `thinkingContent` as collapsible block

---

## 2. Verbose streaming mode (`/verbose`)

### Trigger

`/verbose` command toggles a boolean in the frontend's local state. Not persisted to session or config.

### Effect when on

- Raw streaming chunks displayed as they arrive, including tool call JSON being generated
- Tool arguments and raw results shown in full (no summarization)

### Scope

TUI and WebChat only. Channel adapters (Telegram, Discord, Slack, WhatsApp) do not support this.

### Implementation

- Add `verbose: boolean` to TUI and WebChat frontend state (default `false`)
- Add `/verbose` to command parser — toggles the flag, prints current status
- Streaming renderer checks the flag:
  - **On:** emit raw chunks as-is, display full tool call JSON and results
  - **Off:** current behavior (summarized tool output, clean text display)
- No backend changes — purely a display concern

### Files affected

- `src/frontends/tui/commands.ts` — Add `verbose` command type and parsing
- `src/frontends/tui/minimal.ts` — Handle `/verbose`, toggle state, modify streaming display
- `src/gateway/ui/pages/chat.js` — WebChat verbose toggle and raw display mode
- WebSocket message handler — Pass raw chunks when verbose is active

---

## 3. Typing indicators

### When

Immediately on receiving a user message. Sustained until the response is fully sent.

### Per-adapter implementation

| Adapter | API | Notes |
|---------|-----|-------|
| **Discord** | `channel.sendTyping()` | Auto-expires after 10s. Re-fire on a 9s interval while processing. |
| **Slack** | Bolt typing indicator API | Fire on receipt, cancel on response. |
| **WhatsApp** | `sock.sendPresenceUpdate('composing', jid)` | Fire on receipt, send `'paused'` on response. |
| **Telegram** | grammY `sendChatAction('typing')` | Already implemented. No changes needed. |

### Implementation pattern

Each adapter's message handler calls `sendTyping()` before dispatching to the agent loop. A cleanup/cancel mechanism (interval clear or presence update) stops the indicator once the response is sent.

```typescript
// Pseudocode for Discord adapter
async handleMessage(msg) {
  const typingInterval = setInterval(() => msg.channel.sendTyping(), 9000);
  msg.channel.sendTyping(); // immediate first call
  try {
    await this.dispatch(msg);
  } finally {
    clearInterval(typingInterval);
  }
}
```

### Files affected

- `src/channels/discord/adapter.ts` — Add typing interval in message handler
- `src/channels/slack/adapter.ts` — Add typing indicator in message handler
- `src/channels/whatsapp/adapter.ts` — Add presence composing/paused in message handler

---

## 4. Session pruning (TTL-based)

### Config addition

```yaml
sessions:
  ttl: 30d    # duration string. Default: 30d. Set to 0 or false to disable.
```

Supported formats: `"30d"`, `"7d"`, `"12h"`, `"0"` (disabled).

### Mechanism

1. Daemon startup schedules a periodic timer (every 1 hour)
2. Timer calls `SessionStore.pruneStale(cutoffTimestamp)`
3. SQLite query finds all `session_id`s where `MAX(created_at) < cutoff`
4. Deletes all messages for stale sessions
5. Evicts pruned sessions from `SessionManager`'s in-memory cache
6. Logs: `"Pruned 3 stale sessions (TTL: 30d)"`

### Duration parsing

Simple regex parser for duration strings — no external library:

```typescript
function parseDuration(s: string): number | null {
  const match = s.match(/^(\d+)(h|d)$/);
  if (!match) return null;
  const [, n, unit] = match;
  const ms = unit === 'h' ? Number(n) * 3600000 : Number(n) * 86400000;
  return ms;
}
```

### New SessionStore method

```typescript
async pruneStale(beforeTimestamp: number): Promise<string[]> {
  // Returns list of pruned session IDs
  const stale = db.prepare(`
    SELECT session_id FROM messages
    GROUP BY session_id
    HAVING MAX(created_at) < ?
  `).all(beforeTimestamp);

  for (const { session_id } of stale) {
    db.prepare('DELETE FROM messages WHERE session_id = ?').run(session_id);
  }
  return stale.map(r => r.session_id);
}
```

### Files affected

- `src/config/schema.ts` — Add `sessions.ttl` field
- `src/session/store.ts` — Add `pruneStale()` method
- `src/session/manager.ts` — Add `evictSessions(ids)` to clear in-memory cache
- `src/daemon/index.ts` — Schedule pruning timer on startup

---

## 5. Tool groups

### Group definitions

Static map in `policy.ts`:

```typescript
export const TOOL_GROUPS: Record<string, string[]> = {
  'group:fs':      ['file.read', 'file.write', 'file.edit', 'file.list'],
  'group:runtime': ['shell.exec', 'process.start', 'process.output', 'process.status', 'process.kill', 'process.list'],
  'group:web':     ['web.fetch', 'web.search', 'browser.navigate', 'browser.click', 'browser.type', 'browser.screenshot', 'browser.evaluate'],
  'group:memory':  ['memory.read', 'memory.write', 'memory.search'],
};
```

### Resolution

`ToolPolicy` expands `group:*` entries in allow/deny lists before applying filters. Expansion happens early in the resolution pipeline, before any set operations.

```typescript
function expandGroups(names: string[]): string[] {
  return names.flatMap(n => TOOL_GROUPS[n] ?? [n]);
}
```

Works in all scopes: global allow/deny, per-agent overrides, per-provider overrides.

### Config usage example

```yaml
tools:
  profile: minimal
  allow: ['group:web']
  agents:
    fast:
      allow: ['group:fs']
      deny: ['shell.exec']
  providers:
    ollama:
      deny: ['group:web']
```

### Files affected

- `src/tools/policy.ts` — Add `TOOL_GROUPS` map, `expandGroups()` helper, integrate into resolution pipeline
- `src/tools/policy.test.ts` — Tests for group expansion in all scopes

---

## Implementation order

Recommended order by independence and risk:

1. **Tool groups** — Isolated to `policy.ts`, no cross-cutting concerns
2. **Typing indicators** — Per-adapter, independent changes
3. **Session pruning** — Self-contained, touches store/manager/daemon
4. **`/verbose`** — Frontend-only, no backend changes
5. **`!!think`** — Largest scope, touches all providers + agent loop + frontends

Features 1–3 can be implemented in parallel. Feature 4 is independent. Feature 5 depends on understanding the streaming path touched by feature 4.