Files
flynn/docs/plans/2026-02-07-tier1-quick-wins-design.md
T
William Valentin 1c2f54fae3 feat: implement tier 1 quick wins (tool groups, typing, pruning, verbose, think)
Five additive features with no breaking changes:

- Tool groups: group:fs, group:runtime, group:web, group:memory syntactic
  sugar for allow/deny lists in tool policy config
- Typing indicators: Discord sendTyping() and WhatsApp sendStateTyping()
  on message receipt for better UX feedback
- Session pruning: TTL-based auto-cleanup via sessions.ttl config with
  hourly daemon timer and SQLite GROUP BY pruning
- /verbose command: TUI command parser toggle for raw streaming display
- !!think prefix: per-message extended thinking mode wired through
  Anthropic (budget_tokens), OpenAI/GitHub (reasoning_effort), and
  Gemini (thinkingConfig) providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:35:00 -08:00

285 lines
9.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Tier 1 Quick Wins — Design
**Date:** 2026-02-07
**Status:** Draft
**Scope:** 5 additive features, no breaking changes
---
## 1. Per-message thinking mode (`!!think` prefix)
### Trigger
User prefixes a message with `!!think`. The prefix is stripped before the message reaches the model.
### Data flow
1. Frontend/channel adapter detects `!!think` prefix, strips it, sets `thinking: true` on the message metadata
2. Agent loop passes `thinking` flag through to `ChatRequest`
3. Each provider client checks the flag:
- **Anthropic:** sets `thinking.budget_tokens` (default 4096)
- **OpenAI/GitHub Models:** sets `reasoning_effort` (default `'medium'`)
- **Gemini:** sets `thinkingConfig.thinkBudgetTokens` (default 4096)
- **Bedrock:** sets via Anthropic thinking params
- **Ollama/llama.cpp:** no-op (silently ignored)
4. Response thinking/reasoning content is included in the reply (displayed as a collapsible block in TUI/WebChat, omitted in channel adapters)
### Config additions
All optional — controls per-provider defaults when `!!think` is active:
```yaml
models:
thinking:
anthropic:
budgetTokens: 4096
openai:
reasoningEffort: medium # low | medium | high
gemini:
budgetTokens: 4096
```
### Types changes
```typescript
// src/models/types.ts — ChatRequest
export interface ChatRequest {
messages: Message[];
system?: string;
maxTokens?: number;
tools?: ToolDefinition[];
thinking?: boolean; // NEW
}
// src/models/types.ts — ChatResponse
export interface ChatResponse {
content: string;
toolCalls?: ToolCall[];
stopReason?: string;
usage?: TokenUsage;
thinkingContent?: string; // NEW — raw thinking/reasoning output
}
```
### Provider implementation
Each client checks `request.thinking` and maps to native API:
- **`anthropic.ts`**: Add `thinking: { type: 'enabled', budget_tokens }` to `messages.create()` params. Parse `thinking` content blocks from response.
- **`openai.ts`**: Add `reasoning_effort` to `chat.completions.create()`. Parse `reasoning` from response.
- **`github.ts`**: Same as OpenAI (uses OpenAI SDK).
- **`gemini.ts`**: Add `thinkingConfig` to `generationConfig`. Parse thinking parts from response.
- **`bedrock.ts`**: Add thinking params via Anthropic Converse API format.
- **`ollama.ts` / `llamacpp.ts`**: Ignore the flag.
### Files affected
- `src/models/types.ts` — Add `thinking` to ChatRequest, `thinkingContent` to ChatResponse
- `src/models/anthropic.ts` — Wire `budget_tokens`, parse thinking blocks
- `src/models/openai.ts` — Wire `reasoning_effort`, parse reasoning
- `src/models/github.ts` — Pass through to OpenAI client
- `src/models/gemini.ts` — Wire `thinkingConfig`
- `src/models/bedrock.ts` — Wire thinking params
- `src/config/schema.ts` — Add `models.thinking` config section
- `src/backends/native/agent.ts` — Pass `thinking` flag from message metadata to ChatRequest
- `src/frontends/tui/commands.ts` — Detect and strip `!!think` prefix
- Channel adapters — Detect and strip `!!think` prefix
- TUI/WebChat — Display `thinkingContent` as collapsible block
---
## 2. Verbose streaming mode (`/verbose`)
### Trigger
`/verbose` command toggles a boolean in the frontend's local state. Not persisted to session or config.
### Effect when on
- Raw streaming chunks displayed as they arrive, including tool call JSON being generated
- Tool arguments and raw results shown in full (no summarization)
### Scope
TUI and WebChat only. Channel adapters (Telegram, Discord, Slack, WhatsApp) do not support this.
### Implementation
- Add `verbose: boolean` to TUI and WebChat frontend state (default `false`)
- Add `/verbose` to command parser — toggles the flag, prints current status
- Streaming renderer checks the flag:
- **On:** emit raw chunks as-is, display full tool call JSON and results
- **Off:** current behavior (summarized tool output, clean text display)
- No backend changes — purely a display concern
### Files affected
- `src/frontends/tui/commands.ts` — Add `verbose` command type and parsing
- `src/frontends/tui/minimal.ts` — Handle `/verbose`, toggle state, modify streaming display
- `src/gateway/ui/pages/chat.js` — WebChat verbose toggle and raw display mode
- WebSocket message handler — Pass raw chunks when verbose is active
---
## 3. Typing indicators
### When
Immediately on receiving a user message. Sustained until the response is fully sent.
### Per-adapter implementation
| Adapter | API | Notes |
|---------|-----|-------|
| **Discord** | `channel.sendTyping()` | Auto-expires after 10s. Re-fire on a 9s interval while processing. |
| **Slack** | Bolt typing indicator API | Fire on receipt, cancel on response. |
| **WhatsApp** | `sock.sendPresenceUpdate('composing', jid)` | Fire on receipt, send `'paused'` on response. |
| **Telegram** | grammY `sendChatAction('typing')` | Already implemented. No changes needed. |
### Implementation pattern
Each adapter's message handler calls `sendTyping()` before dispatching to the agent loop. A cleanup/cancel mechanism (interval clear or presence update) stops the indicator once the response is sent.
```typescript
// Pseudocode for Discord adapter
async handleMessage(msg) {
const typingInterval = setInterval(() => msg.channel.sendTyping(), 9000);
msg.channel.sendTyping(); // immediate first call
try {
await this.dispatch(msg);
} finally {
clearInterval(typingInterval);
}
}
```
### Files affected
- `src/channels/discord/adapter.ts` — Add typing interval in message handler
- `src/channels/slack/adapter.ts` — Add typing indicator in message handler
- `src/channels/whatsapp/adapter.ts` — Add presence composing/paused in message handler
---
## 4. Session pruning (TTL-based)
### Config addition
```yaml
sessions:
ttl: 30d # duration string. Default: 30d. Set to 0 or false to disable.
```
Supported formats: `"30d"`, `"7d"`, `"12h"`, `"0"` (disabled).
### Mechanism
1. Daemon startup schedules a periodic timer (every 1 hour)
2. Timer calls `SessionStore.pruneStale(cutoffTimestamp)`
3. SQLite query finds all `session_id`s where `MAX(created_at) < cutoff`
4. Deletes all messages for stale sessions
5. Evicts pruned sessions from `SessionManager`'s in-memory cache
6. Logs: `"Pruned 3 stale sessions (TTL: 30d)"`
### Duration parsing
Simple regex parser for duration strings — no external library:
```typescript
function parseDuration(s: string): number | null {
const match = s.match(/^(\d+)(h|d)$/);
if (!match) return null;
const [, n, unit] = match;
const ms = unit === 'h' ? Number(n) * 3600000 : Number(n) * 86400000;
return ms;
}
```
### New SessionStore method
```typescript
async pruneStale(beforeTimestamp: number): Promise<string[]> {
// Returns list of pruned session IDs
const stale = db.prepare(`
SELECT session_id FROM messages
GROUP BY session_id
HAVING MAX(created_at) < ?
`).all(beforeTimestamp);
for (const { session_id } of stale) {
db.prepare('DELETE FROM messages WHERE session_id = ?').run(session_id);
}
return stale.map(r => r.session_id);
}
```
### Files affected
- `src/config/schema.ts` — Add `sessions.ttl` field
- `src/session/store.ts` — Add `pruneStale()` method
- `src/session/manager.ts` — Add `evictSessions(ids)` to clear in-memory cache
- `src/daemon/index.ts` — Schedule pruning timer on startup
---
## 5. Tool groups
### Group definitions
Static map in `policy.ts`:
```typescript
export const TOOL_GROUPS: Record<string, string[]> = {
'group:fs': ['file.read', 'file.write', 'file.edit', 'file.list'],
'group:runtime': ['shell.exec', 'process.start', 'process.output', 'process.status', 'process.kill', 'process.list'],
'group:web': ['web.fetch', 'web.search', 'browser.navigate', 'browser.click', 'browser.type', 'browser.screenshot', 'browser.evaluate'],
'group:memory': ['memory.read', 'memory.write', 'memory.search'],
};
```
### Resolution
`ToolPolicy` expands `group:*` entries in allow/deny lists before applying filters. Expansion happens early in the resolution pipeline, before any set operations.
```typescript
function expandGroups(names: string[]): string[] {
return names.flatMap(n => TOOL_GROUPS[n] ?? [n]);
}
```
Works in all scopes: global allow/deny, per-agent overrides, per-provider overrides.
### Config usage example
```yaml
tools:
profile: minimal
allow: ['group:web']
agents:
fast:
allow: ['group:fs']
deny: ['shell.exec']
providers:
ollama:
deny: ['group:web']
```
### Files affected
- `src/tools/policy.ts` — Add `TOOL_GROUPS` map, `expandGroups()` helper, integrate into resolution pipeline
- `src/tools/policy.test.ts` — Tests for group expansion in all scopes
---
## Implementation order
Recommended order by independence and risk:
1. **Tool groups** — Isolated to `policy.ts`, no cross-cutting concerns
2. **Typing indicators** — Per-adapter, independent changes
3. **Session pruning** — Self-contained, touches store/manager/daemon
4. **`/verbose`** — Frontend-only, no backend changes
5. **`!!think`** — Largest scope, touches all providers + agent loop + frontends
Features 13 can be implemented in parallel. Feature 4 is independent. Feature 5 depends on understanding the streaming path touched by feature 4.