Commit Graph

28 Commits

Author SHA1 Message Date
William Valentin 67058c8719 feat(security): harden tool provenance and skill isolation 2026-02-15 10:16:55 -08:00
William Valentin 151b48310e orchestrator: recover from context overflow on fallback 2026-02-13 21:19:02 -08:00
William Valentin 46099664f0 feat(gateway): wire safe-point runtime cancellation for agent.cancel 2026-02-13 08:51:14 -08:00
William Valentin 9f81c01603 feat(session): persist model tier overrides per session
Store per-session config in SQLite and route /model and /reset through command fast-paths so channel sessions keep independent model selection across reconnects and restarts.
2026-02-13 01:04:26 -08:00
William Valentin 90ce622080 feat(policy): enforce truthfulness and autonomy guardrails
Add runtime truthfulness modes and autonomy-level tool gating with audit metadata for overrides/denials.

Wire policy through prompt assembly, tool execution context, and daemon/gateway agent paths; update tests and planning state for Phase 3 PR #2 completion.
2026-02-12 16:06:45 -08:00
William Valentin a8a2c59313 feat: implement model persistence with per-session overrides
- Add session_config SQLite table for per-session settings
- Update routing to support session override → agent config → global default resolution chain
- Upgrade WebChat SessionBridge from NativeAgent to AgentOrchestrator
- Add /model, /local, /cloud commands to Telegram adapter
- Add /model command to WebChat gateway handlers
- Clear session overrides on /reset command
- Pass memoryStore and config through to SessionBridge
- Add comprehensive tests for all new functionality

Fixes model persistence bug where TUI model changes didn't affect WebChat/Telegram sessions. Now:
- TUI /model sets global default (persists across restarts, affects all new sessions)
- WebChat/Telegram /model sets session override (only that conversation, cleared on /reset)
- WebChat sessions gain AgentOrchestrator features (delegation, compaction, memory)
2026-02-11 21:51:38 -08:00
William Valentin d62e836b5d feat(audit): Add core audit logging infrastructure
- Add AuditLogger class with rotation support
- Add audit configuration to config schema
- Instrument tool execution with full audit logging
- Instrument session lifecycle (create, message, delete, transfer, compact)
- Add audit logger initialization in daemon
- Add cron scheduler audit logging

Audit events captured:
- tool.start/success/error/denied
- session.create/message/delete/transfer/compact
- cron.trigger/add/remove

All logs go to ~/.local/share/flynn/audit.log (JSON lines)
with rotation (10MB files, 30-day retention)
2026-02-11 15:58:07 -08:00
William Valentin 6090508bad style: auto-fix ESLint issues (curly braces and formatting)
- Add curly braces to all if/else/for/while statements
- Fix indentation and trailing spaces
- Auto-fixed 372 linting errors using eslint --fix
- Remaining issues are warnings only (non-null assertions, explicit any types)
2026-02-11 10:30:24 -08:00
William Valentin 01c3175fdb fix: normalize OpenAI/GitHub finish_reason to Flynn stopReason conventions
OpenAI-compatible providers return 'stop' and 'tool_calls' as finish_reason
values, but Flynn's agent loop expects Anthropic-style 'end_turn' and
'tool_use'. This caused the agent to exit the tool loop prematurely when
falling back to GitHub Copilot (due to Anthropic API quota exhaustion).

- openai.ts: Map 'stop' → 'end_turn', 'length' → 'max_tokens', tool_calls
  with actual tools → 'tool_use', tool_calls without tools → 'end_turn'
- github.ts: Handle edge case where finish_reason is 'tool_calls' but no
  tools were parsed
- agent.ts: Accept both 'tool_use' and 'tool_calls' as valid stop reasons
  (belt-and-suspenders), extract toolCalls to local variable for TS narrowing
- openai.test.ts: Update expectations to match new normalized values
2026-02-11 09:49:36 -08:00
William Valentin 1aab006a7f feat: improve agent loop resilience — same-tool nudging and error handling
- agent.ts: track consecutive calls to the same tool (ignoring args) and
  inject a nudge after 4 repeats telling the model to summarize and respond,
  preventing local models from endlessly retrying searches with slight
  query variations
- agent.ts: wrap the entire tool loop iteration in try-catch so model/network
  errors don't crash the daemon — returns a descriptive error message instead
- Tests for both: nudge triggers after 4 same-tool calls, error recovery
  persists to history
2026-02-11 09:33:30 -08:00
William Valentin bf9ca690f3 fix(agent): detect repeated tool call loops and make max_iterations configurable
Local LLMs often get stuck calling the same tool repeatedly because they
lack the sophistication to synthesize results. The agent loop had no
safeguard — it re-executed whatever the model requested up to 10 times.

Add fingerprint-based loop detection: if the same tool+args combination
repeats 3 consecutive times, break the loop and return the last results.
Also add agents.max_iterations to the config schema so the iteration
limit is user-configurable (default: 10).
2026-02-10 19:35:09 -08:00
William Valentin 796e143d61 fix(agent): inject tool inventory note when tools change mid-session
Stale session history can cause the model to follow old "I can't do
that" patterns even when new tools are available. NativeAgent now tracks
a tool fingerprint and appends a system prompt note listing current
tools when the inventory changes, resetting on session reset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 11:41:31 -08:00
William Valentin 9be8f76bc7 feat: implement Tier 3 features — lane queue, credential redaction, token dashboard, xAI, Voyage AI
- Lane Queue: per-session FIFO queue in gateway replacing reject-when-busy (9 tests)
- Credential Redaction: redactConfig() expanded to cover 18+ secret fields (16 tests)
- Web UI Token Dashboard: system.tokenUsage endpoint + Usage page with summary cards
- xAI (Grok) Provider: OpenAI-compatible client with model pricing
- Voyage AI Embeddings: new embedding provider with configurable dimensions (5 tests)
- Update gap analysis: 90→95 match (70%→74%), Tier 3 section marked DONE
- Update state.json: test count 1001→1034, add tier3_completion entry

Total: 1034 tests passing across 85 files, typecheck clean
2026-02-09 10:32:57 -08:00
William Valentin 1c2f54fae3 feat: implement tier 1 quick wins (tool groups, typing, pruning, verbose, think)
Five additive features with no breaking changes:

- Tool groups: group:fs, group:runtime, group:web, group:memory syntactic
  sugar for allow/deny lists in tool policy config
- Typing indicators: Discord sendTyping() and WhatsApp sendStateTyping()
  on message receipt for better UX feedback
- Session pruning: TTL-based auto-cleanup via sessions.ttl config with
  hourly daemon timer and SQLite GROUP BY pruning
- /verbose command: TUI command parser toggle for raw streaming display
- !!think prefix: per-message extended thinking mode wired through
  Anthropic (budget_tokens), OpenAI/GitHub (reasoning_effort), and
  Gemini (thinkingConfig) providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:35:00 -08:00
William Valentin 6bb424cddc feat: add agent tools and sanitize tool names for Anthropic API
Add 8 new agent-callable tools (sessions.list/history/create/delete,
agents.list, message.send, cron.list/trigger) and sanitize tool names
at the API boundary (dots → underscores) to comply with Anthropic's
`^[a-zA-Z0-9_-]{1,128}` requirement. Reverse-maps sanitized names
back to internal names for hook callbacks and tool execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 12:23:09 -08:00
William Valentin b9bfee9c5b feat: add outbound attachment support with media.send tool
Introduces OutboundAttachment type on OutboundMessage, an
OutboundAttachmentCollector (push/drain pattern), and a media.send
tool that queues files for outbound delivery. Each channel adapter
(Telegram, Discord, Slack, WhatsApp) sends attachments after the
text reply. Includes 15 tests for collector and tool.
2026-02-07 09:09:00 -08:00
William Valentin a515912537 feat: add multimodal media pipeline for image support across all providers and channels
Widen Message.content from string to string | MessageContentPart[] to support
multimodal content. Add Attachment type to channel layer, media conversion
utilities, and image extraction to all channel adapters (Telegram, Discord,
Slack, WhatsApp). Update all model clients (Anthropic, OpenAI, Gemini, Bedrock)
to convert structured content to provider-specific formats. Fix downstream
consumers (tokens, compaction, TUI, local models) to handle the widened type
via getMessageText() helper.
2026-02-06 17:17:21 -08:00
William Valentin ee0af0cc06 feat: add tool allow/deny profiles with per-agent and per-provider filtering
Implements configurable tool filtering with four built-in profiles
(minimal, messaging, coding, full), global and per-agent/per-provider
allow/deny lists with glob pattern support, and defense-in-depth
enforcement at both tool listing and execution time.

New: src/tools/policy.ts (ToolPolicy engine), src/tools/policy.test.ts (37 tests)
Modified: config schema, tool registry, tool executor, NativeAgent,
AgentOrchestrator, daemon wiring, gateway tool handler, test mocks
2026-02-06 15:30:34 -08:00
William Valentin 4316dbd3be feat: add P2 features — retry policy, prompt templating, usage tracking, tech debt cleanup
- Extract shared splitMessage() into channels/utils.ts (dedup 4 adapters)
- Add Slack user name resolution with caching (users.info API)
- Add withRetry() with exponential backoff + jitter, isRetryable() filter
- Wire retry config into ModelRouter.chat() (non-streaming only)
- Add assembleSystemPrompt() multi-file template system (SOUL/AGENTS/IDENTITY/USER/TOOLS.md)
- Add usage tracking accumulators in NativeAgent + AgentOrchestrator
- Add estimateCost() with per-model pricing table
- Add /usage TUI command with full usage report formatting
- Add retrySchema and promptSchema to config schema

Tests: 569 passing, typecheck clean
2026-02-06 15:12:35 -08:00
William Valentin 7a35b22458 feat: wire up all Phase 2-6 features into daemon and config
Integrate all new features into the shared infrastructure:
- Config schema: add memory, discord, slack, process, web_search schemas
- Daemon wiring: memory store init, tool registration, channel adapters
- Orchestrator: memory injection into system prompt, extraction on compaction
- Agent: add setSystemPrompt() for dynamic prompt updates
- Channel/tool index: export new adapters and tool factories
- Add @slack/bolt, discord.js, turndown, linkedom, @mozilla/readability deps
- Update state.json with Phase 3b completion (494 tests passing)
2026-02-06 14:24:39 -08:00
William Valentin 306e11bd2e feat: add multi-model delegation (Phase 0) and context compaction (Phase 1)
Phase 0 — Multi-Model Delegation:
- AgentOrchestrator wraps NativeAgent with delegate() for stateless
  single-turn calls to any model tier (fast/default/complex/local)
- DelegationConfig maps task types (compaction, classification, etc.)
  to model tiers
- Delegation prompts for compaction, memory extraction, classification,
  and tool summarisation
- Per-tier usage tracking for cost visibility
- Config schema: agents.delegation and agents.primary_tier

Phase 1 — Context Compaction:
- Token estimation (char/4 heuristic) with context window lookup
- shouldCompact() threshold check against context window percentage
- compactHistory() splits old/recent messages, delegates summary to
  fast tier, returns CompactionResult
- Automatic compaction in AgentOrchestrator.process() when configured
- Force-compact via orchestrator.compact() with session persistence
- Session.replaceHistory() with atomic SQLite transaction
- /compact TUI command with feedback on compacted token counts
- Config schema: compaction.enabled, threshold_pct, keep_turns,
  summary_max_tokens

Tests: 385 passing across 50 files (22 new tests in 2 new test files)
2026-02-06 13:17:02 -08:00
William Valentin e4b7f96d33 fix: provider-aware model routing with fallback visibility
- Extract createClientFromConfig() to dispatch on provider field instead
  of hardcoding all tiers as AnthropicClient
- Add fallback/fallbackReason metadata to ChatResponse and ChatStreamEvent
  so callers know when a fallback model was used
- Enhance doctor check to report full model stack and warn on missing
  API keys for cloud providers
- Log fallback warnings in NativeAgent and display them in TUI
- Support tier names and local_providers entries in fallback_chain
- Add 8 tests for createClientFromConfig covering all provider types
2026-02-06 09:58:56 -08:00
William Valentin ad7fc241f1 feat(telegram): display tool execution status messages
Telegram bot now shows tool status during execution:
- Sends status message when tool starts (tool name + args snippet)
- Edits status message with result on completion
- Keeps typing indicator active during tool execution
- Adds setOnToolUse() to NativeAgent for per-message callback control
2026-02-05 17:53:54 -08:00
William Valentin 4f87643341 feat(agent): add iterative tool use loop with max iterations
Rewrites NativeAgent.process() from single-turn to an iterative tool
loop. When toolRegistry and toolExecutor are provided, the agent calls
the model, executes any requested tool calls, feeds results back, and
loops until the model returns a text response or max iterations hit.

- Backward compatible: works exactly as before without tools
- Supports onToolUse callback for frontend status display
- Max iterations (default 10) prevents infinite loops
- Handles multiple tool calls per model response
- 5 new tests (8 total)
2026-02-05 17:48:38 -08:00
William Valentin f891c7aee8 fix: add API key/auth token support across all model clients 2026-02-05 10:56:40 -08:00
William Valentin fb7575f850 refactor: integrate SessionManager into daemon and agent
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:43:09 -08:00
William Valentin 6e6c263e14 feat: integrate model router, session persistence, and hook engine
- NativeAgent now loads/saves messages to SessionStore
- Daemon creates ModelRouter with fallback chain support
- Telegram bot handles confirmation callbacks from HookEngine
- Session data stored in ~/.local/share/flynn/sessions.db

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:05:42 -08:00
William Valentin 69309e58bc feat: add native agent with conversation history 2026-02-02 20:56:46 -08:00