Commit Graph

370 Commits

Author SHA1 Message Date
William Valentin bd754d520e feat(skills): add dry-run installer planning surface 2026-02-12 17:56:51 -08:00
William Valentin 81d04357a1 feat(skills): validate manifest installer specs 2026-02-12 17:52:53 -08:00
William Valentin bd29afeaff chore(skills): improve watcher event observability 2026-02-12 17:40:41 -08:00
William Valentin 333e33f30f feat(skills): target watcher updates with safe fallback 2026-02-12 17:36:32 -08:00
William Valentin 2fb5c9adab feat(skills): reload registry on watcher change events 2026-02-12 17:30:23 -08:00
William Valentin b773e2bbf3 feat(skills): enable watcher wiring through daemon lifecycle 2026-02-12 17:18:22 -08:00
William Valentin 95091cc198 feat(skills): add debounced watcher foundation for phase 2 2026-02-12 17:15:46 -08:00
William Valentin 0a19f01639 feat(doctor): surface skill directory health in diagnostics 2026-02-12 17:05:04 -08:00
William Valentin fc3d2ab4d8 feat(skills): add refresh summary for discovery health 2026-02-12 17:02:23 -08:00
William Valentin 2d753321b3 feat(skills): guard uninstall with explicit confirmation 2026-02-12 16:59:50 -08:00
William Valentin d5b7d72e5d feat(skills): add install dispatch for local skill setup 2026-02-12 16:50:25 -08:00
William Valentin 0d84a6bccc feat(skills): add info command for skill inspection 2026-02-12 16:44:46 -08:00
William Valentin b3e5aee333 feat(skills): expose list command for skill visibility 2026-02-12 16:42:00 -08:00
William Valentin 90ce622080 feat(policy): enforce truthfulness and autonomy guardrails
Add runtime truthfulness modes and autonomy-level tool gating with audit metadata for overrides/denials.

Wire policy through prompt assembly, tool execution context, and daemon/gateway agent paths; update tests and planning state for Phase 3 PR #2 completion.
2026-02-12 16:06:45 -08:00
William Valentin a8a2c59313 feat: implement model persistence with per-session overrides
- Add session_config SQLite table for per-session settings
- Update routing to support session override → agent config → global default resolution chain
- Upgrade WebChat SessionBridge from NativeAgent to AgentOrchestrator
- Add /model, /local, /cloud commands to Telegram adapter
- Add /model command to WebChat gateway handlers
- Clear session overrides on /reset command
- Pass memoryStore and config through to SessionBridge
- Add comprehensive tests for all new functionality

Fixes model persistence bug where TUI model changes didn't affect WebChat/Telegram sessions. Now:
- TUI /model sets global default (persists across restarts, affects all new sessions)
- WebChat/Telegram /model sets session override (only that conversation, cleared on /reset)
- WebChat sessions gain AgentOrchestrator features (delegation, compaction, memory)
2026-02-11 21:51:38 -08:00
William Valentin c62dad2e2e docs: update state.json with native audio support feature and test count (1369) 2026-02-11 18:27:50 -08:00
William Valentin fae3565480 docs(skills): add skills infrastructure plan
- Three-phase plan for skills system improvements
- Phase 1: Command Dispatch (flynn skills CLI commands)
- Phase 2: Skills Watcher (auto-reload with chokidar)
- Phase 3: Installer Specs (auto-install brew/node/go/download)
- Model strategy: glm-4.7-flash for mechanical, glm-4.7 for complex
- Estimated 8-11 hours total
2026-02-11 14:48:21 -08:00
William Valentin 85d7a6bfec test: add stopReason edge case tests; update state.json with recent fixes
- Added tests for finish_reason 'tool_calls' with empty array → 'end_turn'
- Added test for finish_reason 'length' → 'max_tokens'
- Updated state.json with 4 new entries for today's fixes (SOUL.md, message
  normalization, agent loop resilience, stopReason normalization)
- Test count: 1329 → 1331
2026-02-11 09:51:19 -08:00
William Valentin 5270234bbb feat: improve tool usage guidance in SOUL.md and add cron.create/cron.delete tools
- SOUL.md: list all available tools (web.search, memory.*, cron.*, etc.)
  and add Tool Usage Rules section enforcing 'act, don't narrate'
- cron.ts: add getJob(), addJob(), removeJob() to CronScheduler for
  runtime (ephemeral) cron job management
- cron tools: add cron.create and cron.delete tools, enhance cron.list
  to show schedule/output/message details
- policy.ts: add cron tools to messaging and coding profiles, add
  group:cron to tool groups

Fixes issue where models would narrate tool intent ('let me search...')
then stop without actually calling tools.
2026-02-11 09:32:36 -08:00
William Valentin 27ee3b2c10 feat(webchat): add copy and edit buttons on chat messages
Copy button on all messages (clipboard API with checkmark feedback).
Edit button on user messages populates the input textarea.
Buttons appear on hover (desktop) or always visible (mobile).
2026-02-10 20:53:49 -08:00
William Valentin 4c8ba3f20c feat(webchat): add slash commands, autocomplete popup, and web search button
Add 6 slash commands (/help, /reset, /compact, /usage, /status, /model)
with autocomplete popup (arrow keys, Enter/Tab/Escape navigation).
Search button toggles web search mode by prepending instruction to message.
Backend agent.send extended with metadata for server-side command routing.
2026-02-10 20:45:14 -08:00
William Valentin bf9ca690f3 fix(agent): detect repeated tool call loops and make max_iterations configurable
Local LLMs often get stuck calling the same tool repeatedly because they
lack the sophistication to synthesize results. The agent loop had no
safeguard — it re-executed whatever the model requested up to 10 times.

Add fingerprint-based loop detection: if the same tool+args combination
repeats 3 consecutive times, break the loop and return the last results.
Also add agents.max_iterations to the config schema so the iteration
limit is user-configurable (default: 10).
2026-02-10 19:35:09 -08:00
William Valentin 4ce8e81c01 fix(gmail): sanitize HTML entities and tags in tool output
Gmail API returns snippets with HTML entities (&amp;, &#39;, <br>, etc.)
that leaked into LLM responses as raw HTML. Added shared sanitizeHtml()
utility in src/utils/html.ts and applied it to gmail tool snippets,
HTML body fallback, and gmail watcher snippets.
2026-02-10 16:30:14 -08:00
William Valentin 4317492e4b docs: update state.json with TUI fullscreen improvements and test count (1268) 2026-02-10 13:29:14 -08:00
William Valentin ff03f74404 feat(cli): add gmail-auth command for OAuth2 token setup
Implements `flynn gmail-auth` to complete the OAuth2 flow that
GmailWatcher references but was never built. Supports local callback
server (default) and --manual paste mode. Adds Gmail health check
to `flynn doctor`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 10:33:01 -08:00
William Valentin f9446a4d67 docs: update gap analysis and state.json for setup wizard
Mark onboard wizard as MATCH (100/128, 78%). Update test count to 1151.
Add setup-wizard plan entry to state.json.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 09:44:52 -08:00
William Valentin 48fab11066 docs: add setup wizard implementation plan
9 tasks with TDD approach: prompt helpers, config builder, provider/channel
flows, menu sections, orchestrator, CLI wiring, integration tests. ~29 new
tests, 13 new files, 0 new dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 09:19:21 -08:00
William Valentin 6b426a1e52 docs: add setup wizard design
Interactive setup wizard with two entry points: auto-trigger on
first run (no config detected) and explicit `flynn setup` command.
Minimal-first flow for quick start, menu-driven for reconfiguration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10 09:14:45 -08:00
William Valentin bab3f26ef6 docs: update pairing docs with SQLite persistence and TUI execution details 2026-02-09 22:09:30 -08:00
William Valentin 99b7e743f4 docs: update state.json with pairing persistence and TUI wiring 2026-02-09 22:05:21 -08:00
William Valentin 7065b5e650 docs: update state.json with 03-01 metrics backend completion
- Test count: 1087 → 1107
- Added operator_dx_milestone tracking
- Updated next_up with GSD phase 3 status
2026-02-09 21:31:30 -08:00
William Valentin 7565d55551 docs: update state.json with log-level system feature 2026-02-09 21:23:57 -08:00
William Valentin e86653fc14 docs: update state.json and gap analysis scorecard for Tier 4 completion (77% match rate) 2026-02-09 18:30:42 -08:00
William Valentin 9be8f76bc7 feat: implement Tier 3 features — lane queue, credential redaction, token dashboard, xAI, Voyage AI
- Lane Queue: per-session FIFO queue in gateway replacing reject-when-busy (9 tests)
- Credential Redaction: redactConfig() expanded to cover 18+ secret fields (16 tests)
- Web UI Token Dashboard: system.tokenUsage endpoint + Usage page with summary cards
- xAI (Grok) Provider: OpenAI-compatible client with model pricing
- Voyage AI Embeddings: new embedding provider with configurable dimensions (5 tests)
- Update gap analysis: 90→95 match (70%→74%), Tier 3 section marked DONE
- Update state.json: test count 1001→1034, add tier3_completion entry

Total: 1034 tests passing across 85 files, typecheck clean
2026-02-09 10:32:57 -08:00
William Valentin ffa63a435e docs: update test count to 1001 2026-02-07 17:45:03 -08:00
William Valentin a0f5584220 docs: update CHANGELOG, state.json, and default config for local model tool calling 2026-02-07 17:27:27 -08:00
William Valentin fcbab1e1ee docs: document system.info tool and runtime context in README, CHANGELOG, and gap analysis 2026-02-07 16:27:16 -08:00
William Valentin 8bf88049bf feat: add runtime context awareness — system.info tool + date/time in system prompt
- assembleSystemPrompt() now injects '# Runtime Context' with current date/time
- New system.info tool: date, time, hostname, platform, arch, uptime, memory, Node.js version
- Tool available in all profiles (minimal/messaging/coding/full)
- 983 tests passing (+7 new)
2026-02-07 16:22:17 -08:00
William Valentin be3363fdc8 docs: update state.json and gap analysis — file.patch + Gmail (87/116 = 75%) 2026-02-07 15:40:45 -08:00
William Valentin 308e7f228e docs: update state.json with tier 2 completion stats 2026-02-07 14:53:50 -08:00
William Valentin 93c0d64e8d docs: update gap analysis — mark Tier 2 as complete (85/116 = 73%) 2026-02-07 14:46:23 -08:00
William Valentin b50c140d25 feat: add Docker support and inbound webhooks (Tier 2)
- Dockerfile: multi-stage build (node:22-alpine), better-sqlite3 native deps handled
- .dockerignore + docker-compose.yml for deployment
- FLYNN_DATA_DIR env var support in daemon, CLI, and TUI
- WebhookHandler: ChannelAdapter for HTTP POST /webhooks/:name
- Per-webhook HMAC auth, template rendering ({{body}}, {{json.field}})
- Config schema: automation.webhooks array with name/secret/message/output
- Gateway routes webhook requests before static files (bypasses gateway auth)
- 23 new tests for webhook functionality, 874 total tests passing
2026-02-07 14:36:05 -08:00
William Valentin d5694649bf docs: update gap analysis for tier 1 implementation (65% → 69%)
Mark 5 features as MATCH: tool groups, session pruning, /think,
/verbose, typing indicators. Update scorecard (80/116 features),
remove completed Tier 1 section from remaining gaps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:36:22 -08:00
William Valentin 1c2f54fae3 feat: implement tier 1 quick wins (tool groups, typing, pruning, verbose, think)
Five additive features with no breaking changes:

- Tool groups: group:fs, group:runtime, group:web, group:memory syntactic
  sugar for allow/deny lists in tool policy config
- Typing indicators: Discord sendTyping() and WhatsApp sendStateTyping()
  on message receipt for better UX feedback
- Session pruning: TTL-based auto-cleanup via sessions.ttl config with
  hourly daemon timer and SQLite GROUP BY pruning
- /verbose command: TUI command parser toggle for raw streaming display
- !!think prefix: per-message extended thinking mode wired through
  Anthropic (budget_tokens), OpenAI/GitHub (reasoning_effort), and
  Gemini (thinkingConfig) providers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 13:35:00 -08:00
William Valentin 6bb424cddc feat: add agent tools and sanitize tool names for Anthropic API
Add 8 new agent-callable tools (sessions.list/history/create/delete,
agents.list, message.send, cron.list/trigger) and sanitize tool names
at the API boundary (dots → underscores) to comply with Anthropic's
`^[a-zA-Z0-9_-]{1,128}` requirement. Reverse-maps sanitized names
back to internal names for hook callbacks and tool execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 12:23:09 -08:00
William Valentin 130711a377 docs: update documentation for P7 web UI dashboard completion
- README: add Web UI Dashboard section, update features list with all
  current capabilities (multi-channel, media pipeline, sandboxing, etc.),
  expand model providers table, update architecture diagram
- CHANGELOG: add P7 entries (dashboard SPA, 4 new gateway handlers)
- state.json: add P7 entry with all 6 phases and file lists, update
  overall_progress to reflect P0-P7 completion
- web-ui-dashboard.md: mark as completed with detailed phase outcomes
2026-02-07 10:11:54 -08:00
William Valentin 22230a3e3f feat: add web UI dashboard SPA with dashboard, chat, sessions, and settings pages
- Add SPA shell with hash-based router, sidebar navigation, and WebSocket RPC client
- Add dashboard page with system health cards, channel status, and auto-refresh
- Add chat page with session selector, streaming tool events, and markdown rendering
- Add sessions page with list, history viewer, and delete functionality
- Add settings page with hook pattern editor, tool list, and config viewer
- Add backend handlers: sessions.delete, sessions.switch, system.channels, system.usage
- Wire channelRegistry into gateway server for channel status reporting
- Extend static file server with .mjs, .png, .ico, .woff2 content types
2026-02-07 10:07:45 -08:00
William Valentin f7d889e35e docs: update state.json with P6 completion and overall progress 2026-02-07 09:11:49 -08:00
William Valentin d4530a7034 feat: add runtime provider/model switching via /model <tier> <provider/model>
- ModelRouter: add setClient(), labels map, getLabel(), getAllLabels()
- TUI commands: parse /model <tier> <provider/model> syntax with autocompletion
- TUI minimal: handle provider switching via createClientFromConfig factory
- Daemon: wire initial labels into router config
- Fix /model alias mappings (opus=complex, sonnet=default, haiku=fast)
- Add design doc and update state.json with feature status
2026-02-06 23:42:14 -08:00
William Valentin cfdd448495 docs: add Docker sandbox and multi-agent routing design/implementation plans 2026-02-06 16:52:38 -08:00