21 KiB
21 KiB
Flynn Codebase Audit Report
Date: 2026-02-16
Scope: Production-risk-first audit of bugs, code improvements, and feature opportunities.
Remediation Status (2026-02-16)
- ✅ F-001 addressed: chat markdown rendering now sanitizes HTML before DOM insertion in
src/gateway/ui/pages/chat.js(and legacysrc/gateway/ui/chat.html). - ✅ F-006 addressed: inbound HTTP request bodies now enforce a configurable max-size limit (
server.max_request_body_bytes) with413 Payload Too Largeresponses. - ✅ F-007 addressed:
ToolExecutortimeout timer handles are now cleared infinally, preventing orphan timers on fast/failed tool calls. - ✅ F-016 addressed: gateway + webhook body readers are consolidated in shared utility
src/utils/httpBody.tswith size-limit enforcement. - ✅ F-005 addressed: ESLint JS globals now include
FileReader, removing UI false-positive lint failures for attachment handling code. - ✅ F-010 addressed:
session.compactaudit events now emit actual message counts formessages_before/messages_after(tokens remain in token fields). - ✅ F-012 addressed: synthetic repeated-tool nudge no longer emits invalid
tool_result.tool_use_id; nudge is injected as plain user text guidance. - ✅ F-009 addressed: gateway now enforces per-connection WebSocket ingress rate limits with deterministic throttle errors and close-on-repeated-violation behavior.
- ✅ F-008 addressed: WhatsApp Chromium launch is now sandboxed by default; no-sandbox mode is behind explicit
whatsapp.no_sandbox: trueopt-in. - ✅ F-014 addressed:
ModelRouter.setOnTierChangenow preserves existing listeners instead of replacing them, removing destructive listener-setter behavior. - ✅ F-002 addressed:
config.patchnow supports durable persistence via atomic write + backup when daemon has a concrete config path, and response includespersisted/persistErrorso UI can distinguish runtime-only vs disk-persisted updates. - ✅ F-003 addressed: tool execution now has an
AbortSignalcontract, executor triggers abort on timeout, high-risk tools (shell.exec, sandbox docker exec,process.start, browser tools,web.fetch,web.search) respond to cancellation, and executor regression tests verify cancellable tools do not apply side effects after timeout. - ✅ F-015 addressed: retry defaults no longer classify timeout-style failures as non-retryable, improving resilience for transient timeout conditions.
- ✅ F-011 addressed: Slack user-name resolution now uses bounded TTL+LRU caching to prevent unbounded growth.
- ◑ F-013 partially addressed: reset-command normalization is now shared across Discord/Slack/WhatsApp adapters via
src/channels/utils.ts, reducing duplicated command-parsing logic.
Executive Summary
Current health snapshot:
pnpm typecheck: passingpnpm build: passingpnpm test:run: passing (140/140files,1773/1773tests)pnpm lint: failing (148 errors,530 warnings)
Top conclusions:
- A critical Web UI security issue exists in markdown rendering (unsanitized HTML insertion).
- Runtime configuration edits from the settings page appear non-persistent across restart.
- Tool timeout behavior likely allows underlying side effects to continue after timeout.
- Gateway request-body handling and WebSocket ingress controls need abuse protections.
- Lint quality gates are currently broken at scale, reducing CI signal quality.
Methodology and Scope
Method used:
- Static inspection of core runtime, gateway, UI, and tooling modules.
- Command validation via:
pnpm -s typecheckpnpm -s buildpnpm -s test:runpnpm -s lint
Scope priorities:
- Security and production reliability
- Runtime correctness and operational safety
- Maintainability and engineering throughput
- New features with measurable user/ops value
Non-goals:
- No code changes in this step
- No architecture rewrite proposals without incremental path
Findings (Bugs and Risks)
F-001 Critical: Web UI markdown XSS surface in chat rendering
- Severity: Critical
- Impact: Arbitrary script/HTML injection in browser clients if model/tool output includes malicious markup.
- Evidence:
src/gateway/ui/pages/chat.js:63src/gateway/ui/pages/chat.js:610
- Detail:
- Assistant/system content is rendered via
marked.parse(...)then inserted usinginnerHTML. - No sanitizer or trusted-types/CSP-backed cleanup is applied before insertion.
- Assistant/system content is rendered via
- Repro trigger:
- Assistant response containing HTML payload (for example, event handlers or scriptable tags) in markdown output path.
- Recommended fix:
- Introduce HTML sanitization (DOMPurify or equivalent) between markdown parse and DOM insertion.
- Add restrictive CSP for UI surface.
- Disable raw HTML passthrough in markdown parser if not required.
- Tests to add:
- Unit tests for sanitization of script/event-handler payloads.
- Integration test confirming rendered output excludes executable content.
F-002 High: Settings hook edits are runtime-only and likely not durable
- Severity: High
- Impact: Operators may believe settings were saved permanently when they only changed in-memory state.
- Evidence:
src/gateway/ui/pages/settings.js:148src/gateway/handlers/config.ts:135
- Detail:
config.patchmutates the activeconfigobject.- No file write-back or persistence layer call is present in handler path.
- Repro trigger:
- Save hook patterns from UI, restart process, observe values reset to file-backed config.
- Recommended fix:
- Add persisted config patch pipeline (atomic write + validation + backup).
- Return
persisted: true|falsein response to avoid misleading UI status.
- Tests to add:
- Restart-persistence integration test for
hooks.confirm/log/silent. - Error path test for rejected file write with unchanged runtime config.
- Restart-persistence integration test for
F-003 High: Tool timeout does not cancel underlying execution
- Severity: High
- Impact: Timed-out tools can continue executing side effects after caller has received timeout failure.
- Evidence:
src/tools/executor.ts:228
- Detail:
- Timeout is implemented with
Promise.race(...). - No cancellation token/abort path is propagated to long-running tool implementations.
- Timeout is implemented with
- Repro trigger:
- Invoke a long-running side-effecting tool, timeout occurs, side effect completes anyway.
- Recommended fix:
- Add abort signal support to tool contract and propagate cancellation in executor.
- Enforce cancellation support for high-risk tools (
shell.exec,process.start, web/browser mutating tools).
- Tests to add:
- Timeout + cancellation conformance tests for cancellable tools.
- Regression tests verifying no post-timeout side effects.
Remediation update (2026-02-16):
- Abort propagation now covers executor -> context signal ->
process.start, browser tools, and web fetch/search tools. - Added aborted-signal regression tests for these tool paths.
- Added timeout regression coverage in
src/tools/executor.test.tsto verify side effects are prevented after timeout-triggered abort for cancellable tools.
F-004 Medium: Lint quality gate is broken and concentrated in key runtime files
- Severity: Medium
- Impact: CI noise, reduced confidence in static analysis, and slower defect detection.
- Evidence:
pnpm -s lint=>148 errors,530 warnings- Error concentration:
src/daemon/models.ts(90 errors)src/cli/tui.ts(25 errors)src/daemon/routing.ts(14 errors)src/gateway/ui/pages/settings.js(14 errors)
- Detail:
- Most errors are style/indentation, but gate failure blocks meaningful lint signal.
- High warning volume includes many
anyand non-null assertions in runtime paths.
- Recommended fix:
- Two-stage lint recovery:
- Clear all error-level violations in top files.
- Burn down warning debt in runtime (non-test) modules by policy.
- Two-stage lint recovery:
- Tests to add:
- CI check enforcing
eslinterrors = 0. - Secondary threshold check for warning reduction trend.
- CI check enforcing
F-005 Medium: ESLint browser globals mismatch causes avoidable UI lint failures
- Severity: Medium
- Impact: False-positive lint failures and friction for frontend changes.
- Evidence:
src/gateway/ui/pages/chat.js:127(FileReadernot defined in lint)eslint.config.jsbrowser globals block does not includeFileReader.
- Recommended fix:
- Add missing browser globals used by UI pages, or set UI files to browser env with parser options.
- Tests to add:
- Lint regression case for UI page using file attachment APIs.
F-006 High: Request body reader has no size limit (memory DoS risk)
- Severity: High
- Impact: Large POST bodies can force unbounded buffering and memory pressure/crash.
- Evidence:
src/gateway/server.ts:446
- Detail:
readRequestBodyconcatenates all chunks without max-size guard.- Path is used by inbound handlers (webhooks/Gmail push route handling).
- Repro trigger:
- Send very large request body to exposed POST endpoint.
- Recommended fix:
- Add strict max body size (for example 1 MB default, configurable).
- Stop reading and destroy request when threshold is exceeded.
- Return
413 Payload Too Large.
- Tests to add:
- Body-under-limit succeeds.
- Body-over-limit terminates early and returns 413.
F-007 Medium: Tool timeout race leaves orphan timers
- Severity: Medium
- Impact: Pending timeouts accumulate under load, increasing event-loop overhead.
- Evidence:
src/tools/executor.ts:228
- Detail:
- Timeout promise created in
Promise.raceis never cleared when tool resolves quickly. - This is distinct from the cancellation gap in F-003.
- Timeout promise created in
- Recommended fix:
- Store timer handle and
clearTimeout(...)infinallyafter race settles.
- Store timer handle and
- Tests to add:
- Repeated short tool calls do not grow active timer count.
F-008 Medium: WhatsApp channel launches Chromium without sandbox
- Severity: Medium
- Impact: Browser compromise has higher blast radius when sandbox is disabled.
- Evidence:
src/channels/whatsapp/adapter.ts:95
- Detail:
- Puppeteer launch args include
--no-sandboxand--disable-setuid-sandbox.
- Puppeteer launch args include
- Recommended fix:
- Default to sandboxed mode.
- Gate no-sandbox behind explicit high-trust config with warning logs.
- Tests to add:
- Config test verifies secure default launch args.
F-009 Medium: Gateway WebSocket ingress lacks rate limiting
- Severity: Medium
- Impact: Abuse can cause resource starvation and model API cost spikes.
- Evidence:
src/gateway/server.ts:291
- Detail:
- Incoming WebSocket messages are processed without per-connection throttling.
- Recommended fix:
- Add token-bucket or leaky-bucket rate limit by connection/session/IP.
- Add backoff/close behavior for repeated violations.
- Tests to add:
- Burst traffic triggers throttling and does not starve normal traffic.
F-010 Medium: Compaction audit event logs token counts in message-count fields
- Severity: Medium
- Impact: Audit accuracy and observability metrics are corrupted.
- Evidence:
src/backends/native/orchestrator.ts:311
- Detail:
sessionCompactevent uses token counts where message counts are expected.
- Recommended fix:
- Emit actual pre/post message counts from compaction result.
- Tests to add:
- Audit event schema/value validation for compaction events.
F-011 Low: Slack user-name cache has no eviction strategy
- Severity: Low
- Impact: Slow unbounded memory growth in long-running Slack deployments.
- Evidence:
src/channels/slack/adapter.ts:66
- Recommended fix:
- Use LRU/TTL cache with maximum entry count.
Remediation update (2026-02-16):
SlackAdapteruser-name cache now has TTL expiry and max-entry LRU eviction behavior.- Added cache regression tests for cache-hit reuse and TTL refresh in
src/channels/slack/adapter.test.ts.
F-012 Low-Medium: Synthetic tool nudge uses invalid tool_use_id
- Severity: Low-Medium
- Impact: Strict providers may reject tool_result blocks that reference unknown IDs.
- Evidence:
src/backends/native/agent.ts:347
- Recommended fix:
- Emit nudge as plain assistant text or tie to valid preceding tool-use ID.
F-013 Low: Channel adapters duplicate pairing/reset/mention/chunking logic
- Severity: Low
- Impact: Behavior drift and higher maintenance cost across channel implementations.
- Evidence:
src/channels/discord/adapter.tssrc/channels/slack/adapter.tssrc/channels/whatsapp/adapter.ts
- Recommended fix:
- Extract shared middleware utilities for common inbound/outbound behaviors.
Remediation update (2026-02-16):
- Added shared
normalizeResetCommandText()utility and migrated Discord/Slack/WhatsApp adapters to use it, reducing repeated reset-command parsing logic.
F-014 Low: ModelRouter listener API has destructive setter footgun
- Severity: Low
- Impact:
setOnTierChangesilently replaces listeners added through append API. - Evidence:
src/models/router.ts:69
- Recommended fix:
- Deprecate/remove setter or explicitly document destructive semantics.
F-015 Low: Retry defaults classify timeouts as non-retryable
- Severity: Low
- Impact: Reduces resilience for transient timeout failures.
- Evidence:
src/models/retry.ts:29
- Recommended fix:
- Remove timeout markers from non-retryable defaults or make behavior configurable.
F-016 Low: Duplicate request-body utility logic in two modules
- Severity: Low
- Impact: Duplicate maintenance surface and inconsistent future behavior.
- Evidence:
src/gateway/server.ts:446src/automation/webhooks.ts:17
- Recommended fix:
- Centralize in shared utility with size-limit support.
Remediation update (2026-02-16):
- Request body parsing is centralized in
src/utils/httpBody.tsand consumed by both gateway server and webhook handler paths.
F-017 Low: NativeAgent history getter returns mixed mutability semantics
- Severity: Low
- Impact: Caller behavior differs by storage path (by-reference vs copied array).
- Evidence:
src/backends/native/agent.ts:72
- Recommended fix:
- Return immutable copy consistently across both paths.
F-018 Low: Session pruning interval has no jitter
- Severity: Low
- Impact: Multi-instance deployments can synchronize prune spikes.
- Evidence:
src/daemon/index.ts:91
- Recommended fix:
- Add randomized jitter window to pruning schedule.
F-019 Low: _restoreHistory no-op when no session is attached
- Severity: Low
- Impact: Sessionless recovery path cannot rollback conversational state.
- Evidence:
src/backends/native/orchestrator.ts:499
- Recommended fix:
- Keep explicit in-memory rollback fallback for sessionless mode.
Code Improvements (Prioritized Backlog)
| Priority | Item | Why | Effort | Dependencies | Acceptance Criteria |
|---|---|---|---|---|---|
| P0 | Secure markdown render pipeline in gateway UI | Closes critical XSS class | M | UI markdown stack + sanitizer lib | All assistant/tool markdown is sanitized before innerHTML; security tests pass |
| P0 | Add request-body hard limits in gateway/webhooks | Prevents memory DoS on exposed POST endpoints | S | Shared body reader utility | Oversized payloads are rejected with 413 and no unbounded buffering |
| P0 | Persistent config patch service | Prevents config drift and operator confusion | M | Config loader/writer + schema validation | config.patch writes atomically, survives restart, reports persistence status |
| P1 | Tool execution cancellation contract | Prevents post-timeout side effects | L | Tool type updates + executor + tool impls | Timeout triggers cancellation and verified no continued side effects |
| P1 | Fix timeout timer leak in ToolExecutor | Prevents event-loop timer buildup under load | S | ToolExecutor timeout refactor | Timer is always cleared when tool execution settles |
| P1 | WebSocket ingress rate limiting | Mitigates abuse and cost/resource exhaustion | M | Gateway connection/session identity | Excess message bursts are throttled or rejected deterministically |
| P1 | Lint gate restoration program | Recovers static-analysis trust | M | ESLint config + high-error files | pnpm lint returns 0 errors on main branch |
| P2 | Reduce runtime any/non-null usage |
Lowers hidden runtime risk | M | Typed wrappers for channel/model SDKs | Warning count reduced in non-test runtime files with no behavior regressions |
| P2 | Harden WhatsApp browser sandbox defaults | Reduce browser compromise blast radius | M | Channel config schema + adapter launch options | Secure-by-default launch path without --no-sandbox |
| P2 | Decompose oversized modules | Improves maintainability and review velocity | L | Module boundary design | src/daemon/routing.ts, src/frontends/tui/minimal.ts, src/cli/skills.ts split into cohesive units |
Potential New Features (Prioritized Roadmap)
R-001 Runtime Config Persistence and Versioning
- Problem: Runtime edits are not clearly durable.
- Feature: Versioned config store with atomic patching and rollback.
- User value: Safe live updates with auditability.
- Effort: Medium
- Dependencies: Config writer, schema validator, backup policy.
- Acceptance criteria:
- Every patch yields version ID.
- Restart restores latest committed version.
- Rollback endpoint reverts to previous version safely.
R-002 Tool Cancellation and Kill-Switch API
- Problem: Timeout without cancellation can leave unsafe side effects.
- Feature:
AbortSignal-aware tool interface +tool.cancelcontrol path. - User value: Safer automation under failure conditions.
- Effort: Large
- Dependencies: Executor, tool contracts, process/browser tools.
- Acceptance criteria:
- Cancel request stops active cancellable tools within SLA.
- Audit events include cancel reason and final state.
R-003 Security-Hardened Markdown Renderer
- Problem: Model output is untrusted content.
- Feature: Sanitized markdown renderer with strict allowed-tags policy.
- User value: Safe web chat experience by default.
- Effort: Medium
- Dependencies: UI renderer + CSP headers.
- Acceptance criteria:
- Red-team payload test suite passes.
- No executable markup reaches final DOM.
R-004 Quality Health Endpoint
- Problem: No first-class quality telemetry for lint/build/test status.
- Feature:
system.qualityendpoint exposing latest CI/local gate results. - User value: Faster operator diagnosis and release confidence.
- Effort: Small
- Dependencies: Build metadata collection.
- Acceptance criteria:
- Endpoint reports lint/typecheck/test/build state with timestamp.
- Dashboard view consumes and displays status.
R-005 Config Drift Indicator in Dashboard
- Problem: Runtime and file-backed config can diverge invisibly.
- Feature: Drift warning badge and diff preview in settings page.
- User value: Reduces operational mistakes.
- Effort: Medium
- Dependencies: Persisted config support and hash/diff helper.
- Acceptance criteria:
- UI shows
in-syncvsdriftedstate. - Operator can preview and reconcile diff.
- UI shows
R-006 Tool Safety Simulator (Dry-Run Policy Check)
- Problem: Policy/hook changes are hard to validate before production use.
- Feature: Simulate tool calls against policy, autonomy, and hooks without execution.
- User value: Safer policy rollout.
- Effort: Medium
- Dependencies: Tool policy engine and hook decision surface.
- Acceptance criteria:
- Given tool name/args/context, simulator returns allow/deny + reason chain.
- Simulation traces can be exported for review.
R-007 Gateway Abuse Protection Pack
- Problem: WebSocket and POST ingress currently lack explicit anti-abuse controls.
- Feature: Unified ingress controls (body-size limits, per-connection rate limits, adaptive penalties).
- User value: More predictable reliability and lower abuse-induced cost.
- Effort: Medium
- Dependencies: Gateway server, shared request parsing, metrics.
- Acceptance criteria:
- Oversized payloads rejected early.
- Rate violations are metered and observable.
- Healthy traffic retains low latency under synthetic abuse tests.
30/60/90-Day Execution Sequence
0-30 Days
- Ship markdown sanitization and CSP hardening.
- Add POST body-size limits and shared body reader.
- Restore lint error baseline to zero in mainline.
- Add config patch persistence contract proposal and tests.
31-60 Days
- Implement persisted
config.patchwith versioning and rollback. - Introduce cancellable tool interface and migrate high-risk tools first.
- Implement WebSocket rate limiting and abuse observability.
- Add dashboard drift indicator.
61-90 Days
- Complete cancellation coverage across remaining tools.
- Launch quality health endpoint and dashboard integration.
- Add policy simulation tool and operator workflow docs.
Appendix: Command Evidence
Commands run:
pnpm -s typecheck
pnpm -s build
pnpm -s test:run
pnpm -s lint
Observed outcomes:
- Typecheck/build/test: passing.
- Lint: failing with
148 errorsand530 warnings.
Top lint error concentration snapshot:
src/daemon/models.ts: 90 errorssrc/cli/tui.ts: 25 errorssrc/daemon/routing.ts: 14 errorssrc/gateway/ui/pages/settings.js: 14 errors