Flynn Codebase Audit + Improvement Report

Date: 2026-02-24 Branch: feature/full-audit-hardening-and-config-consolidation

Executive Summary

I audited core Flynn wiring across config -> daemon -> router/providers -> automation/tools -> CLI/docs, then implemented high-safety fixes for auth hardening, router correctness, provider alignment, and config consolidation.

High-impact outcomes:

Google OAuth runtime is now centralized with store-first token loading, legacy token migration, and refresh persistence.
Model router fallback behavior now matches retry/fallback intent (no duplicate fallback attempts, retry policy applied on fallback paths).
OpenAI OAuth mode now fails fast when tools are requested (prevents silent non-executable tool output).
PaaS config is now generated from canonical default.yaml + overlay to prevent template drift.
flynn doctor now validates all Google automation services, not just Gmail.

Breaking behavior changes introduced:

models.fallback_chain schema default changed from ['anthropic'] to [] (avoids invalid fallback entries by default).
OpenAI OAuth requests with tools now throw explicit errors instead of returning warning text.

Findings (With File Pointers)

HIGH Google token persistence path caused runtime failures in restricted environments.

Cause: migration/store writes to ~/.config/flynn/auth.json could fail and abort tool execution.
Evidence: src/google/oauth.ts, src/auth/google.ts
Fix: auth store writes now tolerate known filesystem permission errors and preserve token-file compatibility.

HIGH OpenAI OAuth mode accepted tool-bearing requests without executable tool support.

Cause: OAuth Codex backend path did not support Flynn tool execution semantics.
Evidence: src/models/openai.ts
Fix: explicit throw when tools are present, enabling router fallback or config correction.

HIGH Router fallback execution did not fully match retry policy and could repeat the same failing client.

Cause: retry policy only wrapped primary path; fallback clients could be retried inconsistently; duplicate fallbacks possible.
Evidence: src/models/router.ts
Fix: attempted-client tracking and retry wrapping now apply to tier/global fallback chat paths and streaming fallback paths.

MEDIUM flynn doctor had incomplete feature wiring checks for Google services.

Cause: only Gmail automation health was validated.
Evidence: src/cli/doctor.ts, src/cli/doctor.ts
Fix: added service checks for Calendar/Docs/Drive/Tasks with auth-store and token-file detection.

MEDIUM Config profile overlap risk (manual config/paas.yaml drift from canonical defaults).

Cause: duplicated full config template with independent edits.
Evidence: config/profiles/paas.overlay.yaml, scripts/generate-config-profiles.mjs
Fix: canonical+overlay generation model, profile drift check, and sync test.

MEDIUM Default fallback chain schema value conflicted with router semantics.

Cause: default ['anthropic'] is not a tier/local-provider key in current router semantics.
Evidence: src/config/schema.ts
Fix: schema default set to [] to avoid spurious invalid fallback entries.

LOW Provider capability type list lagged configured providers.

Cause: ModelProvider union omitted vercel, minimax, moonshot, synthetic.
Evidence: src/models/capabilities.ts
Fix: union updated and test coverage expanded.

LOW Audit logger path expansion bug for ~.

Cause: logger configured rotator with expanded path but write stream used raw path.
Evidence: src/audit/logger.ts, src/audit/logger.ts
Fix: normalized path now used consistently by logger and rotator.

INFO Log-pattern analysis could not be completed from repository artifacts.

Cause: no runtime .log / audit JSONL artifacts present in workspace snapshot.
Evidence: repository scan returned no log files under repo root.
Mitigation: recommendations added below for repeatable log collection/analysis workflow.

Recommended Changes (Prioritized)

P0 Keep OpenAI OAuth tool rejection behavior and enforce documented fallback guidance.
P0 Keep Google auth centralized; avoid introducing new per-tool OAuth duplication.
P1 Add a shared Google auth CLI factory to remove duplicated *-auth command flow code.
P1 Add optional XDG_CONFIG_HOME/override support for auth store paths for containerized/sandboxed environments.
P1 Add periodic log export + analyzer command (error-rate, latency, provider fallback frequency) so reliability trends are measurable from CI/dev snapshots.
P2 Introduce a provider capability matrix module consumed by router/doctor/docs from one source of truth.

Implemented Changes (Diff Summary)

Commits in this branch:

5b95eb1 fix(audit): expand tilde paths for audit log output
076379b refactor(config): generate paas profile from default overlay
00b2d64 feat(google-auth): centralize oauth token store and service checks
092a9ba fix(router): align fallback semantics and oauth provider behavior

Notable file groups:

Audit hardening: src/audit/logger.ts, src/audit/logger.test.ts
Config consolidation: config/profiles/paas.overlay.yaml, config/paas.yaml, scripts/generate-config-profiles.mjs, src/config/profileTemplates.test.ts, docs/deployment/PAAS.md, package.json
Google auth hardening: src/auth/google.ts, src/google/oauth.ts, Google tool modules, Gmail watcher, Google auth CLI commands, src/cli/doctor.ts
Router/provider correctness: src/models/router.ts, src/models/openai.ts, src/config/schema.ts, src/models/capabilities.ts
Documentation additions: Google OAuth runbook and agent-facing repo map docs

Validation executed:

Focused suites (420 tests) across changed modules passed.
pnpm lint passed (warnings only, 0 errors).
pnpm typecheck passed.
pnpm config:profiles:check passed.

Remaining TODOs / Risks

No runtime log corpus was available for empirical recurring-error/perf bottleneck analysis.
Google auth CLI commands still contain duplicated flow logic across service-specific command files.
Auth store remains plaintext on disk (permissions are set, but no at-rest encryption).
Provider capability behavior is still partially split across provider clients + capability utility; further normalization is recommended.

6.6 KiB Raw Permalink Blame History