flynn/REPORT.md

# Flynn Codebase Audit + Improvement Report

Date: 2026-02-24
Branch: `feature/full-audit-hardening-and-config-consolidation`

## Executive Summary

I audited core Flynn wiring across config -> daemon -> router/providers -> automation/tools -> CLI/docs, then implemented high-safety fixes for auth hardening, router correctness, provider alignment, and config consolidation.

High-impact outcomes:
- Google OAuth runtime is now centralized with store-first token loading, legacy token migration, and refresh persistence.
- Model router fallback behavior now matches retry/fallback intent (no duplicate fallback attempts, retry policy applied on fallback paths).
- OpenAI OAuth mode now fails fast when tools are requested (prevents silent non-executable tool output).
- PaaS config is now generated from canonical `default.yaml` + overlay to prevent template drift.
- `flynn doctor` now validates all Google automation services, not just Gmail.

Breaking behavior changes introduced:
- `models.fallback_chain` schema default changed from `['anthropic']` to `[]` (avoids invalid fallback entries by default).
- OpenAI OAuth requests with tools now throw explicit errors instead of returning warning text.

## Findings (With File Pointers)

1. `HIGH` Google token persistence path caused runtime failures in restricted environments.
- Cause: migration/store writes to `~/.config/flynn/auth.json` could fail and abort tool execution.
- Evidence: [src/google/oauth.ts](src/google/oauth.ts:111), [src/auth/google.ts](src/auth/google.ts:108)
- Fix: auth store writes now tolerate known filesystem permission errors and preserve token-file compatibility.

2. `HIGH` OpenAI OAuth mode accepted tool-bearing requests without executable tool support.
- Cause: OAuth Codex backend path did not support Flynn tool execution semantics.
- Evidence: [src/models/openai.ts](src/models/openai.ts:236)
- Fix: explicit throw when tools are present, enabling router fallback or config correction.

3. `HIGH` Router fallback execution did not fully match retry policy and could repeat the same failing client.
- Cause: retry policy only wrapped primary path; fallback clients could be retried inconsistently; duplicate fallbacks possible.
- Evidence: [src/models/router.ts](src/models/router.ts:90)
- Fix: attempted-client tracking and retry wrapping now apply to tier/global fallback chat paths and streaming fallback paths.

4. `MEDIUM` `flynn doctor` had incomplete feature wiring checks for Google services.
- Cause: only Gmail automation health was validated.
- Evidence: [src/cli/doctor.ts](src/cli/doctor.ts:663), [src/cli/doctor.ts](src/cli/doctor.ts:723)
- Fix: added service checks for Calendar/Docs/Drive/Tasks with auth-store and token-file detection.

5. `MEDIUM` Config profile overlap risk (manual `config/paas.yaml` drift from canonical defaults).
- Cause: duplicated full config template with independent edits.
- Evidence: [config/profiles/paas.overlay.yaml](config/profiles/paas.overlay.yaml:1), [scripts/generate-config-profiles.mjs](scripts/generate-config-profiles.mjs:10)
- Fix: canonical+overlay generation model, profile drift check, and sync test.

6. `MEDIUM` Default fallback chain schema value conflicted with router semantics.
- Cause: default `['anthropic']` is not a tier/local-provider key in current router semantics.
- Evidence: [src/config/schema.ts](src/config/schema.ts:181)
- Fix: schema default set to `[]` to avoid spurious invalid fallback entries.

7. `LOW` Provider capability type list lagged configured providers.
- Cause: `ModelProvider` union omitted `vercel`, `minimax`, `moonshot`, `synthetic`.
- Evidence: [src/models/capabilities.ts](src/models/capabilities.ts:8)
- Fix: union updated and test coverage expanded.

8. `LOW` Audit logger path expansion bug for `~`.
- Cause: logger configured rotator with expanded path but write stream used raw path.
- Evidence: [src/audit/logger.ts](src/audit/logger.ts:44), [src/audit/logger.ts](src/audit/logger.ts:57)
- Fix: normalized path now used consistently by logger and rotator.

9. `INFO` Log-pattern analysis could not be completed from repository artifacts.
- Cause: no runtime `.log` / audit JSONL artifacts present in workspace snapshot.
- Evidence: repository scan returned no log files under repo root.
- Mitigation: recommendations added below for repeatable log collection/analysis workflow.

## Recommended Changes (Prioritized)

1. `P0` Keep OpenAI OAuth tool rejection behavior and enforce documented fallback guidance.
2. `P0` Keep Google auth centralized; avoid introducing new per-tool OAuth duplication.
3. `P1` Add a shared Google auth CLI factory to remove duplicated `*-auth` command flow code.
4. `P1` Add optional `XDG_CONFIG_HOME`/override support for auth store paths for containerized/sandboxed environments.
5. `P1` Add periodic log export + analyzer command (error-rate, latency, provider fallback frequency) so reliability trends are measurable from CI/dev snapshots.
6. `P2` Introduce a provider capability matrix module consumed by router/doctor/docs from one source of truth.

## Implemented Changes (Diff Summary)

Commits in this branch:
- `5b95eb1` `fix(audit): expand tilde paths for audit log output`
- `076379b` `refactor(config): generate paas profile from default overlay`
- `00b2d64` `feat(google-auth): centralize oauth token store and service checks`
- `092a9ba` `fix(router): align fallback semantics and oauth provider behavior`

Notable file groups:
- Audit hardening: `src/audit/logger.ts`, `src/audit/logger.test.ts`
- Config consolidation: `config/profiles/paas.overlay.yaml`, `config/paas.yaml`, `scripts/generate-config-profiles.mjs`, `src/config/profileTemplates.test.ts`, `docs/deployment/PAAS.md`, `package.json`
- Google auth hardening: `src/auth/google.ts`, `src/google/oauth.ts`, Google tool modules, Gmail watcher, Google auth CLI commands, `src/cli/doctor.ts`
- Router/provider correctness: `src/models/router.ts`, `src/models/openai.ts`, `src/config/schema.ts`, `src/models/capabilities.ts`
- Documentation additions: Google OAuth runbook and agent-facing repo map docs

Validation executed:
- Focused suites (420 tests) across changed modules passed.
- `pnpm lint` passed (warnings only, 0 errors).
- `pnpm typecheck` passed.
- `pnpm config:profiles:check` passed.

## Remaining TODOs / Risks

- No runtime log corpus was available for empirical recurring-error/perf bottleneck analysis.
- Google auth CLI commands still contain duplicated flow logic across service-specific command files.
- Auth store remains plaintext on disk (permissions are set, but no at-rest encryption).
- Provider capability behavior is still partially split across provider clients + capability utility; further normalization is recommended.