Files
flynn/docs/plans/phase3-pr2-policy-autonomy-hardening-checklist.md
2026-02-12 22:47:28 -08:00

2.3 KiB

Phase 3 PR #2 Checklist: Truthfulness, Policy, and Autonomy Hardening

Created: 2026-02-12 Owner: Flynn core Status: ready to implement

Goal

Make truthfulness and autonomy constraints enforceable at runtime (not prompt-only), with auditable policy decisions.

Scope

In scope:

  • truthfulness guardrail injection in prompt assembly/agent startup
  • autonomy-aware hook/tool execution policy
  • tool-denial audit improvements
  • tests for enforcement paths

Out of scope:

  • external fact-check APIs
  • new UI policy management features

Files

New:

  • src/backends/native/guardrails.ts
  • src/backends/native/guardrails.test.ts
  • src/hooks/autonomy.ts
  • src/hooks/autonomy.test.ts

Modified:

  • src/backends/native/agent.ts
  • src/backends/native/orchestrator.ts
  • src/prompt/template.ts
  • src/tools/executor.ts
  • src/hooks/engine.ts
  • src/config/schema.ts

Implementation Steps

  1. Add guardrail utility module for truthfulness guidance modes.
  2. Add config:
    • agents.truthfulness_mode (strict|standard|relaxed)
    • agents.autonomy_level (conservative|standard|autonomous)
  3. Inject truthfulness policy section in effective system prompt.
  4. Add autonomy resolution layer in hook/tool executor path.
  5. Ensure denied/overridden actions are audit-logged with reason.
  6. Add tests for tool action resolution matrix and guardrail prompt output.

Validation Commands

pnpm typecheck
pnpm test:run src/backends/native/guardrails.test.ts
pnpm test:run src/hooks/autonomy.test.ts
pnpm test:run src/tools/executor.test.ts
pnpm test:run src/hooks/engine.test.ts
pnpm test:run
pnpm lint
pnpm build

Acceptance Criteria

  • Truthfulness policy appears in assembled system prompt.
  • Autonomy level changes effective tool action behavior as configured.
  • Dangerous tools still require confirmation in conservative/standard modes.
  • Policy denials/overrides produce audit records with clear reasons.
  • Feature is backward compatible with safe defaults.

Risks

  • Overly strict guardrails reduce task completion.
    • Mitigation: default standard mode with configurable strictness.
  • Autonomy matrix conflicts with existing hooks.
    • Mitigation: deterministic precedence rules + tests.

Commit Message

feat(policy): enforce truthfulness guardrails and autonomy-aware tool controls