# Phase 3 PR #2 Checklist: Truthfulness, Policy, and Autonomy Hardening Created: 2026-02-12 Owner: Flynn core Status: ready to implement ## Goal Make truthfulness and autonomy constraints enforceable at runtime (not prompt-only), with auditable policy decisions. ## Scope In scope: - truthfulness guardrail injection in prompt assembly/agent startup - autonomy-aware hook/tool execution policy - tool-denial audit improvements - tests for enforcement paths Out of scope: - external fact-check APIs - new UI policy management features ## Files New: - `src/backends/native/guardrails.ts` - `src/backends/native/guardrails.test.ts` - `src/hooks/autonomy.ts` - `src/hooks/autonomy.test.ts` Modified: - `src/backends/native/agent.ts` - `src/backends/native/orchestrator.ts` - `src/prompt/template.ts` - `src/tools/executor.ts` - `src/hooks/engine.ts` - `src/config/schema.ts` ## Implementation Steps 1. Add guardrail utility module for truthfulness guidance modes. 2. Add config: - `agents.truthfulness_mode` (`strict|standard|relaxed`) - `agents.autonomy_level` (`conservative|standard|autonomous`) 3. Inject truthfulness policy section in effective system prompt. 4. Add autonomy resolution layer in hook/tool executor path. 5. Ensure denied/overridden actions are audit-logged with reason. 6. Add tests for tool action resolution matrix and guardrail prompt output. ## Validation Commands ```bash pnpm typecheck pnpm test:run src/backends/native/guardrails.test.ts pnpm test:run src/hooks/autonomy.test.ts pnpm test:run src/tools/executor.test.ts pnpm test:run src/hooks/engine.test.ts pnpm test:run pnpm lint pnpm build ``` ## Acceptance Criteria - Truthfulness policy appears in assembled system prompt. - Autonomy level changes effective tool action behavior as configured. - Dangerous tools still require confirmation in conservative/standard modes. - Policy denials/overrides produce audit records with clear reasons. - Feature is backward compatible with safe defaults. ## Risks - Overly strict guardrails reduce task completion. - Mitigation: default `standard` mode with configurable strictness. - Autonomy matrix conflicts with existing hooks. - Mitigation: deterministic precedence rules + tests. ## Commit Message `feat(policy): enforce truthfulness guardrails and autonomy-aware tool controls`