2.3 KiB
2.3 KiB
Phase 3 PR #2 Checklist: Truthfulness, Policy, and Autonomy Hardening
Created: 2026-02-12 Owner: Flynn core Status: ready to implement
Goal
Make truthfulness and autonomy constraints enforceable at runtime (not prompt-only), with auditable policy decisions.
Scope
In scope:
- truthfulness guardrail injection in prompt assembly/agent startup
- autonomy-aware hook/tool execution policy
- tool-denial audit improvements
- tests for enforcement paths
Out of scope:
- external fact-check APIs
- new UI policy management features
Files
New:
src/backends/native/guardrails.tssrc/backends/native/guardrails.test.tssrc/hooks/autonomy.tssrc/hooks/autonomy.test.ts
Modified:
src/backends/native/agent.tssrc/backends/native/orchestrator.tssrc/prompt/template.tssrc/tools/executor.tssrc/hooks/engine.tssrc/config/schema.ts
Implementation Steps
- Add guardrail utility module for truthfulness guidance modes.
- Add config:
agents.truthfulness_mode(strict|standard|relaxed)agents.autonomy_level(conservative|standard|autonomous)
- Inject truthfulness policy section in effective system prompt.
- Add autonomy resolution layer in hook/tool executor path.
- Ensure denied/overridden actions are audit-logged with reason.
- Add tests for tool action resolution matrix and guardrail prompt output.
Validation Commands
pnpm typecheck
pnpm test:run src/backends/native/guardrails.test.ts
pnpm test:run src/hooks/autonomy.test.ts
pnpm test:run src/tools/executor.test.ts
pnpm test:run src/hooks/engine.test.ts
pnpm test:run
pnpm lint
pnpm build
Acceptance Criteria
- Truthfulness policy appears in assembled system prompt.
- Autonomy level changes effective tool action behavior as configured.
- Dangerous tools still require confirmation in conservative/standard modes.
- Policy denials/overrides produce audit records with clear reasons.
- Feature is backward compatible with safe defaults.
Risks
- Overly strict guardrails reduce task completion.
- Mitigation: default
standardmode with configurable strictness.
- Mitigation: default
- Autonomy matrix conflicts with existing hooks.
- Mitigation: deterministic precedence rules + tests.
Commit Message
feat(policy): enforce truthfulness guardrails and autonomy-aware tool controls