# Safe-By-Default Personal Agent This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets). If you're looking for API-level tool contracts, see `docs/api/TOOLS.md`. ## Overview Flynn is built around a strict separation of: - **Conversation** (LLM output) - **Capabilities** (tools) - **Policy** (what tools are allowed, under what conditions) This milestone adds a skill capability layer and hardens the tool loop. Core principles: - Capability declarations beat intentions: skills get only what they declare. - Deny by default: a skill without a `permissions` manifest has no tool access. - Treat fetched/tool content as untrusted data, not instructions. - Never leak secrets into audit logs. ## Skills: Capability Manifests Each skill lives in a directory with: - `SKILL.md` (instructions injected into the system prompt) - `manifest.json` (metadata + optional capabilities) The capability declaration is `manifest.json.permissions`. See: `src/skills/types.ts`. ### `permissions` Schema (manifest.json) ```json { "permissions": { "tool_groups": ["group:web", "group:memory"], "tools": ["web.fetch", "web.search"], "fs": { "read": ["/home/will/Documents/**"], "write": ["/home/will/Documents/notes/**"] }, "net": [ { "host": "api.todoist.com", "ports": [443] }, { "host": "*.github.com", "ports": [443] } ], "secrets": ["gmail", "web_search"], "execution_environment": "sandbox" } } ``` Fields: - `tool_groups`: tool-group allowlist using names from `src/tools/policy.ts` (`group:web`, `group:fs`, etc.) - `tools`: explicit tool-name/pattern allowlist (glob). If present, it overrides `tool_groups`. - `fs.read` / `fs.write`: allowed path globs (checked for `file.*` tools). - `net`: allowed hosts (glob) and optional port list (best-effort enforcement for `web.fetch`). - `secrets`: secret scopes allowed for this skill (used to gate credentialed tools). - `execution_environment`: `sandbox` (default) or `host` (escape hatch for high-risk operations). ### Backward Compatibility Skills without `permissions` still load, but: - If a skill is activated (via routing) and it has no `permissions` block, **it has no tool access**. - This is deliberate: skills should be auditable capability packages. ## Registry Trust Model (ClawHub / Community Catalogs) Registry catalogs are discovery metadata, not trusted code. - Flynn supports registry discovery and install-by-id via `flynn skills registry *` and `flynn skills install --registry-id`. - Registry metadata fields such as `publisher`, `homepage`, and `sha256` are treated as **declared/unverified**. - Non-local registry sources require explicit operator confirmation (`--confirm`) during install. - Resolved sources (local/git/archive) are still routed through the same skill scanner and installer safety gates. - Registry-driven installs emit dedicated audit events (`skills.registry_install`) including registry id/source and outcome. Operationally: treat a registry as a candidate index. Trust is established by your own review and scanner outcomes, not by catalog claims alone. ## Runtime Enforcement Enforcement happens in two places: 1. **Tool listing / exposure** (ToolPolicy) 2. **Tool execution** (ToolExecutor) — defense in depth ### ToolPolicy: Restricting Available Tools When a skill context is active, the tool allow set is intersected with the skill's declared allowlist. See: `src/tools/policy.ts`. Important behaviors: - If `skillName` is set but `skillPermissions` is missing, ToolPolicy returns an empty allowed set. - If `permissions.tools` is present, it overrides `permissions.tool_groups`. ### ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards See: `src/tools/executor.ts`. When a skill context is active (`ToolPolicyContext.skillName`): - Filesystem writes are blocked outside `permissions.fs.write`. - Filesystem reads are blocked outside `permissions.fs.read` (for `file.read`/`file.list`). - Credentialed tools require their `requiredSecretScopes` be present in the skill's allowed scopes. - If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls. ## Skill Routing (Intents) Skills can be activated via intent rules. See: - Config schema: `src/config/schema.ts` (`intents.rules[].target.type = 'skill'`) - Routing: `src/daemon/routing.ts` Example config: ```yaml intents: enabled: true match_threshold: 0.7 rules: - name: "web-research" patterns: ["research *", "look up *"] target: { type: skill, name: my-web-skill } enabled: true ``` When an intent routes to a skill: - `toolPolicyContext.skillName` and `toolPolicyContext.skillPermissions` are set - High-risk execution defaults to sandbox (when available) ## Sandbox-By-Default (High-Risk Tools) In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in. High-risk tools include: - `shell.exec` - `process.start` - `process.kill` - `file.write`, `file.edit`, `file.patch` - all `browser.*` Behavior: - Default (`execution_environment` omitted or `sandbox`): - If Docker sandbox is enabled and available, `shell.exec` and `process.start` run inside the per-session sandbox container. - If sandbox is not available, host execution for high-risk tools is denied for skill contexts. - Escape hatch (`execution_environment: host`): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy). Note: today, only `shell.exec` and `process.start` are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed. ## Prompt Injection Mitigation Flynn uses a practical defense-in-depth approach: 1. System prompt guidance: fetched/tool content is treated as untrusted data. 2. Provenance tagging: tool results are wrapped in provenance markers. 3. Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked. ### Provenance Wrapping Tool results returned to the model are wrapped like: ```text [provenance=fetched_content tool=web.fetch untrusted=true] ...tool output... [/provenance] ``` See: `src/backends/native/agent.ts`. ### Tool-Call Guard When `ToolPolicyContext.untrustedContent` is true: - High-risk tool calls whose args contain obvious markers (e.g. `rm -rf`, `ignore previous`, `exfiltrate`, etc.) are blocked. - Network tools (`web.fetch`, `web.search`) refuse arguments containing secret-like fields. See: `src/tools/executor.ts`. ## Secret Scopes Tools can declare which secret scopes they require: - `Tool.requiredSecretScopes?: string[]` Skills declare which scopes they are allowed to use: - `manifest.json.permissions.secrets?: string[]` Enforcement: - In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool. - Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility. See: - `src/tools/types.ts` - `src/tools/executor.ts` - Examples: `src/tools/builtin/gmail.ts`, `src/tools/builtin/gcal.ts`, `src/tools/builtin/web-search.ts` ## Audit Logging (Without Secret Leaks) Tool execution is audited, but sensitive values are redacted before writing to disk. See: - `src/audit/logger.ts` - `src/audit/types.ts` - `src/audit/redact.ts` Notable fields: - `execution_id`: a per-tool-call UUID for correlation - `execution_environment`: `host` or `sandbox` - `skill_name`: active skill (if any) - `redactions_applied`: count of redaction operations - `tool.approval`: emitted when a confirm hook is resolved Example tool start event (JSONL): ```json { "timestamp": 0, "level": "debug", "event_type": "tool.start", "event": { "tool_name": "shell.exec", "execution_id": "...", "execution_environment": "sandbox", "skill_name": "my-web-skill", "redactions_applied": 1, "tool_args": { "command": "echo [REDACTED_TOKEN]" } } } ``` ## Recommended Operator Defaults - Enable Docker sandboxing (`sandbox.enabled: true`). - Enable DM pairing (`pairing.enabled: true`) on any messaging surface. - Use a conservative tool profile for general chat (`tools.profile: messaging`). - Use skill intent routing for specialized workflows and keep skill permissions narrow.