8.4 KiB
Safe-By-Default Personal Agent
This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets).
If you're looking for API-level tool contracts, see docs/api/TOOLS.md.
Overview
Flynn is built around a strict separation of:
- Conversation (LLM output)
- Capabilities (tools)
- Policy (what tools are allowed, under what conditions)
This milestone adds a skill capability layer and hardens the tool loop.
Core principles:
- Capability declarations beat intentions: skills get only what they declare.
- Deny by default: a skill without a
permissionsmanifest has no tool access. - Treat fetched/tool content as untrusted data, not instructions.
- Never leak secrets into audit logs.
Skills: Capability Manifests
Each skill lives in a directory with:
SKILL.md(instructions injected into the system prompt)manifest.json(metadata + optional capabilities)
The capability declaration is manifest.json.permissions.
See: src/skills/types.ts.
permissions Schema (manifest.json)
{
"permissions": {
"tool_groups": ["group:web", "group:memory"],
"tools": ["web.fetch", "web.search"],
"fs": {
"read": ["/home/will/Documents/**"],
"write": ["/home/will/Documents/notes/**"]
},
"net": [
{ "host": "api.todoist.com", "ports": [443] },
{ "host": "*.github.com", "ports": [443] }
],
"secrets": ["gmail", "web_search"],
"execution_environment": "sandbox"
}
}
Fields:
tool_groups: tool-group allowlist using names fromsrc/tools/policy.ts(group:web,group:fs, etc.)tools: explicit tool-name/pattern allowlist (glob). If present, it overridestool_groups.fs.read/fs.write: allowed path globs (checked forfile.*tools).net: allowed hosts (glob) and optional port list (best-effort enforcement forweb.fetch).secrets: secret scopes allowed for this skill (used to gate credentialed tools).execution_environment:sandbox(default) orhost(escape hatch for high-risk operations).
Backward Compatibility
Skills without permissions still load, but:
- If a skill is activated (via routing) and it has no
permissionsblock, it has no tool access. - This is deliberate: skills should be auditable capability packages.
Registry Trust Model (ClawHub / Community Catalogs)
Registry catalogs are discovery metadata, not trusted code.
- Flynn supports registry discovery and install-by-id via
flynn skills registry *andflynn skills install --registry-id. - Registry metadata fields such as
publisher,homepage, andsha256are treated as declared/unverified. - Non-local registry sources require explicit operator confirmation (
--confirm) during install. - Resolved sources (local/git/archive) are still routed through the same skill scanner and installer safety gates.
- Registry-driven installs emit dedicated audit events (
skills.registry_install) including registry id/source and outcome.
Operationally: treat a registry as a candidate index. Trust is established by your own review and scanner outcomes, not by catalog claims alone.
Runtime Enforcement
Enforcement happens in two places:
- Tool listing / exposure (ToolPolicy)
- Tool execution (ToolExecutor) — defense in depth
ToolPolicy: Restricting Available Tools
When a skill context is active, the tool allow set is intersected with the skill's declared allowlist.
See: src/tools/policy.ts.
Important behaviors:
- If
skillNameis set butskillPermissionsis missing, ToolPolicy returns an empty allowed set. - If
permissions.toolsis present, it overridespermissions.tool_groups.
ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards
See: src/tools/executor.ts.
When a skill context is active (ToolPolicyContext.skillName):
- Filesystem writes are blocked outside
permissions.fs.write. - Filesystem reads are blocked outside
permissions.fs.read(forfile.read/file.list). - Credentialed tools require their
requiredSecretScopesbe present in the skill's allowed scopes. - If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls.
Skill Routing (Intents)
Skills can be activated via intent rules.
See:
- Config schema:
src/config/schema.ts(intents.rules[].target.type = 'skill') - Routing:
src/daemon/routing.ts
Example config:
intents:
enabled: true
match_threshold: 0.7
rules:
- name: "web-research"
patterns: ["research *", "look up *"]
target: { type: skill, name: my-web-skill }
enabled: true
When an intent routes to a skill:
toolPolicyContext.skillNameandtoolPolicyContext.skillPermissionsare set- High-risk execution defaults to sandbox (when available)
Sandbox-By-Default (High-Risk Tools)
In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in.
High-risk tools include:
shell.execprocess.startprocess.killfile.write,file.edit,file.patch- all
browser.*
Behavior:
- Default (
execution_environmentomitted orsandbox):- If Docker sandbox is enabled and available,
shell.execandprocess.startrun inside the per-session sandbox container. - If sandbox is not available, host execution for high-risk tools is denied for skill contexts.
- If Docker sandbox is enabled and available,
- Escape hatch (
execution_environment: host): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy).
Note: today, only shell.exec and process.start are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed.
Prompt Injection Mitigation
Flynn uses a practical defense-in-depth approach:
- System prompt guidance: fetched/tool content is treated as untrusted data.
- Provenance tagging: tool results are wrapped in provenance markers.
- Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked.
Provenance Wrapping
Tool results returned to the model are wrapped like:
[provenance=fetched_content tool=web.fetch untrusted=true]
...tool output...
[/provenance]
See: src/backends/native/agent.ts.
Tool-Call Guard
When ToolPolicyContext.untrustedContent is true:
- High-risk tool calls whose args contain obvious markers (e.g.
rm -rf,ignore previous,exfiltrate, etc.) are blocked. - Network tools (
web.fetch,web.search) refuse arguments containing secret-like fields.
See: src/tools/executor.ts.
Secret Scopes
Tools can declare which secret scopes they require:
Tool.requiredSecretScopes?: string[]
Skills declare which scopes they are allowed to use:
manifest.json.permissions.secrets?: string[]
Enforcement:
- In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool.
- Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility.
See:
src/tools/types.tssrc/tools/executor.ts- Examples:
src/tools/builtin/gmail.ts,src/tools/builtin/gcal.ts,src/tools/builtin/web-search.ts
Audit Logging (Without Secret Leaks)
Tool execution is audited, but sensitive values are redacted before writing to disk.
See:
src/audit/logger.tssrc/audit/types.tssrc/audit/redact.ts
Notable fields:
execution_id: a per-tool-call UUID for correlationexecution_environment:hostorsandboxskill_name: active skill (if any)redactions_applied: count of redaction operationstool.approval: emitted when a confirm hook is resolved
Example tool start event (JSONL):
{
"timestamp": 0,
"level": "debug",
"event_type": "tool.start",
"event": {
"tool_name": "shell.exec",
"execution_id": "...",
"execution_environment": "sandbox",
"skill_name": "my-web-skill",
"redactions_applied": 1,
"tool_args": { "command": "echo [REDACTED_TOKEN]" }
}
}
Recommended Operator Defaults
- Enable Docker sandboxing (
sandbox.enabled: true). - Enable DM pairing (
pairing.enabled: true) on any messaging surface. - Use a conservative tool profile for general chat (
tools.profile: messaging). - Use skill intent routing for specialized workflows and keep skill permissions narrow.