Files
flynn/docs/security/SAFE_PERSONAL_AGENT.md

9.3 KiB

Safe-By-Default Personal Agent

This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets).

If you're looking for API-level tool contracts, see docs/api/TOOLS.md.

Overview

Flynn is built around a strict separation of:

  • Conversation (LLM output)
  • Capabilities (tools)
  • Policy (what tools are allowed, under what conditions)

This milestone adds a skill capability layer and hardens the tool loop.

Core principles:

  • Capability declarations beat intentions: skills get only what they declare.
  • Deny by default: a skill without a permissions manifest has no tool access.
  • Treat fetched/tool content as untrusted data, not instructions.
  • Never leak secrets into audit logs.

Skills: Capability Manifests

Each skill lives in a directory with:

  • SKILL.md (instructions injected into the system prompt)
  • manifest.json (metadata + optional capabilities)

The capability declaration is manifest.json.permissions.

See: src/skills/types.ts.

permissions Schema (manifest.json)

{
  "permissions": {
    "tool_groups": ["group:web", "group:memory"],
    "tools": ["web.fetch", "web.search"],
    "fs": {
      "read": ["/home/will/Documents/**"],
      "write": ["/home/will/Documents/notes/**"]
    },
    "net": [
      { "host": "api.todoist.com", "ports": [443] },
      { "host": "*.github.com", "ports": [443] }
    ],
    "secrets": ["gmail", "web_search"],
    "execution_environment": "sandbox"
  }
}

Fields:

  • tool_groups: tool-group allowlist using names from src/tools/policy.ts (group:web, group:fs, etc.)
  • tools: explicit tool-name/pattern allowlist (glob). If present, it overrides tool_groups.
  • fs.read / fs.write: allowed path globs (checked for file.* tools).
  • net: allowed hosts (glob) and optional port list (best-effort enforcement for web.fetch).
  • secrets: secret scopes allowed for this skill (used to gate credentialed tools).
  • execution_environment: sandbox (default) or host (escape hatch for high-risk operations).

Backward Compatibility

Skills without permissions still load, but:

  • If a skill is activated (via routing) and it has no permissions block, it has no tool access.
  • This is deliberate: skills should be auditable capability packages.

Registry Trust Model (ClawHub / Community Catalogs)

Registry catalogs are discovery metadata, not trusted code.

  • Flynn supports registry discovery and install-by-id via flynn skills registry * and flynn skills install --registry-id.
  • Registry metadata fields such as publisher, homepage, and sha256 are treated as declared/unverified.
  • Non-local registry sources require explicit operator confirmation (--confirm) during install.
  • Resolved sources (local/git/archive) are still routed through the same skill scanner and installer safety gates.
  • Registry-driven installs emit dedicated audit events (skills.registry_install) including registry id/source and outcome.

Operationally: treat a registry as a candidate index. Trust is established by your own review and scanner outcomes, not by catalog claims alone.

Runtime Enforcement

Enforcement happens in two places:

  1. Tool listing / exposure (ToolPolicy)
  2. Tool execution (ToolExecutor) — defense in depth

ToolPolicy: Restricting Available Tools

When a skill context is active, the tool allow set is intersected with the skill's declared allowlist.

See: src/tools/policy.ts.

Important behaviors:

  • If skillName is set but skillPermissions is missing, ToolPolicy returns an empty allowed set.
  • If permissions.tools is present, it overrides permissions.tool_groups.

ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards

See: src/tools/executor.ts.

When a skill context is active (ToolPolicyContext.skillName):

  • Filesystem writes are blocked outside permissions.fs.write.
  • Filesystem reads are blocked outside permissions.fs.read (for file.read/file.list).
  • Credentialed tools require their requiredSecretScopes be present in the skill's allowed scopes.
  • If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls.

Skill Routing (Intents)

Skills can be activated via intent rules.

See:

  • Config schema: src/config/schema.ts (intents.rules[].target.type = 'skill')
  • Routing: src/daemon/routing.ts

Example config:

intents:
  enabled: true
  match_threshold: 0.7
  rules:
    - name: "web-research"
      patterns: ["research *", "look up *"]
      target: { type: skill, name: my-web-skill }
      enabled: true

When an intent routes to a skill:

  • toolPolicyContext.skillName and toolPolicyContext.skillPermissions are set
  • High-risk execution defaults to sandbox (when available)

Sandbox-By-Default (High-Risk Tools)

In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in.

High-risk tools include:

  • shell.exec
  • process.start
  • process.kill
  • file.write, file.edit, file.patch
  • all browser.*

Behavior:

  • Default (execution_environment omitted or sandbox):
    • If Docker sandbox is enabled and available, shell.exec and process.start run inside the per-session sandbox container.
    • If sandbox is not available, host execution for high-risk tools is denied for skill contexts.
  • Escape hatch (execution_environment: host): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy).

Note: today, only shell.exec and process.start are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed.

Elevated Mode (Break Glass)

Flynn supports a time-bounded /elevate escape hatch for host execution of sensitive tools.

  • Session keys: elevation.until_ms, elevation.id, elevation.reason
  • Command UX requires explicit confirmation (--yes / --confirm)
  • Expiry is automatic (TTL-based) and emits audit events

Implementation is centralized in src/security/elevation.ts and reused by:

  • src/daemon/routing.ts (channel command fast path)
  • src/gateway/handlers/agent.ts (websocket/gateway command fast path)
  • src/frontends/tui/minimal.ts and src/frontends/tui/components/App.tsx (TUI command surfaces)
  • src/backends/native/agent.ts (per-tool-call elevation context resolution)

Tool enforcement remains in src/tools/executor.ts:

  • host-sensitive tools are denied when elevation is required but inactive
  • elevated host high-risk calls still require explicit confirmation via hooks

Prompt Injection Mitigation

Flynn uses a practical defense-in-depth approach:

  1. System prompt guidance: fetched/tool content is treated as untrusted data.
  2. Provenance tagging: tool results are wrapped in provenance markers.
  3. Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked.

Provenance Wrapping

Tool results returned to the model are wrapped like:

[provenance=fetched_content tool=web.fetch untrusted=true]
...tool output...
[/provenance]

See: src/backends/native/agent.ts.

Tool-Call Guard

When ToolPolicyContext.untrustedContent is true:

  • High-risk tool calls whose args contain obvious markers (e.g. rm -rf, ignore previous, exfiltrate, etc.) are blocked.
  • Network tools (web.fetch, web.search) refuse arguments containing secret-like fields.

See: src/tools/executor.ts.

Secret Scopes

Tools can declare which secret scopes they require:

  • Tool.requiredSecretScopes?: string[]

Skills declare which scopes they are allowed to use:

  • manifest.json.permissions.secrets?: string[]

Enforcement:

  • In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool.
  • Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility.

See:

  • src/tools/types.ts
  • src/tools/executor.ts
  • Examples: src/tools/builtin/gmail.ts, src/tools/builtin/gcal.ts, src/tools/builtin/web-search.ts

Audit Logging (Without Secret Leaks)

Tool execution is audited, but sensitive values are redacted before writing to disk.

See:

  • src/audit/logger.ts
  • src/audit/types.ts
  • src/audit/redact.ts

Notable fields:

  • execution_id: a per-tool-call UUID for correlation
  • execution_environment: host or sandbox
  • skill_name: active skill (if any)
  • redactions_applied: count of redaction operations
  • tool.approval: emitted when a confirm hook is resolved

Example tool start event (JSONL):

{
  "timestamp": 0,
  "level": "debug",
  "event_type": "tool.start",
  "event": {
    "tool_name": "shell.exec",
    "execution_id": "...",
    "execution_environment": "sandbox",
    "skill_name": "my-web-skill",
    "redactions_applied": 1,
    "tool_args": { "command": "echo [REDACTED_TOKEN]" }
  }
}
  • Enable Docker sandboxing (sandbox.enabled: true).
  • Enable DM pairing (pairing.enabled: true) on any messaging surface.
  • Use a conservative tool profile for general chat (tools.profile: messaging).
  • Use skill intent routing for specialized workflows and keep skill permissions narrow.