will/flynn

Files

T

William Valentin baa53f91d9 refactor(security): unify elevated mode handling across surfaces

2026-02-19 11:41:53 -08:00

9.3 KiB

Raw Permalink Blame History

Safe-By-Default Personal Agent

This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets).

If you're looking for API-level tool contracts, see docs/api/TOOLS.md.

Overview

Flynn is built around a strict separation of:

Conversation (LLM output)
Capabilities (tools)
Policy (what tools are allowed, under what conditions)

This milestone adds a skill capability layer and hardens the tool loop.

Core principles:

Capability declarations beat intentions: skills get only what they declare.
Deny by default: a skill without a permissions manifest has no tool access.
Treat fetched/tool content as untrusted data, not instructions.
Never leak secrets into audit logs.

Skills: Capability Manifests

Each skill lives in a directory with:

SKILL.md (instructions injected into the system prompt)
manifest.json (metadata + optional capabilities)

The capability declaration is manifest.json.permissions.

See: src/skills/types.ts.

`permissions` Schema (manifest.json)

{
  "permissions": {
    "tool_groups": ["group:web", "group:memory"],
    "tools": ["web.fetch", "web.search"],
    "fs": {
      "read": ["/home/will/Documents/**"],
      "write": ["/home/will/Documents/notes/**"]
    },
    "net": [
      { "host": "api.todoist.com", "ports": [443] },
      { "host": "*.github.com", "ports": [443] }
    ],
    "secrets": ["gmail", "web_search"],
    "execution_environment": "sandbox"
  }
}

Fields:

tool_groups: tool-group allowlist using names from src/tools/policy.ts (group:web, group:fs, etc.)
tools: explicit tool-name/pattern allowlist (glob). If present, it overrides tool_groups.
fs.read / fs.write: allowed path globs (checked for file.* tools).
net: allowed hosts (glob) and optional port list (best-effort enforcement for web.fetch).
secrets: secret scopes allowed for this skill (used to gate credentialed tools).
execution_environment: sandbox (default) or host (escape hatch for high-risk operations).

Backward Compatibility

Skills without permissions still load, but:

If a skill is activated (via routing) and it has no permissions block, it has no tool access.
This is deliberate: skills should be auditable capability packages.

Registry Trust Model (ClawHub / Community Catalogs)

Registry catalogs are discovery metadata, not trusted code.

Flynn supports registry discovery and install-by-id via flynn skills registry * and flynn skills install --registry-id.
Registry metadata fields such as publisher, homepage, and sha256 are treated as declared/unverified.
Non-local registry sources require explicit operator confirmation (--confirm) during install.
Resolved sources (local/git/archive) are still routed through the same skill scanner and installer safety gates.
Registry-driven installs emit dedicated audit events (skills.registry_install) including registry id/source and outcome.

Operationally: treat a registry as a candidate index. Trust is established by your own review and scanner outcomes, not by catalog claims alone.

Runtime Enforcement

Enforcement happens in two places:

Tool listing / exposure (ToolPolicy)
Tool execution (ToolExecutor) — defense in depth

ToolPolicy: Restricting Available Tools

When a skill context is active, the tool allow set is intersected with the skill's declared allowlist.

See: src/tools/policy.ts.

Important behaviors:

If skillName is set but skillPermissions is missing, ToolPolicy returns an empty allowed set.
If permissions.tools is present, it overrides permissions.tool_groups.

ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards

See: src/tools/executor.ts.

When a skill context is active (ToolPolicyContext.skillName):

Filesystem writes are blocked outside permissions.fs.write.
Filesystem reads are blocked outside permissions.fs.read (for file.read/file.list).
Credentialed tools require their requiredSecretScopes be present in the skill's allowed scopes.
If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls.

Skill Routing (Intents)

Skills can be activated via intent rules.

See:

Config schema: src/config/schema.ts (intents.rules[].target.type = 'skill')
Routing: src/daemon/routing.ts

Example config:

intents:
  enabled: true
  match_threshold: 0.7
  rules:
    - name: "web-research"
      patterns: ["research *", "look up *"]
      target: { type: skill, name: my-web-skill }
      enabled: true

When an intent routes to a skill:

toolPolicyContext.skillName and toolPolicyContext.skillPermissions are set
High-risk execution defaults to sandbox (when available)

Sandbox-By-Default (High-Risk Tools)

In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in.

High-risk tools include:

shell.exec
process.start
process.kill
file.write, file.edit, file.patch
all browser.*

Behavior:

Default (execution_environment omitted or sandbox):
- If Docker sandbox is enabled and available, shell.exec and process.start run inside the per-session sandbox container.
- If sandbox is not available, host execution for high-risk tools is denied for skill contexts.
Escape hatch (execution_environment: host): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy).

Note: today, only shell.exec and process.start are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed.

Elevated Mode (Break Glass)

Flynn supports a time-bounded /elevate escape hatch for host execution of sensitive tools.

Session keys: elevation.until_ms, elevation.id, elevation.reason
Command UX requires explicit confirmation (--yes / --confirm)
Expiry is automatic (TTL-based) and emits audit events

Implementation is centralized in src/security/elevation.ts and reused by:

src/daemon/routing.ts (channel command fast path)
src/gateway/handlers/agent.ts (websocket/gateway command fast path)
src/frontends/tui/minimal.ts and src/frontends/tui/components/App.tsx (TUI command surfaces)
src/backends/native/agent.ts (per-tool-call elevation context resolution)

Tool enforcement remains in src/tools/executor.ts:

host-sensitive tools are denied when elevation is required but inactive
elevated host high-risk calls still require explicit confirmation via hooks

Prompt Injection Mitigation

Flynn uses a practical defense-in-depth approach:

System prompt guidance: fetched/tool content is treated as untrusted data.
Provenance tagging: tool results are wrapped in provenance markers.
Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked.

Provenance Wrapping

Tool results returned to the model are wrapped like:

[provenance=fetched_content tool=web.fetch untrusted=true]
...tool output...
[/provenance]

See: src/backends/native/agent.ts.

Tool-Call Guard

When ToolPolicyContext.untrustedContent is true:

High-risk tool calls whose args contain obvious markers (e.g. rm -rf, ignore previous, exfiltrate, etc.) are blocked.
Network tools (web.fetch, web.search) refuse arguments containing secret-like fields.

See: src/tools/executor.ts.

Secret Scopes

Tools can declare which secret scopes they require:

Tool.requiredSecretScopes?: string[]

Skills declare which scopes they are allowed to use:

manifest.json.permissions.secrets?: string[]

Enforcement:

In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool.
Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility.

See:

src/tools/types.ts
src/tools/executor.ts
Examples: src/tools/builtin/gmail.ts, src/tools/builtin/gcal.ts, src/tools/builtin/web-search.ts

Audit Logging (Without Secret Leaks)

Tool execution is audited, but sensitive values are redacted before writing to disk.

See:

src/audit/logger.ts
src/audit/types.ts
src/audit/redact.ts

Notable fields:

execution_id: a per-tool-call UUID for correlation
execution_environment: host or sandbox
skill_name: active skill (if any)
redactions_applied: count of redaction operations
tool.approval: emitted when a confirm hook is resolved

Example tool start event (JSONL):

{
  "timestamp": 0,
  "level": "debug",
  "event_type": "tool.start",
  "event": {
    "tool_name": "shell.exec",
    "execution_id": "...",
    "execution_environment": "sandbox",
    "skill_name": "my-web-skill",
    "redactions_applied": 1,
    "tool_args": { "command": "echo [REDACTED_TOKEN]" }
  }
}

Recommended Operator Defaults

Enable Docker sandboxing (sandbox.enabled: true).
Enable DM pairing (pairing.enabled: true) on any messaging surface.
Use a conservative tool profile for general chat (tools.profile: messaging).
Use skill intent routing for specialized workflows and keep skill permissions narrow.

9.3 KiB Raw Permalink Blame History

Safe-By-Default Personal Agent

Overview

Skills: Capability Manifests

permissions Schema (manifest.json)

Backward Compatibility

Registry Trust Model (ClawHub / Community Catalogs)

Runtime Enforcement

ToolPolicy: Restricting Available Tools

ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards

Skill Routing (Intents)

Sandbox-By-Default (High-Risk Tools)

Elevated Mode (Break Glass)

Prompt Injection Mitigation

Provenance Wrapping

Tool-Call Guard

Secret Scopes

Audit Logging (Without Secret Leaks)

Recommended Operator Defaults

9.3 KiB

Raw Permalink Blame History

`permissions` Schema (manifest.json)