flynn/docs/security/SAFE_PERSONAL_AGENT.md

# Safe-By-Default Personal Agent

This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets).

If you're looking for API-level tool contracts, see `docs/api/TOOLS.md`.

## Overview

Flynn is built around a strict separation of:

- **Conversation** (LLM output)
- **Capabilities** (tools)
- **Policy** (what tools are allowed, under what conditions)

This milestone adds a skill capability layer and hardens the tool loop.

Core principles:

- Capability declarations beat intentions: skills get only what they declare.
- Deny by default: a skill without a `permissions` manifest has no tool access.
- Treat fetched/tool content as untrusted data, not instructions.
- Never leak secrets into audit logs.

## Skills: Capability Manifests

Each skill lives in a directory with:

- `SKILL.md` (instructions injected into the system prompt)
- `manifest.json` (metadata + optional capabilities)

The capability declaration is `manifest.json.permissions`.

See: `src/skills/types.ts`.

### `permissions` Schema (manifest.json)

```json
{
  "permissions": {
    "tool_groups": ["group:web", "group:memory"],
    "tools": ["web.fetch", "web.search"],
    "fs": {
      "read": ["/home/will/Documents/**"],
      "write": ["/home/will/Documents/notes/**"]
    },
    "net": [
      { "host": "api.todoist.com", "ports": [443] },
      { "host": "*.github.com", "ports": [443] }
    ],
    "secrets": ["gmail", "web_search"],
    "execution_environment": "sandbox"
  }
}
```

Fields:

- `tool_groups`: tool-group allowlist using names from `src/tools/policy.ts` (`group:web`, `group:fs`, etc.)
- `tools`: explicit tool-name/pattern allowlist (glob). If present, it overrides `tool_groups`.
- `fs.read` / `fs.write`: allowed path globs (checked for `file.*` tools).
- `net`: allowed hosts (glob) and optional port list (best-effort enforcement for `web.fetch`).
- `secrets`: secret scopes allowed for this skill (used to gate credentialed tools).
- `execution_environment`: `sandbox` (default) or `host` (escape hatch for high-risk operations).

### Backward Compatibility

Skills without `permissions` still load, but:

- If a skill is activated (via routing) and it has no `permissions` block, **it has no tool access**.
- This is deliberate: skills should be auditable capability packages.

## Registry Trust Model (ClawHub / Community Catalogs)

Registry catalogs are discovery metadata, not trusted code.

- Flynn supports registry discovery and install-by-id via `flynn skills registry *` and `flynn skills install --registry-id`.
- Registry metadata fields such as `publisher`, `homepage`, and `sha256` are treated as **declared/unverified**.
- Non-local registry sources require explicit operator confirmation (`--confirm`) during install.
- Resolved sources (local/git/archive) are still routed through the same skill scanner and installer safety gates.
- Registry-driven installs emit dedicated audit events (`skills.registry_install`) including registry id/source and outcome.

Operationally: treat a registry as a candidate index. Trust is established by your own review and scanner outcomes, not by catalog claims alone.

## Runtime Enforcement

Enforcement happens in two places:

1. **Tool listing / exposure** (ToolPolicy)
2. **Tool execution** (ToolExecutor) — defense in depth

### ToolPolicy: Restricting Available Tools

When a skill context is active, the tool allow set is intersected with the skill's declared allowlist.

See: `src/tools/policy.ts`.

Important behaviors:

- If `skillName` is set but `skillPermissions` is missing, ToolPolicy returns an empty allowed set.
- If `permissions.tools` is present, it overrides `permissions.tool_groups`.

### ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards

See: `src/tools/executor.ts`.

When a skill context is active (`ToolPolicyContext.skillName`):

- Filesystem writes are blocked outside `permissions.fs.write`.
- Filesystem reads are blocked outside `permissions.fs.read` (for `file.read`/`file.list`).
- Credentialed tools require their `requiredSecretScopes` be present in the skill's allowed scopes.
- If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls.

## Skill Routing (Intents)

Skills can be activated via intent rules.

See:

- Config schema: `src/config/schema.ts` (`intents.rules[].target.type = 'skill'`)
- Routing: `src/daemon/routing.ts`

Example config:

```yaml
intents:
  enabled: true
  match_threshold: 0.7
  rules:
    - name: "web-research"
      patterns: ["research *", "look up *"]
      target: { type: skill, name: my-web-skill }
      enabled: true
```

When an intent routes to a skill:

- `toolPolicyContext.skillName` and `toolPolicyContext.skillPermissions` are set
- High-risk execution defaults to sandbox (when available)

## Sandbox-By-Default (High-Risk Tools)

In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in.

High-risk tools include:

- `shell.exec`
- `process.start`
- `process.kill`
- `file.write`, `file.edit`, `file.patch`
- all `browser.*`

Behavior:

- Default (`execution_environment` omitted or `sandbox`):
  - If Docker sandbox is enabled and available, `shell.exec` and `process.start` run inside the per-session sandbox container.
  - If sandbox is not available, host execution for high-risk tools is denied for skill contexts.
- Escape hatch (`execution_environment: host`): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy).

Note: today, only `shell.exec` and `process.start` are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed.

## Elevated Mode (Break Glass)

Flynn supports a time-bounded `/elevate` escape hatch for host execution of sensitive tools.

- Session keys: `elevation.until_ms`, `elevation.id`, `elevation.reason`
- Command UX requires explicit confirmation (`--yes` / `--confirm`)
- Expiry is automatic (TTL-based) and emits audit events

Implementation is centralized in `src/security/elevation.ts` and reused by:

- `src/daemon/routing.ts` (channel command fast path)
- `src/gateway/handlers/agent.ts` (websocket/gateway command fast path)
- `src/frontends/tui/minimal.ts` and `src/frontends/tui/components/App.tsx` (TUI command surfaces)
- `src/backends/native/agent.ts` (per-tool-call elevation context resolution)

Tool enforcement remains in `src/tools/executor.ts`:

- host-sensitive tools are denied when elevation is required but inactive
- elevated host high-risk calls still require explicit confirmation via hooks

## Prompt Injection Mitigation

Flynn uses a practical defense-in-depth approach:

1. System prompt guidance: fetched/tool content is treated as untrusted data.
2. Provenance tagging: tool results are wrapped in provenance markers.
3. Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked.

### Provenance Wrapping

Tool results returned to the model are wrapped like:

```text
[provenance=fetched_content tool=web.fetch untrusted=true]
...tool output...
[/provenance]
```

See: `src/backends/native/agent.ts`.

### Tool-Call Guard

When `ToolPolicyContext.untrustedContent` is true:

- High-risk tool calls whose args contain obvious markers (e.g. `rm -rf`, `ignore previous`, `exfiltrate`, etc.) are blocked.
- Network tools (`web.fetch`, `web.search`) refuse arguments containing secret-like fields.

See: `src/tools/executor.ts`.

## Secret Scopes

Tools can declare which secret scopes they require:

- `Tool.requiredSecretScopes?: string[]`

Skills declare which scopes they are allowed to use:

- `manifest.json.permissions.secrets?: string[]`

Enforcement:

- In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool.
- Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility.

See:

- `src/tools/types.ts`
- `src/tools/executor.ts`
- Examples: `src/tools/builtin/gmail.ts`, `src/tools/builtin/gcal.ts`, `src/tools/builtin/web-search.ts`

## Audit Logging (Without Secret Leaks)

Tool execution is audited, but sensitive values are redacted before writing to disk.

See:

- `src/audit/logger.ts`
- `src/audit/types.ts`
- `src/audit/redact.ts`

Notable fields:

- `execution_id`: a per-tool-call UUID for correlation
- `execution_environment`: `host` or `sandbox`
- `skill_name`: active skill (if any)
- `redactions_applied`: count of redaction operations
- `tool.approval`: emitted when a confirm hook is resolved

Example tool start event (JSONL):

```json
{
  "timestamp": 0,
  "level": "debug",
  "event_type": "tool.start",
  "event": {
    "tool_name": "shell.exec",
    "execution_id": "...",
    "execution_environment": "sandbox",
    "skill_name": "my-web-skill",
    "redactions_applied": 1,
    "tool_args": { "command": "echo [REDACTED_TOKEN]" }
  }
}
```

## Recommended Operator Defaults

- Enable Docker sandboxing (`sandbox.enabled: true`).
- Enable DM pairing (`pairing.enabled: true`) on any messaging surface.
- Use a conservative tool profile for general chat (`tools.profile: messaging`).
- Use skill intent routing for specialized workflows and keep skill permissions narrow.