docs: add safety docs and OpenClaw gap roadmap
This commit is contained in:
@@ -0,0 +1,240 @@
|
||||
# Safe-By-Default Personal Agent
|
||||
|
||||
This document describes Flynn's "OpenClaw-style" safety boundary: how skills declare capabilities, how those capabilities are enforced at runtime, how high-risk execution is sandboxed by default, how prompt injection is mitigated, and what gets logged (without leaking secrets).
|
||||
|
||||
If you're looking for API-level tool contracts, see `docs/api/TOOLS.md`.
|
||||
|
||||
## Overview
|
||||
|
||||
Flynn is built around a strict separation of:
|
||||
|
||||
- **Conversation** (LLM output)
|
||||
- **Capabilities** (tools)
|
||||
- **Policy** (what tools are allowed, under what conditions)
|
||||
|
||||
This milestone adds a skill capability layer and hardens the tool loop.
|
||||
|
||||
Core principles:
|
||||
|
||||
- Capability declarations beat intentions: skills get only what they declare.
|
||||
- Deny by default: a skill without a `permissions` manifest has no tool access.
|
||||
- Treat fetched/tool content as untrusted data, not instructions.
|
||||
- Never leak secrets into audit logs.
|
||||
|
||||
## Skills: Capability Manifests
|
||||
|
||||
Each skill lives in a directory with:
|
||||
|
||||
- `SKILL.md` (instructions injected into the system prompt)
|
||||
- `manifest.json` (metadata + optional capabilities)
|
||||
|
||||
The capability declaration is `manifest.json.permissions`.
|
||||
|
||||
See: `src/skills/types.ts`.
|
||||
|
||||
### `permissions` Schema (manifest.json)
|
||||
|
||||
```json
|
||||
{
|
||||
"permissions": {
|
||||
"tool_groups": ["group:web", "group:memory"],
|
||||
"tools": ["web.fetch", "web.search"],
|
||||
"fs": {
|
||||
"read": ["/home/will/Documents/**"],
|
||||
"write": ["/home/will/Documents/notes/**"]
|
||||
},
|
||||
"net": [
|
||||
{ "host": "api.todoist.com", "ports": [443] },
|
||||
{ "host": "*.github.com", "ports": [443] }
|
||||
],
|
||||
"secrets": ["gmail", "web_search"],
|
||||
"execution_environment": "sandbox"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Fields:
|
||||
|
||||
- `tool_groups`: tool-group allowlist using names from `src/tools/policy.ts` (`group:web`, `group:fs`, etc.)
|
||||
- `tools`: explicit tool-name/pattern allowlist (glob). If present, it overrides `tool_groups`.
|
||||
- `fs.read` / `fs.write`: allowed path globs (checked for `file.*` tools).
|
||||
- `net`: allowed hosts (glob) and optional port list (best-effort enforcement for `web.fetch`).
|
||||
- `secrets`: secret scopes allowed for this skill (used to gate credentialed tools).
|
||||
- `execution_environment`: `sandbox` (default) or `host` (escape hatch for high-risk operations).
|
||||
|
||||
### Backward Compatibility
|
||||
|
||||
Skills without `permissions` still load, but:
|
||||
|
||||
- If a skill is activated (via routing) and it has no `permissions` block, **it has no tool access**.
|
||||
- This is deliberate: skills should be auditable capability packages.
|
||||
|
||||
## Runtime Enforcement
|
||||
|
||||
Enforcement happens in two places:
|
||||
|
||||
1. **Tool listing / exposure** (ToolPolicy)
|
||||
2. **Tool execution** (ToolExecutor) — defense in depth
|
||||
|
||||
### ToolPolicy: Restricting Available Tools
|
||||
|
||||
When a skill context is active, the tool allow set is intersected with the skill's declared allowlist.
|
||||
|
||||
See: `src/tools/policy.ts`.
|
||||
|
||||
Important behaviors:
|
||||
|
||||
- If `skillName` is set but `skillPermissions` is missing, ToolPolicy returns an empty allowed set.
|
||||
- If `permissions.tools` is present, it overrides `permissions.tool_groups`.
|
||||
|
||||
### ToolExecutor: Enforcing Paths, Network, Secrets, and Injection Guards
|
||||
|
||||
See: `src/tools/executor.ts`.
|
||||
|
||||
When a skill context is active (`ToolPolicyContext.skillName`):
|
||||
|
||||
- Filesystem writes are blocked outside `permissions.fs.write`.
|
||||
- Filesystem reads are blocked outside `permissions.fs.read` (for `file.read`/`file.list`).
|
||||
- Credentialed tools require their `requiredSecretScopes` be present in the skill's allowed scopes.
|
||||
- If untrusted content has been seen, obviously malicious argument markers can block high-risk tool calls.
|
||||
|
||||
## Skill Routing (Intents)
|
||||
|
||||
Skills can be activated via intent rules.
|
||||
|
||||
See:
|
||||
|
||||
- Config schema: `src/config/schema.ts` (`intents.rules[].target.type = 'skill'`)
|
||||
- Routing: `src/daemon/routing.ts`
|
||||
|
||||
Example config:
|
||||
|
||||
```yaml
|
||||
intents:
|
||||
enabled: true
|
||||
match_threshold: 0.7
|
||||
rules:
|
||||
- name: "web-research"
|
||||
patterns: ["research *", "look up *"]
|
||||
target: { type: skill, name: my-web-skill }
|
||||
enabled: true
|
||||
```
|
||||
|
||||
When an intent routes to a skill:
|
||||
|
||||
- `toolPolicyContext.skillName` and `toolPolicyContext.skillPermissions` are set
|
||||
- High-risk execution defaults to sandbox (when available)
|
||||
|
||||
## Sandbox-By-Default (High-Risk Tools)
|
||||
|
||||
In skill context, high-risk tools are not allowed to run on the host unless the skill explicitly opts in.
|
||||
|
||||
High-risk tools include:
|
||||
|
||||
- `shell.exec`
|
||||
- `process.start`
|
||||
- `process.kill`
|
||||
- `file.write`, `file.edit`, `file.patch`
|
||||
- all `browser.*`
|
||||
|
||||
Behavior:
|
||||
|
||||
- Default (`execution_environment` omitted or `sandbox`):
|
||||
- If Docker sandbox is enabled and available, `shell.exec` and `process.start` run inside the per-session sandbox container.
|
||||
- If sandbox is not available, host execution for high-risk tools is denied for skill contexts.
|
||||
- Escape hatch (`execution_environment: host`): high-risk tools are permitted to run on host (still subject to tool policy + hooks/autonomy).
|
||||
|
||||
Note: today, only `shell.exec` and `process.start` are replaced with sandboxed implementations. Other high-risk tools are blocked-by-default in skill contexts unless host mode is explicitly allowed.
|
||||
|
||||
## Prompt Injection Mitigation
|
||||
|
||||
Flynn uses a practical defense-in-depth approach:
|
||||
|
||||
1. System prompt guidance: fetched/tool content is treated as untrusted data.
|
||||
2. Provenance tagging: tool results are wrapped in provenance markers.
|
||||
3. Tool-call guard: when untrusted content has been observed, tool calls with obvious injection markers are blocked.
|
||||
|
||||
### Provenance Wrapping
|
||||
|
||||
Tool results returned to the model are wrapped like:
|
||||
|
||||
```text
|
||||
[provenance=fetched_content tool=web.fetch untrusted=true]
|
||||
...tool output...
|
||||
[/provenance]
|
||||
```
|
||||
|
||||
See: `src/backends/native/agent.ts`.
|
||||
|
||||
### Tool-Call Guard
|
||||
|
||||
When `ToolPolicyContext.untrustedContent` is true:
|
||||
|
||||
- High-risk tool calls whose args contain obvious markers (e.g. `rm -rf`, `ignore previous`, `exfiltrate`, etc.) are blocked.
|
||||
- Network tools (`web.fetch`, `web.search`) refuse arguments containing secret-like fields.
|
||||
|
||||
See: `src/tools/executor.ts`.
|
||||
|
||||
## Secret Scopes
|
||||
|
||||
Tools can declare which secret scopes they require:
|
||||
|
||||
- `Tool.requiredSecretScopes?: string[]`
|
||||
|
||||
Skills declare which scopes they are allowed to use:
|
||||
|
||||
- `manifest.json.permissions.secrets?: string[]`
|
||||
|
||||
Enforcement:
|
||||
|
||||
- In skill context, if a tool requires scopes not allowed by the skill, ToolExecutor denies the tool.
|
||||
- Outside skill context, secrets are treated as "ambient" (allowed) to preserve backward compatibility.
|
||||
|
||||
See:
|
||||
|
||||
- `src/tools/types.ts`
|
||||
- `src/tools/executor.ts`
|
||||
- Examples: `src/tools/builtin/gmail.ts`, `src/tools/builtin/gcal.ts`, `src/tools/builtin/web-search.ts`
|
||||
|
||||
## Audit Logging (Without Secret Leaks)
|
||||
|
||||
Tool execution is audited, but sensitive values are redacted before writing to disk.
|
||||
|
||||
See:
|
||||
|
||||
- `src/audit/logger.ts`
|
||||
- `src/audit/types.ts`
|
||||
- `src/audit/redact.ts`
|
||||
|
||||
Notable fields:
|
||||
|
||||
- `execution_id`: a per-tool-call UUID for correlation
|
||||
- `execution_environment`: `host` or `sandbox`
|
||||
- `skill_name`: active skill (if any)
|
||||
- `redactions_applied`: count of redaction operations
|
||||
- `tool.approval`: emitted when a confirm hook is resolved
|
||||
|
||||
Example tool start event (JSONL):
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": 0,
|
||||
"level": "debug",
|
||||
"event_type": "tool.start",
|
||||
"event": {
|
||||
"tool_name": "shell.exec",
|
||||
"execution_id": "...",
|
||||
"execution_environment": "sandbox",
|
||||
"skill_name": "my-web-skill",
|
||||
"redactions_applied": 1,
|
||||
"tool_args": { "command": "echo [REDACTED_TOKEN]" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Recommended Operator Defaults
|
||||
|
||||
- Enable Docker sandboxing (`sandbox.enabled: true`).
|
||||
- Enable DM pairing (`pairing.enabled: true`) on any messaging surface.
|
||||
- Use a conservative tool profile for general chat (`tools.profile: messaging`).
|
||||
- Use skill intent routing for specialized workflows and keep skill permissions narrow.
|
||||
Reference in New Issue
Block a user