docs: add safety docs and OpenClaw gap roadmap

This commit is contained in:
William Valentin
2026-02-15 10:17:07 -08:00
parent 28304ac397
commit f2cdd1abd2
14 changed files with 3869 additions and 40 deletions
+206
View File
@@ -0,0 +1,206 @@
# Agent-Oriented Project Diagram
This is a high-signal, agent-oriented view of Flynn's structure and execution flow.
If you're new to the codebase, start here, then jump to the referenced files.
## Big Picture (Runtime Data Flow)
```text
Inbound Message
(Telegram/Discord/Slack/WhatsApp/WebChat)
|
v
ChannelAdapter -> ChannelRegistry
| |
| v
| createMessageRouter()
| |
| v
| SessionManager
| |
| v
| AgentOrchestrator
| |
| v
| NativeAgent
| |
| ModelRouter.chat()
| |
| v
| ModelClient
|
+----> (optional) PairingManager gate for unknown senders
Tool Calls (inside NativeAgent loop)
NativeAgent -> ToolRegistry (policy-filtered) -> ToolExecutor
| |
| v
| HookEngine + autonomy
| |
| v
| Tool.execute()
| |
| v
+---------------------------> AuditLogger (redacted)
Outbound Reply
-> ChannelAdapter.send() (text + optional attachments)
```
Key files:
- Routing + per-session agent creation: `src/daemon/routing.ts`
- Orchestration: `src/backends/native/orchestrator.ts`
- Tool loop: `src/backends/native/agent.ts`
- Model routing: `src/models/router.ts`
- Tool policy + execution: `src/tools/policy.ts`, `src/tools/executor.ts`
## Component Graph (Agent-Safety Boundary)
```text
+---------------------------+
| Config |
| (Zod schema + YAML) |
| src/config/schema.ts |
+-------------+-------------+
|
v
+-------------------+ +-------------+ +------------------+
| SkillRegistry | | ToolPolicy | | HookEngine |
| src/skills/* | | src/tools/* | | src/hooks/* |
+---------+---------+ +------+------+ +---------+--------+
| | |
| (system prompt) | (allow/deny) | (confirm/log/silent)
v v v
+-------------------+ +-------------+ +------------------+
| System Prompt | | ToolRegistry| | ToolExecutor |
| src/daemon/services.ts| src/tools/* | | src/tools/executor.ts
+---------+---------+ +------+------+ +---------+--------+
| | |
v | |
+-------------------+ | v
| AgentOrchestrator | | +-----------+
| src/backends/* | +------------> | AuditLogger|
+---------+---------+ | src/audit/*|
|
v
+-------------------+
| NativeAgent |
| src/backends/* |
+---------+---------+
|
v
+-------------------+
| ModelRouter |
| src/models/* |
+-------------------+
```
## Skills + Capabilities (What Gets Enforced)
Skills are local directories with:
- `SKILL.md` (instructions injected into the system prompt)
- `manifest.json` (metadata + optional `permissions`)
### Skill permissions enforcement points
- Tool availability: `ToolPolicy.resolveAllowedNames()` intersects allowed tools with `manifest.json.permissions`.
- Tool execution (defense in depth): `ToolExecutor.execute()` enforces:
- fs allowlists (`permissions.fs.read` / `permissions.fs.write`)
- net allowlists (best-effort for `web.fetch`)
- secret scopes (tools declare `requiredSecretScopes`, skills allow `permissions.secrets`)
- injection guard when untrusted content is present
Important default:
- If a request is routed into a skill context but the skill has no `permissions` manifest, **tool access is denied**.
Key files:
- Skill manifest types: `src/skills/types.ts`
- Loader validation: `src/skills/loader.ts`
- Policy intersection: `src/tools/policy.ts`
- Executor enforcement: `src/tools/executor.ts`
## Sandbox Execution (High-Risk Tools)
Flynn supports per-session Docker sandboxes.
Where sandboxing is applied today:
- `shell.exec` and `process.start` can be replaced with sandboxed implementations.
- Replacement is wired in `src/daemon/routing.ts` by cloning the ToolRegistry and swapping the tool implementations.
Skill context default:
- High-risk tool execution defaults to `sandbox` in skill context (when available).
- A skill can opt into host execution only by setting `permissions.execution_environment: "host"`.
Key files:
- Sandbox lifecycle: `src/sandbox/manager.ts`, `src/sandbox/docker.ts`
- Sandboxed tool wrappers: `src/sandbox/tools.ts`
- Wiring: `src/daemon/routing.ts`
## Prompt Injection Hardening (Practical)
Flynn treats content provenance as part of the control boundary:
- `web.fetch`, `web.search`, and `browser.content` outputs are treated as untrusted "fetched_content".
- Tool results are wrapped in provenance markers inside the tool loop.
- Once untrusted content is seen, ToolExecutor applies stricter gating (blocks obvious injection patterns for high-risk tools).
Key files:
- Provenance wrapping: `src/backends/native/agent.ts`
- Tool-call guard: `src/tools/executor.ts`
- System prompt safety guidance: `src/daemon/services.ts`
## Mermaid (For Fast Visual Scanning)
If your renderer supports Mermaid, this is the same information as a sequence diagram.
```mermaid
sequenceDiagram
autonumber
participant U as User
participant CA as ChannelAdapter
participant CR as ChannelRegistry
participant SM as SessionManager
participant AR as AgentOrchestrator
participant NA as NativeAgent
participant MR as ModelRouter
participant MC as ModelClient
participant TP as ToolPolicy/Registry
participant TE as ToolExecutor
participant HE as HookEngine
participant AL as AuditLogger
U->>CA: message
CA->>CR: onMessage(InboundMessage)
CR->>SM: getSession(channel, sender)
SM-->>CR: Session
CR->>AR: getOrCreateAgent(session + routing)
AR->>NA: process(userMessage)
NA->>MR: chat(messages + tools)
MR->>MC: provider request
MC-->>MR: response (content or tool_calls)
MR-->>NA: ChatResponse
alt model requests tool use
NA->>TP: filtered tool list (skill + policy)
NA->>TE: execute(tool, args, context)
TE->>HE: confirm/log/silent (autonomy)
HE-->>TE: approved/denied
TE->>AL: audit (redacted)
TE-->>NA: ToolResult
NA->>MR: chat(tool_result blocks)
end
NA-->>AR: assistant response
AR-->>CR: OutboundMessage
CR-->>CA: send()
CA-->>U: reply
```
+250
View File
@@ -0,0 +1,250 @@
# Contributor Map (Agent-Oriented)
This is a fast navigation guide for contributors (human or AI). It answers:
- Where do I add a new tool?
- Where do I add a new skill?
- Where do I change routing/policy?
- What tests should I run?
For the execution-flow diagram, see `docs/architecture/AGENT_DIAGRAM.md`.
## 30-Second Repo Tour
```text
src/
daemon/ Start-up wiring, service init, message routing
backends/ Native agent + orchestrator (tool loop lives here)
tools/ Tool interfaces, policy, executor, builtins
skills/ Skill loader/registry + install/watch infra
hooks/ Confirm/log/silent policy + autonomy resolution
sandbox/ Docker sandbox manager + sandboxed tool wrappers
models/ Provider clients + model router + retry/cost/capabilities
channels/ Chat adapters + pairing gate
gateway/ WebSocket JSON-RPC server + web UI + handlers
memory/ Hybrid search + embeddings + persistence
session/ SQLite store + session mgmt
cli/ CLI entrypoints + setup wizard
automation/ Cron/webhooks/heartbeat/gmail watcher
docs/
api/ Tool and gateway protocol docs
security/ Capability model, sandboxing, injection resistance
architecture/ Diagrams + contributor maps
config/
default.yaml Example configuration
```
## Adding a New Tool
### Where code goes
- Builtins live in `src/tools/builtin/`.
- Core types live in `src/tools/types.ts`.
- Tools are registered through the daemon wiring (see existing patterns in `src/daemon/index.ts`).
### Minimal tool skeleton
```ts
import type { Tool, ToolResult } from '../types.js';
export const myTool: Tool = {
name: 'my.tool',
description: 'What it does (model-facing).',
// Optional: gate credentialed actions
requiredSecretScopes: ['my_scope'],
inputSchema: {
type: 'object',
properties: {
foo: { type: 'string', description: '...' },
},
required: ['foo'],
},
execute: async (rawArgs: unknown): Promise<ToolResult> => {
// ...
return { success: true, output: 'ok' };
},
};
```
### Security checklist
- If the tool calls an external service or uses credentials:
- set `requiredSecretScopes` on the tool.
- ensure skill permissions can gate it (`manifest.json.permissions.secrets`).
- If it reads/writes files:
- use `file.*` tools rather than bespoke FS access.
- skills can restrict FS paths via `permissions.fs`.
### Tests to add
- Unit tests in `src/tools/builtin/<tool>.test.ts`.
- If you touch policy/executor logic: add tests in `src/tools/policy.test.ts` or `src/tools/executor.test.ts`.
## Adding a New Skill
### What a skill is
A skill is a package with:
- `SKILL.md`: instructions injected into the system prompt.
- `manifest.json`: metadata + capability declarations.
Where skills live:
- Bundled skills: `skills/`
- Managed skills (installed by Flynn): configured skill directory (see config)
Skill loading:
- Loader: `src/skills/loader.ts`
- Registry: `src/skills/registry.ts`
- Watcher (optional): `src/skills/watcher.ts`
### Capability permissions
If the skill is used via routing (intent target type `skill`), add `permissions` to `manifest.json`.
Without `permissions`, a skill is still loadable, but in skill context it has no tool access.
Reference: `docs/security/SAFE_PERSONAL_AGENT.md`.
## Routing: Agents vs Skills vs Default
Where routing decisions happen:
- Inbound routing: `src/daemon/routing.ts`
Inputs to routing:
- Channel + sender (agent router)
- Intent registry (regex rules) — can target `agent` or `skill`
- Metadata overrides (gateway / channel adapters)
If you need a new routing rule type:
- Intent targets live in the intent registry/types (see `src/intents/registry.ts`).
## Tool Policy + Execution
You will usually touch these files for capability/security work:
- Tool allowlisting: `src/tools/policy.ts`
- Tool runtime enforcement + audit: `src/tools/executor.ts`
- Confirmation/autonomy: `src/hooks/engine.ts`, `src/hooks/autonomy.ts`
In skill context:
- `ToolPolicyContext.skillName` and `.skillPermissions` are set in `src/daemon/routing.ts`.
- ToolPolicy filters available tools.
- ToolExecutor enforces fs/net/secret/injection restrictions even if a tool is somehow called.
## Sandbox
Sandbox components:
- Docker sandbox manager: `src/sandbox/manager.ts`
- Docker implementation: `src/sandbox/docker.ts`
- Sandboxed tool wrappers: `src/sandbox/tools.ts`
- Tool replacement wiring: `src/daemon/routing.ts`
Notes:
- Today the sandbox wiring replaces `shell.exec` and `process.start` when sandbox is enabled.
- In skill context, high-risk execution defaults to sandbox unless the skill opts into host execution.
## Gateway / API Surface
Gateway protocol docs:
- `docs/api/PROTOCOL.md`
Gateway handlers:
- `src/gateway/handlers/` (JSON-RPC methods)
Useful places to start:
- `src/gateway/server.ts` (server lifecycle)
- `src/gateway/protocol.ts` (types)
## Tests + Commands
Common checks:
```bash
pnpm typecheck
pnpm lint
pnpm test:run
```
Targeted tests for safety boundary changes:
- Tool policy: `pnpm test:run src/tools/policy.test.ts`
- Tool executor: `pnpm test:run src/tools/executor.test.ts`
- Skill loader: `pnpm test:run src/skills/loader.test.ts`
- Routing: `pnpm test:run src/daemon/routing.test.ts`
## First 3 PRs to Pick Up (Good Agent On-Ramps)
These are small, high-leverage changes that teach you the architecture quickly.
### PR 1: Add a new "narrow" skill + permissions
Goal: add a skill that can only do one bounded thing (example: summarize a URL).
Deliverables:
- `skills/url-summarizer/SKILL.md`
- `skills/url-summarizer/manifest.json` with permissions:
- `tool_groups: ["group:web"]`
- `net: [{"host":"*","ports":[443]}]` (or narrower if you prefer)
- `execution_environment: "sandbox"` (default)
Acceptance:
- Skill loads (`flynn doctor` / skills list)
- In skill context, `shell.exec` is not available
- `web.fetch` works for https URLs
### PR 2: Route into the skill via intents
Goal: make it easy to invoke the skill without special UI.
Deliverables:
- Add an `intents.rules[]` entry targeting `type: skill`
- Patterns like: `summarize *`, `tldr *`
Acceptance:
- A message like `summarize https://example.com` routes to the skill
- Tool list is capability-filtered for that skill context
### PR 3: Add an end-to-end safety test
Goal: lock in behavior so future refactors dont weaken the boundary.
Deliverables:
- A test that asserts: when routed to a skill context with web-only permissions:
- `ToolPolicy` excludes `shell.exec` and `file.write`
- `ToolExecutor` denies a direct attempt to call `file.write` outside allowed fs globs
Suggested test locations:
- `src/tools/policy.test.ts`
- `src/tools/executor.test.ts`
## Where to Add What (Cheat Sheet)
```text
New tool .................. src/tools/builtin/ + register in daemon
Tool allow/deny logic ...... src/tools/policy.ts
Tool runtime enforcement .... src/tools/executor.ts
New skill .................. skills/<name>/{SKILL.md,manifest.json}
Skill loader/validation ..... src/skills/loader.ts
Skill routing (intents) ..... src/daemon/routing.ts + config intents
Sandbox behavior ........... src/sandbox/* + src/daemon/routing.ts
Confirmation UX ............ src/hooks/* + frontends/gateway
Web UI changes ............. src/gateway/ui/
```
+143
View File
@@ -0,0 +1,143 @@
# Symbol Index (Agent Quick-Jump)
This is a curated index of the most important exported types and functions, organized for fast navigation.
It is intentionally short: if something isn't here, it's probably not a primary control surface.
See also:
- `docs/architecture/TYPESCRIPT_MAP.md` (conceptual map + diagrams)
- `docs/architecture/AGENT_DIAGRAM.md` (runtime flow)
## Daemon Entry Points
- `src/daemon/index.ts`
- Creates/wires: config, tool registry, tool executor, skill registry, model router, gateway, channels.
- `src/daemon/routing.ts`
- `createMessageRouter(deps)`
- Main router factory used by channel adapters + gateway.
## Routing / Intents / Agents
- `src/agents/router.ts`
- `AgentRouter.resolve(channel, senderId)`
- Picks an agent config for a sender/channel.
- `src/agents/registry.ts`
- `AgentConfigRegistry.get(name)`
- Loads agent configs.
- `src/intents/registry.ts`
- `IntentRegistry.match(text)`
- Matches text to intent rules (targets: agent or skill).
## Native Agent Loop
- `src/backends/native/orchestrator.ts`
- `AgentOrchestrator.process(message, options)`
- Top-level entry for “run agent on this message”.
- `src/backends/native/agent.ts`
- `NativeAgent` (class)
- Internal hot path: tool loop that:
- asks model
- executes tools
- returns tool results
- repeats
## Models
- `src/models/router.ts`
- `ModelRouter.chat(request)`
- Chooses tier/provider fallback chain.
- `src/models/types.ts`
- Core request/response types shared by providers.
## Tools
- `src/tools/types.ts`
- `Tool`
- `ToolResult`
- `src/tools/registry.ts`
- `ToolRegistry.register(tool)`
- `ToolRegistry.list()`
- `ToolRegistry.clone()` / `ToolRegistry.replace(tool)` (used for sandbox substitution)
- `src/tools/policy.ts`
- `ToolPolicy` (class)
- `ToolPolicy.resolveAllowedNames(allToolNames, context)`
- `ToolPolicyContext` (type)
- `TOOL_GROUPS` (group expansion)
- `src/tools/executor.ts`
- `ToolExecutor.execute(toolName, args, context)`
- Central enforcement point:
- policy allow/deny
- hooks/autonomy confirmations
- skill fs/net/secret constraints
- untrusted-content injection guard
- audit events w/ redaction
## Skills
- `src/skills/types.ts`
- `SkillManifest`
- `SkillPermissions`
- `src/skills/loader.ts`
- `loadSkill(dir, tier)`
- `loadAllSkills(...)`
- Validates manifests.
- `src/skills/registry.ts`
- `SkillRegistry.get(name)`
- `SkillRegistry.getSystemPromptAdditions()`
## Hooks / Approval
- `src/hooks/engine.ts`
- `HookEngine.getAction(toolName)`
- `HookEngine.requestConfirmation(toolName, args)`
- `src/hooks/autonomy.ts`
- `resolveAutonomy(toolName, baseAction, autonomyLevel)`
## Sandbox
- `src/sandbox/manager.ts`
- Manages per-session sandbox lifecycle.
- `src/sandbox/tools.ts`
- Creates sandboxed tool implementations for `shell.exec` / `process.start`.
- `src/sandbox/docker.ts`
- Docker-specific implementation.
## Audit
- `src/audit/index.ts`
- `auditLogger` singleton.
- `src/audit/types.ts`
- Event types (`tool.start`, `tool.success`, `tool.denied`, `tool.approval`, ...)
- `src/audit/logger.ts`
- `AuditLogger` methods: `toolStart`, `toolDenied`, `toolApproval`, ...
- `src/audit/redact.ts`
- `redactForAudit(value)`
## Gateway
- `src/gateway/server.ts`
- WebSocket server lifecycle.
- `src/gateway/handlers/*`
- JSON-RPC methods grouped by area.
Protocol:
- `docs/api/PROTOCOL.md`
+186
View File
@@ -0,0 +1,186 @@
# TypeScript Map (Types + Hot Functions)
This doc is optimized for AI agents: it names the core TypeScript types and the handful of functions/methods that actually control behavior.
For runtime flow diagrams, see:
- `docs/architecture/AGENT_DIAGRAM.md`
- `docs/architecture/CONTRIBUTOR_MAP.md`
## Core Domain Types (What Matters)
### Messages
- `InboundMessage` / `OutboundMessage`
- Used by channel adapters and the message router.
- Source: `src/channels/types.ts`, `src/daemon/routing.ts`
### Tools
- `Tool`
- A single capability callable by the model.
- Source: `src/tools/types.ts`
- `ToolResult`
- Return value from a tool.
- Source: `src/tools/types.ts`
- `ToolPolicyContext`
- Dynamic context used to decide tool availability and enforcement.
- Source: `src/tools/policy.ts`
Key fields to know:
- `agent`, `provider`, `autonomyLevel`
- `skillName`, `skillPermissions`
- `executionEnvironment` (`host` or `sandbox`)
- `untrustedContent` (tightens guards after fetched content appears)
### Skills
- `SkillManifest`
- `manifest.json` parsed and validated.
- Source: `src/skills/types.ts`
- `SkillPermissions`
- Capability declarations that get enforced at runtime.
- Source: `src/skills/types.ts`
### Audit
- `AuditEventType` and tool events (`tool.start`, `tool.success`, `tool.denied`, `tool.approval`)
- Source: `src/audit/types.ts`
## Hot Functions / Methods (Where Behavior Lives)
If you only read 10 definitions, read these:
- `createMessageRouter()`
- Routes inbound messages, resolves intent targets (agent vs skill).
- File: `src/daemon/routing.ts`
- `getOrCreateAgent()` (inner helper)
- Builds per-session `AgentOrchestrator` and sets `toolPolicyContext`.
- File: `src/daemon/routing.ts`
- `AgentOrchestrator.process()`
- Runs the agent loop and streams output.
- File: `src/backends/native/orchestrator.ts`
- `NativeAgent.toolLoop()`
- The core loop: model -> tool calls -> tool results -> model -> final response.
- Adds provenance markers to tool results.
- File: `src/backends/native/agent.ts`
- `ToolPolicy.resolveAllowedNames()`
- Computes the available tool set for a given context.
- Enforces skill capability intersection (deny-by-default for skill context).
- File: `src/tools/policy.ts`
- `ToolExecutor.execute()`
- Defense-in-depth enforcement + hooks/autonomy + auditing.
- Enforces skill fs/net/secret scope + injection guard.
- File: `src/tools/executor.ts`
## Diagram: Key Types (Mermaid)
```mermaid
classDiagram
class Tool {
+string name
+string description
+JSONSchema inputSchema
+string[]? requiredSecretScopes
+execute(args): Promise~ToolResult~
}
class ToolResult {
+boolean success
+string output
+string? error
}
class ToolPolicyContext {
+string? agent
+string? provider
+string? autonomyLevel
+string? sessionId
+string? channel
+string? sender
+string? tier
+string? skillName
+SkillPermissions? skillPermissions
+string? executionEnvironment
+boolean? untrustedContent
+string[]? allowedSecretScopes
}
class SkillManifest {
+string name
+string description
+string version
+string tier
+SkillPermissions? permissions
}
class SkillPermissions {
+string[]? tool_groups
+string[]? tools
+SkillFsPermissions? fs
+SkillNetPermission[]? net
+string[]? secrets
+string? execution_environment
}
class AuditToolEvent {
+string tool_name
+string? execution_id
+string? execution_environment
+string? skill_name
+number? redactions_applied
}
Tool --> ToolResult
ToolPolicyContext --> SkillPermissions
SkillManifest --> SkillPermissions
```
## Diagram: Control Flow (Tool Call Path)
```mermaid
flowchart TD
A[Model proposes tool call] --> B[ToolPolicy filters allowed tools]
B --> C[ToolExecutor.execute]
C --> D{Allowed by policy?}
D -- no --> X[Denied + audit]
D -- yes --> E{Hooks/autonomy confirm?}
E -- denied --> X
E -- approved --> F{Skill constraints}
F -- violation --> X
F -- ok --> G{Untrusted content guard}
G -- blocked --> X
G -- ok --> H[Tool.execute]
H --> I[Audit (redacted)]
I --> J[ToolResult returned to model]
```
## Diagram: Module Entry Points
These are the places you typically jump to first.
```text
src/daemon/index.ts
- wires together: config + skillRegistry + toolRegistry + toolExecutor + router
src/daemon/routing.ts
- inbound routing + intent match + per-session agent construction
src/backends/native/agent.ts
- tool loop (the actual "agent")
src/tools/policy.ts
- tool allow/deny resolution (+ skill capability intersection)
src/tools/executor.ts
- enforcement + hooks + auditing
```