docs: add safety docs and OpenClaw gap roadmap

This commit is contained in:
William Valentin
2026-02-15 10:17:07 -08:00
parent 28304ac397
commit f2cdd1abd2
14 changed files with 3869 additions and 40 deletions
+206
View File
@@ -0,0 +1,206 @@
# Agent-Oriented Project Diagram
This is a high-signal, agent-oriented view of Flynn's structure and execution flow.
If you're new to the codebase, start here, then jump to the referenced files.
## Big Picture (Runtime Data Flow)
```text
Inbound Message
(Telegram/Discord/Slack/WhatsApp/WebChat)
|
v
ChannelAdapter -> ChannelRegistry
| |
| v
| createMessageRouter()
| |
| v
| SessionManager
| |
| v
| AgentOrchestrator
| |
| v
| NativeAgent
| |
| ModelRouter.chat()
| |
| v
| ModelClient
|
+----> (optional) PairingManager gate for unknown senders
Tool Calls (inside NativeAgent loop)
NativeAgent -> ToolRegistry (policy-filtered) -> ToolExecutor
| |
| v
| HookEngine + autonomy
| |
| v
| Tool.execute()
| |
| v
+---------------------------> AuditLogger (redacted)
Outbound Reply
-> ChannelAdapter.send() (text + optional attachments)
```
Key files:
- Routing + per-session agent creation: `src/daemon/routing.ts`
- Orchestration: `src/backends/native/orchestrator.ts`
- Tool loop: `src/backends/native/agent.ts`
- Model routing: `src/models/router.ts`
- Tool policy + execution: `src/tools/policy.ts`, `src/tools/executor.ts`
## Component Graph (Agent-Safety Boundary)
```text
+---------------------------+
| Config |
| (Zod schema + YAML) |
| src/config/schema.ts |
+-------------+-------------+
|
v
+-------------------+ +-------------+ +------------------+
| SkillRegistry | | ToolPolicy | | HookEngine |
| src/skills/* | | src/tools/* | | src/hooks/* |
+---------+---------+ +------+------+ +---------+--------+
| | |
| (system prompt) | (allow/deny) | (confirm/log/silent)
v v v
+-------------------+ +-------------+ +------------------+
| System Prompt | | ToolRegistry| | ToolExecutor |
| src/daemon/services.ts| src/tools/* | | src/tools/executor.ts
+---------+---------+ +------+------+ +---------+--------+
| | |
v | |
+-------------------+ | v
| AgentOrchestrator | | +-----------+
| src/backends/* | +------------> | AuditLogger|
+---------+---------+ | src/audit/*|
|
v
+-------------------+
| NativeAgent |
| src/backends/* |
+---------+---------+
|
v
+-------------------+
| ModelRouter |
| src/models/* |
+-------------------+
```
## Skills + Capabilities (What Gets Enforced)
Skills are local directories with:
- `SKILL.md` (instructions injected into the system prompt)
- `manifest.json` (metadata + optional `permissions`)
### Skill permissions enforcement points
- Tool availability: `ToolPolicy.resolveAllowedNames()` intersects allowed tools with `manifest.json.permissions`.
- Tool execution (defense in depth): `ToolExecutor.execute()` enforces:
- fs allowlists (`permissions.fs.read` / `permissions.fs.write`)
- net allowlists (best-effort for `web.fetch`)
- secret scopes (tools declare `requiredSecretScopes`, skills allow `permissions.secrets`)
- injection guard when untrusted content is present
Important default:
- If a request is routed into a skill context but the skill has no `permissions` manifest, **tool access is denied**.
Key files:
- Skill manifest types: `src/skills/types.ts`
- Loader validation: `src/skills/loader.ts`
- Policy intersection: `src/tools/policy.ts`
- Executor enforcement: `src/tools/executor.ts`
## Sandbox Execution (High-Risk Tools)
Flynn supports per-session Docker sandboxes.
Where sandboxing is applied today:
- `shell.exec` and `process.start` can be replaced with sandboxed implementations.
- Replacement is wired in `src/daemon/routing.ts` by cloning the ToolRegistry and swapping the tool implementations.
Skill context default:
- High-risk tool execution defaults to `sandbox` in skill context (when available).
- A skill can opt into host execution only by setting `permissions.execution_environment: "host"`.
Key files:
- Sandbox lifecycle: `src/sandbox/manager.ts`, `src/sandbox/docker.ts`
- Sandboxed tool wrappers: `src/sandbox/tools.ts`
- Wiring: `src/daemon/routing.ts`
## Prompt Injection Hardening (Practical)
Flynn treats content provenance as part of the control boundary:
- `web.fetch`, `web.search`, and `browser.content` outputs are treated as untrusted "fetched_content".
- Tool results are wrapped in provenance markers inside the tool loop.
- Once untrusted content is seen, ToolExecutor applies stricter gating (blocks obvious injection patterns for high-risk tools).
Key files:
- Provenance wrapping: `src/backends/native/agent.ts`
- Tool-call guard: `src/tools/executor.ts`
- System prompt safety guidance: `src/daemon/services.ts`
## Mermaid (For Fast Visual Scanning)
If your renderer supports Mermaid, this is the same information as a sequence diagram.
```mermaid
sequenceDiagram
autonumber
participant U as User
participant CA as ChannelAdapter
participant CR as ChannelRegistry
participant SM as SessionManager
participant AR as AgentOrchestrator
participant NA as NativeAgent
participant MR as ModelRouter
participant MC as ModelClient
participant TP as ToolPolicy/Registry
participant TE as ToolExecutor
participant HE as HookEngine
participant AL as AuditLogger
U->>CA: message
CA->>CR: onMessage(InboundMessage)
CR->>SM: getSession(channel, sender)
SM-->>CR: Session
CR->>AR: getOrCreateAgent(session + routing)
AR->>NA: process(userMessage)
NA->>MR: chat(messages + tools)
MR->>MC: provider request
MC-->>MR: response (content or tool_calls)
MR-->>NA: ChatResponse
alt model requests tool use
NA->>TP: filtered tool list (skill + policy)
NA->>TE: execute(tool, args, context)
TE->>HE: confirm/log/silent (autonomy)
HE-->>TE: approved/denied
TE->>AL: audit (redacted)
TE-->>NA: ToolResult
NA->>MR: chat(tool_result blocks)
end
NA-->>AR: assistant response
AR-->>CR: OutboundMessage
CR-->>CA: send()
CA-->>U: reply
```