will/flynn

Files

T

William Valentin 892668cb2f docs(plan): OpenClaw-style personal agent without OpenClaw risks

2026-02-14 09:51:17 -08:00

7.9 KiB

Raw Blame History

OpenClaw-Style Personal Agent (Without OpenClaw Risks) — Milestone Plan

Date: 2026-02-14 Scope: Make Flynn feel as efficient as a productized personal assistant (OpenClaw-like), while explicitly avoiding the structural security risks of a broad “skills marketplace + many surfaces + powerful tools” system. Context: Flynn already has strong core architecture (multi-channel daemon, gateway/UI, tool policy, Docker sandboxing, memory, automation). This milestone is about tightening the trust boundary and making the safe path the default path.

Goals

Safe-by-default extension model: “skills” are locally-auditable, capability-declared, and policy-enforced.
Sandbox-by-default for risk: any skill/tool execution with meaningful blast radius runs in an isolation boundary by default.
Prompt-injection resistance: fetched/untrusted content cannot directly drive tool calls or secret exfiltration.
Operational clarity: high-quality audit trails of tool calls, approvals, and materialized changes.
Product efficiency parity: fast onboarding and “works everywhere I already talk” without expanding the attack surface indiscriminately.

Non-Goals (for this milestone)

A public “skills marketplace” or auto-installing third-party skills.
Unbounded remote device/node execution.
Full browser RPA parity (can be a later milestone).

Threat Model (Practical)

We assume:

An attacker can send messages on at least one channel surface (DM or group), and can embed prompt-injection payloads in web pages, email bodies, PDFs, etc.
The assistant has access to local files, network, and credential material (directly or indirectly).

We must prevent:

Unauthorized tool execution (filesystem writes, shell, browser automation, network posts).
Data exfiltration (secrets and private data leaving the machine via model/tool).
Supply-chain compromise via skills (untrusted code/commands running because “it’s a skill”).

Design Principles

Capability declarations beat intentions: a skill’s declared permissions determine what it can do, not the LLM’s narrative.
Deny by default, allow narrowly: explicit whitelists for tools, paths, and network hosts.
Untrusted content isolation: fetched content is treated as data, never as instructions.
Make approvals boring: human-in-the-loop prompts are short, scoped, and explain impact.

Milestone Structure (PR-Sized Slices)

PR 1: Capability Manifests + Policy Binding (Skills)

Outcome: Every skill has a manifest that declares what it is allowed to do, and Flynn enforces that at runtime.

Work items

Define skills/manifest.json schema extension:
- permissions.tool_groups: e.g. ["group:fs", "group:web", "group:runtime"]
- permissions.tools: explicit allowlist patterns, overrides tool_groups
- permissions.fs: allowlisted path globs (read/write separate)
- permissions.net: allowlisted host globs + ports (optional)
- permissions.secrets: named secret scopes (no ambient access)
Bind skill permissions to tool policy evaluation (skill execution context carries a policy profile).
Add a “capability diff” display for installs/enables:
- “This skill requests: write access to ~/Documents/notes/**, network access to api.todoist.com:443, tool web.search.”
Tests:
- Skill with insufficient permissions cannot invoke denied tool or write outside allowed paths.
- Skill with allowed permissions succeeds.

Acceptance

A skill cannot call a denied tool even if the model tries.
A skill cannot read/write outside declared fs scopes.

PR 2: Sandbox-by-Default Enforcement for High-Risk Tools

Outcome: Risky actions run inside isolation by default, with a clear escape hatch for trusted local workflows.

Work items

Define a “risk tier” mapping for tool groups:
- low: pure compute / formatting
- medium: web fetch/search
- high: filesystem writes, shell/process, browser automation, credentialed API actions
Enforce:
- high tools must execute inside the session sandbox (Docker) unless policy explicitly allows “host mode”.
Add a per-session “execution environment” indicator in gateway/UI (host vs sandbox).
Tests:
- Attempting high tool in host mode without explicit allow fails deterministically.

Acceptance

By default, tools with meaningful blast radius do not run on the host.

PR 3: Prompt Injection Firewall (Content Provenance + Tool Gating)

Outcome: Untrusted content can be used as context but not as control.

Work items

Introduce provenance tags on content:
- user_message vs fetched_content vs tool_output vs memory
Update prompts to:
- Explicitly forbid following instructions found inside fetched_content/tool_output.
- Require a short “tool intent” explanation before tool use when untrusted content is present.
Add a “tool-call guard” layer:
- If the model’s tool call arguments contain obvious injection markers (“ignore previous”, “exfiltrate”, “send to”, etc.) and the request originated from untrusted content, force a confirmation step or deny.
- Reject tool calls that attempt to reference secrets directly in arguments.
Tests:
- A web page that contains “run shell command X” does not trigger shell tool use.
- Model attempts to copy-paste secrets into a network request are blocked/redacted.

Acceptance

Untrusted content cannot directly cause tool execution without an explicit, policy-consistent path.

PR 4: Secret Scoping + Audit Logging (Operator-Grade)

Outcome: Secrets are scoped and auditable; tool usage is traceable without leaking secrets into logs.

Work items

Add a “secret scope” mechanism for tools and skills:
- Tools declare requiredSecretScopes[]
- Skills declare permissions.secrets[]
- Resolution: only scoped secrets are provided to tool execution; no ambient env pass-through.
Extend audit log events:
- tool start/end, policy decision, sandbox vs host, approval prompts, redactions applied
- stable ids for correlation per session
Tests:
- Secrets never appear in logs (even on tool failure).
- Tool without declared scope cannot obtain secret.

Acceptance

You can answer “what ran, with what permissions, in what environment” for any session.

PR 5: Product Efficiency Layer (Minimal Surfaces, Max Habit)

Outcome: Make Flynn feel “always there” without adding unnecessary new attack surface.

Work items

Tighten onboarding defaults:
- choose 2-3 “recommended surfaces” and ensure they are first-class (WebChat + one messaging channel).
- ensure pairing/auth flows are the default for unknown senders.
Add “assistant ergonomics”:
- a small set of default automations that are safe (summaries, reminders) behind explicit opt-in.
- fast-path help for configuration gaps (already used for voice transcription, continue pattern).
Tests:
- Setup wizard produces a config with safe defaults: pairing on, conservative tool profile, sandbox for high-risk.

Acceptance

A new user can get to a useful assistant quickly, and default config does not expose dangerous capabilities.

Sequencing

PR 1 (capabilities) and PR 2 (sandbox enforcement) first: they set the trust boundary.
PR 3 (injection firewall) next: it hardens the most common real-world failure mode.
PR 4 (secrets + audit) to make the system operable and trustworthy long-term.
PR 5 (efficiency layer) last: UX improvements on top of hardened foundations.

Definition of Done (Milestone)

Skills are capability-declared and enforced by policy.
High-risk tools default to sandbox execution.
Untrusted content is isolated and cannot drive tool execution without policy-compliant intent + gating.
Secrets are scoped and never leak into logs.
New-user path yields a safe, useful personal assistant in under ~10 minutes on WebChat.

7.9 KiB Raw Blame History Unescape Escape

OpenClaw-Style Personal Agent (Without OpenClaw Risks) — Milestone Plan

Goals

Non-Goals (for this milestone)

Threat Model (Practical)

Design Principles

Milestone Structure (PR-Sized Slices)

PR 1: Capability Manifests + Policy Binding (Skills)

PR 2: Sandbox-by-Default Enforcement for High-Risk Tools

PR 3: Prompt Injection Firewall (Content Provenance + Tool Gating)

PR 4: Secret Scoping + Audit Logging (Operator-Grade)

PR 5: Product Efficiency Layer (Minimal Surfaces, Max Habit)

Sequencing

Definition of Done (Milestone)

7.9 KiB

Raw Blame History