diff --git a/docs/plans/2026-02-14-openclaw-style-personal-agent-without-openclaw-risks-plan.md b/docs/plans/2026-02-14-openclaw-style-personal-agent-without-openclaw-risks-plan.md new file mode 100644 index 0000000..ecf4e43 --- /dev/null +++ b/docs/plans/2026-02-14-openclaw-style-personal-agent-without-openclaw-risks-plan.md @@ -0,0 +1,159 @@ +# OpenClaw-Style Personal Agent (Without OpenClaw Risks) — Milestone Plan + +**Date:** 2026-02-14 +**Scope:** Make Flynn feel as efficient as a productized personal assistant (OpenClaw-like), while explicitly avoiding the structural security risks of a broad “skills marketplace + many surfaces + powerful tools” system. +**Context:** Flynn already has strong core architecture (multi-channel daemon, gateway/UI, tool policy, Docker sandboxing, memory, automation). This milestone is about tightening the trust boundary and making the safe path the default path. + +## Goals + +1. **Safe-by-default extension model**: “skills” are locally-auditable, capability-declared, and policy-enforced. +2. **Sandbox-by-default for risk**: any skill/tool execution with meaningful blast radius runs in an isolation boundary by default. +3. **Prompt-injection resistance**: fetched/untrusted content cannot directly drive tool calls or secret exfiltration. +4. **Operational clarity**: high-quality audit trails of tool calls, approvals, and materialized changes. +5. **Product efficiency parity**: fast onboarding and “works everywhere I already talk” without expanding the attack surface indiscriminately. + +## Non-Goals (for this milestone) + +- A public “skills marketplace” or auto-installing third-party skills. +- Unbounded remote device/node execution. +- Full browser RPA parity (can be a later milestone). + +## Threat Model (Practical) + +We assume: +- An attacker can send messages on at least one channel surface (DM or group), and can embed prompt-injection payloads in web pages, email bodies, PDFs, etc. +- The assistant has access to local files, network, and credential material (directly or indirectly). + +We must prevent: +- **Unauthorized tool execution** (filesystem writes, shell, browser automation, network posts). +- **Data exfiltration** (secrets and private data leaving the machine via model/tool). +- **Supply-chain compromise via skills** (untrusted code/commands running because “it’s a skill”). + +## Design Principles + +- **Capability declarations beat intentions**: a skill’s declared permissions determine what it can do, not the LLM’s narrative. +- **Deny by default, allow narrowly**: explicit whitelists for tools, paths, and network hosts. +- **Untrusted content isolation**: fetched content is treated as data, never as instructions. +- **Make approvals boring**: human-in-the-loop prompts are short, scoped, and explain impact. + +## Milestone Structure (PR-Sized Slices) + +### PR 1: Capability Manifests + Policy Binding (Skills) + +**Outcome:** Every skill has a manifest that declares *what it is allowed to do*, and Flynn enforces that at runtime. + +**Work items** +- Define `skills/manifest.json` schema extension: + - `permissions.tool_groups`: e.g. `["group:fs", "group:web", "group:runtime"]` + - `permissions.tools`: explicit allowlist patterns, overrides tool_groups + - `permissions.fs`: allowlisted path globs (read/write separate) + - `permissions.net`: allowlisted host globs + ports (optional) + - `permissions.secrets`: named secret scopes (no ambient access) +- Bind skill permissions to tool policy evaluation (skill execution context carries a policy profile). +- Add a “capability diff” display for installs/enables: + - “This skill requests: write access to `~/Documents/notes/**`, network access to `api.todoist.com:443`, tool `web.search`.” +- Tests: + - Skill with insufficient permissions cannot invoke denied tool or write outside allowed paths. + - Skill with allowed permissions succeeds. + +**Acceptance** +- A skill cannot call a denied tool even if the model tries. +- A skill cannot read/write outside declared fs scopes. + +--- + +### PR 2: Sandbox-by-Default Enforcement for High-Risk Tools + +**Outcome:** Risky actions run inside isolation by default, with a clear escape hatch for trusted local workflows. + +**Work items** +- Define a “risk tier” mapping for tool groups: + - `low`: pure compute / formatting + - `medium`: web fetch/search + - `high`: filesystem writes, shell/process, browser automation, credentialed API actions +- Enforce: + - `high` tools must execute inside the session sandbox (Docker) unless policy explicitly allows “host mode”. +- Add a per-session “execution environment” indicator in gateway/UI (host vs sandbox). +- Tests: + - Attempting `high` tool in host mode without explicit allow fails deterministically. + +**Acceptance** +- By default, tools with meaningful blast radius do not run on the host. + +--- + +### PR 3: Prompt Injection Firewall (Content Provenance + Tool Gating) + +**Outcome:** Untrusted content can be used as *context* but not as *control*. + +**Work items** +- Introduce provenance tags on content: + - `user_message` vs `fetched_content` vs `tool_output` vs `memory` +- Update prompts to: + - Explicitly forbid following instructions found inside `fetched_content`/`tool_output`. + - Require a short “tool intent” explanation before tool use when untrusted content is present. +- Add a “tool-call guard” layer: + - If the model’s tool call arguments contain obvious injection markers (“ignore previous”, “exfiltrate”, “send to”, etc.) and the request originated from untrusted content, force a confirmation step or deny. + - Reject tool calls that attempt to reference secrets directly in arguments. +- Tests: + - A web page that contains “run shell command X” does not trigger shell tool use. + - Model attempts to copy-paste secrets into a network request are blocked/redacted. + +**Acceptance** +- Untrusted content cannot directly cause tool execution without an explicit, policy-consistent path. + +--- + +### PR 4: Secret Scoping + Audit Logging (Operator-Grade) + +**Outcome:** Secrets are scoped and auditable; tool usage is traceable without leaking secrets into logs. + +**Work items** +- Add a “secret scope” mechanism for tools and skills: + - Tools declare `requiredSecretScopes[]` + - Skills declare `permissions.secrets[]` + - Resolution: only scoped secrets are provided to tool execution; no ambient env pass-through. +- Extend audit log events: + - tool start/end, policy decision, sandbox vs host, approval prompts, redactions applied + - stable ids for correlation per session +- Tests: + - Secrets never appear in logs (even on tool failure). + - Tool without declared scope cannot obtain secret. + +**Acceptance** +- You can answer “what ran, with what permissions, in what environment” for any session. + +--- + +### PR 5: Product Efficiency Layer (Minimal Surfaces, Max Habit) + +**Outcome:** Make Flynn feel “always there” without adding unnecessary new attack surface. + +**Work items** +- Tighten onboarding defaults: + - choose 2-3 “recommended surfaces” and ensure they are first-class (WebChat + one messaging channel). + - ensure pairing/auth flows are the default for unknown senders. +- Add “assistant ergonomics”: + - a small set of default automations that are safe (summaries, reminders) behind explicit opt-in. + - fast-path help for configuration gaps (already used for voice transcription, continue pattern). +- Tests: + - Setup wizard produces a config with safe defaults: pairing on, conservative tool profile, sandbox for high-risk. + +**Acceptance** +- A new user can get to a useful assistant quickly, and default config does not expose dangerous capabilities. + +## Sequencing + +1. PR 1 (capabilities) and PR 2 (sandbox enforcement) first: they set the trust boundary. +2. PR 3 (injection firewall) next: it hardens the most common real-world failure mode. +3. PR 4 (secrets + audit) to make the system operable and trustworthy long-term. +4. PR 5 (efficiency layer) last: UX improvements on top of hardened foundations. + +## Definition of Done (Milestone) + +- Skills are capability-declared and enforced by policy. +- High-risk tools default to sandbox execution. +- Untrusted content is isolated and cannot drive tool execution without policy-compliant intent + gating. +- Secrets are scoped and never leak into logs. +- New-user path yields a safe, useful personal assistant in under ~10 minutes on WebChat. + diff --git a/docs/plans/state.json b/docs/plans/state.json index 570b629..7c3a19c 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -4,6 +4,12 @@ "description": "Tracks the status of all Flynn plans and implementation phases", "plans": { + "openclaw-style-personal-agent-without-openclaw-risks": { + "file": "2026-02-14-openclaw-style-personal-agent-without-openclaw-risks-plan.md", + "status": "planned", + "date": "2026-02-14", + "summary": "Milestone plan to reach OpenClaw-style personal-assistant efficiency with a safer trust boundary: capability-declared skills, sandbox-by-default for high-risk tools, prompt-injection firewall, secret scoping, and audit logging." + }, "openclaw-feature-gap-analysis": { "file": "2026-02-06-openclaw-feature-gap-analysis.md", "status": "completed",