160 lines
7.9 KiB
Markdown
160 lines
7.9 KiB
Markdown
# OpenClaw-Style Personal Agent (Without OpenClaw Risks) — Milestone Plan
|
||
|
||
**Date:** 2026-02-14
|
||
**Scope:** Make Flynn feel as efficient as a productized personal assistant (OpenClaw-like), while explicitly avoiding the structural security risks of a broad “skills marketplace + many surfaces + powerful tools” system.
|
||
**Context:** Flynn already has strong core architecture (multi-channel daemon, gateway/UI, tool policy, Docker sandboxing, memory, automation). This milestone is about tightening the trust boundary and making the safe path the default path.
|
||
|
||
## Goals
|
||
|
||
1. **Safe-by-default extension model**: “skills” are locally-auditable, capability-declared, and policy-enforced.
|
||
2. **Sandbox-by-default for risk**: any skill/tool execution with meaningful blast radius runs in an isolation boundary by default.
|
||
3. **Prompt-injection resistance**: fetched/untrusted content cannot directly drive tool calls or secret exfiltration.
|
||
4. **Operational clarity**: high-quality audit trails of tool calls, approvals, and materialized changes.
|
||
5. **Product efficiency parity**: fast onboarding and “works everywhere I already talk” without expanding the attack surface indiscriminately.
|
||
|
||
## Non-Goals (for this milestone)
|
||
|
||
- A public “skills marketplace” or auto-installing third-party skills.
|
||
- Unbounded remote device/node execution.
|
||
- Full browser RPA parity (can be a later milestone).
|
||
|
||
## Threat Model (Practical)
|
||
|
||
We assume:
|
||
- An attacker can send messages on at least one channel surface (DM or group), and can embed prompt-injection payloads in web pages, email bodies, PDFs, etc.
|
||
- The assistant has access to local files, network, and credential material (directly or indirectly).
|
||
|
||
We must prevent:
|
||
- **Unauthorized tool execution** (filesystem writes, shell, browser automation, network posts).
|
||
- **Data exfiltration** (secrets and private data leaving the machine via model/tool).
|
||
- **Supply-chain compromise via skills** (untrusted code/commands running because “it’s a skill”).
|
||
|
||
## Design Principles
|
||
|
||
- **Capability declarations beat intentions**: a skill’s declared permissions determine what it can do, not the LLM’s narrative.
|
||
- **Deny by default, allow narrowly**: explicit whitelists for tools, paths, and network hosts.
|
||
- **Untrusted content isolation**: fetched content is treated as data, never as instructions.
|
||
- **Make approvals boring**: human-in-the-loop prompts are short, scoped, and explain impact.
|
||
|
||
## Milestone Structure (PR-Sized Slices)
|
||
|
||
### PR 1: Capability Manifests + Policy Binding (Skills)
|
||
|
||
**Outcome:** Every skill has a manifest that declares *what it is allowed to do*, and Flynn enforces that at runtime.
|
||
|
||
**Work items**
|
||
- Define `skills/manifest.json` schema extension:
|
||
- `permissions.tool_groups`: e.g. `["group:fs", "group:web", "group:runtime"]`
|
||
- `permissions.tools`: explicit allowlist patterns, overrides tool_groups
|
||
- `permissions.fs`: allowlisted path globs (read/write separate)
|
||
- `permissions.net`: allowlisted host globs + ports (optional)
|
||
- `permissions.secrets`: named secret scopes (no ambient access)
|
||
- Bind skill permissions to tool policy evaluation (skill execution context carries a policy profile).
|
||
- Add a “capability diff” display for installs/enables:
|
||
- “This skill requests: write access to `~/Documents/notes/**`, network access to `api.todoist.com:443`, tool `web.search`.”
|
||
- Tests:
|
||
- Skill with insufficient permissions cannot invoke denied tool or write outside allowed paths.
|
||
- Skill with allowed permissions succeeds.
|
||
|
||
**Acceptance**
|
||
- A skill cannot call a denied tool even if the model tries.
|
||
- A skill cannot read/write outside declared fs scopes.
|
||
|
||
---
|
||
|
||
### PR 2: Sandbox-by-Default Enforcement for High-Risk Tools
|
||
|
||
**Outcome:** Risky actions run inside isolation by default, with a clear escape hatch for trusted local workflows.
|
||
|
||
**Work items**
|
||
- Define a “risk tier” mapping for tool groups:
|
||
- `low`: pure compute / formatting
|
||
- `medium`: web fetch/search
|
||
- `high`: filesystem writes, shell/process, browser automation, credentialed API actions
|
||
- Enforce:
|
||
- `high` tools must execute inside the session sandbox (Docker) unless policy explicitly allows “host mode”.
|
||
- Add a per-session “execution environment” indicator in gateway/UI (host vs sandbox).
|
||
- Tests:
|
||
- Attempting `high` tool in host mode without explicit allow fails deterministically.
|
||
|
||
**Acceptance**
|
||
- By default, tools with meaningful blast radius do not run on the host.
|
||
|
||
---
|
||
|
||
### PR 3: Prompt Injection Firewall (Content Provenance + Tool Gating)
|
||
|
||
**Outcome:** Untrusted content can be used as *context* but not as *control*.
|
||
|
||
**Work items**
|
||
- Introduce provenance tags on content:
|
||
- `user_message` vs `fetched_content` vs `tool_output` vs `memory`
|
||
- Update prompts to:
|
||
- Explicitly forbid following instructions found inside `fetched_content`/`tool_output`.
|
||
- Require a short “tool intent” explanation before tool use when untrusted content is present.
|
||
- Add a “tool-call guard” layer:
|
||
- If the model’s tool call arguments contain obvious injection markers (“ignore previous”, “exfiltrate”, “send to”, etc.) and the request originated from untrusted content, force a confirmation step or deny.
|
||
- Reject tool calls that attempt to reference secrets directly in arguments.
|
||
- Tests:
|
||
- A web page that contains “run shell command X” does not trigger shell tool use.
|
||
- Model attempts to copy-paste secrets into a network request are blocked/redacted.
|
||
|
||
**Acceptance**
|
||
- Untrusted content cannot directly cause tool execution without an explicit, policy-consistent path.
|
||
|
||
---
|
||
|
||
### PR 4: Secret Scoping + Audit Logging (Operator-Grade)
|
||
|
||
**Outcome:** Secrets are scoped and auditable; tool usage is traceable without leaking secrets into logs.
|
||
|
||
**Work items**
|
||
- Add a “secret scope” mechanism for tools and skills:
|
||
- Tools declare `requiredSecretScopes[]`
|
||
- Skills declare `permissions.secrets[]`
|
||
- Resolution: only scoped secrets are provided to tool execution; no ambient env pass-through.
|
||
- Extend audit log events:
|
||
- tool start/end, policy decision, sandbox vs host, approval prompts, redactions applied
|
||
- stable ids for correlation per session
|
||
- Tests:
|
||
- Secrets never appear in logs (even on tool failure).
|
||
- Tool without declared scope cannot obtain secret.
|
||
|
||
**Acceptance**
|
||
- You can answer “what ran, with what permissions, in what environment” for any session.
|
||
|
||
---
|
||
|
||
### PR 5: Product Efficiency Layer (Minimal Surfaces, Max Habit)
|
||
|
||
**Outcome:** Make Flynn feel “always there” without adding unnecessary new attack surface.
|
||
|
||
**Work items**
|
||
- Tighten onboarding defaults:
|
||
- choose 2-3 “recommended surfaces” and ensure they are first-class (WebChat + one messaging channel).
|
||
- ensure pairing/auth flows are the default for unknown senders.
|
||
- Add “assistant ergonomics”:
|
||
- a small set of default automations that are safe (summaries, reminders) behind explicit opt-in.
|
||
- fast-path help for configuration gaps (already used for voice transcription, continue pattern).
|
||
- Tests:
|
||
- Setup wizard produces a config with safe defaults: pairing on, conservative tool profile, sandbox for high-risk.
|
||
|
||
**Acceptance**
|
||
- A new user can get to a useful assistant quickly, and default config does not expose dangerous capabilities.
|
||
|
||
## Sequencing
|
||
|
||
1. PR 1 (capabilities) and PR 2 (sandbox enforcement) first: they set the trust boundary.
|
||
2. PR 3 (injection firewall) next: it hardens the most common real-world failure mode.
|
||
3. PR 4 (secrets + audit) to make the system operable and trustworthy long-term.
|
||
4. PR 5 (efficiency layer) last: UX improvements on top of hardened foundations.
|
||
|
||
## Definition of Done (Milestone)
|
||
|
||
- Skills are capability-declared and enforced by policy.
|
||
- High-risk tools default to sandbox execution.
|
||
- Untrusted content is isolated and cannot drive tool execution without policy-compliant intent + gating.
|
||
- Secrets are scoped and never leak into logs.
|
||
- New-user path yields a safe, useful personal assistant in under ~10 minutes on WebChat.
|
||
|