diff --git a/docs/plans/2026-02-18-openclaw-analysis.md b/docs/plans/2026-02-18-openclaw-analysis.md index 021de83..69bc944 100644 --- a/docs/plans/2026-02-18-openclaw-analysis.md +++ b/docs/plans/2026-02-18-openclaw-analysis.md @@ -1,221 +1,275 @@ --- -title: OpenClaw Strategic Analysis for Flynn +title: OpenClaw Strategic Analysis for Flynn (Phase 2 Synthesis) doc_type: strategy_analysis created: 2026-02-18 updated: 2026-02-18 -scope: why OpenClaw feels efficient as a personal assistant, and what Flynn should adopt next -supersedes: - - docs/plans/2026-02-06-openclaw-feature-gap-analysis.md +scope: source-backed synthesis of OpenClaw effectiveness patterns and a prioritized Flynn roadmap +complements: - docs/plans/analysis/openclaw-comparison.md +related: + - docs/plans/2026-02-06-openclaw-feature-gap-analysis.md sources: - https://github.com/openclaw/openclaw - - https://docs.openclaw.ai/llms.txt - https://docs.openclaw.ai/start/lore - https://docs.openclaw.ai/concepts/architecture - - https://docs.openclaw.ai/concepts/agent-loop - - https://docs.openclaw.ai/concepts/session - https://docs.openclaw.ai/concepts/queue - https://docs.openclaw.ai/concepts/streaming - https://docs.openclaw.ai/concepts/memory - https://docs.openclaw.ai/concepts/model-failover - - https://docs.openclaw.ai/tools/skills - - https://docs.openclaw.ai/start/wizard - - README.md - - src/channels/index.ts + - https://docs.openclaw.ai/concepts/agent-loop + - https://docs.openclaw.ai/tools/clawhub + - docs/plans/analysis/openclaw-comparison.md + - docs/plans/2026-02-06-openclaw-feature-gap-analysis.md - src/companion/runtimeClient.ts - - src/tools/policy.ts + - src/gateway/lane-queue.ts + - src/gateway/handlers/agent.ts + - src/gateway/session-bridge.ts + - src/models/router.ts + - src/context/compaction.ts --- -# OpenClaw Strategic Analysis for Flynn +# OpenClaw Strategic Analysis for Flynn (Phase 2) -## 1. Background: ClawdBot -> MoltBot -> OpenClaw +This document complements the canonical weighted comparison in `docs/plans/analysis/openclaw-comparison.md`. -OpenClaw, MoltBot, and ClawdBot refer to the same project lineage (branding evolution, not separate products). OpenClaw docs explicitly preserve this history in the lore/start documentation and position OpenClaw as the current identity. +Purpose: +- keep the Feb 12 scorecard as the baseline, +- add source-level findings that materially change roadmap priority, +- translate those findings into concrete Flynn implementation targets. -Strategic implication for Flynn: comparisons should treat these names as one continuous product strategy, not three separate benchmarks. +## 1) Background: ClawdBot -> MoltBot -> OpenClaw -## 2. What Makes OpenClaw Effective as a Personal Assistant +From OpenClaw lore documentation: +- Clawd/Clawdbot phase preceded January 2026. +- First molt on January 27, 2026 (rename away from Clawd naming after trademark pressure). +- Final rename to OpenClaw on January 30, 2026. -This section focuses on behavior and product dynamics, not just a feature checklist. +Interpretation for Flynn planning: +- OpenClaw/MoltBot/ClawdBot should be treated as one continuous product strategy and code lineage. +- The identity shifts were brand changes, not architecture resets. -### Principle 1: "Always there" presence +## 2) What Makes OpenClaw Efficient: 8 Mechanisms -OpenClaw emphasizes ambient availability across user surfaces. The practical effect is low-friction invocation: users do not need to open a specific app and re-establish context every time. +### 2.1 Unified multi-channel inbox -Why this matters: -- Reduces cognitive/context-switch overhead. -- Increases daily engagement frequency. +OpenClaw's gateway model centralizes many chat surfaces under one runtime. -### Principle 2: Proactive push, not only reactive chat +Why this improves assistant efficiency: +- less context switching, +- more opportunities for the assistant to be used in-place. -OpenClaw architecture and docs emphasize scheduled/event-driven agent behavior (cron, queue/session controls, streaming/event surfaces). The assistant can initiate useful updates instead of waiting for prompts. +### 2.2 Queue policy as a UX primitive -Why this matters: -- Personal assistants feel valuable when they surface information at the right moment. -- Proactive loops create compounding utility (briefings, alerts, follow-ups). +OpenClaw queue docs define behavior modes beyond plain FIFO, including `collect`, `followup`, `steer`, `steer-backlog`, and legacy `interrupt` semantics. -### Principle 3: Workflow-oriented execution with user control +Why this improves assistant efficiency: +- users can steer/reshape ongoing work without waiting for full completion, +- long-running turns feel controllable instead of blocking. -OpenClaw's agent-loop and queue/session model prioritize reliable multi-step execution with explicit control points. +### 2.3 Local-first gateway architecture -Why this matters: -- Multi-step operations are where assistants save real time. -- Human checkpoints preserve trust when actions are high-impact. +OpenClaw docs emphasize local gateway operation and user-controlled state. -### Principle 4: Ecosystem leverage (skills/community) +Why this improves assistant efficiency: +- trust and privacy increase willingness to connect more accounts/tools, +- lower friction for daily persistent use. -OpenClaw's skills posture and public ecosystem framing reduce integration bottlenecks by allowing capability growth outside core maintainers. +### 2.4 Streaming + chunking tuned for messaging surfaces -Why this matters: -- Ecosystem breadth often beats in-house implementation speed. -- Users get niche integrations without waiting for core releases. +OpenClaw streaming docs describe bounded chunking (`minChars`/`maxChars`), break-preference logic, and markdown/code-fence-safe splitting. -### Principle 5: Automation that can operate beyond API-only integrations +Why this improves assistant efficiency: +- faster perceived response, +- fewer formatting regressions during long outputs. -OpenClaw's workflow/tooling strategy includes browser-driven paths for non-API systems. +### 2.5 Companion/node + voice surfaces -Why this matters: -- Many real workflows are blocked by missing APIs. -- Browser-native automation unlocks "last mile" personal-assistant utility. +OpenClaw platform narrative strongly emphasizes ambient access (desktop/mobile/voice-facing surfaces). -### Principle 6: Memory designed for continuity +Why this improves assistant efficiency: +- the assistant is available in more contexts (hands-free/mobile), +- proactive behavior has a reliable delivery surface. -OpenClaw's memory framing is continuity-first: avoid repeated onboarding of the assistant to user preferences/projects. +### 2.6 Memory model optimized for continuity -Why this matters: -- A personal assistant that forgets details behaves like a stateless chatbot. -- Continuity directly affects user trust and perceived intelligence. +OpenClaw memory docs define a dual pattern: +- daily append-only logs (`memory/YYYY-MM-DD.md`), +- curated long-term memory (`MEMORY.md`). -## 3. Flynn Current State (Baseline + Present Capabilities) +Why this improves assistant efficiency: +- better day-to-day recall without conflating temporary and durable facts. -### 3.1 Baseline parity reference +### 2.7 Model failover with auth-profile rotation -The canonical checklist-based parity snapshot in `docs/plans/2026-02-06-openclaw-feature-gap-analysis.md` records: -- 101/128 matched features (79%) -- 27/128 missing features (21%) +OpenClaw model-failover docs show two-stage resilience: +- rotate auth profiles within provider first, +- then fallback across models/providers. -That baseline is still useful for trend tracking, but several entries are now stale versus current Flynn code/README (for example channel breadth and companion-node groundwork have expanded). +They also document session profile stickiness and exponential cooldown/disable behavior. -### 3.2 Where Flynn already matches or exceeds +Why this improves assistant efficiency: +- fewer hard failures under rate-limit/auth instability, +- better cache behavior from per-session pinning. -Flynn already has strong fundamentals and in several areas exceeds OpenClaw's documented posture: +### 2.8 Ecosystem + hook surface -- MCP integration depth (tool bridging + lifecycle): `src/mcp/*` -- Explicit multi-tier model routing and failover controls: `src/models/router.ts`, `src/daemon/models.ts` -- Fine-grained tool policy profiles/groups and per-context controls: `src/tools/policy.ts` -- Strong ops/automation primitives (cron, webhooks, heartbeat, backups, Gmail watcher): `src/automation/*` -- Broad channel adapter layer with consistent interfaces: `src/channels/index.ts` -- SQLite-backed session persistence and gateway session tooling: `src/session/*`, `src/gateway/*` +OpenClaw docs show a public skill ecosystem (ClawHub) and lifecycle hooks (`agent:bootstrap`, model/prompt/tool/message hooks). -### 3.3 Why Flynn still feels behind as a "personal assistant" +Why this improves assistant efficiency: +- ecosystem expands capability coverage faster than core-only development, +- hooks allow behavior shaping without forking core runtime. -The remaining delta is less about core engine quality and more about assistant product behavior: -- ambient presence, -- proactive delivery loops, -- workflow interaction model, -- ecosystem/network effects, -- visible day-to-day assistant ergonomics. +## 3) New Findings That Change Flynn Gap Priority -## 4. Prioritized Gap Table (What Actually Reduces Assistant Effectiveness) +## 3.1 Companion protocol gap is mostly client-side -| Gap | Type | Impact | Effort | Why it hurts assistant feel | -|---|---|---:|---:|---| -| Proactive announce/delivery mode as first-class behavior | Design pattern + feature | High | Medium | Keeps Flynn reactive by default | -| Voice output (TTS) across channels with voice input | Product behavior | High | Medium | Voice-in without voice-out feels incomplete | -| Event/reaction automation layer (pattern -> action) | Design pattern + feature | High | High | Limits autonomous "watch and act" behavior | -| Workflow approval gates (pause/resume with user consent) | Interaction model | High | Medium/High | Multi-step tasks lack robust human-in-loop checkpoints | -| Memory extraction cadence beyond compaction windows | Design pattern | Medium | Low/Medium | Important context is captured late or inconsistently | -| Registry-backed skill discovery UX | Ecosystem | Medium | Medium | Limits capability growth velocity | -| Companion/PWA push surface maturity | Product surface | Medium | Medium/High | Reduces always-on presence and proactive reach | +Flynn already has substantial gateway/node protocol support: +- node registration/capabilities/status/location/push-token, +- canvas artifact RPCs, +- typed runtime client and event subscriptions. -## 5. Recommendations (Tier A / B / C) +Evidence: +- `src/companion/runtimeClient.ts` +- `src/gateway/protocol.ts` +- `src/gateway/server.ts` -## Tier A (Next implementation wave) +Conclusion: +- "No companion" is primarily a shipped-client-product gap, not a missing server protocol foundation. -### A1. Proactive Announce Mode +## 3.2 Queue modes: naming parity exists, runtime semantics are partial -Implement a first-class `announce` delivery pattern for automation jobs so Flynn can push outbound updates without requiring an inbound conversational trigger. +Validated in Flynn code: +- Queue mode enum includes `collect`, `followup`, `steer`, `steer_backlog`, `interrupt` (`src/gateway/lane-queue.ts`). +- `agent.cancel` explicitly cancels queued work and requests active-run cancellation (`src/gateway/handlers/agent.ts`, `src/gateway/session-bridge.ts`). + +Critical nuance: +- In-lane `interrupt` mode currently rejects queued entries but does not itself abort already-running active work. +- `LaneQueue.cancel` explicitly states active work is not interrupted. + +Conclusion: +- Flynn has strong queue controls, but OpenClaw-style "interrupt current run immediately on new message" behavior is only partially represented unless paired with explicit cancel flows. + +## 3.3 Daily memory log pattern is a low-effort, high-impact add + +Validated in Flynn code: +- auto extraction is currently tied to compaction flow (`src/context/compaction.ts`), not a first-class daily-log convention. + +Conclusion: +- a `memory/YYYY-MM-DD`-style append path can improve continuity without architectural upheaval. + +## 3.4 Auth-profile rotation is a meaningful resilience gap + +Validated in Flynn code: +- router failover is client/provider-level (`src/models/router.ts`) with tier and fallback chains, +- no equivalent first-class per-provider profile rotation/stickiness layer. + +Conclusion: +- Flynn can gain robustness by adding profile-level key/token rotation before cross-provider fallback. + +## 4) Flynn Current State: Leads vs Lags + +### Where Flynn leads or is highly competitive + +- MCP integration depth and bridge model. +- Multi-tier routing controls and explicit tool policy system. +- Strong automation/ops primitives (cron/webhook/heartbeat/backup). +- Wide channel support in current adapters. +- SQLite session persistence and gateway observability. + +### Where Flynn still lags on "assistant feel" + +- default proactive delivery behavior and ambient surfaces, +- interaction-level control for in-flight runs (steer/interrupt semantics), +- continuity ergonomics (daily memory capture patterns), +- profile-level auth failover resilience, +- ecosystem-network effects (public skill discovery/install loops). + +## 5) Prioritized Roadmap (Tier A/B/C) + +## Tier A: high impact, feasible next + +### A1. Queue interrupt/steer execution semantics hardening + +Goal: +- make queue modes behaviorally match their names, including active-run interrupt semantics. + +Implementation anchors: +- `src/gateway/lane-queue.ts` +- `src/gateway/handlers/agent.ts` +- `src/gateway/session-bridge.ts` +- `src/backends/native/agent.ts` + +### A2. Daily memory logs + proactive extraction cadence + +Goal: +- add daily append memory path and post-task extraction path (not only compaction-time extraction). + +Implementation anchors: +- `src/context/compaction.ts` +- `src/memory/store.ts` +- `src/tools/builtin/memory-write.ts` +- `src/backends/native/orchestrator.ts` + +### A3. Proactive announce delivery mode + +Goal: +- first-class outbound push mode for automation jobs that do not depend on active chat turns. Implementation anchors: - `src/automation/cron.ts` - `src/automation/webhooks.ts` - `src/config/schema.ts` -- channel adapters for explicit "notification-style" delivery behavior +- relevant channel adapters in `src/channels/*` -### A2. Voice Output (TTS) +### A4. TTS voice output -Add configurable TTS pipeline and channel-aware voice response policy. +Goal: +- make voice interaction bidirectional on channels that support audio output. Implementation anchors: -- new `tts` config block in `src/config/schema.ts` -- voice renderer service + adapter integration (`src/channels/*`) -- per-session/command-level toggle for voice output strategy +- `src/config/schema.ts` (new `tts` block) +- `src/tools/builtin/*` (provider integration surface) +- channel adapters in `src/channels/*` -### A3. Proactive Memory Quality Loop +### A5. Auth profile rotation before provider fallback -Add lightweight post-task extraction and daily memory journaling in addition to current compaction-based extraction. +Goal: +- support multi-profile credentials per provider with session stickiness and cooldowns. Implementation anchors: -- `src/memory/*` -- `src/context/compaction.ts` -- tooling hooks around tool-heavy exchanges in `src/backends/native/*` +- `src/models/router.ts` +- `src/daemon/models.ts` +- `src/config/schema.ts` +- auth store modules in `src/auth/*` -### A4. Reactions/Event Automation +## Tier B: meaningful medium-scope improvements -Add declarative event-to-action rules for reactive automation that is not purely schedule-based. +- Guided onboarding upgrades in `src/cli/setup/*` (channel-specific test loops + safer defaults). +- Minimal companion client (macOS first) using existing gateway protocol. +- Safety preset packs for personal-assistant mode (pairing/tool profile/sandbox defaults). +- Registry-backed skill discovery UX via existing skills framework (`src/skills/*`). +- Chunking quality upgrade in `src/channels/utils.ts` toward paragraph/sentence/code-fence-aware splitting. -Implementation anchors: -- extend `src/automation/*` with reactions engine -- config schema for reaction rules -- audit visibility for reaction triggers/actions +## Tier C: defer / large scope -## Tier B (High value, moderate scope) +- Full native companion suite (iOS/Android parity). +- Full canvas-first UX expansion beyond current artifact API. +- Marketplace-scale ClawHub-equivalent infrastructure. +- Advanced always-on wake-word runtime across platforms. -### B1. Skill Discovery/Registry Index +## 6) Updated Takeaway Since Feb 12 Comparison -Build a registry-backed discovery and install UX for skills (CLI + in-chat exposure), leveraging existing Flynn skill scaffolding. +What changed versus `docs/plans/analysis/openclaw-comparison.md`: +- Companion gap is narrower than previously scored on backend foundations (protocol already present). +- Queue control gap is now better understood: Flynn has mode vocabulary, but semantics need tightening for true interrupt/steer behavior. +- Memory and failover priorities shift upward because they are high-leverage and relatively contained in scope. -### B2. Workflow Approval Gates +Practical recommendation: +- prioritize Tier A behavior-layer upgrades before chasing long-tail parity items. +- this path improves "personal assistant effectiveness" faster than broad surface-area expansion. -Extend existing hooks/autonomy model to support durable await-approval checkpoints in long-running workflows. +## Evidence and Confidence Notes -### B3. PWA Push for WebChat - -Add service worker + push notifications for WebChat to create a lightweight always-on surface before full native companions. - -## Tier C (Defer unless strategic priority changes) - -- Full native companion apps (macOS/iOS/Android) -- Rich canvas-first workspace UX expansion -- Typed workflow runtime on Lobster-like scope -- Marketplace-scale public skill ecosystem infrastructure - -## 6. Updated Scorecard: The 21% Gap That Matters - -The historical 21% "missing" set is not equally important. Strategic weighting for personal-assistant effectiveness: - -| Gap bucket | Share of checklist gap | User-impact weight | -|---|---:|---:| -| Always-on/proactive behavior (announce, reactions, push) | Medium | Very High | -| Workflow interaction quality (approval gates, pause/resume) | Small/Medium | High | -| Voice/ambient UX (TTS + surfaced presence) | Small/Medium | High | -| Companion surfaces | Medium | Medium/High | -| Ecosystem scale (skill registry/network effects) | Medium | Medium | -| Long-tail parity items (additional providers/channels) | Medium | Low/Medium | - -Conclusion: -- Flynn can materially close the "assistant feel" gap without full OpenClaw parity. -- The highest ROI is behavior-layer upgrades (proactive + workflow + voice + memory cadence), not another broad feature sweep. - -## Implementation Guidance for Follow-on Plans - -When converting Tier A items into build plans, require each proposal to include: -- explicit config schema and migration/backward compatibility strategy, -- audit/observability events, -- failure mode handling (queue pressure, retries, idempotency), -- security posture (pairing, confirmation hooks, sandbox/elevation interactions), -- user-facing UX acceptance criteria ("assistant feel" outcomes, not only API behavior). - -## Notes on Evidence Quality - -This document prioritizes official OpenClaw docs/repo and Flynn code/docs. External press/community claims (for example exact ecosystem-size numbers reported by third parties) should be treated as non-authoritative unless mirrored in official project channels. +- High confidence: findings directly validated in Flynn source files listed above. +- High confidence: OpenClaw concepts drawn from official docs pages under `docs.openclaw.ai`. +- Caution: external media/community metrics (for example exact ecosystem-size counts) change quickly and should not drive core roadmap priority without official-source confirmation. diff --git a/docs/plans/state.json b/docs/plans/state.json index e11bc39..745cc52 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -5157,7 +5157,7 @@ "status": "completed", "date": "2026-02-18", "updated": "2026-02-18", - "summary": "Added a standalone strategic analysis document comparing Flynn with OpenClaw beyond raw feature parity, including naming lineage clarification (ClawdBot -> MoltBot -> OpenClaw), six personal-assistant effectiveness principles, prioritized design/feature gaps, and a Tier A/B/C recommendation stack for Flynn.", + "summary": "Expanded the OpenClaw strategic analysis into a Phase-2 synthesis document that explicitly complements the canonical weighted comparison, captures eight documented OpenClaw efficiency mechanisms, and adds code-validated Flynn findings (companion protocol readiness, queue interrupt semantics gap, memory cadence gap, auth-profile rotation gap) with a concrete Tier A/B/C roadmap and file-level implementation anchors.", "files_modified": [ "docs/plans/2026-02-18-openclaw-analysis.md", "docs/plans/state.json"