docs(plans): add personal assistant productization roadmap

2026-02-26 12:45:06 -08:00
parent adcef2249b
commit e887c3c964
2 changed files with 191 additions and 3 deletions
@@ -0,0 +1,176 @@
+# Flynn Personal Assistant Productization Plan (Post-Gap Rebaseline)
+
+Date: 2026-02-26  
+Status: proposed roadmap  
+Scope: ship the remaining product-layer capabilities that make Flynn feel like a daily autonomous personal assistant
+
+## Rebaseline (What Is Already Done)
+
+The following were previously treated as gaps but are already implemented in Flynn:
+
+1. Broader channels are present (`src/channels/index.ts` includes Signal, Matrix, Teams, BlueBubbles, LINE, Feishu, Zalo, Mattermost, Google Chat).
+2. Guided onboarding is present (`flynn setup` and `flynn onboard` in `src/cli/setup.ts` and `src/cli/onboard.ts`).
+3. Browser automation baseline is present (`browser.navigate/click/type/screenshot/content/eval` in `src/tools/builtin/browser/tools.ts`).
+4. Companion protocol/runtime foundation is present (`src/companion/runtimeClient.ts`, `src/companion/platformClients.ts`).
+5. Talk mode + wake phrase baseline is present (`src/daemon/routing.ts`, `audio.talk_mode` schema support).
+
+## Remaining Product Gaps (Now)
+
+1. No shipped end-user companion apps (desktop/mobile) despite protocol readiness.
+2. Voice UX is functional but not yet a polished, end-to-end daily-driver experience across surfaces.
+3. Browser tools exist but lack task-level reliability primitives (checkpoints/retries/guardrails) for autonomous workflows.
+4. Onboarding lacks a "first success" guided path that validates real integrations live during setup.
+
+## Product Goal
+
+Within 8-10 weeks, ship a stable "Personal Assistant Mode" that supports:
+
+1. Always-available access from primary user surfaces (desktop + mobile companion path).
+2. Reliable hands-free capture/reply loops with graceful degradation.
+3. Safe browser-assisted task execution for common personal workflows.
+4. First-run setup that gets a new user from zero to first successful automated task in under 15 minutes.
+
+## Success Metrics
+
+1. `activation_rate_7d` (setup complete -> first automated action within 7 days): >= 65%.
+2. Companion reconnect success in soak: >= 99.5%.
+3. Voice turn completion (reply delivered as audio or text fallback): >= 99.9%.
+4. Browser workflow completion on canonical scripts: >= 90% with explicit failure reasons.
+5. Time-to-first-successful-automation from `flynn setup`: median <= 15 minutes.
+
+## Roadmap
+
+## Phase 1 (Weeks 1-2): Companion MVP Surfaces (Desktop First)
+
+### Deliverables
+
+1. Ship a macOS menu-bar companion reference app using existing runtime protocol.
+2. Ship a minimal mobile companion shell (iOS + Android) for registration, status, push token, and message handoff.
+3. Add signed release artifacts and installation docs.
+
+### Implementation Anchors
+
+1. `src/companion/runtimeClient.ts`
+2. `src/companion/platformClients.ts`
+3. `src/cli/companion.ts`
+4. `src/gateway/handlers/node.ts`
+5. `docs/api/PROTOCOL.md`
+
+### Tests
+
+1. Extend `src/companion/platformClients.integration.test.ts` for reconnect, background wake, and token refresh flows.
+2. Add end-to-end gateway fixture tests for node lifecycle transitions and push-token updates.
+
+### Exit Criteria
+
+1. A user can install companion, pair, receive assistant response notifications, and reopen after disconnect without manual repair.
+
+## Phase 2 (Weeks 3-4): Voice Daily-Driver Reliability
+
+### Deliverables
+
+1. Unify wake/talk controls across TUI, gateway UI, and companion surfaces.
+2. Add robust TTS provider fallback policy with per-provider health tracking.
+3. Add interruption-safe voice run control (cancel/replace behavior) consistent with text runs.
+
+### Implementation Anchors
+
+1. `src/daemon/routing.ts`
+2. `src/models/tts.ts`
+3. `src/gateway/handlers/agent.ts`
+4. `src/gateway/protocol.ts`
+5. `src/gateway/ui/pages/chat.js`
+
+### Tests
+
+1. Expand `src/daemon/routing.test.ts` with concurrent voice/text interruption cases.
+2. Add provider-failure matrix tests for TTS fallback.
+3. Add protocol/UI tests for voice run-state rendering.
+
+### Exit Criteria
+
+1. No dropped assistant replies when voice synthesis fails; response falls back to text deterministically.
+
+## Phase 3 (Weeks 5-7): Browser Task Automation Reliability Layer
+
+### Deliverables
+
+1. Add browser workflow primitives: `browser.wait_for`, `browser.assert`, `browser.extract`, and retry wrappers.
+2. Add task checkpoints with resumable execution state for long workflows.
+3. Add guardrails: domain allowlists, explicit high-risk confirmation hooks, and bounded execution budgets.
+
+### Implementation Anchors
+
+1. `src/tools/builtin/browser/tools.ts`
+2. `src/tools/executor.ts`
+3. `src/tools/policy.ts`
+4. `src/config/schema.ts`
+5. `src/hooks/*`
+
+### Tests
+
+1. Add deterministic browser tool tests for retry/checkpoint/error classification.
+2. Add policy tests for domain budget/confirm behavior.
+3. Add integration tests that replay canonical user tasks (form fill, booking-like flow, account portal navigation).
+
+### Exit Criteria
+
+1. Canonical browser workflows pass at >= 90% in CI replay suite with auditable failures.
+
+## Phase 4 (Weeks 8-10): Onboarding 2.0 + First-Success Funnel
+
+### Deliverables
+
+1. Add "Personal Assistant Mode" wizard preset focused on practical defaults.
+2. Add live connectivity checks during setup (model, channel, memory, automation).
+3. Add a post-setup guided first task that confirms end-to-end assistant operation.
+
+### Implementation Anchors
+
+1. `src/cli/setup.ts`
+2. `src/cli/onboard.ts`
+3. `src/cli/doctor.ts`
+4. `README.md`
+5. `docs/architecture/AGENT_DIAGRAM.md`
+6. `docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md`
+7. `docs/api/PROTOCOL.md`
+
+### Tests
+
+1. Extend `src/cli/setup/integration.test.ts` with live-check and first-task paths.
+2. Add regression coverage for failed-check remediation prompts.
+
+### Exit Criteria
+
+1. New user reaches first successful automated task from clean install in median <= 15 minutes.
+
+## Cross-Cutting Controls (All Phases)
+
+1. Feature flags for each phase with canary rollout and rollback paths.
+2. Audit event coverage for all new autonomous/voice/browser behaviors.
+3. Every implementation PR must include tests, docs, and `docs/plans/state.json` updates.
+4. Diagram review/update is mandatory for code changes affecting flow semantics.
+
+## Execution Order and Parallelization
+
+1. Run Phase 1 and Phase 2 in parallel after finalizing shared run-state contracts.
+2. Start Phase 3 once Phase 2 interruption semantics are stable.
+3. Start Phase 4 onboarding work in parallel with late Phase 3 once APIs stabilize.
+
+## Top Risks and Mitigations
+
+1. Risk: companion client reliability drifts from gateway contract.  
+Mitigation: contract tests pinned to `docs/api/PROTOCOL.md` event schema.
+2. Risk: voice experience appears flaky under provider variability.  
+Mitigation: deterministic fallback policy + provider health scoring + explicit user-visible degrade messaging.
+3. Risk: browser autonomy creates brittle flows.  
+Mitigation: checkpoint/retry primitives, strict policy defaults, and explicit risk confirmation hooks.
+4. Risk: roadmap spread too wide.  
+Mitigation: desktop-first companion scope, canonical-task suite, and hard phase exit gates before expansion.
+
+## Definition of Done (Roadmap Complete)
+
+1. Companion desktop and mobile shells are shippable with documented install/run paths.
+2. Voice and text paths share deterministic run-control semantics and pass reliability gates.
+3. Browser automations run through a resilient workflow layer, not raw primitive chaining only.
+4. Onboarding produces measurable first-success outcomes and reduced drop-off.
@@ -1,6 +1,6 @@
 {
  "version": "1.0",
-  "updated_at": "2026-02-25",
+  "updated_at": "2026-02-26",
  "description": "Tracks the status of all Flynn plans and implementation phases",
  "plans": {
    "phase0-ticket-0.1-audit-schema-extension": {
@@ -6784,6 +6784,17 @@
        "docs/plans/state.json"
      ],
      "test_status": "docs only"
+    },
+    "personal-assistant-productization-plan-2026-02-26": {
+      "status": "proposed",
+      "date": "2026-02-26",
+      "updated": "2026-02-26",
+      "summary": "Rebaselined Flynn's OpenClaw-style personal-assistant gaps and defined an execution-ready 8-10 week productization roadmap focused on shipped companion apps, voice daily-driver reliability, browser workflow reliability, and onboarding first-success funnel metrics.",
+      "files_modified": [
+        "docs/plans/2026-02-26-personal-assistant-productization-plan.md",
+        "docs/plans/state.json"
+      ],
+      "test_status": "planning/docs update only; no runtime code changes"
    }
  },
  "overall_progress": {
@@ -6802,7 +6813,7 @@
    "tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor",
    "tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings",
    "tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes",
-    "feature_gap_scorecard": "~114/130 match (~88%), 1 partial (~1%), ~14 missing (~11%) — deferred: channel adapters (Matrix, Zalo, LINE, Feishu, Mattermost), providers (MiniMax/Moonshot, Vercel AI Gateway, OAuth subscription auth), native companion apps (macOS menu bar, iOS node, Android node, location access), ecosystem (ClawHub registry), infra (Bonjour/mDNS discovery), features (skill/plugin safety scanner, elevated mode)",
+    "feature_gap_scorecard": "rebaselined 2026-02-26 — channel breadth, setup wizard, and baseline browser automation are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, browser workflow reliability primitives, and first-success onboarding funnel optimization.",
    "operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done",
    "dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling",
    "gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram",
@@ -6834,7 +6845,8 @@
    "deeper_surfaces_phase2_reactions_v2": "completed \u2014 reaction engine now uses priority/cooldown with non-blocking rules, recursion guard, and routing-level cooldown skip logging",
    "deeper_surfaces_phase3_companion_canvas_voice": "completed \u2014 companion reconnect resilience (auto-reconnect with backoff, pending-wait cancellation on disconnect), canvas artifact persistence (SQLite-backed store, daemon-restart durability), voice TTS fallback coverage (text-only reply on TTS failure, no dropped responses)",
    "deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized",
-    "post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)"
+    "post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)",
+    "personal_assistant_productization_plan": "proposed \u2014 8-10 week phased roadmap defined (companion MVP surfaces, voice reliability hardening, browser workflow reliability layer, onboarding 2.0 first-success funnel) with measurable exit gates."
  },
  "soul_md_and_cron_create": {
    "date": "2026-02-11",