feat(subagents): add multi-turn subagent session runtime

2026-02-26 13:07:34 -08:00
parent e887c3c964
commit 2171346116
21 changed files with 1111 additions and 12 deletions
@@ -20,6 +20,7 @@ The following were previously treated as gaps but are already implemented in Fly
 2. Voice UX is functional but not yet a polished, end-to-end daily-driver experience across surfaces.
 3. Browser tools exist but lack task-level reliability primitives (checkpoints/retries/guardrails) for autonomous workflows.
 4. Onboarding lacks a "first success" guided path that validates real integrations live during setup.
+5. Subagent sessions are now available (`subagent.*`) but need lifecycle hardening (TTL/budgeting/UI visibility) for larger autonomous workflows.

 ## Product Goal

@@ -0,0 +1,51 @@
+# Subagents Support Plan (Flynn)
+
+Date: 2026-02-26  
+Status: phase 1 implemented  
+Scope: add OpenClaw-style multi-turn subagent session support in Flynn without changing channel surface scope (Telegram-first)
+
+## Constraints
+
+1. Keep channel scope unchanged (Telegram remains default for now).
+2. Deliver subagent capability through the existing native tool loop.
+3. Keep gateway protocol additive-only (no new JSON-RPC methods required).
+
+## Phase 1 (Implemented in this change)
+
+1. Added subagent runtime manager (`src/backends/native/subagents.ts`) that can:
+   - spawn child sessions,
+   - send follow-up turns,
+   - list active child sessions,
+   - cancel in-flight child runs,
+   - delete child sessions.
+2. Added new tools:
+   - `subagent.spawn`
+   - `subagent.send`
+   - `subagent.list`
+   - `subagent.cancel`
+   - `subagent.delete`
+3. Wired tools into per-session router orchestration (`src/daemon/routing.ts`).
+4. Added config guardrails under `agents.subagents`:
+   - `enabled`
+   - `max_active_sessions`
+5. Added policy/profile support so `subagent.*` is controlled through `group:agents` and tool profiles.
+
+## Phase 2 (Next)
+
+1. Add per-subagent TTL/idle eviction and auto-cleanup metrics.
+2. Add optional transcript export/summarization (`subagent.summary`).
+3. Add per-subagent tool-profile override (read-only by default for risky workloads).
+4. Add parent-child trace IDs in audit events for easier debugging.
+
+## Phase 3 (Stretch)
+
+1. Add queue semantics for child sessions (`followup` vs `interrupt` per subagent).
+2. Add explicit resource budgets (token/time) per child session.
+3. Add UI affordances in gateway chat for subagent session inspection.
+
+## Acceptance Criteria (Phase 1)
+
+1. Parent agent can spawn and continue a child subagent across multiple turns.
+2. Child session state is isolated and delete clears history.
+3. Recursion tooling (`agent.delegate`, `council.run`, `subagent.*`) is removed from child registries.
+4. Tests cover manager lifecycle, tool behavior, config parsing, and policy profile inclusion.
@@ -6795,10 +6795,40 @@
        "docs/plans/state.json"
      ],
      "test_status": "planning/docs update only; no runtime code changes"
+    },
+    "subagents-support-phase1": {
+      "status": "completed",
+      "date": "2026-02-26",
+      "updated": "2026-02-26",
+      "summary": "Implemented Phase 1 subagent support: added a SubagentManager with multi-turn child sessions, new `subagent.*` tools (spawn/send/list/cancel/delete), routing wiring, config guardrails, policy/profile integration, docs/diagram updates, and focused test coverage.",
+      "files_modified": [
+        "src/backends/native/subagents.ts",
+        "src/backends/native/subagents.test.ts",
+        "src/backends/native/index.ts",
+        "src/backends/index.ts",
+        "src/tools/builtin/subagents.ts",
+        "src/tools/builtin/subagents.test.ts",
+        "src/tools/builtin/index.ts",
+        "src/tools/index.ts",
+        "src/tools/policy.ts",
+        "src/tools/policy.test.ts",
+        "src/config/schema.ts",
+        "src/config/schema.test.ts",
+        "src/daemon/routing.ts",
+        "config/default.yaml",
+        "README.md",
+        "docs/api/PROTOCOL.md",
+        "docs/architecture/AGENT_DIAGRAM.md",
+        "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md",
+        "docs/plans/2026-02-26-subagents-support-plan.md",
+        "docs/plans/2026-02-26-personal-assistant-productization-plan.md",
+        "docs/plans/state.json"
+      ],
+      "test_status": "pnpm test:run src/backends/native/subagents.test.ts src/tools/builtin/subagents.test.ts src/tools/policy.test.ts src/config/schema.test.ts passing"
    }
  },
  "overall_progress": {
-    "total_test_count": 2525,
+    "total_test_count": 2531,
    "all_tests_passing": true,
    "p0_completion": "3/3 (100%)",
    "p1_completion": "4/4 (100%)",
@@ -6813,7 +6843,7 @@
    "tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor",
    "tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings",
    "tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes",
-    "feature_gap_scorecard": "rebaselined 2026-02-26 — channel breadth, setup wizard, and baseline browser automation are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, browser workflow reliability primitives, and first-success onboarding funnel optimization.",
+    "feature_gap_scorecard": "rebaselined 2026-02-26 — channel breadth, setup wizard, baseline browser automation, and phase-1 multi-turn subagent sessions (`subagent.*`) are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, browser workflow reliability primitives, and first-success onboarding funnel optimization.",
    "operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done",
    "dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling",
    "gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram",
@@ -6846,7 +6876,8 @@
    "deeper_surfaces_phase3_companion_canvas_voice": "completed \u2014 companion reconnect resilience (auto-reconnect with backoff, pending-wait cancellation on disconnect), canvas artifact persistence (SQLite-backed store, daemon-restart durability), voice TTS fallback coverage (text-only reply on TTS failure, no dropped responses)",
    "deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized",
    "post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)",
-    "personal_assistant_productization_plan": "proposed \u2014 8-10 week phased roadmap defined (companion MVP surfaces, voice reliability hardening, browser workflow reliability layer, onboarding 2.0 first-success funnel) with measurable exit gates."
+    "personal_assistant_productization_plan": "proposed \u2014 8-10 week phased roadmap defined (companion MVP surfaces, voice reliability hardening, browser workflow reliability layer, onboarding 2.0 first-success funnel) with measurable exit gates.",
+    "subagents_support": "completed \u2014 phase-1 subagent runtime support added with `subagent.spawn/send/list/cancel/delete`, per-parent child-session orchestration, config guardrails (`agents.subagents.*`), and focused regression tests."
  },
  "soul_md_and_cron_create": {
    "date": "2026-02-11",