feat(subagents): add idle ttl cleanup and summary tool

2026-02-26 13:12:53 -08:00
parent 2171346116
commit b679261683
17 changed files with 226 additions and 15 deletions
@@ -41,7 +41,7 @@ The gateway serialises agent work **per session**, not per WebSocket connection:
 - The gateway `agent.send` command path and channel-router path use the same runtime backend-mode command service; `flynn tui` forwards `/runtime ...` through this gateway path for parity.
 - Backend routing and fallback outcomes are emitted to audit logs (`backend.route`, `backend.success`, `backend.fallback`) for rollout evaluation; this telemetry is outside JSON-RPC response payloads.
 - Session-start memory injection (`user/profile` + `user/working`) is server-side and controlled by `memory.user_namespace`; it does not affect protocol payloads.
- Multi-turn child agents are exposed through tool calls (`subagent.spawn/send/list/cancel/delete`) inside the agent loop; they do not add new JSON-RPC methods.
+- Multi-turn child agents are exposed through tool calls (`subagent.spawn/send/list/cancel/delete/summary`) inside the agent loop; they do not add new JSON-RPC methods.

 This is implemented via a per-lane queue (`LaneQueue`) in the gateway server, and used by `agent.send` and `agent.cancel`.

@@ -137,7 +137,7 @@ Tool Calls (inside NativeAgent loop)
                 +---------------------------> AuditLogger (redacted)

 Subagent sessions (multi-turn child agents)
-  parent AgentOrchestrator -> subagent.* tools -> SubagentManager
+  parent AgentOrchestrator -> subagent.* tools -> SubagentManager (TTL cleanup)
  SubagentManager -> child AgentOrchestrator (session namespace: subagent:<parent>:<id>)
  child AgentOrchestrator -> NativeAgent/tool loop (same policy engine, recursion tools removed)

@@ -17,7 +17,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`.
 - Backend routing outcomes are auditable via `backend.route` / `backend.success` / `backend.fallback`, which enables offline canary evaluation without changing gateway protocol methods.
 - Run lifecycle/cancel intent and reaction decisions are emitted to audit logs, and aggregated into `system.metrics` counters (runStates, cancelLatencyMs, reactions) for dashboards.
 - Reaction matching is deterministic (priority + cooldown + recursion guard) before intent/agent routing.
- `subagent.*` tools create child orchestrators scoped to the parent conversation (`subagent:<parentSessionId>:<childId>`); this is tool-loop behavior, not a separate gateway RPC session lane.
+- `subagent.*` tools create child orchestrators scoped to the parent conversation (`subagent:<parentSessionId>:<childId>`) with idle TTL cleanup; this is tool-loop behavior, not a separate gateway RPC session lane.
 - Companion `node.*` registration is per WebSocket connection; reconnects must re-register capabilities before invoking node RPC methods.
 - Canvas artifacts are persisted per session under the gateway data directory for UI recovery across restarts.
 - TTS output is best-effort; synthesis failures fall back to text-only responses.
@@ -20,7 +20,7 @@ The following were previously treated as gaps but are already implemented in Fly
 2. Voice UX is functional but not yet a polished, end-to-end daily-driver experience across surfaces.
 3. Browser tools exist but lack task-level reliability primitives (checkpoints/retries/guardrails) for autonomous workflows.
 4. Onboarding lacks a "first success" guided path that validates real integrations live during setup.
-5. Subagent sessions are now available (`subagent.*`) but need lifecycle hardening (TTL/budgeting/UI visibility) for larger autonomous workflows.
+5. Subagent sessions are now available (`subagent.*`) with idle TTL cleanup and transcript summary support, but still need budgeting/UI visibility for larger autonomous workflows.

 ## Product Goal

@@ -1,7 +1,7 @@
 # Subagents Support Plan (Flynn)

 Date: 2026-02-26  
-Status: phase 1 implemented  
+Status: phase 1 implemented, phase 2 partially implemented  
 Scope: add OpenClaw-style multi-turn subagent session support in Flynn without changing channel surface scope (Telegram-first)

 ## Constraints
@@ -32,10 +32,10 @@ Scope: add OpenClaw-style multi-turn subagent session support in Flynn without c

 ## Phase 2 (Next)

-1. Add per-subagent TTL/idle eviction and auto-cleanup metrics.
-2. Add optional transcript export/summarization (`subagent.summary`).
-3. Add per-subagent tool-profile override (read-only by default for risky workloads).
-4. Add parent-child trace IDs in audit events for easier debugging.
+1. Add per-subagent TTL/idle eviction and auto-cleanup metrics. (implemented: TTL eviction)
+2. Add optional transcript export/summarization (`subagent.summary`). (implemented)
+3. Add per-subagent tool-profile override (read-only by default for risky workloads). (pending)
+4. Add parent-child trace IDs in audit events for easier debugging. (pending)

 ## Phase 3 (Stretch)

@@ -6800,7 +6800,7 @@
      "status": "completed",
      "date": "2026-02-26",
      "updated": "2026-02-26",
-      "summary": "Implemented Phase 1 subagent support: added a SubagentManager with multi-turn child sessions, new `subagent.*` tools (spawn/send/list/cancel/delete), routing wiring, config guardrails, policy/profile integration, docs/diagram updates, and focused test coverage.",
+      "summary": "Implemented Phase 1 and partial Phase 2 subagent support: added a SubagentManager with multi-turn child sessions, idle TTL cleanup, new `subagent.*` tools (spawn/send/list/cancel/delete/summary), routing wiring, config guardrails, policy/profile integration, docs/diagram updates, and focused test coverage.",
      "files_modified": [
        "src/backends/native/subagents.ts",
        "src/backends/native/subagents.test.ts",
@@ -6824,11 +6824,11 @@
        "docs/plans/2026-02-26-personal-assistant-productization-plan.md",
        "docs/plans/state.json"
      ],
-      "test_status": "pnpm test:run src/backends/native/subagents.test.ts src/tools/builtin/subagents.test.ts src/tools/policy.test.ts src/config/schema.test.ts passing"
+      "test_status": "pnpm test:run src/backends/native/subagents.test.ts src/tools/builtin/subagents.test.ts src/tools/policy.test.ts src/config/schema.test.ts src/daemon/routing.test.ts passing + pnpm typecheck"
    }
  },
  "overall_progress": {
-    "total_test_count": 2531,
+    "total_test_count": 2533,
    "all_tests_passing": true,
    "p0_completion": "3/3 (100%)",
    "p1_completion": "4/4 (100%)",
@@ -6843,7 +6843,7 @@
    "tier2_completion": "4/4 (100%) \u2014 inbound webhooks, vector memory search, Dockerfile, heartbeat monitor",
    "tier3_completion": "5/5 (100%) \u2014 lane queue, credential redaction, web UI token dashboard, xAI (Grok) provider, Voyage AI embeddings",
    "tier4_completion": "4/4 (100%) \u2014 gateway lock, shell completion, Tailscale Serve/Funnel, DM pairing codes",
-    "feature_gap_scorecard": "rebaselined 2026-02-26 — channel breadth, setup wizard, baseline browser automation, and phase-1 multi-turn subagent sessions (`subagent.*`) are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, browser workflow reliability primitives, and first-success onboarding funnel optimization.",
+    "feature_gap_scorecard": "rebaselined 2026-02-26 — channel breadth, setup wizard, baseline browser automation, and partial phase-2 subagent support (`subagent.*` + idle TTL cleanup + transcript summary) are implemented; remaining high-impact personal-assistant gaps center on shipped companion apps (desktop/mobile), voice UX polish, browser workflow reliability primitives, and first-success onboarding funnel optimization.",
    "operator_dx_milestone": "Phase 3 (Live Ops Dashboard): 2/2 plans complete \u2014 milestone done",
    "dashboard_observability": "completed \u2014 service health graphs + core service log viewer added to web UI via observability RPCs and bounded backend sampling",
    "gmail_auth_cli": "flynn gmail-auth command implemented with OAuth2 flow, doctor check, config routed to Telegram",
@@ -6877,7 +6877,7 @@
    "deeper_surfaces_phase4_rollout": "completed \u2014 phase 4 rollout and operator readiness plan documented: canary rollout plan by feature flag/surface, explicit rollback playbook, operator docs and architecture/protocol docs synchronized",
    "post_phase_test_fixes": "completed \u2014 fixed 4 test failures introduced by phases 1-3: iOS/Android push listNodes (missing publishHeartbeat before platform-filtered query), server.test agent.send (run_state events now precede done; added sendAndWaitForDone helper), httpBody 413 (req.destroy() closed socket before response could be sent; replaced with Connection: close header on 413 responses)",
    "personal_assistant_productization_plan": "proposed \u2014 8-10 week phased roadmap defined (companion MVP surfaces, voice reliability hardening, browser workflow reliability layer, onboarding 2.0 first-success funnel) with measurable exit gates.",
-    "subagents_support": "completed \u2014 phase-1 subagent runtime support added with `subagent.spawn/send/list/cancel/delete`, per-parent child-session orchestration, config guardrails (`agents.subagents.*`), and focused regression tests."
+    "subagents_support": "completed \u2014 phase-1 plus partial phase-2 subagent runtime support added with `subagent.spawn/send/list/cancel/delete/summary`, per-parent child-session orchestration, idle TTL cleanup (`agents.subagents.idle_ttl_ms`), config guardrails, and focused regression tests."
  },
  "soul_md_and_cron_create": {
    "date": "2026-02-11",