diff --git a/WIP.subagent-reliability.md b/WIP.subagent-reliability.md index 05f9672..b65efea 100644 --- a/WIP.subagent-reliability.md +++ b/WIP.subagent-reliability.md @@ -1,15 +1,30 @@ # WIP.subagent-reliability.md ## Status -Status: `open` +Status: `follow-up` Owner: `zap` Opened: `2026-03-13` +Last updated: `2026-03-13` ## Purpose Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise. -## Why now -This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs. +## Current state +- The core reliability thread tracked in this WIP is now **fixed and live-verified** on `external/openclaw-upstream` branch `fix/subagent-wait-error-outcome`. +- Verified fixed: + - subagent persistence / announcement handling for terminal assistant-provider failures + - raw `agent.wait` semantics for the live direct gateway path +- Key upstream commits on this branch: + - `2a2ed0d6f` — `fix(subagents): derive outcome from terminal assistant errors` + - `5a328d22b` — `fix(agent): surface terminal run errors in wait semantics` + - `f9a78e8f7` — `fix(gateway): honor terminal assistant errors in live wait path` + +## Why this file is still open +- The broader delegation reliability task is not fully done yet. +- Remaining follow-up work is now narrower: + 1. ACP-specific Claude/Codex runtime failures + 2. optional separate `/subagents log` UX cleanup + 3. push/PR the focused upstream reliability branch when desired ## Related tasks - `task-20260304-2215-subagent-reliability` — in progress @@ -21,19 +36,10 @@ This is the highest-leverage remaining open reliability item because it affects - User explicitly wants subagent tooling reliability fixed and completion-event spam prevented. - Fresh-session implementation discipline and monitoring thresholds were already documented locally. -## Goals for this pass -1. Establish the current failure modes with concrete evidence. -2. Separate ACP-specific failures from generic subagent/session issues. -3. Determine what is already fixed versus still broken. -4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix. -5. Update task/memory state with evidence before ending. - -## Suggested investigation plan -1. Review current OpenClaw docs and local memory around subagent/ACP failures. -2. Reproduce or inspect recent failures using session/task evidence instead of guessing. -3. Check current runtime status / relevant logs / known local patches. -4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch. -5. Validate with the smallest reliable reproduction possible. +## Immediate baton +- Do **not** reopen the solved `agent.wait` investigation unless a fresh repro appears. +- If this project is resumed next, start with ACP-specific Claude/Codex runtime failures. +- Treat `/subagents log` UX edits as a separate branch/pass so they do not muddy the reliability fix branch. ## Evidence gathered so far - Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution.