docs(wip): record success probe and next failure-path pass

2026-03-13 16:40:06 +00:00
parent 5dbbc30834
commit 08c1981faa
2 changed files with 26 additions and 11 deletions
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -4,7 +4,7 @@
 Immediate baton-pass for the next fresh implementation session.

 ## Current objective
-Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. Focus on current failure modes, what is already fixed, and the highest-confidence next fix.
+Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The current target is to verify the newly landed upstream fix for subagent error/outcome handling and then continue on any remaining real runtime failures.

 ## Use these state files first
 1. `WIP.subagent-reliability.md` — canonical state for this pass
@@ -22,18 +22,20 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs
 - User still wants actual subagent reliability improved, not just UI noise hidden.
 - Prior ACP failures included Claude/Codex runtime exits.
 - Fresh-session implementation discipline is now the expected approach for non-trivial work.
+- One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
+- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
+- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.

 ## Highest-priority next actions
-1. Inspect prior task/session evidence and current runtime state.
-2. Reproduce or otherwise concretely characterize present failures.
-3. Split findings into:
-   - ACP runtime issues
-   - generic subagent/session issues
-   - completion-event / delivery issues
-4. If a fix is feasible now, implement the smallest high-confidence fix and validate it.
+1. The success side is now verified on a real fresh `gpt-5.4` subagent run.
+2. Find and execute the smallest safe controlled-failure repro on a valid model/runtime (`gpt-5.4` preferred) so we can confirm:
+   - a failing child run is stored as `error` rather than `ok`
+   - a successful child run stores a useful frozen result / announcement payload
+3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent reporting bug.
+4. If another core bug appears, continue in `external/openclaw-upstream/` on a focused branch with targeted validation.
 5. Update WIP + memory + tasks before ending.

 ## Success criteria
- Clear current-state diagnosis.
- Evidence-backed fix or sharply scoped next fix plan.
+- Real-run verification of the new error/outcome fix.
+- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
 - State files updated with paths, commands, and outcomes.