# HANDOFF.md ## Purpose Immediate baton-pass for the next fresh implementation session. ## Current objective Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The failure-path proof for the new subagent outcome handling is captured, and a focused upstream `agent.wait` semantics fix is now implemented/tested on branch `fix/subagent-wait-error-outcome`; the remaining follow-up is deployment/live verification, not root-cause discovery. ## Use these state files first 1. `WIP.subagent-reliability.md` — canonical state for this pass 2. `memory/tasks.json` — task tracking for reliability items 3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context 4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there 5. `external/openclaw-upstream/` — for any core-runtime fix work ## Related tasks - `task-20260304-2215-subagent-reliability` — in progress - `task-20260304-211216-acp-claude-codex` — open ## Known truths - TUI noise suppression was already patched locally and upstreamed earlier. - User still wants actual subagent reliability improved, not just UI noise hidden. - Prior ACP failures included Claude/Codex runtime exits. - Fresh-session implementation discipline is now the expected approach for non-trivial work. - One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup. - A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result. - An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing. ## Highest-priority next actions 1. Treat the live `gpt-5.4` failure repro as proven for subagent persistence/announcement handling: - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf` - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl` - `runs.json` now stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText` 2. For raw gateway `agent.wait`, use the new upstream diagnosis/fix rather than re-arguing semantics: - decision: the previous `status:"ok"` was a bug, not intended layering - cause: `src/commands/agent.ts` fallback lifecycle emission used `phase:"end"` even when resolved run `meta.stopReason:"error"` - fix: `src/commands/agent.ts` now emits lifecycle `phase:"error"` with extracted terminal error text in that case; `src/gateway/server-methods/agent-wait-dedupe.ts` also maps resolved agent payloads with terminal `stopReason:"error"` / `aborted:true` to `error` / `timeout` - targeted validation passed: `pnpm -C external/openclaw-upstream test -- --run src/commands/agent.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/gateway/server-methods/server-methods.test.ts` 3. If continuing, do a low-noise live verification on the patched gateway/runtime for the same failure class, then report whether raw `agent.wait` now returns `status:"error"` as expected. 4. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent outcome bug. 5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage. ## Success criteria - Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling. - Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures. - Explicit decision on whether raw `agent.wait` behavior is acceptable or requires a follow-up fix. - State files updated with paths, commands, and outcomes.