docs(reliability): record live agent.wait fix verification

2026-03-13 19:24:14 +00:00
parent f2b99841af
commit 8998e7535e
4 changed files with 59 additions and 29 deletions
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -4,7 +4,7 @@
 Immediate baton-pass for the next fresh implementation session.

 ## Current objective
-Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The failure-path proof for the new subagent outcome handling is captured, but the focused upstream `agent.wait` semantics fix on branch `fix/subagent-wait-error-outcome` did **not** hold in a fresh live source-gateway repro, so the remaining work is a narrower root-cause follow-up on the still-open live `agent.wait => ok` path.
+Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw `agent.wait` semantics fix are now both live-verified on branch `fix/subagent-wait-error-outcome`; the next work should shift to follow-up cleanup (commit/push/PR when desired, ACP-specific runtime failures, and any unrelated `/subagents log` UX work only in a separate focused pass).

 ## Use these state files first
 1. `WIP.subagent-reliability.md` — canonical state for this pass
@@ -27,25 +27,19 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs
 - An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.

 ## Highest-priority next actions
-1. Treat the live `gpt-5.4` failure repro as proven for subagent persistence/announcement handling:
-   - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
-   - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
-   - `runs.json` now stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText`
-2. Treat raw gateway `agent.wait` as still **open** despite the current follow-up fix branch.
-   - decisive live source-gateway repro:
-     - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18902 --bind loopback --auth none --allow-unconfigured`
-     - session key: `agent:main:subagent:agent-wait-gpt53-live-source-1773427981586`
-     - run id: `gwc-live-agent-wait-gpt53-source-1773427981614`
-     - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-1773427981614","status":"ok","endedAt":1773427984243}`
-     - last assistant: `provider:"openai-codex" model:"gpt-5.3-codex" stopReason:"error" errorMessage contains context_length_exceeded`
-   - this is the current canonical blocker evidence for the still-open live path
-3. Most likely remaining gap to investigate next:
-   - `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded`
-   - `lifecycleEnded` flips true on any inner lifecycle `phase:"end"` or `phase:"error"`
-   - `src/gateway/server-methods/agent-job.ts` resolves/caches `phase:"end"` as terminal `status:"ok"`
-   - so an inner lifecycle emitter is still the likeliest place where terminal assistant/provider failures are being marked `end` too early on the live direct gateway path
-4. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent outcome bug.
-5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.
+1. Treat the reliability fixes as live-verified on this branch:
+   - subagent persistence/announcement proof:
+     - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
+     - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
+     - `runs.json` stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText`
+   - raw `agent.wait` live-fix proof:
+     - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured`
+     - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008`
+     - final `agent` response: `finalStatus:"error"`
+     - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}`
+2. Commit/push/PR the focused upstream reliability branch when ready.
+3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible now that the generic subagent outcome/agent.wait bugs are separated and fixed.
+4. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.

 ## Success criteria
 - Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.