docs(reliability): record live agent.wait fix verification

2026-03-13 19:24:14 +00:00
parent f2b99841af
commit 8998e7535e
4 changed files with 59 additions and 29 deletions
--- a/WIP.subagent-reliability.md
+++ b/WIP.subagent-reliability.md
@@ -107,13 +107,34 @@ This is the highest-leverage remaining open reliability item because it affects
  - earlier temporary gateway runs reinforced the same mismatch:
    - stale dist gateway repro run `gwc-live-agent-wait-gpt53-1773427893583` also returned `status:"ok"` while transcript stopReason remained `error`
    - temp `gpt-5.4` session repro on the same temp gateway returned `status:"error"`, but only because that runtime reported `FailoverError: Unknown model: openai-codex/gpt-5.4`; that is useful as transport sanity, but **not** the canonical live semantics proof
- Most likely remaining code-path gap (high-confidence from source inspection):
-  - `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded`
-  - `lifecycleEnded` is set as soon as any inner lifecycle callback reports `phase:"end"` or `phase:"error"`
-  - `src/gateway/server-methods/agent-job.ts` immediately caches/resolves `phase:"end"` as terminal `status:"ok"`
-  - so if an inner embedded lifecycle emitter still reports `phase:"end"` for a run whose final assistant message later has `stopReason:"error"`, `agent.wait` will still resolve `ok` before the dedupe/result-meta rescue path matters
-  - likely next target: identify the inner lifecycle emitter that is still producing `phase:"end"` on this direct gateway path and either convert that event to `phase:"error"` for terminal assistant failures or make `agent.wait` prefer final dedupe/result-meta over earlier lifecycle `end` when both exist for the same run
- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.
+- The final focused live-fix pass on 2026-03-13 closed the remaining `agent.wait` bug.
+  - root cause confirmed: the live direct gateway path could receive an inner `agent_end` event carrying a terminal assistant error without a preceding `message_end`, which left stale/empty assistant state and still emitted lifecycle `phase:"end"`
+  - upstream fix extends the embedded subscribe lifecycle handler to recover the terminal assistant from `agent_end.messages` or the session transcript when state is stale, then emit lifecycle `phase:"error"` with a friendly error string instead of `end`
+  - upstream fix also updates the direct gateway `agent` RPC handler to observe lifecycle events for the run and derive the final RPC payload/terminal status from observed lifecycle + resolved result metadata, instead of blindly caching `status:"ok"` when the outer RPC resolves
+  - files changed for the final fix:
+    - `src/agents/pi-embedded-subscribe.e2e-harness.ts`
+    - `src/agents/pi-embedded-subscribe.handlers.lifecycle.ts`
+    - `src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts`
+    - `src/agents/pi-embedded-subscribe.handlers.ts`
+    - `src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts`
+    - `src/gateway/server-methods/agent.ts`
+    - `src/gateway/server-methods/server-methods.test.ts`
+- Final targeted validation passed:
+  - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts src/commands/agent.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/gateway/server-methods/server-methods.test.ts`
+  - result: `108 tests` passed across `5` files
+- Final decisive live source-gateway repro after the fix:
+  - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured`
+  - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008`
+  - session key: `agent:main:subagent:agent-wait-gpt53-live-source-fixed2-1773429512008`
+  - final `agent` response with `expectFinal: true` returned:
+    - `finalStatus: "error"`
+    - `finalSummary: "LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."`
+  - matching `agent.wait` returned:
+    - `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}`
+- Net status now:
+  - subagent persistence/announcement fix: live-verified ✅
+  - raw `agent.wait` semantics fix: live-verified ✅
+- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this focused reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.

 ## Constraints
 - Prefer evidence over theory.