docs(subagents): record live gpt-5.4 failure verification

2026-03-13 18:12:11 +00:00
parent 08c1981faa
commit 95135eb5f1
4 changed files with 44 additions and 12 deletions
@@ -23,6 +23,18 @@
 - Targeted validation passed:
  - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
  - result: `50 tests` passed across `3` files
- Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (`gpt-5.4` preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as `error`.
+- Real success-path verification later passed on `gpt-5.4` with run `23750d80-b481-4f50-b219-cc9245be405f` and final child result `SUCCESS-PROBE-OK`.
+- Real failure-path verification later also passed on valid `gpt-5.4` by intentionally triggering a `context_length_exceeded` provider error with a token-dense oversized task payload.
+  - child run: `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
+  - child session: `agent:main:subagent:4c0dd686-cd2e-4cba-b80b-2fbf309a4594`
+  - child transcript: `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
+  - transcript terminal assistant entry recorded `provider:"openai-codex"`, `model:"gpt-5.4"`, `stopReason:"error"`, `errorMessage:"Codex error: {...context_length_exceeded...}"`
+  - matching `~/.openclaw/subagents/runs.json` now correctly stored:
+    - `outcome.status: "error"`
+    - `outcome.error: "Codex error: {...context_length_exceeded...}"`
+    - `endedReason: "subagent-error"`
+    - `frozenResultText: "Codex error: {...context_length_exceeded...}"`
+- Important remaining nuance: raw gateway `agent.wait` for that same failed child still returned `status:"ok"` with only `endedAt`, so the current fix is verified for subagent outcome persistence/announcements but not for lower-level `agent.wait` semantics.
+- Side note: unrelated dirty `/subagents log` UX changes in `external/openclaw-upstream` regression-passed `src/auto-reply/reply/commands.test.ts` (44 tests) but were intentionally left out-of-scope for this focused reliability pass.
 - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.
 - Will explicitly reinforced on 2026-03-13 that once planning is done, zap should use subagents ASAP and start implementation in a fresh session rather than continuing to implement inside the long-lived main chat.
@@ -27,7 +27,8 @@
      "Patch timestamp: 2026-03-04T22:31:50Z",
      "Upstream patch committed in external/openclaw-upstream on branch fix/tui-hide-internal-runtime-context commit 0f66a4547 (suppress internal runtime completion context blocks in TUI formatter).",
      "Validation: pnpm test:fast completed successfully (812 files / 6599 tests passing) at 2026-03-04T22:53:29Z",
-      "2026-03-13: confirmed corrected LiteLLM run was still failing (child transcript showed assistant 429/plan error for GLM-5) while runs.json incorrectly stored outcome.status=ok and frozenResultText=null; implemented upstream branch fix/subagent-wait-error-outcome to derive terminal subagent outcome from latest assistant error state, with targeted validation (50 tests passed across 3 files). Still needs live rerun with a known-working model such as gpt-5.4."
+      "2026-03-13: confirmed corrected LiteLLM run was still failing (child transcript showed assistant 429/plan error for GLM-5) while runs.json incorrectly stored outcome.status=ok and frozenResultText=null; implemented upstream branch fix/subagent-wait-error-outcome to derive terminal subagent outcome from latest assistant error state, with targeted validation (50 tests passed across 3 files).",
+      "2026-03-13 later: live gpt-5.4 success repro passed (run 23750d80-b481-4f50-b219-cc9245be405f). Live gpt-5.4 failure repro also passed for subagent persistence/announcement handling: child run b50cb91f-6219-44f7-9d2f-a1264ac7ceaf ended with transcript stopReason=error + context_length_exceeded, and runs.json now stored outcome.status=error / endedReason=subagent-error / frozenResultText non-null. Remaining open nuance: raw agent.wait for that same failed child still returned status=ok."
    ]
  },
  {