docs(subagents): record live gpt-5.4 failure verification

2026-03-13 18:12:11 +00:00
parent 08c1981faa
commit 95135eb5f1
4 changed files with 44 additions and 12 deletions
--- a/memory/2026-03-13.md
+++ b/memory/2026-03-13.md
@@ -23,6 +23,18 @@
 - Targeted validation passed:
  - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
  - result: `50 tests` passed across `3` files
- Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (`gpt-5.4` preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as `error`.
+- Real success-path verification later passed on `gpt-5.4` with run `23750d80-b481-4f50-b219-cc9245be405f` and final child result `SUCCESS-PROBE-OK`.
+- Real failure-path verification later also passed on valid `gpt-5.4` by intentionally triggering a `context_length_exceeded` provider error with a token-dense oversized task payload.
+  - child run: `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
+  - child session: `agent:main:subagent:4c0dd686-cd2e-4cba-b80b-2fbf309a4594`
+  - child transcript: `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
+  - transcript terminal assistant entry recorded `provider:"openai-codex"`, `model:"gpt-5.4"`, `stopReason:"error"`, `errorMessage:"Codex error: {...context_length_exceeded...}"`
+  - matching `~/.openclaw/subagents/runs.json` now correctly stored:
+    - `outcome.status: "error"`
+    - `outcome.error: "Codex error: {...context_length_exceeded...}"`
+    - `endedReason: "subagent-error"`
+    - `frozenResultText: "Codex error: {...context_length_exceeded...}"`
+- Important remaining nuance: raw gateway `agent.wait` for that same failed child still returned `status:"ok"` with only `endedAt`, so the current fix is verified for subagent outcome persistence/announcements but not for lower-level `agent.wait` semantics.
+- Side note: unrelated dirty `/subagents log` UX changes in `external/openclaw-upstream` regression-passed `src/auto-reply/reply/commands.test.ts` (44 tests) but were intentionally left out-of-scope for this focused reliability pass.
 - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.
 - Will explicitly reinforced on 2026-03-13 that once planning is done, zap should use subagents ASAP and start implementation in a fresh session rather than continuing to implement inside the long-lived main chat.