docs(subagents): record live gpt-5.4 failure verification

This commit is contained in:
zap
2026-03-13 18:12:11 +00:00
parent 08c1981faa
commit 95135eb5f1
4 changed files with 44 additions and 12 deletions

View File

@@ -61,8 +61,24 @@ This is the highest-leverage remaining open reliability item because it affects
- result: successful completion with a real final child result (`SUCCESS-PROBE-OK`)
- A later GLM-5 probe was invalid for entitlement reasons and was terminated; it should not be treated as the canonical failure-path verification.
- killed/failed run id: `4965775c-4764-41e9-a77a-692f1ab4c2fd`
- Remaining gap: we still need a controlled failure-path verification on a valid model/runtime so we can confirm failed child runs persist/announce as `error` rather than fake `ok`.
- Next step: continue in a fresh `gpt-5.4` subagent session, find the smallest safe controlled-failure repro that does not depend on unavailable GLM-5 access, run it, and update WIP/HANDOFF with exact evidence.
- Live failure-path verification on a valid working model/runtime is now complete on `gpt-5.4`.
- spawned child run: `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
- requester session: `agent:main:subagent-reliability-failure-hex-1773425126098`
- child session: `agent:main:subagent:4c0dd686-cd2e-4cba-b80b-2fbf309a4594`
- child transcript: `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
- terminal child assistant message (transcript line 6) recorded:
- `provider: "openai-codex"`
- `model: "gpt-5.4"`
- `stopReason: "error"`
- `errorMessage: "Codex error: {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"code\":\"context_length_exceeded\",\"message\":\"Your input exceeds the context window of this model. Please adjust your input and try again.\",\"param\":\"input\"},\"sequence_number\":2}"`
- matching `~/.openclaw/subagents/runs.json` record now correctly persisted:
- `outcome.status: "error"`
- `outcome.error: "Codex error: {...context_length_exceeded...}"`
- `endedReason: "subagent-error"`
- `frozenResultText: "Codex error: {...context_length_exceeded...}"`
- Important nuance from the same live repro: raw gateway `agent.wait` still returned `{"runId":"b50cb91f-6219-44f7-9d2f-a1264ac7ceaf","status":"ok","endedAt":1773425130881}` for that failed child. So the current fix is verified for persisted/announced **subagent outcomes**, but **not** for the lower-level `agent.wait` RPC semantics.
- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.
- Next step if continuing core work: decide whether `agent.wait` itself should downgrade terminal assistant errors to `status: "error"`, or whether the current contract is acceptable now that subagent registry persistence/announcements are fixed.
## Constraints
- Prefer evidence over theory.