docs(subagents): record live gpt-5.4 failure verification
This commit is contained in:
@@ -61,8 +61,24 @@ This is the highest-leverage remaining open reliability item because it affects
|
||||
- result: successful completion with a real final child result (`SUCCESS-PROBE-OK`)
|
||||
- A later GLM-5 probe was invalid for entitlement reasons and was terminated; it should not be treated as the canonical failure-path verification.
|
||||
- killed/failed run id: `4965775c-4764-41e9-a77a-692f1ab4c2fd`
|
||||
- Remaining gap: we still need a controlled failure-path verification on a valid model/runtime so we can confirm failed child runs persist/announce as `error` rather than fake `ok`.
|
||||
- Next step: continue in a fresh `gpt-5.4` subagent session, find the smallest safe controlled-failure repro that does not depend on unavailable GLM-5 access, run it, and update WIP/HANDOFF with exact evidence.
|
||||
- Live failure-path verification on a valid working model/runtime is now complete on `gpt-5.4`.
|
||||
- spawned child run: `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
|
||||
- requester session: `agent:main:subagent-reliability-failure-hex-1773425126098`
|
||||
- child session: `agent:main:subagent:4c0dd686-cd2e-4cba-b80b-2fbf309a4594`
|
||||
- child transcript: `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
|
||||
- terminal child assistant message (transcript line 6) recorded:
|
||||
- `provider: "openai-codex"`
|
||||
- `model: "gpt-5.4"`
|
||||
- `stopReason: "error"`
|
||||
- `errorMessage: "Codex error: {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"code\":\"context_length_exceeded\",\"message\":\"Your input exceeds the context window of this model. Please adjust your input and try again.\",\"param\":\"input\"},\"sequence_number\":2}"`
|
||||
- matching `~/.openclaw/subagents/runs.json` record now correctly persisted:
|
||||
- `outcome.status: "error"`
|
||||
- `outcome.error: "Codex error: {...context_length_exceeded...}"`
|
||||
- `endedReason: "subagent-error"`
|
||||
- `frozenResultText: "Codex error: {...context_length_exceeded...}"`
|
||||
- Important nuance from the same live repro: raw gateway `agent.wait` still returned `{"runId":"b50cb91f-6219-44f7-9d2f-a1264ac7ceaf","status":"ok","endedAt":1773425130881}` for that failed child. So the current fix is verified for persisted/announced **subagent outcomes**, but **not** for the lower-level `agent.wait` RPC semantics.
|
||||
- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.
|
||||
- Next step if continuing core work: decide whether `agent.wait` itself should downgrade terminal assistant errors to `status: "error"`, or whether the current contract is acceptable now that subagent registry persistence/announcements are fixed.
|
||||
|
||||
## Constraints
|
||||
- Prefer evidence over theory.
|
||||
|
||||
Reference in New Issue
Block a user