From 8983f45d4e2c97930759a39fb2a907dff9fb58e6 Mon Sep 17 00:00:00 2001 From: zap Date: Fri, 13 Mar 2026 00:25:59 +0000 Subject: [PATCH] docs(state): correct glm-5 entitlement note --- WIP.subagent-reliability.md | 18 ++++++++++++++---- memory/2026-03-13.md | 13 ++++++++++++- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git a/WIP.subagent-reliability.md b/WIP.subagent-reliability.md index 2aac8d0..67b7599 100644 --- a/WIP.subagent-reliability.md +++ b/WIP.subagent-reliability.md @@ -36,10 +36,20 @@ This is the highest-leverage remaining open reliability item because it affects 5. Validate with the smallest reliable reproduction possible. ## Evidence gathered so far -- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution. -- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured. -- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents. -- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes. +- Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution. +- Current installed agent auth profile keys inspected in agent stores include `openai-codex:default`, `litellm:default`, and `github-copilot:github`. +- Will clarified that Z.AI auth does exist, but this account is not entitled for `glm-5`. +- Root cause for this immediate repro is therefore best described as a provider/model entitlement mismatch caused by the explicit spawn model choice, not missing auth propagation between agents. +- A later "corrected" run using `litellm/glm-5` also did not succeed: child transcript `~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl` contains repeated assistant `stopReason:"error"` entries with `429 ... subscription plan does not yet include access to GLM-5`, while `~/.openclaw/subagents/runs.json` recorded that run (`776a8b51-6fdc-448e-83bc-55418814a05b`) as `outcome.status: "ok"` with `frozenResultText: null`. +- This separates the problems: + - ACP/operator/model-selection issue: explicit `glm-5` → `zai/glm-5` without auth (already understood). + - Generic subagent completion/reporting issue: terminal assistant errors can still be stored/announced as successful completion with no frozen result. +- Implemented upstream patch on branch `fix/subagent-wait-error-outcome` in `external/openclaw-upstream` so subagent completion paths inspect the latest assistant terminal message and treat terminal assistant errors as `outcome.status: "error"` rather than `ok`. +- Validation completed for targeted non-E2E coverage: + - `pnpm -C external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts` + - result: passed (`50 tests` across `3` files). +- E2E-style `subagent-announce.format.e2e.test.ts` coverage was updated but the normal Vitest include rules exclude `*.e2e.test.ts`; direct `pnpm test -- --run ...e2e...` confirms exclusion rather than executing that file. +- Next step after this patch: rerun a real subagent with a known-working model (`gpt-5.4` or another actually entitled model) and confirm `runs.json` stores `error` on terminal assistant failure and a useful frozen result on success. ## Constraints - Prefer evidence over theory. diff --git a/memory/2026-03-13.md b/memory/2026-03-13.md index 96a8761..2eb1d79 100644 --- a/memory/2026-03-13.md +++ b/memory/2026-03-13.md @@ -3,7 +3,8 @@ ## Subagent reliability investigation - Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work. - Failure mode: delegated run was spawned with model `glm-5`, which resolved to provider model `zai/glm-5`. -- Current agent auth profiles across installed agents include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` auth profile configured in agent auth stores. +- Current installed agent auth profile keys inspected in agent stores include `openai-codex:default`, `litellm:default`, and `github-copilot:github`. +- Will clarified on 2026-03-13 that Z.AI auth does exist in the environment, but the account is not entitled for `glm-5`. - Verified by inspecting agent auth profile keys under: - `/home/openclaw/.openclaw/agents/*/agent/auth-profiles.json` - Relevant OpenClaw docs confirm: @@ -13,4 +14,14 @@ - Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents. - Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup). - Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability. +- New evidence from the corrected run: `~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl` shows repeated assistant `stopReason:"error"` entries with `429 ... GLM-5 not included in current subscription plan`, but `~/.openclaw/subagents/runs.json` recorded run `776a8b51-6fdc-448e-83bc-55418814a05b` as `outcome.status: "ok"` and `frozenResultText: null`. +- That separates ACP/runtime choice problems from a generic subagent completion/reporting bug: a terminal assistant error can still be persisted/announced as success with no useful result. +- Implemented upstream fix on branch `external/openclaw-upstream@fix/subagent-wait-error-outcome`: + - added assistant terminal-outcome helper so empty-content assistant errors still yield usable terminal text + - subagent registry now downgrades `agent.wait => ok` to `error` when the child session's terminal assistant message is actually an error + - subagent announce flow now reports terminal assistant errors as failed outcomes instead of successful `(no output)` completions +- Targeted validation passed: + - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts` + - result: `50 tests` passed across `3` files +- Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (`gpt-5.4` preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as `error`. - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.