docs(state): correct glm-5 entitlement note

2026-03-13 00:25:59 +00:00
parent 7669d5787d
commit 8983f45d4e
2 changed files with 26 additions and 5 deletions
@@ -36,10 +36,20 @@ This is the highest-leverage remaining open reliability item because it affects
 5. Validate with the smallest reliable reproduction possible.

 ## Evidence gathered so far
- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution.
- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured.
- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents.
- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes.
+- Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution.
+- Current installed agent auth profile keys inspected in agent stores include `openai-codex:default`, `litellm:default`, and `github-copilot:github`.
+- Will clarified that Z.AI auth does exist, but this account is not entitled for `glm-5`.
+- Root cause for this immediate repro is therefore best described as a provider/model entitlement mismatch caused by the explicit spawn model choice, not missing auth propagation between agents.
+- A later "corrected" run using `litellm/glm-5` also did not succeed: child transcript `~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl` contains repeated assistant `stopReason:"error"` entries with `429 ... subscription plan does not yet include access to GLM-5`, while `~/.openclaw/subagents/runs.json` recorded that run (`776a8b51-6fdc-448e-83bc-55418814a05b`) as `outcome.status: "ok"` with `frozenResultText: null`.
+- This separates the problems:
+  - ACP/operator/model-selection issue: explicit `glm-5` → `zai/glm-5` without auth (already understood).
+  - Generic subagent completion/reporting issue: terminal assistant errors can still be stored/announced as successful completion with no frozen result.
+- Implemented upstream patch on branch `fix/subagent-wait-error-outcome` in `external/openclaw-upstream` so subagent completion paths inspect the latest assistant terminal message and treat terminal assistant errors as `outcome.status: "error"` rather than `ok`.
+- Validation completed for targeted non-E2E coverage:
+  - `pnpm -C external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
+  - result: passed (`50 tests` across `3` files).
+- E2E-style `subagent-announce.format.e2e.test.ts` coverage was updated but the normal Vitest include rules exclude `*.e2e.test.ts`; direct `pnpm test -- --run ...e2e...` confirms exclusion rather than executing that file.
+- Next step after this patch: rerun a real subagent with a known-working model (`gpt-5.4` or another actually entitled model) and confirm `runs.json` stores `error` on terminal assistant failure and a useful frozen result on success.

 ## Constraints
 - Prefer evidence over theory.
@@ -3,7 +3,8 @@
 ## Subagent reliability investigation
 - Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work.
 - Failure mode: delegated run was spawned with model `glm-5`, which resolved to provider model `zai/glm-5`.
- Current agent auth profiles across installed agents include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` auth profile configured in agent auth stores.
+- Current installed agent auth profile keys inspected in agent stores include `openai-codex:default`, `litellm:default`, and `github-copilot:github`.
+- Will clarified on 2026-03-13 that Z.AI auth does exist in the environment, but the account is not entitled for `glm-5`.
 - Verified by inspecting agent auth profile keys under:
  - `/home/openclaw/.openclaw/agents/*/agent/auth-profiles.json`
 - Relevant OpenClaw docs confirm:
@@ -13,4 +14,14 @@
 - Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
 - Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup).
 - Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability.
+- New evidence from the corrected run: `~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl` shows repeated assistant `stopReason:"error"` entries with `429 ... GLM-5 not included in current subscription plan`, but `~/.openclaw/subagents/runs.json` recorded run `776a8b51-6fdc-448e-83bc-55418814a05b` as `outcome.status: "ok"` and `frozenResultText: null`.
+- That separates ACP/runtime choice problems from a generic subagent completion/reporting bug: a terminal assistant error can still be persisted/announced as success with no useful result.
+- Implemented upstream fix on branch `external/openclaw-upstream@fix/subagent-wait-error-outcome`:
+  - added assistant terminal-outcome helper so empty-content assistant errors still yield usable terminal text
+  - subagent registry now downgrades `agent.wait => ok` to `error` when the child session's terminal assistant message is actually an error
+  - subagent announce flow now reports terminal assistant errors as failed outcomes instead of successful `(no output)` completions
+- Targeted validation passed:
+  - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
+  - result: `50 tests` passed across `3` files
+- Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (`gpt-5.4` preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as `error`.
 - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.