2026-03-13

Subagent reliability investigation

Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work.
Failure mode: delegated run was spawned with model glm-5, which resolved to provider model zai/glm-5.
Current installed agent auth profile keys inspected in agent stores include openai-codex:default, litellm:default, and github-copilot:github.
Will clarified on 2026-03-13 that Z.AI auth does exist in the environment, but the account is not entitled for glm-5.
Verified by inspecting agent auth profile keys under:
- /home/openclaw/.openclaw/agents/*/agent/auth-profiles.json
Relevant OpenClaw docs confirm:
- subagent spawns inherit caller model when sessions_spawn.model is omitted
- provider/model auth errors like No API key found for provider "zai" occur when a provider model is selected without matching auth
- multi-agent auth is per-agent via ~/.openclaw/agents/<agentId>/agent/auth-profiles.json
Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
Corrective action: retry fresh delegation with litellm/glm-5 (the intended medium-tier routed model for delegated implementation work in this setup).
Will explicitly requested on 2026-03-13 to use gpt-5.4 for subagents for now while debugging delegation reliability.
New evidence from the corrected run: ~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl shows repeated assistant stopReason:"error" entries with 429 ... GLM-5 not included in current subscription plan, but ~/.openclaw/subagents/runs.json recorded run 776a8b51-6fdc-448e-83bc-55418814a05b as outcome.status: "ok" and frozenResultText: null.
That separates ACP/runtime choice problems from a generic subagent completion/reporting bug: a terminal assistant error can still be persisted/announced as success with no useful result.
Implemented upstream fix on branch external/openclaw-upstream@fix/subagent-wait-error-outcome:
- added assistant terminal-outcome helper so empty-content assistant errors still yield usable terminal text
- subagent registry now downgrades agent.wait => ok to error when the child session's terminal assistant message is actually an error
- subagent announce flow now reports terminal assistant errors as failed outcomes instead of successful (no output) completions
Targeted validation passed:
- pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts
- result: 50 tests passed across 3 files
Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (gpt-5.4 preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as error.
Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.

3.1 KiB Raw Blame History

2026-03-13

Subagent reliability investigation

3.1 KiB

Raw Blame History