Files
swarm-zap/memory/2026-03-13.md

3.1 KiB

2026-03-13

Subagent reliability investigation

  • Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work.
  • Failure mode: delegated run was spawned with model glm-5, which resolved to provider model zai/glm-5.
  • Current installed agent auth profile keys inspected in agent stores include openai-codex:default, litellm:default, and github-copilot:github.
  • Will clarified on 2026-03-13 that Z.AI auth does exist in the environment, but the account is not entitled for glm-5.
  • Verified by inspecting agent auth profile keys under:
    • /home/openclaw/.openclaw/agents/*/agent/auth-profiles.json
  • Relevant OpenClaw docs confirm:
    • subagent spawns inherit caller model when sessions_spawn.model is omitted
    • provider/model auth errors like No API key found for provider "zai" occur when a provider model is selected without matching auth
    • multi-agent auth is per-agent via ~/.openclaw/agents/<agentId>/agent/auth-profiles.json
  • Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
  • Corrective action: retry fresh delegation with litellm/glm-5 (the intended medium-tier routed model for delegated implementation work in this setup).
  • Will explicitly requested on 2026-03-13 to use gpt-5.4 for subagents for now while debugging delegation reliability.
  • New evidence from the corrected run: ~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl shows repeated assistant stopReason:"error" entries with 429 ... GLM-5 not included in current subscription plan, but ~/.openclaw/subagents/runs.json recorded run 776a8b51-6fdc-448e-83bc-55418814a05b as outcome.status: "ok" and frozenResultText: null.
  • That separates ACP/runtime choice problems from a generic subagent completion/reporting bug: a terminal assistant error can still be persisted/announced as success with no useful result.
  • Implemented upstream fix on branch external/openclaw-upstream@fix/subagent-wait-error-outcome:
    • added assistant terminal-outcome helper so empty-content assistant errors still yield usable terminal text
    • subagent registry now downgrades agent.wait => ok to error when the child session's terminal assistant message is actually an error
    • subagent announce flow now reports terminal assistant errors as failed outcomes instead of successful (no output) completions
  • Targeted validation passed:
    • pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts
    • result: 50 tests passed across 3 files
  • Follow-up still needed: rerun a real delegated subagent using a known-working model entitlement (gpt-5.4 preferred for now) to verify successful runs leave a useful frozen result and failed runs now persist as error.
  • Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.