65 lines
4.5 KiB
Markdown
65 lines
4.5 KiB
Markdown
# WIP.subagent-reliability.md
|
|
|
|
## Status
|
|
Status: `open`
|
|
Owner: `zap`
|
|
Opened: `2026-03-13`
|
|
|
|
## Purpose
|
|
Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
|
|
|
|
## Why now
|
|
This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs.
|
|
|
|
## Related tasks
|
|
- `task-20260304-2215-subagent-reliability` — in progress
|
|
- `task-20260304-211216-acp-claude-codex` — open
|
|
|
|
## Known context
|
|
- Prior work already patched TUI formatting to suppress internal runtime completion context blocks.
|
|
- Upstream patch exists in `external/openclaw-upstream` on branch `fix/tui-hide-internal-runtime-context` commit `0f66a4547`.
|
|
- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
|
|
- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
|
|
|
|
## Goals for this pass
|
|
1. Establish the current failure modes with concrete evidence.
|
|
2. Separate ACP-specific failures from generic subagent/session issues.
|
|
3. Determine what is already fixed versus still broken.
|
|
4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix.
|
|
5. Update task/memory state with evidence before ending.
|
|
|
|
## Suggested investigation plan
|
|
1. Review current OpenClaw docs and local memory around subagent/ACP failures.
|
|
2. Reproduce or inspect recent failures using session/task evidence instead of guessing.
|
|
3. Check current runtime status / relevant logs / known local patches.
|
|
4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch.
|
|
5. Validate with the smallest reliable reproduction possible.
|
|
|
|
## Evidence gathered so far
|
|
- Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution.
|
|
- Current installed agent auth profile keys inspected in agent stores include `openai-codex:default`, `litellm:default`, and `github-copilot:github`.
|
|
- Will clarified that Z.AI auth does exist, but this account is not entitled for `glm-5`.
|
|
- Root cause for this immediate repro is therefore best described as a provider/model entitlement mismatch caused by the explicit spawn model choice, not missing auth propagation between agents.
|
|
- A later "corrected" run using `litellm/glm-5` also did not succeed: child transcript `~/.openclaw/agents/main/sessions/1615a980-cf92-4d5e-845a-a2abe77c0418.jsonl` contains repeated assistant `stopReason:"error"` entries with `429 ... subscription plan does not yet include access to GLM-5`, while `~/.openclaw/subagents/runs.json` recorded that run (`776a8b51-6fdc-448e-83bc-55418814a05b`) as `outcome.status: "ok"` with `frozenResultText: null`.
|
|
- This separates the problems:
|
|
- ACP/operator/model-selection issue: explicit `glm-5` → `zai/glm-5` without auth (already understood).
|
|
- Generic subagent completion/reporting issue: terminal assistant errors can still be stored/announced as successful completion with no frozen result.
|
|
- Implemented upstream patch on branch `fix/subagent-wait-error-outcome` in `external/openclaw-upstream` so subagent completion paths inspect the latest assistant terminal message and treat terminal assistant errors as `outcome.status: "error"` rather than `ok`.
|
|
- Validation completed for targeted non-E2E coverage:
|
|
- `pnpm -C external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
|
|
- result: passed (`50 tests` across `3` files).
|
|
- E2E-style `subagent-announce.format.e2e.test.ts` coverage was updated but the normal Vitest include rules exclude `*.e2e.test.ts`; direct `pnpm test -- --run ...e2e...` confirms exclusion rather than executing that file.
|
|
- Next step after this patch: rerun a real subagent with a known-working model (`gpt-5.4` or another actually entitled model) and confirm `runs.json` stores `error` on terminal assistant failure and a useful frozen result on success.
|
|
|
|
## Constraints
|
|
- Prefer evidence over theory.
|
|
- Do not claim a fix without concrete validation.
|
|
- Keep the main session clean; use this file as the canonical baton.
|
|
|
|
## Success criteria
|
|
- Clear diagnosis of the current reliability problem(s).
|
|
- At least one of:
|
|
- implemented fix with validation, or
|
|
- sharply scoped next fix plan with exact evidence and files.
|
|
- `memory/2026-03-13.md` (or current daily note), `memory/tasks.json`, and this WIP updated.
|