docs(state): seed subagent reliability investigation
This commit is contained in:
54
WIP.subagent-reliability.md
Normal file
54
WIP.subagent-reliability.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# WIP.subagent-reliability.md
|
||||
|
||||
## Status
|
||||
Status: `open`
|
||||
Owner: `zap`
|
||||
Opened: `2026-03-13`
|
||||
|
||||
## Purpose
|
||||
Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
|
||||
|
||||
## Why now
|
||||
This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs.
|
||||
|
||||
## Related tasks
|
||||
- `task-20260304-2215-subagent-reliability` — in progress
|
||||
- `task-20260304-211216-acp-claude-codex` — open
|
||||
|
||||
## Known context
|
||||
- Prior work already patched TUI formatting to suppress internal runtime completion context blocks.
|
||||
- Upstream patch exists in `external/openclaw-upstream` on branch `fix/tui-hide-internal-runtime-context` commit `0f66a4547`.
|
||||
- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
|
||||
- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
|
||||
|
||||
## Goals for this pass
|
||||
1. Establish the current failure modes with concrete evidence.
|
||||
2. Separate ACP-specific failures from generic subagent/session issues.
|
||||
3. Determine what is already fixed versus still broken.
|
||||
4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix.
|
||||
5. Update task/memory state with evidence before ending.
|
||||
|
||||
## Suggested investigation plan
|
||||
1. Review current OpenClaw docs and local memory around subagent/ACP failures.
|
||||
2. Reproduce or inspect recent failures using session/task evidence instead of guessing.
|
||||
3. Check current runtime status / relevant logs / known local patches.
|
||||
4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch.
|
||||
5. Validate with the smallest reliable reproduction possible.
|
||||
|
||||
## Evidence gathered so far
|
||||
- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution.
|
||||
- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured.
|
||||
- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents.
|
||||
- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes.
|
||||
|
||||
## Constraints
|
||||
- Prefer evidence over theory.
|
||||
- Do not claim a fix without concrete validation.
|
||||
- Keep the main session clean; use this file as the canonical baton.
|
||||
|
||||
## Success criteria
|
||||
- Clear diagnosis of the current reliability problem(s).
|
||||
- At least one of:
|
||||
- implemented fix with validation, or
|
||||
- sharply scoped next fix plan with exact evidence and files.
|
||||
- `memory/2026-03-13.md` (or current daily note), `memory/tasks.json`, and this WIP updated.
|
||||
Reference in New Issue
Block a user