docs(wip): tighten subagent reliability baton
This commit is contained in:
@@ -1,15 +1,30 @@
|
|||||||
# WIP.subagent-reliability.md
|
# WIP.subagent-reliability.md
|
||||||
|
|
||||||
## Status
|
## Status
|
||||||
Status: `open`
|
Status: `follow-up`
|
||||||
Owner: `zap`
|
Owner: `zap`
|
||||||
Opened: `2026-03-13`
|
Opened: `2026-03-13`
|
||||||
|
Last updated: `2026-03-13`
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
|
Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
|
||||||
|
|
||||||
## Why now
|
## Current state
|
||||||
This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs.
|
- The core reliability thread tracked in this WIP is now **fixed and live-verified** on `external/openclaw-upstream` branch `fix/subagent-wait-error-outcome`.
|
||||||
|
- Verified fixed:
|
||||||
|
- subagent persistence / announcement handling for terminal assistant-provider failures
|
||||||
|
- raw `agent.wait` semantics for the live direct gateway path
|
||||||
|
- Key upstream commits on this branch:
|
||||||
|
- `2a2ed0d6f` — `fix(subagents): derive outcome from terminal assistant errors`
|
||||||
|
- `5a328d22b` — `fix(agent): surface terminal run errors in wait semantics`
|
||||||
|
- `f9a78e8f7` — `fix(gateway): honor terminal assistant errors in live wait path`
|
||||||
|
|
||||||
|
## Why this file is still open
|
||||||
|
- The broader delegation reliability task is not fully done yet.
|
||||||
|
- Remaining follow-up work is now narrower:
|
||||||
|
1. ACP-specific Claude/Codex runtime failures
|
||||||
|
2. optional separate `/subagents log` UX cleanup
|
||||||
|
3. push/PR the focused upstream reliability branch when desired
|
||||||
|
|
||||||
## Related tasks
|
## Related tasks
|
||||||
- `task-20260304-2215-subagent-reliability` — in progress
|
- `task-20260304-2215-subagent-reliability` — in progress
|
||||||
@@ -21,19 +36,10 @@ This is the highest-leverage remaining open reliability item because it affects
|
|||||||
- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
|
- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
|
||||||
- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
|
- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
|
||||||
|
|
||||||
## Goals for this pass
|
## Immediate baton
|
||||||
1. Establish the current failure modes with concrete evidence.
|
- Do **not** reopen the solved `agent.wait` investigation unless a fresh repro appears.
|
||||||
2. Separate ACP-specific failures from generic subagent/session issues.
|
- If this project is resumed next, start with ACP-specific Claude/Codex runtime failures.
|
||||||
3. Determine what is already fixed versus still broken.
|
- Treat `/subagents log` UX edits as a separate branch/pass so they do not muddy the reliability fix branch.
|
||||||
4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix.
|
|
||||||
5. Update task/memory state with evidence before ending.
|
|
||||||
|
|
||||||
## Suggested investigation plan
|
|
||||||
1. Review current OpenClaw docs and local memory around subagent/ACP failures.
|
|
||||||
2. Reproduce or inspect recent failures using session/task evidence instead of guessing.
|
|
||||||
3. Check current runtime status / relevant logs / known local patches.
|
|
||||||
4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch.
|
|
||||||
5. Validate with the smallest reliable reproduction possible.
|
|
||||||
|
|
||||||
## Evidence gathered so far
|
## Evidence gathered so far
|
||||||
- Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution.
|
- Fresh subagent run failed immediately when an explicit `glm-5` choice resolved into the Z.AI provider path before any useful task execution.
|
||||||
|
|||||||
Reference in New Issue
Block a user