5.5 KiB
5.5 KiB
HANDOFF.md
Purpose
Immediate baton-pass for the next fresh implementation session.
Current objective
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw agent.wait semantics fix are now both live-verified on branch fix/subagent-wait-error-outcome; the next work should stay tightly scoped to ACP-specific Claude/Codex follow-up. This pass already narrowed that thread to a real bundled-acpx parser bug for Claude-style JSON-RPC auth failures and landed a focused fix/tests. The remaining work is end-to-end OpenClaw ACP-path validation (or a fresh repro of the older exit-code crash notes) plus normal commit/push/PR cleanup when desired.
Use these state files first
WIP.subagent-reliability.md— canonical state for this passmemory/tasks.json— task tracking for reliability itemsmemory/2026-03-04-subagent-delegation.md— earlier delegation contextmemory/2026-03-13.mdif present, otherwise append today’s evidence thereexternal/openclaw-upstream/— for any core-runtime fix work
Related tasks
task-20260304-2215-subagent-reliability— in progresstask-20260304-211216-acp-claude-codex— open
Known truths
- TUI noise suppression was already patched locally and upstreamed earlier.
- User still wants actual subagent reliability improved, not just UI noise hidden.
- Historical ACP notes included
Claude: acpx exited with code 1andCodex: acpx exited with code 5, but those exact crashes were not reproduced in the latest pass. - Fresh-session implementation discipline is now the expected approach for non-trivial work.
- One explicit failure mode is already understood: requesting
glm-5can route into an unavailable GLM-5 provider/entitlement path in this setup. - A deeper bug was also identified and fixed earlier: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
- Current host state for ACP follow-up:
- bundled plugin-local
acpxexists and runs ~/.openclaw/openclaw.jsoncurrently has no explicitacpblock / enabledacpxplugin entry, so this pass used the smallest direct acpx repro path instead of a full OpenClaw ACP session
- bundled plugin-local
- New confirmed acpx/runtime bug from this pass:
- Codex direct acpx path works
- Claude direct acpx path returns top-level JSON-RPC auth errors (
Authentication required) and exits0 extensions/acpx/src/runtime-internals/events.tspreviously dropped that JSON-RPC error shape during prompt streaming, so OpenClaw could falsely treat the turn as successful
- A focused upstream fix for that runtime bug now exists on
fix/subagent-wait-error-outcomewith targeted tests passing.
Highest-priority next actions
- Treat the generic reliability fixes as live-verified on this branch:
- subagent persistence/announcement proof:
- run id
b50cb91f-6219-44f7-9d2f-a1264ac7ceaf - child transcript
~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl runs.jsonstoresoutcome.status: "error",endedReason: "subagent-error", and a non-nullfrozenResultText
- run id
- raw
agent.waitlive-fix proof:- gateway launch:
OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured - run id:
gwc-live-agent-wait-gpt53-source-fixed2-1773429512008 - final
agentresponse:finalStatus:"error" agent.wait:{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}
- gateway launch:
- subagent persistence/announcement proof:
- Treat the ACP follow-up as partially closed, not fully done:
- live direct bundled-acpx Codex repro now works and returns
OK - live direct bundled-acpx Claude repro returns JSON-RPC auth errors with process exit
0 - focused upstream fix now maps top-level JSON-RPC prompt errors into ACP runtime
type:"error"events instead of silently dropping them - targeted validation passed:
cd external/openclaw-upstream && pnpm exec vitest run extensions/acpx/src/runtime-internals/events.test.ts extensions/acpx/src/runtime.test.ts extensions/acpx/src/runtime-internals/control-errors.test.ts- result:
22tests passed across3files
- live direct bundled-acpx Codex repro now works and returns
- Next, do end-to-end OpenClaw ACP validation if/when ACP is explicitly enabled here:
- confirm or add the needed
acp/acpxconfig in~/.openclaw/openclaw.json(or equivalent current config path) - run the smallest real OpenClaw ACP turn/session and confirm Claude auth failures now surface as terminal errors instead of false success
- only reopen the old
acpx exited with code 1/5thread if a fresh repro appears
- confirm or add the needed
- Commit/push/PR the focused upstream reliability branch when ready.
- Leave the dirty
/subagents logUX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passedsrc/auto-reply/reply/commands.test.tsbut still lacks dedicated feature coverage.
Success criteria
- Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.
- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
- Explicit decision on whether raw
agent.waitbehavior is acceptable or requires a follow-up fix. - State files updated with paths, commands, and outcomes.