From d08d8fe661793d4344d9347a636255b207102e1a Mon Sep 17 00:00:00 2001
From: zap <zap@local>
Date: Fri, 13 Mar 2026 20:10:40 +0000
Subject: [PATCH] docs(reliability): record acpx follow-up evidence

---
 HANDOFF.md                  | 33 ++++++++++++++++++++++++--------
 WIP.subagent-reliability.md | 38 +++++++++++++++++++++++++++++++++++--
 memory/2026-03-13.md        | 37 ++++++++++++++++++++++++++++++++++++
 memory/tasks.json           |  9 ++++++---
 4 files changed, 104 insertions(+), 13 deletions(-)

diff --git a/HANDOFF.md b/HANDOFF.md
index 8c126ac..5a97575 100644
--- a/HANDOFF.md
+++ b/HANDOFF.md
@@ -4,7 +4,7 @@
 Immediate baton-pass for the next fresh implementation session.
 
 ## Current objective
-Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw `agent.wait` semantics fix are now both live-verified on branch `fix/subagent-wait-error-outcome`; the next work should shift to follow-up cleanup (commit/push/PR when desired, ACP-specific runtime failures, and any unrelated `/subagents log` UX work only in a separate focused pass).
+Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw `agent.wait` semantics fix are now both live-verified on branch `fix/subagent-wait-error-outcome`; the next work should stay tightly scoped to ACP-specific Claude/Codex follow-up. This pass already narrowed that thread to a real bundled-acpx parser bug for Claude-style JSON-RPC auth failures and landed a focused fix/tests. The remaining work is end-to-end OpenClaw ACP-path validation (or a fresh repro of the older exit-code crash notes) plus normal commit/push/PR cleanup when desired.
 
 ## Use these state files first
 1. `WIP.subagent-reliability.md` — canonical state for this pass
@@ -20,14 +20,21 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs
 ## Known truths
 - TUI noise suppression was already patched locally and upstreamed earlier.
 - User still wants actual subagent reliability improved, not just UI noise hidden.
-- Prior ACP failures included Claude/Codex runtime exits.
+- Historical ACP notes included `Claude: acpx exited with code 1` and `Codex: acpx exited with code 5`, but those exact crashes were **not** reproduced in the latest pass.
 - Fresh-session implementation discipline is now the expected approach for non-trivial work.
 - One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
-- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
-- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.
+- A deeper bug was also identified and fixed earlier: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
+- Current host state for ACP follow-up:
+  - bundled plugin-local `acpx` exists and runs
+  - `~/.openclaw/openclaw.json` currently has no explicit `acp` block / enabled `acpx` plugin entry, so this pass used the smallest direct acpx repro path instead of a full OpenClaw ACP session
+- New confirmed acpx/runtime bug from this pass:
+  - Codex direct acpx path works
+  - Claude direct acpx path returns top-level JSON-RPC auth errors (`Authentication required`) and exits `0`
+  - `extensions/acpx/src/runtime-internals/events.ts` previously dropped that JSON-RPC error shape during prompt streaming, so OpenClaw could falsely treat the turn as successful
+- A focused upstream fix for that runtime bug now exists on `fix/subagent-wait-error-outcome` with targeted tests passing.
 
 ## Highest-priority next actions
-1. Treat the reliability fixes as live-verified on this branch:
+1. Treat the generic reliability fixes as live-verified on this branch:
    - subagent persistence/announcement proof:
      - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
      - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
@@ -37,9 +44,19 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs
      - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008`
      - final `agent` response: `finalStatus:"error"`
      - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}`
-2. Commit/push/PR the focused upstream reliability branch when ready.
-3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible now that the generic subagent outcome/agent.wait bugs are separated and fixed.
-4. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.
+2. Treat the ACP follow-up as partially closed, not fully done:
+   - live direct bundled-acpx Codex repro now works and returns `OK`
+   - live direct bundled-acpx Claude repro returns JSON-RPC auth errors with process exit `0`
+   - focused upstream fix now maps top-level JSON-RPC prompt errors into ACP runtime `type:"error"` events instead of silently dropping them
+   - targeted validation passed:
+     - `cd external/openclaw-upstream && pnpm exec vitest run extensions/acpx/src/runtime-internals/events.test.ts extensions/acpx/src/runtime.test.ts extensions/acpx/src/runtime-internals/control-errors.test.ts`
+     - result: `22` tests passed across `3` files
+3. Next, do end-to-end OpenClaw ACP validation if/when ACP is explicitly enabled here:
+   - confirm or add the needed `acp` / `acpx` config in `~/.openclaw/openclaw.json` (or equivalent current config path)
+   - run the smallest real OpenClaw ACP turn/session and confirm Claude auth failures now surface as terminal errors instead of false success
+   - only reopen the old `acpx exited with code 1/5` thread if a fresh repro appears
+4. Commit/push/PR the focused upstream reliability branch when ready.
+5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.
 
 ## Success criteria
 - Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.
diff --git a/WIP.subagent-reliability.md b/WIP.subagent-reliability.md
index b65efea..cfe68ea 100644
--- a/WIP.subagent-reliability.md
+++ b/WIP.subagent-reliability.md
@@ -22,7 +22,7 @@ Investigate and improve subagent / ACP delegation reliability, including timeout
 ## Why this file is still open
 - The broader delegation reliability task is not fully done yet.
 - Remaining follow-up work is now narrower:
-  1. ACP-specific Claude/Codex runtime failures
+  1. ACP-specific Claude/Codex runtime failures / final live OpenClaw ACP validation
   2. optional separate `/subagents log` UX cleanup
   3. push/PR the focused upstream reliability branch when desired
 
@@ -38,7 +38,8 @@ Investigate and improve subagent / ACP delegation reliability, including timeout
 
 ## Immediate baton
 - Do **not** reopen the solved `agent.wait` investigation unless a fresh repro appears.
-- If this project is resumed next, start with ACP-specific Claude/Codex runtime failures.
+- If this project is resumed next, start with **real OpenClaw ACP-path validation** of the new acpx JSON-RPC error handling (or capture a fresh Claude/Codex end-to-end repro if ACP still is not configured here).
+- Treat the historical `acpx exited with code 1/5` note as unresolved-but-unreproduced; do not spend more time on it without fresh evidence.
 - Treat `/subagents log` UX edits as a separate branch/pass so they do not muddy the reliability fix branch.
 
 ## Evidence gathered so far
@@ -141,6 +142,39 @@ Investigate and improve subagent / ACP delegation reliability, including timeout
   - subagent persistence/announcement fix: live-verified ✅
   - raw `agent.wait` semantics fix: live-verified ✅
 - Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this focused reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.
+- ACP follow-up pass on 2026-03-13 found a **new live-reproducible runtime bug** in the bundled `extensions/acpx` layer:
+  - current host state does **not** expose a global `acpx` binary on PATH, but the bundled plugin-local runtime exists and works at `~/.local/share/pnpm/.../openclaw/extensions/acpx/node_modules/.bin/acpx`
+  - current `~/.openclaw/openclaw.json` does not contain an explicit `acp` block or enabled `acpx` plugin entry, so this pass used the smallest direct runtime repro path instead of a full `sessions_spawn(runtime:"acp")` OpenClaw run
+  - live direct Codex repro now succeeds:
+    - command: bundled `acpx --format json --json-strict --timeout 15 codex exec 'reply with OK only'`
+    - result: clean JSON-RPC/session stream ending with `agent_message_chunk: "OK"`, `id:2 result:{stopReason:"end_turn"}`, process `exit=0`
+  - live direct Claude repro does **not** crash, but returns top-level JSON-RPC auth errors and still exits 0:
+    - command: bundled `acpx --format json --json-strict --timeout 20 claude exec 'reply with OK only'`
+    - stdout included:
+      - `{"jsonrpc":"2.0","id":2,"error":{"code":-32000,"message":"Authentication required"}}`
+      - `{"jsonrpc":"2.0","id":null,"error":{"code":-32000,"message":"Authentication required"}}`
+    - process `exit=0`
+  - source inspection showed `extensions/acpx/src/runtime-internals/events.ts` ignored that top-level JSON-RPC error shape during prompt streaming, so `runtime.runTurn()` could silently treat Claude auth failure as success (`done`) when no typed `error` event or non-zero exit was emitted
+- Implemented the smallest focused upstream runtime fix on branch `fix/subagent-wait-error-outcome`:
+  - `extensions/acpx/src/runtime-internals/events.ts`
+    - `toAcpxErrorEvent()` now recognizes top-level JSON-RPC `error` responses via `parseControlJsonError()`
+    - `parsePromptEventLine()` now maps those JSON-RPC errors into ACP runtime `type:"error"` events instead of dropping them
+  - regression coverage added:
+    - `extensions/acpx/src/runtime-internals/events.test.ts` — top-level JSON-RPC prompt error parsing
+    - `extensions/acpx/src/runtime-internals/test-fixtures.ts` — mock prompt path for clean-exit JSON-RPC auth error
+    - `extensions/acpx/src/runtime.test.ts` — `runTurn()` emits error and does **not** emit `done` for the Claude-style auth failure shape
+- Targeted validation for the ACP follow-up fix passed:
+  - `cd external/openclaw-upstream && pnpm exec vitest run extensions/acpx/src/runtime-internals/events.test.ts extensions/acpx/src/runtime.test.ts extensions/acpx/src/runtime-internals/control-errors.test.ts`
+  - result: `3` files passed, `22` tests passed
+- Current interpretation of the old Claude/Codex ACP bug after this pass:
+  - historical notes still say `Claude: acpx exited with code 1`, `Codex: acpx exited with code 5`
+  - those exact exit-code crashes were **not** reproduced today
+  - current live state is narrower and better understood:
+    - Codex ACP path works directly
+    - Claude ACP path currently fails for auth, and OpenClaw previously mishandled that failure shape in the acpx runtime layer
+- Remaining open ACP follow-up after this fix:
+  - validate the patched runtime through the real OpenClaw ACP path (`sessions_spawn(runtime:"acp")`) once ACP is explicitly enabled/configured here, or whenever a fresh end-to-end repro is available
+  - only reopen the historical `acpx exited with code 1/5` line if a fresh repro appears
 
 ## Constraints
 - Prefer evidence over theory.
diff --git a/memory/2026-03-13.md b/memory/2026-03-13.md
index 730b393..56dd951 100644
--- a/memory/2026-03-13.md
+++ b/memory/2026-03-13.md
@@ -88,6 +88,43 @@
   - subagent persistence/announcement fix: live-verified ✅
   - raw `agent.wait` semantics fix: live-verified ✅
 - Side note: unrelated dirty `/subagents log` UX changes in `external/openclaw-upstream` regression-passed `src/auto-reply/reply/commands.test.ts` (44 tests) but were intentionally left out-of-scope for this focused reliability pass.
+
+## ACP Claude/Codex follow-up (post-`agent.wait` fix)
+- Historical deferred task `task-20260304-211216-acp-claude-codex` still referenced old failures `Claude: acpx exited with code 1` and `Codex: acpx exited with code 5`, but those exact crashes were **not** reproduced in the latest focused pass.
+- Current host state check:
+  - `claude` installed: `/home/linuxbrew/.linuxbrew/bin/claude` (`2.1.63`)
+  - `codex` installed: `/home/linuxbrew/.linuxbrew/bin/codex` (`0.107.0`)
+  - no global `acpx` on PATH, but bundled plugin-local runtime exists at `~/.local/share/pnpm/.../openclaw/extensions/acpx/node_modules/.bin/acpx`
+  - current `~/.openclaw/openclaw.json` only showed `plugins.entries.telegram.enabled=true`; no explicit `acp` block / `acpx` plugin entry was present, so the smallest reliable repro path used the bundled `acpx` directly rather than a full OpenClaw ACP session
+- Live direct bundled-acpx repro results:
+  - Codex command:
+    - `.../acpx --format json --json-strict --timeout 15 codex exec 'reply with OK only'`
+    - result: clean JSON-RPC/session stream ended with `agent_message_chunk: "OK"`, `id:2 result:{stopReason:"end_turn"}`, process `exit=0`
+  - Claude command:
+    - `.../acpx --format json --json-strict --timeout 20 claude exec 'reply with OK only'`
+    - stdout included top-level JSON-RPC errors:
+      - `{"jsonrpc":"2.0","id":2,"error":{"code":-32000,"message":"Authentication required"}}`
+      - `{"jsonrpc":"2.0","id":null,"error":{"code":-32000,"message":"Authentication required"}}`
+    - process still exited `0`
+- Source-level finding in `external/openclaw-upstream/extensions/acpx/src/runtime-internals/events.ts`:
+  - prompt parsing handled typed `{type:"error"}` lines but dropped top-level JSON-RPC `error` responses
+  - that meant `runtime.runTurn()` could treat a Claude auth failure as success (`done`) when the agent emitted JSON-RPC errors yet exited cleanly
+- Implemented focused upstream fix on branch `fix/subagent-wait-error-outcome`:
+  - `extensions/acpx/src/runtime-internals/events.ts`
+    - `toAcpxErrorEvent()` now also recognizes top-level JSON-RPC `error` responses via `parseControlJsonError()`
+    - `parsePromptEventLine()` now emits ACP runtime `type:"error"` events for that shape instead of dropping it
+  - added regression coverage:
+    - `extensions/acpx/src/runtime-internals/events.test.ts`
+    - `extensions/acpx/src/runtime-internals/test-fixtures.ts`
+    - `extensions/acpx/src/runtime.test.ts`
+- Targeted validation passed:
+  - `cd /home/openclaw/.openclaw/workspace/external/openclaw-upstream && pnpm exec vitest run extensions/acpx/src/runtime-internals/events.test.ts extensions/acpx/src/runtime.test.ts extensions/acpx/src/runtime-internals/control-errors.test.ts`
+  - result: `22` tests passed across `3` files
+- Net status after this pass:
+  - old `acpx exited with code 1/5` reports remain historical evidence only
+  - Codex ACP direct runtime path works today
+  - Claude ACP direct runtime path currently fails for auth, and OpenClaw had a real bug in how the bundled acpx runtime parsed that failure shape
+  - remaining follow-up is end-to-end OpenClaw ACP-path validation once ACP is explicitly configured here (or if a fresh exit-code repro appears)
 - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.
 - Will explicitly reinforced on 2026-03-13 that once planning is done, zap should use subagents ASAP and start implementation in a fresh session rather than continuing to implement inside the long-lived main chat.
 - Will explicitly asked on 2026-03-13 for more frequent checks on active subagent runs; zap should inspect/steer sooner instead of waiting for long silent stretches.
diff --git a/memory/tasks.json b/memory/tasks.json
index dbe8eb4..17e9895 100644
--- a/memory/tasks.json
+++ b/memory/tasks.json
@@ -5,11 +5,14 @@
     "title": "Fix ACP runtime failures for Claude Code and Codex agents",
     "owner": "zap",
     "priority": "high",
-    "status": "open",
-    "details": "Both ACP runs failed during this session (Claude: acpx exited with code 1, Codex: acpx exited with code 5). Investigate acpx/ACP runtime failure path and restore reliable delegation for claude/codex agents.",
+    "status": "in-progress",
+    "details": "Historical evidence said Claude/Codex ACP runs failed with `acpx exited with code 1/5`. Latest focused pass narrowed the live issue: direct bundled `acpx` now shows Codex working, while Claude returns top-level JSON-RPC `Authentication required` errors and exits 0. A focused upstream fix now makes the bundled acpx runtime surface those JSON-RPC prompt errors instead of silently treating them as success. Remaining work: validate through the real OpenClaw ACP session path once ACP is explicitly configured here, or capture a fresh repro of the older exit-code crashes.",
     "notes": [
       "Reported by Will on 2026-03-04.",
-      "Added as deferred follow-up while immediate LiteLLM route fix was applied directly."
+      "Added as deferred follow-up while immediate LiteLLM route fix was applied directly.",
+      "2026-03-13 follow-up: exact historical `acpx exited with code 1/5` crashes were not reproduced. Live direct bundled-acpx repros showed Codex success and Claude top-level JSON-RPC auth errors with clean exit 0.",
+      "2026-03-13 follow-up: fixed bundled acpx prompt parsing in external/openclaw-upstream so top-level JSON-RPC error responses now emit ACP runtime error events instead of being dropped. Targeted validation passed: 22 tests across events/control-errors/runtime test files.",
+      "2026-03-13 remaining step: validate the fix through a real OpenClaw ACP session once `acp`/`acpx` is explicitly enabled in local config, or wait for a fresh end-to-end repro of the older exit-code failures."
     ]
   },
   {