docs(reliability): record acpx follow-up evidence

2026-03-13 20:10:40 +00:00
parent 3cfa7a158c
commit d08d8fe661
4 changed files with 104 additions and 13 deletions
--- a/WIP.subagent-reliability.md
+++ b/WIP.subagent-reliability.md
@@ -22,7 +22,7 @@ Investigate and improve subagent / ACP delegation reliability, including timeout
 ## Why this file is still open
 - The broader delegation reliability task is not fully done yet.
 - Remaining follow-up work is now narrower:
-  1. ACP-specific Claude/Codex runtime failures
+  1. ACP-specific Claude/Codex runtime failures / final live OpenClaw ACP validation
  2. optional separate `/subagents log` UX cleanup
  3. push/PR the focused upstream reliability branch when desired

@@ -38,7 +38,8 @@ Investigate and improve subagent / ACP delegation reliability, including timeout

 ## Immediate baton
 - Do **not** reopen the solved `agent.wait` investigation unless a fresh repro appears.
- If this project is resumed next, start with ACP-specific Claude/Codex runtime failures.
+- If this project is resumed next, start with **real OpenClaw ACP-path validation** of the new acpx JSON-RPC error handling (or capture a fresh Claude/Codex end-to-end repro if ACP still is not configured here).
+- Treat the historical `acpx exited with code 1/5` note as unresolved-but-unreproduced; do not spend more time on it without fresh evidence.
 - Treat `/subagents log` UX edits as a separate branch/pass so they do not muddy the reliability fix branch.

 ## Evidence gathered so far
@@ -141,6 +142,39 @@ Investigate and improve subagent / ACP delegation reliability, including timeout
  - subagent persistence/announcement fix: live-verified ✅
  - raw `agent.wait` semantics fix: live-verified ✅
 - Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this focused reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch.
+- ACP follow-up pass on 2026-03-13 found a **new live-reproducible runtime bug** in the bundled `extensions/acpx` layer:
+  - current host state does **not** expose a global `acpx` binary on PATH, but the bundled plugin-local runtime exists and works at `~/.local/share/pnpm/.../openclaw/extensions/acpx/node_modules/.bin/acpx`
+  - current `~/.openclaw/openclaw.json` does not contain an explicit `acp` block or enabled `acpx` plugin entry, so this pass used the smallest direct runtime repro path instead of a full `sessions_spawn(runtime:"acp")` OpenClaw run
+  - live direct Codex repro now succeeds:
+    - command: bundled `acpx --format json --json-strict --timeout 15 codex exec 'reply with OK only'`
+    - result: clean JSON-RPC/session stream ending with `agent_message_chunk: "OK"`, `id:2 result:{stopReason:"end_turn"}`, process `exit=0`
+  - live direct Claude repro does **not** crash, but returns top-level JSON-RPC auth errors and still exits 0:
+    - command: bundled `acpx --format json --json-strict --timeout 20 claude exec 'reply with OK only'`
+    - stdout included:
+      - `{"jsonrpc":"2.0","id":2,"error":{"code":-32000,"message":"Authentication required"}}`
+      - `{"jsonrpc":"2.0","id":null,"error":{"code":-32000,"message":"Authentication required"}}`
+    - process `exit=0`
+  - source inspection showed `extensions/acpx/src/runtime-internals/events.ts` ignored that top-level JSON-RPC error shape during prompt streaming, so `runtime.runTurn()` could silently treat Claude auth failure as success (`done`) when no typed `error` event or non-zero exit was emitted
+- Implemented the smallest focused upstream runtime fix on branch `fix/subagent-wait-error-outcome`:
+  - `extensions/acpx/src/runtime-internals/events.ts`
+    - `toAcpxErrorEvent()` now recognizes top-level JSON-RPC `error` responses via `parseControlJsonError()`
+    - `parsePromptEventLine()` now maps those JSON-RPC errors into ACP runtime `type:"error"` events instead of dropping them
+  - regression coverage added:
+    - `extensions/acpx/src/runtime-internals/events.test.ts` — top-level JSON-RPC prompt error parsing
+    - `extensions/acpx/src/runtime-internals/test-fixtures.ts` — mock prompt path for clean-exit JSON-RPC auth error
+    - `extensions/acpx/src/runtime.test.ts` — `runTurn()` emits error and does **not** emit `done` for the Claude-style auth failure shape
+- Targeted validation for the ACP follow-up fix passed:
+  - `cd external/openclaw-upstream && pnpm exec vitest run extensions/acpx/src/runtime-internals/events.test.ts extensions/acpx/src/runtime.test.ts extensions/acpx/src/runtime-internals/control-errors.test.ts`
+  - result: `3` files passed, `22` tests passed
+- Current interpretation of the old Claude/Codex ACP bug after this pass:
+  - historical notes still say `Claude: acpx exited with code 1`, `Codex: acpx exited with code 5`
+  - those exact exit-code crashes were **not** reproduced today
+  - current live state is narrower and better understood:
+    - Codex ACP path works directly
+    - Claude ACP path currently fails for auth, and OpenClaw previously mishandled that failure shape in the acpx runtime layer
+- Remaining open ACP follow-up after this fix:
+  - validate the patched runtime through the real OpenClaw ACP path (`sessions_spawn(runtime:"acp")`) once ACP is explicitly enabled/configured here, or whenever a fresh end-to-end repro is available
+  - only reopen the historical `acpx exited with code 1/5` line if a fresh repro appears

 ## Constraints
 - Prefer evidence over theory.