diff --git a/HANDOFF.md b/HANDOFF.md index 378b894..8c126ac 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -4,7 +4,7 @@ Immediate baton-pass for the next fresh implementation session. ## Current objective -Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The failure-path proof for the new subagent outcome handling is captured, but the focused upstream `agent.wait` semantics fix on branch `fix/subagent-wait-error-outcome` did **not** hold in a fresh live source-gateway repro, so the remaining work is a narrower root-cause follow-up on the still-open live `agent.wait => ok` path. +Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw `agent.wait` semantics fix are now both live-verified on branch `fix/subagent-wait-error-outcome`; the next work should shift to follow-up cleanup (commit/push/PR when desired, ACP-specific runtime failures, and any unrelated `/subagents log` UX work only in a separate focused pass). ## Use these state files first 1. `WIP.subagent-reliability.md` — canonical state for this pass @@ -27,25 +27,19 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs - An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing. ## Highest-priority next actions -1. Treat the live `gpt-5.4` failure repro as proven for subagent persistence/announcement handling: - - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf` - - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl` - - `runs.json` now stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText` -2. Treat raw gateway `agent.wait` as still **open** despite the current follow-up fix branch. - - decisive live source-gateway repro: - - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18902 --bind loopback --auth none --allow-unconfigured` - - session key: `agent:main:subagent:agent-wait-gpt53-live-source-1773427981586` - - run id: `gwc-live-agent-wait-gpt53-source-1773427981614` - - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-1773427981614","status":"ok","endedAt":1773427984243}` - - last assistant: `provider:"openai-codex" model:"gpt-5.3-codex" stopReason:"error" errorMessage contains context_length_exceeded` - - this is the current canonical blocker evidence for the still-open live path -3. Most likely remaining gap to investigate next: - - `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded` - - `lifecycleEnded` flips true on any inner lifecycle `phase:"end"` or `phase:"error"` - - `src/gateway/server-methods/agent-job.ts` resolves/caches `phase:"end"` as terminal `status:"ok"` - - so an inner lifecycle emitter is still the likeliest place where terminal assistant/provider failures are being marked `end` too early on the live direct gateway path -4. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent outcome bug. -5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage. +1. Treat the reliability fixes as live-verified on this branch: + - subagent persistence/announcement proof: + - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf` + - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl` + - `runs.json` stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText` + - raw `agent.wait` live-fix proof: + - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured` + - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008` + - final `agent` response: `finalStatus:"error"` + - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}` +2. Commit/push/PR the focused upstream reliability branch when ready. +3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible now that the generic subagent outcome/agent.wait bugs are separated and fixed. +4. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage. ## Success criteria - Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling. diff --git a/WIP.subagent-reliability.md b/WIP.subagent-reliability.md index 57f5e06..05f9672 100644 --- a/WIP.subagent-reliability.md +++ b/WIP.subagent-reliability.md @@ -107,13 +107,34 @@ This is the highest-leverage remaining open reliability item because it affects - earlier temporary gateway runs reinforced the same mismatch: - stale dist gateway repro run `gwc-live-agent-wait-gpt53-1773427893583` also returned `status:"ok"` while transcript stopReason remained `error` - temp `gpt-5.4` session repro on the same temp gateway returned `status:"error"`, but only because that runtime reported `FailoverError: Unknown model: openai-codex/gpt-5.4`; that is useful as transport sanity, but **not** the canonical live semantics proof -- Most likely remaining code-path gap (high-confidence from source inspection): - - `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded` - - `lifecycleEnded` is set as soon as any inner lifecycle callback reports `phase:"end"` or `phase:"error"` - - `src/gateway/server-methods/agent-job.ts` immediately caches/resolves `phase:"end"` as terminal `status:"ok"` - - so if an inner embedded lifecycle emitter still reports `phase:"end"` for a run whose final assistant message later has `stopReason:"error"`, `agent.wait` will still resolve `ok` before the dedupe/result-meta rescue path matters - - likely next target: identify the inner lifecycle emitter that is still producing `phase:"end"` on this direct gateway path and either convert that event to `phase:"error"` for terminal assistant failures or make `agent.wait` prefer final dedupe/result-meta over earlier lifecycle `end` when both exist for the same run -- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch. +- The final focused live-fix pass on 2026-03-13 closed the remaining `agent.wait` bug. + - root cause confirmed: the live direct gateway path could receive an inner `agent_end` event carrying a terminal assistant error without a preceding `message_end`, which left stale/empty assistant state and still emitted lifecycle `phase:"end"` + - upstream fix extends the embedded subscribe lifecycle handler to recover the terminal assistant from `agent_end.messages` or the session transcript when state is stale, then emit lifecycle `phase:"error"` with a friendly error string instead of `end` + - upstream fix also updates the direct gateway `agent` RPC handler to observe lifecycle events for the run and derive the final RPC payload/terminal status from observed lifecycle + resolved result metadata, instead of blindly caching `status:"ok"` when the outer RPC resolves + - files changed for the final fix: + - `src/agents/pi-embedded-subscribe.e2e-harness.ts` + - `src/agents/pi-embedded-subscribe.handlers.lifecycle.ts` + - `src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts` + - `src/agents/pi-embedded-subscribe.handlers.ts` + - `src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts` + - `src/gateway/server-methods/agent.ts` + - `src/gateway/server-methods/server-methods.test.ts` +- Final targeted validation passed: + - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts src/commands/agent.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/gateway/server-methods/server-methods.test.ts` + - result: `108 tests` passed across `5` files +- Final decisive live source-gateway repro after the fix: + - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured` + - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008` + - session key: `agent:main:subagent:agent-wait-gpt53-live-source-fixed2-1773429512008` + - final `agent` response with `expectFinal: true` returned: + - `finalStatus: "error"` + - `finalSummary: "LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."` + - matching `agent.wait` returned: + - `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}` +- Net status now: + - subagent persistence/announcement fix: live-verified ✅ + - raw `agent.wait` semantics fix: live-verified ✅ +- Side assessment on unrelated dirty upstream work: the `/subagents log` UX diff in `src/auto-reply/reply/commands-subagents/action-log.ts` + `shared.ts` is logically coherent and passed `pnpm test -- --run src/auto-reply/reply/commands.test.ts` (`44 tests`), but it is still out-of-scope for this focused reliability pass because there is no dedicated coverage for the new tool-only log behavior and it would muddy the focused branch. ## Constraints - Prefer evidence over theory. diff --git a/memory/2026-03-13.md b/memory/2026-03-13.md index 25c2892..730b393 100644 --- a/memory/2026-03-13.md +++ b/memory/2026-03-13.md @@ -73,6 +73,20 @@ - Net status at end of this pass: - subagent persistence/announcement fix: live-verified - raw `agent.wait` follow-up fix: tests passed, but live source-gateway repro still failed; do not mark this closed +- Final focused live-fix pass on 2026-03-13 closed the remaining raw `agent.wait` bug. + - root cause: the live direct gateway path could receive `agent_end` carrying a terminal assistant error without a preceding `message_end`, leaving stale/empty assistant state and still emitting lifecycle `phase:"end"` + - final upstream fix taught embedded subscribe lifecycle handling to recover the terminal assistant from `agent_end.messages` / session transcript and emit lifecycle `phase:"error"`, and taught the gateway `agent` RPC handler to derive terminal status from observed lifecycle + final result metadata instead of blindly caching `ok` + - final targeted validation passed: + - `pnpm -C /home/openclaw/.openclaw/workspace/external/openclaw-upstream test -- --run src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts src/commands/agent.test.ts src/gateway/server-methods/agent-wait-dedupe.test.ts src/gateway/server-methods/server-methods.test.ts` + - result: `108 tests` passed across `5` files + - decisive live source-gateway repro after the final fix: + - gateway: source-run on port `18903` + - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008` + - final `agent` response returned `finalStatus:"error"` + - matching `agent.wait` returned `status:"error"` with the same context-window error text +- Net status now: + - subagent persistence/announcement fix: live-verified ✅ + - raw `agent.wait` semantics fix: live-verified ✅ - Side note: unrelated dirty `/subagents log` UX changes in `external/openclaw-upstream` regression-passed `src/auto-reply/reply/commands.test.ts` (44 tests) but were intentionally left out-of-scope for this focused reliability pass. - Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion. - Will explicitly reinforced on 2026-03-13 that once planning is done, zap should use subagents ASAP and start implementation in a fresh session rather than continuing to implement inside the long-lived main chat. diff --git a/memory/tasks.json b/memory/tasks.json index 0823ba6..dbe8eb4 100644 --- a/memory/tasks.json +++ b/memory/tasks.json @@ -30,7 +30,8 @@ "2026-03-13: confirmed corrected LiteLLM run was still failing (child transcript showed assistant 429/plan error for GLM-5) while runs.json incorrectly stored outcome.status=ok and frozenResultText=null; implemented upstream branch fix/subagent-wait-error-outcome to derive terminal subagent outcome from latest assistant error state, with targeted validation (50 tests passed across 3 files).", "2026-03-13 later: live gpt-5.4 success repro passed (run 23750d80-b481-4f50-b219-cc9245be405f). Live gpt-5.4 failure repro also passed for subagent persistence/announcement handling: child run b50cb91f-6219-44f7-9d2f-a1264ac7ceaf ended with transcript stopReason=error + context_length_exceeded, and runs.json now stored outcome.status=error / endedReason=subagent-error / frozenResultText non-null. Remaining open nuance: raw agent.wait for that same failed child still returned status=ok.", "2026-03-13 later: traced raw agent.wait=status:ok-on-terminal-error to an upstream bug in commands/agent.ts fallback lifecycle emission (phase:end emitted even when resolved run meta.stopReason=error). Added focused upstream fix plus dedupe-path handling/tests on branch fix/subagent-wait-error-outcome; targeted validation passed (81 tests across commands/agent.test.ts, gateway/server-methods/agent-wait-dedupe.test.ts, gateway/server-methods/server-methods.test.ts). Live verification of the new agent.wait behavior remains open.", - "2026-03-13 final live pass: a fresh source-run gateway on port 18902 still returned agent.wait status=ok for run gwc-live-agent-wait-gpt53-source-1773427981614 even though the same session's terminal assistant message had provider=openai-codex model=gpt-5.3-codex stopReason=error with context_length_exceeded. Most likely remaining gap: an inner lifecycle emitter still marks the live direct gateway path as phase:end early enough that waitForAgentJob resolves ok before dedupe/result-meta rescue logic can win." + "2026-03-13 final live pass: a fresh source-run gateway on port 18902 still returned agent.wait status=ok for run gwc-live-agent-wait-gpt53-source-1773427981614 even though the same session's terminal assistant message had provider=openai-codex model=gpt-5.3-codex stopReason=error with context_length_exceeded. Most likely remaining gap: an inner lifecycle emitter still marks the live direct gateway path as phase:end early enough that waitForAgentJob resolves ok before dedupe/result-meta rescue logic can win.", + "2026-03-13 final focused pass: closed the remaining raw agent.wait bug. Root cause was the live direct gateway path receiving agent_end with a terminal assistant error but no preceding message_end, leaving stale assistant state and still emitting lifecycle phase:end. Final fix updated embedded subscribe lifecycle handling to recover terminal assistant errors from agent_end/session state and updated gateway server-methods/agent.ts to derive final RPC status from observed lifecycle + resolved result metadata. Validation passed (108 tests across 5 files). Live source-gateway repro on port 18903 then returned finalStatus:error and agent.wait status:error for run gwc-live-agent-wait-gpt53-source-fixed2-1773429512008." ] }, { @@ -135,7 +136,7 @@ "notes": [ "Added from tool wishlist on 2026-03-11.", "Completed 2026-03-12: Gmail and Calendar are live via n8n action bus with approval gating and audit history.", - "Drive/Docs/Sheets evaluated and deferred in WIP.drive-docs-sheets.md \u2014 revisit only when concrete use cases appear.", + "Drive/Docs/Sheets evaluated and deferred in WIP.drive-docs-sheets.md — revisit only when concrete use cases appear.", "Live workflow id: Jwi54VWMdlLqYnRo.", "Evidence: WIP.md + memory/2026-03-12.md + WIP.drive-docs-sheets.md." ]