docs(wip): record success probe and next failure-path pass
This commit is contained in:
22
HANDOFF.md
22
HANDOFF.md
@@ -4,7 +4,7 @@
|
|||||||
Immediate baton-pass for the next fresh implementation session.
|
Immediate baton-pass for the next fresh implementation session.
|
||||||
|
|
||||||
## Current objective
|
## Current objective
|
||||||
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. Focus on current failure modes, what is already fixed, and the highest-confidence next fix.
|
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The current target is to verify the newly landed upstream fix for subagent error/outcome handling and then continue on any remaining real runtime failures.
|
||||||
|
|
||||||
## Use these state files first
|
## Use these state files first
|
||||||
1. `WIP.subagent-reliability.md` — canonical state for this pass
|
1. `WIP.subagent-reliability.md` — canonical state for this pass
|
||||||
@@ -22,18 +22,20 @@ Investigate and improve subagent / ACP delegation reliability with evidence-firs
|
|||||||
- User still wants actual subagent reliability improved, not just UI noise hidden.
|
- User still wants actual subagent reliability improved, not just UI noise hidden.
|
||||||
- Prior ACP failures included Claude/Codex runtime exits.
|
- Prior ACP failures included Claude/Codex runtime exits.
|
||||||
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
|
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
|
||||||
|
- One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
|
||||||
|
- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
|
||||||
|
- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.
|
||||||
|
|
||||||
## Highest-priority next actions
|
## Highest-priority next actions
|
||||||
1. Inspect prior task/session evidence and current runtime state.
|
1. The success side is now verified on a real fresh `gpt-5.4` subagent run.
|
||||||
2. Reproduce or otherwise concretely characterize present failures.
|
2. Find and execute the smallest safe controlled-failure repro on a valid model/runtime (`gpt-5.4` preferred) so we can confirm:
|
||||||
3. Split findings into:
|
- a failing child run is stored as `error` rather than `ok`
|
||||||
- ACP runtime issues
|
- a successful child run stores a useful frozen result / announcement payload
|
||||||
- generic subagent/session issues
|
3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent reporting bug.
|
||||||
- completion-event / delivery issues
|
4. If another core bug appears, continue in `external/openclaw-upstream/` on a focused branch with targeted validation.
|
||||||
4. If a fix is feasible now, implement the smallest high-confidence fix and validate it.
|
|
||||||
5. Update WIP + memory + tasks before ending.
|
5. Update WIP + memory + tasks before ending.
|
||||||
|
|
||||||
## Success criteria
|
## Success criteria
|
||||||
- Clear current-state diagnosis.
|
- Real-run verification of the new error/outcome fix.
|
||||||
- Evidence-backed fix or sharply scoped next fix plan.
|
- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
|
||||||
- State files updated with paths, commands, and outcomes.
|
- State files updated with paths, commands, and outcomes.
|
||||||
|
|||||||
@@ -49,7 +49,20 @@ This is the highest-leverage remaining open reliability item because it affects
|
|||||||
- `pnpm -C external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
|
- `pnpm -C external/openclaw-upstream test -- --run src/agents/tools/sessions-helpers.terminal-text.test.ts src/agents/subagent-registry.persistence.test.ts src/gateway/server-methods/server-methods.test.ts`
|
||||||
- result: passed (`50 tests` across `3` files).
|
- result: passed (`50 tests` across `3` files).
|
||||||
- E2E-style `subagent-announce.format.e2e.test.ts` coverage was updated but the normal Vitest include rules exclude `*.e2e.test.ts`; direct `pnpm test -- --run ...e2e...` confirms exclusion rather than executing that file.
|
- E2E-style `subagent-announce.format.e2e.test.ts` coverage was updated but the normal Vitest include rules exclude `*.e2e.test.ts`; direct `pnpm test -- --run ...e2e...` confirms exclusion rather than executing that file.
|
||||||
- Next step after this patch: rerun a real subagent with a known-working model (`gpt-5.4` or another actually entitled model) and confirm `runs.json` stores `error` on terminal assistant failure and a useful frozen result on success.
|
- Tried to take over live verification directly in the main session on 2026-03-13:
|
||||||
|
- confirmed upstream branch `fix/subagent-wait-error-outcome` is present with commit `2a2ed0d6f`
|
||||||
|
- confirmed normal packaged gateway was healthy before attempting runtime verification
|
||||||
|
- first direct hot-swap attempt was interrupted at gateway stop time; systemd restored the packaged gateway cleanly
|
||||||
|
- no patched upstream gateway was left running after that attempt
|
||||||
|
- Current state: upstream patch + targeted tests are real.
|
||||||
|
- Real subagent success verification now completed on `gpt-5.4`:
|
||||||
|
- run id: `23750d80-b481-4f50-b219-cc9245be405f`
|
||||||
|
- child session: `agent:main:subagent:ad2cc776-2527-4078-ab83-0220dbd09509`
|
||||||
|
- result: successful completion with a real final child result (`SUCCESS-PROBE-OK`)
|
||||||
|
- A later GLM-5 probe was invalid for entitlement reasons and was terminated; it should not be treated as the canonical failure-path verification.
|
||||||
|
- killed/failed run id: `4965775c-4764-41e9-a77a-692f1ab4c2fd`
|
||||||
|
- Remaining gap: we still need a controlled failure-path verification on a valid model/runtime so we can confirm failed child runs persist/announce as `error` rather than fake `ok`.
|
||||||
|
- Next step: continue in a fresh `gpt-5.4` subagent session, find the smallest safe controlled-failure repro that does not depend on unavailable GLM-5 access, run it, and update WIP/HANDOFF with exact evidence.
|
||||||
|
|
||||||
## Constraints
|
## Constraints
|
||||||
- Prefer evidence over theory.
|
- Prefer evidence over theory.
|
||||||
|
|||||||
Reference in New Issue
Block a user