55 lines
4.2 KiB
Markdown
55 lines
4.2 KiB
Markdown
# HANDOFF.md
|
||
|
||
## Purpose
|
||
Immediate baton-pass for the next fresh implementation session.
|
||
|
||
## Current objective
|
||
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The failure-path proof for the new subagent outcome handling is captured, but the focused upstream `agent.wait` semantics fix on branch `fix/subagent-wait-error-outcome` did **not** hold in a fresh live source-gateway repro, so the remaining work is a narrower root-cause follow-up on the still-open live `agent.wait => ok` path.
|
||
|
||
## Use these state files first
|
||
1. `WIP.subagent-reliability.md` — canonical state for this pass
|
||
2. `memory/tasks.json` — task tracking for reliability items
|
||
3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context
|
||
4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there
|
||
5. `external/openclaw-upstream/` — for any core-runtime fix work
|
||
|
||
## Related tasks
|
||
- `task-20260304-2215-subagent-reliability` — in progress
|
||
- `task-20260304-211216-acp-claude-codex` — open
|
||
|
||
## Known truths
|
||
- TUI noise suppression was already patched locally and upstreamed earlier.
|
||
- User still wants actual subagent reliability improved, not just UI noise hidden.
|
||
- Prior ACP failures included Claude/Codex runtime exits.
|
||
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
|
||
- One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
|
||
- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
|
||
- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.
|
||
|
||
## Highest-priority next actions
|
||
1. Treat the live `gpt-5.4` failure repro as proven for subagent persistence/announcement handling:
|
||
- run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
|
||
- child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
|
||
- `runs.json` now stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText`
|
||
2. Treat raw gateway `agent.wait` as still **open** despite the current follow-up fix branch.
|
||
- decisive live source-gateway repro:
|
||
- gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18902 --bind loopback --auth none --allow-unconfigured`
|
||
- session key: `agent:main:subagent:agent-wait-gpt53-live-source-1773427981586`
|
||
- run id: `gwc-live-agent-wait-gpt53-source-1773427981614`
|
||
- `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-1773427981614","status":"ok","endedAt":1773427984243}`
|
||
- last assistant: `provider:"openai-codex" model:"gpt-5.3-codex" stopReason:"error" errorMessage contains context_length_exceeded`
|
||
- this is the current canonical blocker evidence for the still-open live path
|
||
3. Most likely remaining gap to investigate next:
|
||
- `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded`
|
||
- `lifecycleEnded` flips true on any inner lifecycle `phase:"end"` or `phase:"error"`
|
||
- `src/gateway/server-methods/agent-job.ts` resolves/caches `phase:"end"` as terminal `status:"ok"`
|
||
- so an inner lifecycle emitter is still the likeliest place where terminal assistant/provider failures are being marked `end` too early on the live direct gateway path
|
||
4. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent outcome bug.
|
||
5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.
|
||
|
||
## Success criteria
|
||
- Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.
|
||
- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
|
||
- Explicit decision on whether raw `agent.wait` behavior is acceptable or requires a follow-up fix.
|
||
- State files updated with paths, commands, and outcomes.
|