Files
swarm-zap/HANDOFF.md

55 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HANDOFF.md
## Purpose
Immediate baton-pass for the next fresh implementation session.
## Current objective
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The failure-path proof for the new subagent outcome handling is captured, but the focused upstream `agent.wait` semantics fix on branch `fix/subagent-wait-error-outcome` did **not** hold in a fresh live source-gateway repro, so the remaining work is a narrower root-cause follow-up on the still-open live `agent.wait => ok` path.
## Use these state files first
1. `WIP.subagent-reliability.md` — canonical state for this pass
2. `memory/tasks.json` — task tracking for reliability items
3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context
4. `memory/2026-03-13.md` if present, otherwise append todays evidence there
5. `external/openclaw-upstream/` — for any core-runtime fix work
## Related tasks
- `task-20260304-2215-subagent-reliability` — in progress
- `task-20260304-211216-acp-claude-codex` — open
## Known truths
- TUI noise suppression was already patched locally and upstreamed earlier.
- User still wants actual subagent reliability improved, not just UI noise hidden.
- Prior ACP failures included Claude/Codex runtime exits.
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
- One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.
## Highest-priority next actions
1. Treat the live `gpt-5.4` failure repro as proven for subagent persistence/announcement handling:
- run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
- child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
- `runs.json` now stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText`
2. Treat raw gateway `agent.wait` as still **open** despite the current follow-up fix branch.
- decisive live source-gateway repro:
- gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18902 --bind loopback --auth none --allow-unconfigured`
- session key: `agent:main:subagent:agent-wait-gpt53-live-source-1773427981586`
- run id: `gwc-live-agent-wait-gpt53-source-1773427981614`
- `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-1773427981614","status":"ok","endedAt":1773427984243}`
- last assistant: `provider:"openai-codex" model:"gpt-5.3-codex" stopReason:"error" errorMessage contains context_length_exceeded`
- this is the current canonical blocker evidence for the still-open live path
3. Most likely remaining gap to investigate next:
- `src/commands/agent.ts` only applies the new fallback correction when `!lifecycleEnded`
- `lifecycleEnded` flips true on any inner lifecycle `phase:"end"` or `phase:"error"`
- `src/gateway/server-methods/agent-job.ts` resolves/caches `phase:"end"` as terminal `status:"ok"`
- so an inner lifecycle emitter is still the likeliest place where terminal assistant/provider failures are being marked `end` too early on the live direct gateway path
4. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible after separating them from the generic subagent outcome bug.
5. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.
## Success criteria
- Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.
- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
- Explicit decision on whether raw `agent.wait` behavior is acceptable or requires a follow-up fix.
- State files updated with paths, commands, and outcomes.