swarm-zap/HANDOFF.md

# HANDOFF.md

## Purpose
Immediate baton-pass for the next fresh implementation session.

## Current objective
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. The subagent persistence/announcement fix and the raw `agent.wait` semantics fix are now both live-verified on branch `fix/subagent-wait-error-outcome`; the next work should shift to follow-up cleanup (commit/push/PR when desired, ACP-specific runtime failures, and any unrelated `/subagents log` UX work only in a separate focused pass).

## Use these state files first
1. `WIP.subagent-reliability.md` — canonical state for this pass
2. `memory/tasks.json` — task tracking for reliability items
3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context
4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there
5. `external/openclaw-upstream/` — for any core-runtime fix work

## Related tasks
- `task-20260304-2215-subagent-reliability` — in progress
- `task-20260304-211216-acp-claude-codex` — open

## Known truths
- TUI noise suppression was already patched locally and upstreamed earlier.
- User still wants actual subagent reliability improved, not just UI noise hidden.
- Prior ACP failures included Claude/Codex runtime exits.
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
- One explicit failure mode is already understood: requesting `glm-5` can route into an unavailable GLM-5 provider/entitlement path in this setup.
- A deeper bug was also identified: a subagent run could finish with terminal assistant errors yet still be recorded as successful with no frozen result.
- An upstream patch for that error/outcome handling now exists in `external/openclaw-upstream` on branch `fix/subagent-wait-error-outcome` with targeted tests passing.

## Highest-priority next actions
1. Treat the reliability fixes as live-verified on this branch:
   - subagent persistence/announcement proof:
     - run id `b50cb91f-6219-44f7-9d2f-a1264ac7ceaf`
     - child transcript `~/.openclaw/agents/main/sessions/f114b831-000b-4070-a539-85c68d2b7057.jsonl`
     - `runs.json` stores `outcome.status: "error"`, `endedReason: "subagent-error"`, and a non-null `frozenResultText`
   - raw `agent.wait` live-fix proof:
     - gateway launch: `OPENCLAW_SKIP_CHANNELS=1 CLAWDBOT_SKIP_CHANNELS=1 pnpm exec tsx src/index.ts gateway run --port 18903 --bind loopback --auth none --allow-unconfigured`
     - run id: `gwc-live-agent-wait-gpt53-source-fixed2-1773429512008`
     - final `agent` response: `finalStatus:"error"`
     - `agent.wait`: `{"runId":"gwc-live-agent-wait-gpt53-source-fixed2-1773429512008","status":"error","endedAt":1773429514106,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again."}`
2. Commit/push/PR the focused upstream reliability branch when ready.
3. Re-check whether ACP-specific Claude/Codex runtime failures are still reproducible now that the generic subagent outcome/agent.wait bugs are separated and fixed.
4. Leave the dirty `/subagents log` UX diff out of this branch unless you intentionally spin a separate focused pass; it regression-passed `src/auto-reply/reply/commands.test.ts` but still lacks dedicated feature coverage.

## Success criteria
- Real-run verification of the new error/outcome fix. ✅ done for subagent persistence/announcement handling.
- Clear separation between resolved reporting bug(s) and any still-open ACP/runtime failures.
- Explicit decision on whether raw `agent.wait` behavior is acceptable or requires a follow-up fix.
- State files updated with paths, commands, and outcomes.