diff --git a/HANDOFF.md b/HANDOFF.md index 699dd41..f065abb 100644 --- a/HANDOFF.md +++ b/HANDOFF.md @@ -4,92 +4,36 @@ Immediate baton-pass for the next fresh implementation session. ## Current objective -The Gmail + Calendar n8n action-bus WIP is complete and live. Next fresh session should review `WIP.drive-docs-sheets.md` and decide whether Drive / Docs / Sheets need action-bus verbs at all, while preserving the approval/history contract that now exists for Gmail + Calendar. +Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. Focus on current failure modes, what is already fixed, and the highest-confidence next fix. ## Use these state files first -1. `WIP.md` — completed Google Workspace + n8n implementation record -2. `WIP.drive-docs-sheets.md` — proposed next-phase decision WIP -3. `memory/2026-03-12.md` — detailed execution history and evidence -4. `memory/tasks.json` — task status tracking +1. `WIP.subagent-reliability.md` — canonical state for this pass +2. `memory/tasks.json` — task tracking for reliability items +3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context +4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there +5. `external/openclaw-upstream/` — for any core-runtime fix work -## What is already true -- `openclaw-action` is live in n8n and active. -- Google auth via `gog` is working headlessly through local env auto-load. -- Local automation env lives in `/home/openclaw/.openclaw/credentials/gog.env` and stays out of git. -- Host bridge exists at `skills/n8n-webhook/scripts/resolve-approval-with-gog.py`. -- Real approval-routed Gmail draft and Calendar event flows have both been verified multiple times end-to-end with cleanup. +## Related tasks +- `task-20260304-2215-subagent-reliability` — in progress +- `task-20260304-211216-acp-claude-codex` — open -## Fresh-session proof completed (2026-03-12 19:44Z) -- Gmail draft flow (`send_email_draft`): - - approval id: `approval-mmnvn4t2-w2rjlwz2` - - draft id: `r-3319106208870238577` - - subject: `[zap n8n e2e] Gmail draft test 20260312T194450Z` - - verified via `gog gmail drafts get` - - cleaned via `gog gmail drafts delete --force` -- Calendar event flow (`create_calendar_event`): - - approval id: `approval-mmnvn6i8-e9eq8gdf` - - event id: `m7prri8vk2opuo6loq3qgtvsv4` - - title: `[zap n8n e2e] Calendar test 20260312T194450Z` - - verified via `gog calendar get primary ` - - cleaned via `gog calendar delete primary --force` - -## Gmail pass 1 completed in this handoff cycle -- Added workflow actions: - - `list_email_drafts` - - `delete_email_draft` - - `send_gmail_draft` (alias: `send_approved_email`) -- Added host bridge executors: - - `email_list_drafts` (`gog gmail drafts list`) - - `email_draft_delete` (`gog gmail drafts delete`) - - `email_draft_send` (`gog gmail drafts send`) -- Added explicit approval metadata in workflow responses (`approval.policy`, `approval.required`, `approval.mutation_level`). -- Updated docs/test payloads/validator to match the expanded Gmail contract. - -## Calendar pass 2 completed in this handoff cycle -- Added workflow actions: - - `list_upcoming_events` - - `update_calendar_event` - - `delete_calendar_event` -- Added host bridge executors: - - `calendar_list_events` (`gog calendar events`) - - `calendar_event_update` (`gog calendar update`) - - `calendar_event_delete` (`gog calendar delete`) -- Preserved explicit approval policy: - - read-only calendar listing stays `low` - - mutating calendar update/delete stay `high` -- Added docs/test payloads/validator coverage for the expanded calendar contract. +## Known truths +- TUI noise suppression was already patched locally and upstreamed earlier. +- User still wants actual subagent reliability improved, not just UI noise hidden. +- Prior ACP failures included Claude/Codex runtime exits. +- Fresh-session implementation discipline is now the expected approach for non-trivial work. ## Highest-priority next actions -1. Review `WIP.drive-docs-sheets.md` and make a go / no-go call per surface: Drive, Docs, Sheets. -2. If any new Google actions are added, keep approval defaults explicit by family (`notification`, `gmail`, `calendar`, `manual`, and any new family names). -3. Preserve compact operator reporting (`pending_compact`, `history_compact`, `summary_line`, `result_refs`) for any new approval-backed actions. -4. Keep the live deployment habit: after implementation, sync the live workflow and run a safe smoke test instead of trusting static validation alone. +1. Inspect prior task/session evidence and current runtime state. +2. Reproduce or otherwise concretely characterize present failures. +3. Split findings into: + - ACP runtime issues + - generic subagent/session issues + - completion-event / delivery issues +4. If a fix is feasible now, implement the smallest high-confidence fix and validate it. +5. Update WIP + memory + tasks before ending. -## Success criteria for the next session -- Clear go/no-go decision on expanding beyond Gmail + Calendar. -- Any new verbs inherit the same safe approval defaults and low-noise history contract. -- `WIP.md` and memory updated with concrete evidence. -- Meaningful commit(s) captured. - -## Relevant files -- `WIP.md` -- `HANDOFF.md` -- `skills/n8n-webhook/assets/openclaw-action.workflow.json` -- `skills/n8n-webhook/scripts/call-action.sh` -- `skills/n8n-webhook/scripts/resolve-approval-with-gog.py` -- `skills/n8n-webhook/references/openclaw-action.md` -- `memory/2026-03-12.md` -- `memory/tasks.json` -- `/home/openclaw/.openclaw/credentials/gog.env` (local-only) - -## Relevant branch / commits -- branch: `feat/n8n-action-bus-v2` -- latest checkpoints before this handoff include: - - `ffe7a6b` — add operator approval runbook - - `249e671` — add compact approval history views - - `afa48a3` — bridge approvals to gog executors - - `044e36f` — auto-load local gog automation env - - `06fa582` — track google workspace and n8n plan - -## Operator note -Use the live n8n public API/webhook surface directly when it is the right path. Do not act blocked on n8n API access. \ No newline at end of file +## Success criteria +- Clear current-state diagnosis. +- Evidence-backed fix or sharply scoped next fix plan. +- State files updated with paths, commands, and outcomes. diff --git a/WIP.subagent-reliability.md b/WIP.subagent-reliability.md new file mode 100644 index 0000000..2aac8d0 --- /dev/null +++ b/WIP.subagent-reliability.md @@ -0,0 +1,54 @@ +# WIP.subagent-reliability.md + +## Status +Status: `open` +Owner: `zap` +Opened: `2026-03-13` + +## Purpose +Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise. + +## Why now +This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs. + +## Related tasks +- `task-20260304-2215-subagent-reliability` — in progress +- `task-20260304-211216-acp-claude-codex` — open + +## Known context +- Prior work already patched TUI formatting to suppress internal runtime completion context blocks. +- Upstream patch exists in `external/openclaw-upstream` on branch `fix/tui-hide-internal-runtime-context` commit `0f66a4547`. +- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented. +- Fresh-session implementation discipline and monitoring thresholds were already documented locally. + +## Goals for this pass +1. Establish the current failure modes with concrete evidence. +2. Separate ACP-specific failures from generic subagent/session issues. +3. Determine what is already fixed versus still broken. +4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix. +5. Update task/memory state with evidence before ending. + +## Suggested investigation plan +1. Review current OpenClaw docs and local memory around subagent/ACP failures. +2. Reproduce or inspect recent failures using session/task evidence instead of guessing. +3. Check current runtime status / relevant logs / known local patches. +4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch. +5. Validate with the smallest reliable reproduction possible. + +## Evidence gathered so far +- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution. +- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured. +- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents. +- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes. + +## Constraints +- Prefer evidence over theory. +- Do not claim a fix without concrete validation. +- Keep the main session clean; use this file as the canonical baton. + +## Success criteria +- Clear diagnosis of the current reliability problem(s). +- At least one of: + - implemented fix with validation, or + - sharply scoped next fix plan with exact evidence and files. +- `memory/2026-03-13.md` (or current daily note), `memory/tasks.json`, and this WIP updated. diff --git a/memory/2026-03-13.md b/memory/2026-03-13.md new file mode 100644 index 0000000..8bd089d --- /dev/null +++ b/memory/2026-03-13.md @@ -0,0 +1,15 @@ +# 2026-03-13 + +## Subagent reliability investigation +- Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work. +- Failure mode: delegated run was spawned with model `glm-5`, which resolved to provider model `zai/glm-5`. +- Current agent auth profiles across installed agents include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` auth profile configured in agent auth stores. +- Verified by inspecting agent auth profile keys under: + - `/home/openclaw/.openclaw/agents/*/agent/auth-profiles.json` +- Relevant OpenClaw docs confirm: + - subagent spawns inherit caller model when `sessions_spawn.model` is omitted + - provider/model auth errors like `No API key found for provider "zai"` occur when a provider model is selected without matching auth + - multi-agent auth is per-agent via `~/.openclaw/agents//agent/auth-profiles.json` +- Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents. +- Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup). +- Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability.