docs(state): seed subagent reliability investigation

2026-03-13 00:12:43 +00:00
parent 841365e020
commit 3bb3888340
3 changed files with 95 additions and 82 deletions
@@ -4,92 +4,36 @@
 Immediate baton-pass for the next fresh implementation session.

 ## Current objective
-The Gmail + Calendar n8n action-bus WIP is complete and live. Next fresh session should review `WIP.drive-docs-sheets.md` and decide whether Drive / Docs / Sheets need action-bus verbs at all, while preserving the approval/history contract that now exists for Gmail + Calendar.
+Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. Focus on current failure modes, what is already fixed, and the highest-confidence next fix.

 ## Use these state files first
-1. `WIP.md` — completed Google Workspace + n8n implementation record
-2. `WIP.drive-docs-sheets.md` — proposed next-phase decision WIP
-3. `memory/2026-03-12.md` — detailed execution history and evidence
-4. `memory/tasks.json` — task status tracking
+1. `WIP.subagent-reliability.md` — canonical state for this pass
+2. `memory/tasks.json` — task tracking for reliability items
+3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context
+4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there
+5. `external/openclaw-upstream/` — for any core-runtime fix work

-## What is already true
- `openclaw-action` is live in n8n and active.
- Google auth via `gog` is working headlessly through local env auto-load.
- Local automation env lives in `/home/openclaw/.openclaw/credentials/gog.env` and stays out of git.
- Host bridge exists at `skills/n8n-webhook/scripts/resolve-approval-with-gog.py`.
- Real approval-routed Gmail draft and Calendar event flows have both been verified multiple times end-to-end with cleanup.
+## Related tasks
+- `task-20260304-2215-subagent-reliability` — in progress
+- `task-20260304-211216-acp-claude-codex` — open

-## Fresh-session proof completed (2026-03-12 19:44Z)
- Gmail draft flow (`send_email_draft`):
-  - approval id: `approval-mmnvn4t2-w2rjlwz2`
-  - draft id: `r-3319106208870238577`
-  - subject: `[zap n8n e2e] Gmail draft test 20260312T194450Z`
-  - verified via `gog gmail drafts get`
-  - cleaned via `gog gmail drafts delete --force`
- Calendar event flow (`create_calendar_event`):
-  - approval id: `approval-mmnvn6i8-e9eq8gdf`
-  - event id: `m7prri8vk2opuo6loq3qgtvsv4`
-  - title: `[zap n8n e2e] Calendar test 20260312T194450Z`
-  - verified via `gog calendar get primary <eventId>`
-  - cleaned via `gog calendar delete primary <eventId> --force`
-
-## Gmail pass 1 completed in this handoff cycle
- Added workflow actions:
-  - `list_email_drafts`
-  - `delete_email_draft`
-  - `send_gmail_draft` (alias: `send_approved_email`)
- Added host bridge executors:
-  - `email_list_drafts` (`gog gmail drafts list`)
-  - `email_draft_delete` (`gog gmail drafts delete`)
-  - `email_draft_send` (`gog gmail drafts send`)
- Added explicit approval metadata in workflow responses (`approval.policy`, `approval.required`, `approval.mutation_level`).
- Updated docs/test payloads/validator to match the expanded Gmail contract.
-
-## Calendar pass 2 completed in this handoff cycle
- Added workflow actions:
-  - `list_upcoming_events`
-  - `update_calendar_event`
-  - `delete_calendar_event`
- Added host bridge executors:
-  - `calendar_list_events` (`gog calendar events`)
-  - `calendar_event_update` (`gog calendar update`)
-  - `calendar_event_delete` (`gog calendar delete`)
- Preserved explicit approval policy:
-  - read-only calendar listing stays `low`
-  - mutating calendar update/delete stay `high`
- Added docs/test payloads/validator coverage for the expanded calendar contract.
+## Known truths
+- TUI noise suppression was already patched locally and upstreamed earlier.
+- User still wants actual subagent reliability improved, not just UI noise hidden.
+- Prior ACP failures included Claude/Codex runtime exits.
+- Fresh-session implementation discipline is now the expected approach for non-trivial work.

 ## Highest-priority next actions
-1. Review `WIP.drive-docs-sheets.md` and make a go / no-go call per surface: Drive, Docs, Sheets.
-2. If any new Google actions are added, keep approval defaults explicit by family (`notification`, `gmail`, `calendar`, `manual`, and any new family names).
-3. Preserve compact operator reporting (`pending_compact`, `history_compact`, `summary_line`, `result_refs`) for any new approval-backed actions.
-4. Keep the live deployment habit: after implementation, sync the live workflow and run a safe smoke test instead of trusting static validation alone.
+1. Inspect prior task/session evidence and current runtime state.
+2. Reproduce or otherwise concretely characterize present failures.
+3. Split findings into:
+   - ACP runtime issues
+   - generic subagent/session issues
+   - completion-event / delivery issues
+4. If a fix is feasible now, implement the smallest high-confidence fix and validate it.
+5. Update WIP + memory + tasks before ending.

-## Success criteria for the next session
- Clear go/no-go decision on expanding beyond Gmail + Calendar.
- Any new verbs inherit the same safe approval defaults and low-noise history contract.
- `WIP.md` and memory updated with concrete evidence.
- Meaningful commit(s) captured.
-
-## Relevant files
- `WIP.md`
- `HANDOFF.md`
- `skills/n8n-webhook/assets/openclaw-action.workflow.json`
- `skills/n8n-webhook/scripts/call-action.sh`
- `skills/n8n-webhook/scripts/resolve-approval-with-gog.py`
- `skills/n8n-webhook/references/openclaw-action.md`
- `memory/2026-03-12.md`
- `memory/tasks.json`
- `/home/openclaw/.openclaw/credentials/gog.env` (local-only)
-
-## Relevant branch / commits
- branch: `feat/n8n-action-bus-v2`
- latest checkpoints before this handoff include:
-  - `ffe7a6b` — add operator approval runbook
-  - `249e671` — add compact approval history views
-  - `afa48a3` — bridge approvals to gog executors
-  - `044e36f` — auto-load local gog automation env
-  - `06fa582` — track google workspace and n8n plan
-
-## Operator note
-Use the live n8n public API/webhook surface directly when it is the right path. Do not act blocked on n8n API access.
+## Success criteria
+- Clear current-state diagnosis.
+- Evidence-backed fix or sharply scoped next fix plan.
+- State files updated with paths, commands, and outcomes.
@@ -0,0 +1,54 @@
+# WIP.subagent-reliability.md
+
+## Status
+Status: `open`
+Owner: `zap`
+Opened: `2026-03-13`
+
+## Purpose
+Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
+
+## Why now
+This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs.
+
+## Related tasks
+- `task-20260304-2215-subagent-reliability` — in progress
+- `task-20260304-211216-acp-claude-codex` — open
+
+## Known context
+- Prior work already patched TUI formatting to suppress internal runtime completion context blocks.
+- Upstream patch exists in `external/openclaw-upstream` on branch `fix/tui-hide-internal-runtime-context` commit `0f66a4547`.
+- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
+- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
+
+## Goals for this pass
+1. Establish the current failure modes with concrete evidence.
+2. Separate ACP-specific failures from generic subagent/session issues.
+3. Determine what is already fixed versus still broken.
+4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix.
+5. Update task/memory state with evidence before ending.
+
+## Suggested investigation plan
+1. Review current OpenClaw docs and local memory around subagent/ACP failures.
+2. Reproduce or inspect recent failures using session/task evidence instead of guessing.
+3. Check current runtime status / relevant logs / known local patches.
+4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch.
+5. Validate with the smallest reliable reproduction possible.
+
+## Evidence gathered so far
+- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution.
+- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured.
+- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents.
+- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes.
+
+## Constraints
+- Prefer evidence over theory.
+- Do not claim a fix without concrete validation.
+- Keep the main session clean; use this file as the canonical baton.
+
+## Success criteria
+- Clear diagnosis of the current reliability problem(s).
+- At least one of:
+  - implemented fix with validation, or
+  - sharply scoped next fix plan with exact evidence and files.
+- `memory/2026-03-13.md` (or current daily note), `memory/tasks.json`, and this WIP updated.
@@ -0,0 +1,15 @@
+# 2026-03-13
+
+## Subagent reliability investigation
+- Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work.
+- Failure mode: delegated run was spawned with model `glm-5`, which resolved to provider model `zai/glm-5`.
+- Current agent auth profiles across installed agents include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` auth profile configured in agent auth stores.
+- Verified by inspecting agent auth profile keys under:
+  - `/home/openclaw/.openclaw/agents/*/agent/auth-profiles.json`
+- Relevant OpenClaw docs confirm:
+  - subagent spawns inherit caller model when `sessions_spawn.model` is omitted
+  - provider/model auth errors like `No API key found for provider "zai"` occur when a provider model is selected without matching auth
+  - multi-agent auth is per-agent via `~/.openclaw/agents/<agentId>/agent/auth-profiles.json`
+- Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
+- Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup).
+- Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability.