docs(state): seed subagent reliability investigation
This commit is contained in:
108
HANDOFF.md
108
HANDOFF.md
@@ -4,92 +4,36 @@
|
||||
Immediate baton-pass for the next fresh implementation session.
|
||||
|
||||
## Current objective
|
||||
The Gmail + Calendar n8n action-bus WIP is complete and live. Next fresh session should review `WIP.drive-docs-sheets.md` and decide whether Drive / Docs / Sheets need action-bus verbs at all, while preserving the approval/history contract that now exists for Gmail + Calendar.
|
||||
Investigate and improve subagent / ACP delegation reliability with evidence-first debugging. Focus on current failure modes, what is already fixed, and the highest-confidence next fix.
|
||||
|
||||
## Use these state files first
|
||||
1. `WIP.md` — completed Google Workspace + n8n implementation record
|
||||
2. `WIP.drive-docs-sheets.md` — proposed next-phase decision WIP
|
||||
3. `memory/2026-03-12.md` — detailed execution history and evidence
|
||||
4. `memory/tasks.json` — task status tracking
|
||||
1. `WIP.subagent-reliability.md` — canonical state for this pass
|
||||
2. `memory/tasks.json` — task tracking for reliability items
|
||||
3. `memory/2026-03-04-subagent-delegation.md` — earlier delegation context
|
||||
4. `memory/2026-03-13.md` if present, otherwise append today’s evidence there
|
||||
5. `external/openclaw-upstream/` — for any core-runtime fix work
|
||||
|
||||
## What is already true
|
||||
- `openclaw-action` is live in n8n and active.
|
||||
- Google auth via `gog` is working headlessly through local env auto-load.
|
||||
- Local automation env lives in `/home/openclaw/.openclaw/credentials/gog.env` and stays out of git.
|
||||
- Host bridge exists at `skills/n8n-webhook/scripts/resolve-approval-with-gog.py`.
|
||||
- Real approval-routed Gmail draft and Calendar event flows have both been verified multiple times end-to-end with cleanup.
|
||||
## Related tasks
|
||||
- `task-20260304-2215-subagent-reliability` — in progress
|
||||
- `task-20260304-211216-acp-claude-codex` — open
|
||||
|
||||
## Fresh-session proof completed (2026-03-12 19:44Z)
|
||||
- Gmail draft flow (`send_email_draft`):
|
||||
- approval id: `approval-mmnvn4t2-w2rjlwz2`
|
||||
- draft id: `r-3319106208870238577`
|
||||
- subject: `[zap n8n e2e] Gmail draft test 20260312T194450Z`
|
||||
- verified via `gog gmail drafts get`
|
||||
- cleaned via `gog gmail drafts delete --force`
|
||||
- Calendar event flow (`create_calendar_event`):
|
||||
- approval id: `approval-mmnvn6i8-e9eq8gdf`
|
||||
- event id: `m7prri8vk2opuo6loq3qgtvsv4`
|
||||
- title: `[zap n8n e2e] Calendar test 20260312T194450Z`
|
||||
- verified via `gog calendar get primary <eventId>`
|
||||
- cleaned via `gog calendar delete primary <eventId> --force`
|
||||
|
||||
## Gmail pass 1 completed in this handoff cycle
|
||||
- Added workflow actions:
|
||||
- `list_email_drafts`
|
||||
- `delete_email_draft`
|
||||
- `send_gmail_draft` (alias: `send_approved_email`)
|
||||
- Added host bridge executors:
|
||||
- `email_list_drafts` (`gog gmail drafts list`)
|
||||
- `email_draft_delete` (`gog gmail drafts delete`)
|
||||
- `email_draft_send` (`gog gmail drafts send`)
|
||||
- Added explicit approval metadata in workflow responses (`approval.policy`, `approval.required`, `approval.mutation_level`).
|
||||
- Updated docs/test payloads/validator to match the expanded Gmail contract.
|
||||
|
||||
## Calendar pass 2 completed in this handoff cycle
|
||||
- Added workflow actions:
|
||||
- `list_upcoming_events`
|
||||
- `update_calendar_event`
|
||||
- `delete_calendar_event`
|
||||
- Added host bridge executors:
|
||||
- `calendar_list_events` (`gog calendar events`)
|
||||
- `calendar_event_update` (`gog calendar update`)
|
||||
- `calendar_event_delete` (`gog calendar delete`)
|
||||
- Preserved explicit approval policy:
|
||||
- read-only calendar listing stays `low`
|
||||
- mutating calendar update/delete stay `high`
|
||||
- Added docs/test payloads/validator coverage for the expanded calendar contract.
|
||||
## Known truths
|
||||
- TUI noise suppression was already patched locally and upstreamed earlier.
|
||||
- User still wants actual subagent reliability improved, not just UI noise hidden.
|
||||
- Prior ACP failures included Claude/Codex runtime exits.
|
||||
- Fresh-session implementation discipline is now the expected approach for non-trivial work.
|
||||
|
||||
## Highest-priority next actions
|
||||
1. Review `WIP.drive-docs-sheets.md` and make a go / no-go call per surface: Drive, Docs, Sheets.
|
||||
2. If any new Google actions are added, keep approval defaults explicit by family (`notification`, `gmail`, `calendar`, `manual`, and any new family names).
|
||||
3. Preserve compact operator reporting (`pending_compact`, `history_compact`, `summary_line`, `result_refs`) for any new approval-backed actions.
|
||||
4. Keep the live deployment habit: after implementation, sync the live workflow and run a safe smoke test instead of trusting static validation alone.
|
||||
1. Inspect prior task/session evidence and current runtime state.
|
||||
2. Reproduce or otherwise concretely characterize present failures.
|
||||
3. Split findings into:
|
||||
- ACP runtime issues
|
||||
- generic subagent/session issues
|
||||
- completion-event / delivery issues
|
||||
4. If a fix is feasible now, implement the smallest high-confidence fix and validate it.
|
||||
5. Update WIP + memory + tasks before ending.
|
||||
|
||||
## Success criteria for the next session
|
||||
- Clear go/no-go decision on expanding beyond Gmail + Calendar.
|
||||
- Any new verbs inherit the same safe approval defaults and low-noise history contract.
|
||||
- `WIP.md` and memory updated with concrete evidence.
|
||||
- Meaningful commit(s) captured.
|
||||
|
||||
## Relevant files
|
||||
- `WIP.md`
|
||||
- `HANDOFF.md`
|
||||
- `skills/n8n-webhook/assets/openclaw-action.workflow.json`
|
||||
- `skills/n8n-webhook/scripts/call-action.sh`
|
||||
- `skills/n8n-webhook/scripts/resolve-approval-with-gog.py`
|
||||
- `skills/n8n-webhook/references/openclaw-action.md`
|
||||
- `memory/2026-03-12.md`
|
||||
- `memory/tasks.json`
|
||||
- `/home/openclaw/.openclaw/credentials/gog.env` (local-only)
|
||||
|
||||
## Relevant branch / commits
|
||||
- branch: `feat/n8n-action-bus-v2`
|
||||
- latest checkpoints before this handoff include:
|
||||
- `ffe7a6b` — add operator approval runbook
|
||||
- `249e671` — add compact approval history views
|
||||
- `afa48a3` — bridge approvals to gog executors
|
||||
- `044e36f` — auto-load local gog automation env
|
||||
- `06fa582` — track google workspace and n8n plan
|
||||
|
||||
## Operator note
|
||||
Use the live n8n public API/webhook surface directly when it is the right path. Do not act blocked on n8n API access.
|
||||
## Success criteria
|
||||
- Clear current-state diagnosis.
|
||||
- Evidence-backed fix or sharply scoped next fix plan.
|
||||
- State files updated with paths, commands, and outcomes.
|
||||
|
||||
54
WIP.subagent-reliability.md
Normal file
54
WIP.subagent-reliability.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# WIP.subagent-reliability.md
|
||||
|
||||
## Status
|
||||
Status: `open`
|
||||
Owner: `zap`
|
||||
Opened: `2026-03-13`
|
||||
|
||||
## Purpose
|
||||
Investigate and improve subagent / ACP delegation reliability, including timeout behavior, runtime failures, and delayed/duplicate completion-event noise.
|
||||
|
||||
## Why now
|
||||
This is the highest-leverage remaining open reliability item because it affects trust in delegation and the usability of fresh implementation runs.
|
||||
|
||||
## Related tasks
|
||||
- `task-20260304-2215-subagent-reliability` — in progress
|
||||
- `task-20260304-211216-acp-claude-codex` — open
|
||||
|
||||
## Known context
|
||||
- Prior work already patched TUI formatting to suppress internal runtime completion context blocks.
|
||||
- Upstream patch exists in `external/openclaw-upstream` on branch `fix/tui-hide-internal-runtime-context` commit `0f66a4547`.
|
||||
- User explicitly wants subagent tooling reliability fixed and completion-event spam prevented.
|
||||
- Fresh-session implementation discipline and monitoring thresholds were already documented locally.
|
||||
|
||||
## Goals for this pass
|
||||
1. Establish the current failure modes with concrete evidence.
|
||||
2. Separate ACP-specific failures from generic subagent/session issues.
|
||||
3. Determine what is already fixed versus still broken.
|
||||
4. Produce a concrete recommendation and, if feasible in one pass, implement the highest-confidence fix.
|
||||
5. Update task/memory state with evidence before ending.
|
||||
|
||||
## Suggested investigation plan
|
||||
1. Review current OpenClaw docs and local memory around subagent/ACP failures.
|
||||
2. Reproduce or inspect recent failures using session/task evidence instead of guessing.
|
||||
3. Check current runtime status / relevant logs / known local patches.
|
||||
4. If the issue is in OpenClaw core, work in `external/openclaw-upstream/` on a focused branch.
|
||||
5. Validate with the smallest reliable reproduction possible.
|
||||
|
||||
## Evidence gathered so far
|
||||
- Fresh subagent run failed immediately with provider auth error for `zai` before any task execution.
|
||||
- Current installed agent auth profiles include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` profile configured.
|
||||
- Root cause for this immediate repro appears to be an incorrect explicit spawn model choice (`glm-5` alias → `zai/glm-5`) rather than missing auth propagation between agents.
|
||||
- Next step after confirming the model-selection issue: prefer `gpt-5.4` for fresh subagent reliability/debug passes for now, per Will's instruction, and continue separating real runtime issues from operator/config mistakes.
|
||||
|
||||
## Constraints
|
||||
- Prefer evidence over theory.
|
||||
- Do not claim a fix without concrete validation.
|
||||
- Keep the main session clean; use this file as the canonical baton.
|
||||
|
||||
## Success criteria
|
||||
- Clear diagnosis of the current reliability problem(s).
|
||||
- At least one of:
|
||||
- implemented fix with validation, or
|
||||
- sharply scoped next fix plan with exact evidence and files.
|
||||
- `memory/2026-03-13.md` (or current daily note), `memory/tasks.json`, and this WIP updated.
|
||||
15
memory/2026-03-13.md
Normal file
15
memory/2026-03-13.md
Normal file
@@ -0,0 +1,15 @@
|
||||
# 2026-03-13
|
||||
|
||||
## Subagent reliability investigation
|
||||
- Fresh implementation subagent launch for subagent/ACP reliability failed immediately before doing any task work.
|
||||
- Failure mode: delegated run was spawned with model `glm-5`, which resolved to provider model `zai/glm-5`.
|
||||
- Current agent auth profiles across installed agents include `openai-codex:default`, `litellm:default`, and `github-copilot:github`; there is no `zai` auth profile configured in agent auth stores.
|
||||
- Verified by inspecting agent auth profile keys under:
|
||||
- `/home/openclaw/.openclaw/agents/*/agent/auth-profiles.json`
|
||||
- Relevant OpenClaw docs confirm:
|
||||
- subagent spawns inherit caller model when `sessions_spawn.model` is omitted
|
||||
- provider/model auth errors like `No API key found for provider "zai"` occur when a provider model is selected without matching auth
|
||||
- multi-agent auth is per-agent via `~/.openclaw/agents/<agentId>/agent/auth-profiles.json`
|
||||
- Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
|
||||
- Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup).
|
||||
- Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability.
|
||||
Reference in New Issue
Block a user