docs(agent): clarify stuck-subagent monitoring

This commit is contained in:
zap
2026-03-13 00:21:17 +00:00
parent 3bb3888340
commit 7669d5787d
2 changed files with 4 additions and 0 deletions

View File

@@ -192,14 +192,17 @@ Handoff rule:
Subagent drift / stuck rule: Subagent drift / stuck rule:
- if a fresh implementation subagent is no longer making crisp progress, inspect before waiting longer - if a fresh implementation subagent is no longer making crisp progress, inspect before waiting longer
- default stance: keep a light eye on active fresh subagents instead of assuming they are fine until completion
- monitoring cadence for fresh implementation runs: - monitoring cadence for fresh implementation runs:
- do not routine-poll in the first 5 minutes unless the task is very small or something already looks wrong - do not routine-poll in the first 5 minutes unless the task is very small or something already looks wrong
- at ~5 minutes, if the run is still active, do one lightweight status check - at ~5 minutes, if the run is still active, do one lightweight status check
- at ~10 minutes, if still active, inspect the child session/history once for concrete evidence of edits/tests/commits - at ~10 minutes, if still active, inspect the child session/history once for concrete evidence of edits/tests/commits
- if the user explicitly asks to keep an eye on it, do sparse follow-up checks and answer plainly whether it looks productively running or stuck
- treat these as intervention triggers: - treat these as intervention triggers:
- the run is still active after a reasonable window for the task and has not updated `WIP.md` - the run is still active after a reasonable window for the task and has not updated `WIP.md`
- the run is looping on broad reads/re-verification without landing state updates or commits - the run is looping on broad reads/re-verification without landing state updates or commits
- the completion result is unusable, missing evidence, or obviously unrelated to the assigned pass - the completion result is unusable, missing evidence, or obviously unrelated to the assigned pass
- a status inspection shows repeated low-value tool churn without advancing files/tests/state
- concrete time thresholds: - concrete time thresholds:
- narrow/scoped pass (single docs/config/script task): suspiciously long at ~12 minutes, intervene by ~15 minutes unless recent inspection shows crisp progress - narrow/scoped pass (single docs/config/script task): suspiciously long at ~12 minutes, intervene by ~15 minutes unless recent inspection shows crisp progress
- medium implementation pass (like one bounded feature slice): suspiciously long at ~20 minutes, intervene by ~25 minutes unless recent inspection shows crisp progress - medium implementation pass (like one bounded feature slice): suspiciously long at ~20 minutes, intervene by ~25 minutes unless recent inspection shows crisp progress

View File

@@ -13,3 +13,4 @@
- Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents. - Conclusion: the immediate failure was caused by an incorrect explicit model selection in the spawn request, not by missing auth propagation between agents.
- Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup). - Corrective action: retry fresh delegation with `litellm/glm-5` (the intended medium-tier routed model for delegated implementation work in this setup).
- Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability. - Will explicitly requested on 2026-03-13 to use `gpt-5.4` for subagents for now while debugging delegation reliability.
- Will also explicitly requested that zap keep a light eye on active subagents and check whether they look stuck instead of assuming they are fine until completion.