Files

zap 3e198bcbb3 docs(council): add experimental findings from all 3 flow types

- Tested parallel 1-round, sequential 1-round, debate/parallel 3-round
- 3 rounds is sweet spot: positions converge, meaningful evolution
- Sequential most token-efficient; parallel 3-round best depth-to-cost
- Debate and parallel 3-round mechanically identical (prompt tone differs)
- Added cost profiles, recommended defaults by use case
- Updated TODOs: unify flows, test 2-round, test mixed model tiers

2026-03-05 16:39:32 +00:00

2.6 KiB

Raw Blame History

2026-03-05

Council skill created and iterated

Built skills/council/ — multi-perspective advisory council using subagents.
Design decisions (agreed with Will):
- Implemented as a skill (not standalone agents).
- 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
- Referee is a separate subagent (not the session model) — can use a stronger model tier.
- Default flow: Parallel + Synthesis. Sequential and Debate flows also available.
- Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
- Model tier chosen per-invocation based on topic complexity.
Two live tests run:
- Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
- Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
Post-test iteration: updated skill with configurable parameters:
- flow (parallel/sequential/debate), rounds (1-5), tier (light/medium/heavy)
- Round-specific prompt templates (opening, rebuttal, final position)
- Multi-round referee template that tracks position evolution
- Word count guidance that decreases per round to control token cost
- Subagent labeling convention: council-r{round}-{role}
Files: SKILL.md, references/prompts.md, scripts/council.sh (reference doc).
TODOs in memory/tasks.json:
- Revisit advisor personality depth (richer backstories).
- Revisit skill name ("council" is placeholder).
- Experiment with different round counts and flows for optimal depth/cost tradeoffs.

Council experiments completed

Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
1. Parallel 1-round (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
2. Sequential 1-round (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
3. Debate/Parallel 3-round (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
Key findings:
- 3 rounds is the sweet spot for depth — positions converge by round 3.
- Sequential is most token-efficient for focused topics.
- Parallel 3-round is best depth-to-cost ratio for substantive topics.
- Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.

2.6 KiB Raw Blame History

2026-03-05

Council skill created and iterated

Council experiments completed

2.6 KiB

Raw Blame History