docs(council): add experimental findings from all 3 flow types

- Tested parallel 1-round, sequential 1-round, debate/parallel 3-round - 3 rounds is sweet spot: positions converge, meaningful evolution - Sequential most token-efficient; parallel 3-round best depth-to-cost - Debate and parallel 3-round mechanically identical (prompt tone differs) - Added cost profiles, recommended defaults by use case - Updated TODOs: unify flows, test 2-round, test mixed model tiers
2026-03-05 16:39:32 +00:00
parent da36000050
commit 3e198bcbb3
2 changed files with 45 additions and 1 deletions
@@ -23,3 +23,16 @@
  - Revisit advisor personality depth (richer backstories).
  - Revisit skill name ("council" is placeholder).
  - Experiment with different round counts and flows for optimal depth/cost tradeoffs.
+
+## Council experiments completed
+- Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
+  1. **Parallel 1-round** (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
+  2. **Sequential 1-round** (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
+  3. **Debate/Parallel 3-round** (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
+- Key findings:
+  - 3 rounds is the sweet spot for depth — positions converge by round 3.
+  - Sequential is most token-efficient for focused topics.
+  - Parallel 3-round is best depth-to-cost ratio for substantive topics.
+  - Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
+- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
+- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.