Files
swarm-zap/skills/council/SKILL.md
zap 3e198bcbb3 docs(council): add experimental findings from all 3 flow types
- Tested parallel 1-round, sequential 1-round, debate/parallel 3-round
- 3 rounds is sweet spot: positions converge, meaningful evolution
- Sequential most token-efficient; parallel 3-round best depth-to-cost
- Debate and parallel 3-round mechanically identical (prompt tone differs)
- Added cost profiles, recommended defaults by use case
- Updated TODOs: unify flows, test 2-round, test mixed model tiers
2026-03-05 16:39:32 +00:00

8.1 KiB

name, description
name description
council Convene a council of AI advisor agents with distinct perspectives to deliberate on a topic, then synthesize their views into a verdict. Use when: (1) user asks for multi-perspective analysis, (2) wants to brainstorm with diverse viewpoints, (3) requests a council or advisors opinion, (4) needs a balanced decision on a complex question. Supports parallel (default), sequential, and debate flows with configurable round count. NOT for: simple factual lookups, single-perspective tasks, or quick one-liner answers.

Council Skill

Spawn a council of 3 advisor subagents + 1 referee subagent to deliberate on a topic. Each advisor has a distinct personality/lens. The referee synthesizes their output into a final verdict with collapsed advisor perspectives.

Parameters

Parameter Default Description
flow parallel parallel, sequential, or debate
rounds 1 Number of deliberation rounds (1-5). Round 1 = opening positions. Round 2+ = rebuttals where advisors see and respond to each other.
tier light Model tier: light, medium, or heavy (see Model Selection)

Quick reference:

  • flow=parallel, rounds=1 — fast single-shot, all advisors in parallel, then referee (default)
  • flow=parallel, rounds=3 — parallel opening + 2 rebuttal rounds + referee (recommended for depth)
  • flow=sequential, rounds=1 — each advisor sees prior outputs, then referee
  • flow=debate, rounds=3 — parallel opening + cross-advisor rebuttals + referee synthesis

Advisor Roster (default)

Role Lens System stance
Pragmatist Feasibility, cost, effort "Can we actually do this?"
Visionary Long-term potential, innovation "What if we went bigger?"
Skeptic Risk, failure modes, edge cases "What could go wrong?"

The referee is a separate agent: balanced, fair, synthesis-oriented.

Flows

1. Parallel + Synthesis (default)

Single-round version (rounds=1):

  1. Spawn all 3 advisors simultaneously via sessions_spawn (mode=run).
  2. Each advisor receives the same topic prompt with their personality instructions.
  3. Wait for all 3 to complete (push-based).
  4. Spawn the referee with all 3 advisor outputs.
  5. Referee produces the final verdict.

Multi-round version (rounds=N):

  1. Round 1: Spawn all 3 advisors in parallel with opening position prompt.
  2. Collect all outputs.
  3. Round 2..N: For each rebuttal round, respawn all 3 advisors in parallel. Each receives:
    • Their own prior position(s)
    • All other advisors' prior round output
    • Round-specific instructions (rebuttal prompt for middle rounds, final position prompt for last round)
  4. Collect outputs after each round.
  5. Referee: Spawn referee with the full debate transcript (all rounds, all advisors).

2. Sequential Rounds

Single-round (rounds=1):

  1. Spawn advisors one at a time, each seeing prior advisor outputs.
  2. Spawn referee with full thread.

Multi-round (rounds=N):

  1. Round 1: Advisors go sequentially, each seeing prior advisors in that round.
  2. Round 2..N: Each advisor sees ALL prior round outputs before giving their rebuttal/final take.
  3. Referee: Gets the full thread.

3. Debate

Always multi-round (minimum rounds=2, default rounds=3 for this flow):

  1. Round 1: Parallel opening takes.
  2. Round 2..N-1: Cross-rebuttals — each advisor responds to all others.
  3. Round N: Final positions.
  4. Referee: Gets full debate transcript, notes evolution of positions.

Model Selection

Pick model tier based on topic complexity:

  • light (casual brainstorm, simple pros/cons): default model for advisors and referee.
  • medium (architecture decisions, strategy): default model for advisors, stronger model for referee.
  • heavy (critical decisions, deep analysis): stronger model for all agents.

The caller (main agent) determines tier before spawning.

Round-Specific Prompt Guidance

See references/prompts.md for all prompt templates. Key points:

  • Round 1 (Opening): Full advisor system prompt + topic. Ask for opening position.
  • Middle rounds (Rebuttals): Include prior positions from ALL advisors. Ask: where do you agree, push back, or change your mind? Keep shorter (200-300 words).
  • Final round: Ask for final synthesis — what changed, what held firm, final recommendation in 2-3 sentences. Keep shortest (150-250 words).
  • Referee (multi-round): Include the FULL debate transcript organized by round. Ask referee to note position evolution, not just final states.

Experimental Findings

Tested all 3 flows on the same topic ("Should AI assistants have persistent memory?"):

Parallel 1-round vs Parallel 3-round

  • 1-round: Fast, good for quick takes. Advisors give independent positions, referee synthesizes. Clean but no cross-pollination — advisors can't respond to each other's arguments.
  • 3-round: Significantly richer. Positions evolved meaningfully — the Visionary stepped back from always-on after engaging with Skeptic's arguments, the Skeptic softened on trajectory. Referee captured evolution. Best overall depth-to-cost ratio.
  • Takeaway: 3 rounds is the sweet spot. 1 round works for quick brainstorms. More than 3 likely hits diminishing returns (positions converge by round 3).

Sequential vs Parallel

  • Sequential: Later advisors build directly on earlier ones — less redundancy, more focused rebuttals. The Skeptic (speaking last) gave the sharpest response because they could address both prior positions directly. But earlier advisors can't respond to later ones without extra rounds.
  • Parallel: Advisors are more independent, sometimes overlapping. But each brings a genuinely uninfluenced perspective in round 1, which can surface blind spots that sequential misses.
  • Takeaway: Sequential produces tighter dialogue in fewer total subagent calls (3 advisors + 1 referee = 4 calls). Parallel gives more independent coverage but needs multi-round for depth (3 advisors x 3 rounds + 1 referee = 10 calls).

Debate (parallel 3-round) vs Parallel 3-round

  • The flows are mechanically identical in our implementation. The distinction is mainly about prompt framing — debate prompts emphasize direct engagement ("respond to the Visionary's claim that...") while parallel rebuttal prompts are more general ("where do you agree or push back?").
  • Takeaway: These can be unified. The "debate" label is useful for user-facing intent ("I want them to argue") but doesn't need a separate mechanical flow.

Cost profile (approximate, per run on default model tier)

  • Parallel 1-round: ~4 subagent calls, ~60k tokens total
  • Sequential 1-round: ~4 subagent calls, ~55k tokens total (slightly less due to no parallel redundancy)
  • Parallel/Debate 3-round: ~10 subagent calls, ~130k tokens total
  • Quick brainstorm: flow=parallel, rounds=1 — fast, cheap, good enough for casual topics
  • Balanced analysis: flow=parallel, rounds=3 — best depth-to-cost ratio, recommended default for substantive topics
  • Tight dialogue: flow=sequential, rounds=1 — fewest calls, good for focused topics where building on each other matters
  • Deep dive: flow=debate, rounds=3 — same as parallel 3-round with more combative prompting

Implementation

Read scripts/council.sh for the orchestration logic. For programmatic invocation, the main agent calls sessions_spawn directly following the patterns above.

Configuration

Advisor personalities can be customized per-invocation by overriding the roster. Default roster and prompt templates live in references/prompts.md.

TODO (revisit later)

  • Revisit subagent personality depth — richer backstories, communication styles
  • Revisit skill name — "council" works for now
  • Consider unifying debate and parallel flows (mechanically identical, differ only in prompt tone)
  • Explore whether 2 rounds is sufficient for most topics (vs 3)
  • Test with different model tiers for advisors vs referee