Files
swarm-zap/memory/2026-03-05.md
zap e7051a617f docs(council): add Flynn council pipeline fix plan
- 5-phase plan: config, structured output, bridge caps, E2E run, zap integration
- Work to happen on fix/council-pipeline branch in ~/flynn
- Goal: get Flynn's dual-council working so zap can delegate to it
2026-03-05 19:00:58 +00:00

3.6 KiB

2026-03-05

Council skill created and iterated

  • Built skills/council/ — multi-perspective advisory council using subagents.
  • Design decisions (agreed with Will):
    • Implemented as a skill (not standalone agents).
    • 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
    • Referee is a separate subagent (not the session model) — can use a stronger model tier.
    • Default flow: Parallel + Synthesis. Sequential and Debate flows also available.
    • Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
    • Model tier chosen per-invocation based on topic complexity.
  • Two live tests run:
    • Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
    • Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
  • Post-test iteration: updated skill with configurable parameters:
    • flow (parallel/sequential/debate), rounds (1-5), tier (light/medium/heavy)
    • Round-specific prompt templates (opening, rebuttal, final position)
    • Multi-round referee template that tracks position evolution
    • Word count guidance that decreases per round to control token cost
    • Subagent labeling convention: council-r{round}-{role}
  • Files: SKILL.md, references/prompts.md, scripts/council.sh (reference doc).
  • TODOs in memory/tasks.json:
    • Revisit advisor personality depth (richer backstories).
    • Revisit skill name ("council" is placeholder).
    • Experiment with different round counts and flows for optimal depth/cost tradeoffs.

Council experiments completed

  • Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
    1. Parallel 1-round (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
    2. Sequential 1-round (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
    3. Debate/Parallel 3-round (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
  • Key findings:
    • 3 rounds is the sweet spot for depth — positions converge by round 3.
    • Sequential is most token-efficient for focused topics.
    • Parallel 3-round is best depth-to-cost ratio for substantive topics.
    • Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
  • Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
  • New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.

Flynn council pipeline — fix plan written

  • Reviewed Flynn codebase (~/flynn/src/councils/): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models.
  • Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging.
  • Created memory/plans/flynn-council-fix.md — 5-phase plan:
    1. Config & agent setup (define 5 required agents in config)
    2. Structured output compatibility (JSON schema support varies by provider)
    3. Bridge & cap validation (defaults may be too tight for real output)
    4. End-to-end run with real models
    5. Integration with zap (CLI command or ACP agent)
  • Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation.
  • Work to happen on feature branch fix/council-pipeline.
  • Estimated effort: 1-2 focused sessions.