Files

zap e7051a617f docs(council): add Flynn council pipeline fix plan

- 5-phase plan: config, structured output, bridge caps, E2E run, zap integration
- Work to happen on fix/council-pipeline branch in ~/flynn
- Goal: get Flynn's dual-council working so zap can delegate to it

2026-03-05 19:00:58 +00:00

3.6 KiB

Raw Blame History

2026-03-05

Council skill created and iterated

Built skills/council/ — multi-perspective advisory council using subagents.
Design decisions (agreed with Will):
- Implemented as a skill (not standalone agents).
- 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
- Referee is a separate subagent (not the session model) — can use a stronger model tier.
- Default flow: Parallel + Synthesis. Sequential and Debate flows also available.
- Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
- Model tier chosen per-invocation based on topic complexity.
Two live tests run:
- Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
- Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
Post-test iteration: updated skill with configurable parameters:
- flow (parallel/sequential/debate), rounds (1-5), tier (light/medium/heavy)
- Round-specific prompt templates (opening, rebuttal, final position)
- Multi-round referee template that tracks position evolution
- Word count guidance that decreases per round to control token cost
- Subagent labeling convention: council-r{round}-{role}
Files: SKILL.md, references/prompts.md, scripts/council.sh (reference doc).
TODOs in memory/tasks.json:
- Revisit advisor personality depth (richer backstories).
- Revisit skill name ("council" is placeholder).
- Experiment with different round counts and flows for optimal depth/cost tradeoffs.

Council experiments completed

Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
1. Parallel 1-round (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
2. Sequential 1-round (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
3. Debate/Parallel 3-round (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
Key findings:
- 3 rounds is the sweet spot for depth — positions converge by round 3.
- Sequential is most token-efficient for focused topics.
- Parallel 3-round is best depth-to-cost ratio for substantive topics.
- Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.

Flynn council pipeline — fix plan written

Reviewed Flynn codebase (~/flynn/src/councils/): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models.
Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging.
Created memory/plans/flynn-council-fix.md — 5-phase plan:
1. Config & agent setup (define 5 required agents in config)
2. Structured output compatibility (JSON schema support varies by provider)
3. Bridge & cap validation (defaults may be too tight for real output)
4. End-to-end run with real models
5. Integration with zap (CLI command or ACP agent)
Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation.
Work to happen on feature branch fix/council-pipeline.
Estimated effort: 1-2 focused sessions.

3.6 KiB Raw Blame History

2026-03-05

Council skill created and iterated

Council experiments completed

Flynn council pipeline — fix plan written

3.6 KiB

Raw Blame History