# 2026-03-05

## Council skill created and iterated
- Built `skills/council/` — multi-perspective advisory council using subagents.
- Design decisions (agreed with Will):
  - Implemented as a **skill** (not standalone agents).
  - 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
  - Referee is a separate subagent (not the session model) — can use a stronger model tier.
  - Default flow: **Parallel + Synthesis**. Sequential and Debate flows also available.
  - Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
  - Model tier chosen per-invocation based on topic complexity.
- Two live tests run:
  - Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
  - Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
- Post-test iteration: updated skill with configurable parameters:
  - `flow` (parallel/sequential/debate), `rounds` (1-5), `tier` (light/medium/heavy)
  - Round-specific prompt templates (opening, rebuttal, final position)
  - Multi-round referee template that tracks position evolution
  - Word count guidance that decreases per round to control token cost
  - Subagent labeling convention: `council-r{round}-{role}`
- Files: `SKILL.md`, `references/prompts.md`, `scripts/council.sh` (reference doc).
- TODOs in `memory/tasks.json`:
  - Revisit advisor personality depth (richer backstories).
  - Revisit skill name ("council" is placeholder).
  - Experiment with different round counts and flows for optimal depth/cost tradeoffs.

## Council experiments completed
- Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
  1. **Parallel 1-round** (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
  2. **Sequential 1-round** (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
  3. **Debate/Parallel 3-round** (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
- Key findings:
  - 3 rounds is the sweet spot for depth — positions converge by round 3.
  - Sequential is most token-efficient for focused topics.
  - Parallel 3-round is best depth-to-cost ratio for substantive topics.
  - Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.

## Flynn council pipeline — fix plan written
- Reviewed Flynn codebase (`~/flynn/src/councils/`): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models.
- Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging.
- Created `memory/plans/flynn-council-fix.md` — 5-phase plan:
  1. Config & agent setup (define 5 required agents in config)
  2. Structured output compatibility (JSON schema support varies by provider)
  3. Bridge & cap validation (defaults may be too tight for real output)
  4. End-to-end run with real models
  5. Integration with zap (CLI command or ACP agent)
- Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation.
- Work to happen on feature branch `fix/council-pipeline`.
- Estimated effort: 1-2 focused sessions.