4.3 KiB
4.3 KiB
2026-03-05
Council skill created and iterated
- Built
skills/council/— multi-perspective advisory council using subagents. - Design decisions (agreed with Will):
- Implemented as a skill (not standalone agents).
- 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
- Referee is a separate subagent (not the session model) — can use a stronger model tier.
- Default flow: Parallel + Synthesis. Sequential and Debate flows also available.
- Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
- Model tier chosen per-invocation based on topic complexity.
- Two live tests run:
- Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
- Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
- Post-test iteration: updated skill with configurable parameters:
flow(parallel/sequential/debate),rounds(1-5),tier(light/medium/heavy)- Round-specific prompt templates (opening, rebuttal, final position)
- Multi-round referee template that tracks position evolution
- Word count guidance that decreases per round to control token cost
- Subagent labeling convention:
council-r{round}-{role}
- Files:
SKILL.md,references/prompts.md,scripts/council.sh(reference doc). - TODOs in
memory/tasks.json:- Revisit advisor personality depth (richer backstories).
- Revisit skill name ("council" is placeholder).
- Experiment with different round counts and flows for optimal depth/cost tradeoffs.
Council experiments completed
- Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
- Parallel 1-round (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
- Sequential 1-round (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
- Debate/Parallel 3-round (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
- Key findings:
- 3 rounds is the sweet spot for depth — positions converge by round 3.
- Sequential is most token-efficient for focused topics.
- Parallel 3-round is best depth-to-cost ratio for substantive topics.
- Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.
Flynn council pipeline — fix plan written
- Reviewed Flynn codebase (
~/flynn/src/councils/): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models. - Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging.
- Created
memory/plans/flynn-council-fix.md— 5-phase plan:- Config & agent setup (define 5 required agents in config)
- Structured output compatibility (JSON schema support varies by provider)
- Bridge & cap validation (defaults may be too tight for real output)
- End-to-end run with real models
- Integration with zap (CLI command or ACP agent)
- Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation.
- Work to happen on feature branch
fix/council-pipeline. - Estimated effort: 1-2 focused sessions.
- 2026-03-05T21:36Z: Ran
openclaw security audit --deepon request to clear stale-audit warning.- Result: 1 critical, 2 warn, 1 info.
- Critical: plugin
acpx.bakcode-safety issue (dangerous exec pattern). - Warnings: missing
plugins.allowallowlist; extension tools reachable under permissive policy. - Updated
memory/startup-health.json+memory/startup-health.mdto mark freshness restored and record findings.
- 2026-03-05T21:41Z: Quarantined stale extension folder
~/.openclaw/extensions/acpx.bakto~/.openclaw/extensions-quarantine/acpx.bak.20260305T214139Z(no deletion). - 2026-03-05T21:42Z: Re-ran
openclaw security audit --deep: now 0 critical, 0 warn, 1 info.