47 lines
3.3 KiB
Markdown
47 lines
3.3 KiB
Markdown
# 2026-03-05
|
|
|
|
## Council skill created and iterated
|
|
- Built `skills/council/` — multi-perspective advisory council using subagents.
|
|
- Design decisions (agreed with Will):
|
|
- Implemented as a **skill** (not standalone agents).
|
|
- 3 advisors (Pragmatist, Visionary, Skeptic) + 1 Referee = 4 subagents total.
|
|
- Referee is a separate subagent (not the session model) — can use a stronger model tier.
|
|
- Default flow: **Parallel + Synthesis**. Sequential and Debate flows also available.
|
|
- Final output includes individual advisor perspectives (collapsed/summarized) + referee verdict.
|
|
- Model tier chosen per-invocation based on topic complexity.
|
|
- Two live tests run:
|
|
- Test 1: Parallel single-round on "Do LLM agents think?" — worked well.
|
|
- Test 2: Parallel 3-round debate on same topic — richer output, positions evolved meaningfully across rounds.
|
|
- Post-test iteration: updated skill with configurable parameters:
|
|
- `flow` (parallel/sequential/debate), `rounds` (1-5), `tier` (light/medium/heavy)
|
|
- Round-specific prompt templates (opening, rebuttal, final position)
|
|
- Multi-round referee template that tracks position evolution
|
|
- Word count guidance that decreases per round to control token cost
|
|
- Subagent labeling convention: `council-r{round}-{role}`
|
|
- Files: `SKILL.md`, `references/prompts.md`, `scripts/council.sh` (reference doc).
|
|
- TODOs in `memory/tasks.json`:
|
|
- Revisit advisor personality depth (richer backstories).
|
|
- Revisit skill name ("council" is placeholder).
|
|
- Experiment with different round counts and flows for optimal depth/cost tradeoffs.
|
|
|
|
## Council experiments completed
|
|
- Ran all 3 flow types on same topic ("Should AI assistants have persistent memory?"):
|
|
1. **Parallel 1-round** (Experiment 1): Fast, clean, independent perspectives. 4 subagent calls, ~60k tokens.
|
|
2. **Sequential 1-round** (Experiment 2): Tighter dialogue — later advisors build on earlier. 4 calls, ~55k tokens. Less redundancy.
|
|
3. **Debate/Parallel 3-round** (Experiment 3): Richest output. Positions evolved significantly across rounds (Visionary backed off always-on, Skeptic softened on trajectory). 10 calls, ~130k tokens.
|
|
- Key findings:
|
|
- 3 rounds is the sweet spot for depth — positions converge by round 3.
|
|
- Sequential is most token-efficient for focused topics.
|
|
- Parallel 3-round is best depth-to-cost ratio for substantive topics.
|
|
- Debate and parallel 3-round are mechanically identical — differ only in prompt tone.
|
|
- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
|
|
- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.
|
|
|
|
- 2026-03-05T21:36Z: Ran `openclaw security audit --deep` on request to clear stale-audit warning.
|
|
- Result: 1 critical, 2 warn, 1 info.
|
|
- Critical: plugin `acpx.bak` code-safety issue (dangerous exec pattern).
|
|
- Warnings: missing `plugins.allow` allowlist; extension tools reachable under permissive policy.
|
|
- Updated `memory/startup-health.json` + `memory/startup-health.md` to mark freshness restored and record findings.
|
|
- 2026-03-05T21:41Z: Quarantined stale extension folder `~/.openclaw/extensions/acpx.bak` to `~/.openclaw/extensions-quarantine/acpx.bak.20260305T214139Z` (no deletion).
|
|
- 2026-03-05T21:42Z: Re-ran `openclaw security audit --deep`: now 0 critical, 0 warn, 1 info.
|