From e7051a617f921fcdace87c910fd36879adbfe610 Mon Sep 17 00:00:00 2001 From: zap Date: Thu, 5 Mar 2026 19:00:58 +0000 Subject: [PATCH] docs(council): add Flynn council pipeline fix plan - 5-phase plan: config, structured output, bridge caps, E2E run, zap integration - Work to happen on fix/council-pipeline branch in ~/flynn - Goal: get Flynn's dual-council working so zap can delegate to it --- memory/2026-03-05.md | 13 +++ memory/plans/flynn-council-fix.md | 128 ++++++++++++++++++++++++++++++ 2 files changed, 141 insertions(+) create mode 100644 memory/plans/flynn-council-fix.md diff --git a/memory/2026-03-05.md b/memory/2026-03-05.md index 1060e6f..89b59f6 100644 --- a/memory/2026-03-05.md +++ b/memory/2026-03-05.md @@ -36,3 +36,16 @@ - Debate and parallel 3-round are mechanically identical — differ only in prompt tone. - Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles. - New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers. + +## Flynn council pipeline — fix plan written +- Reviewed Flynn codebase (`~/flynn/src/councils/`): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models. +- Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging. +- Created `memory/plans/flynn-council-fix.md` — 5-phase plan: + 1. Config & agent setup (define 5 required agents in config) + 2. Structured output compatibility (JSON schema support varies by provider) + 3. Bridge & cap validation (defaults may be too tight for real output) + 4. End-to-end run with real models + 5. Integration with zap (CLI command or ACP agent) +- Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation. +- Work to happen on feature branch `fix/council-pipeline`. +- Estimated effort: 1-2 focused sessions. diff --git a/memory/plans/flynn-council-fix.md b/memory/plans/flynn-council-fix.md new file mode 100644 index 0000000..5985b95 --- /dev/null +++ b/memory/plans/flynn-council-fix.md @@ -0,0 +1,128 @@ +# Flynn Council Pipeline — Fix Plan + +**Goal**: Get Flynn's dual-council pipeline (`council.run`) working against real models so zap can delegate council tasks to Flynn as an external agent. + +**Branch**: `fix/council-pipeline` (off `main`) + +**Status**: The orchestrator code, types, schemas, tool registration, TUI `/council` command, and preflight check all exist. Unit tests pass (mocked). But the pipeline has never run successfully against real models. + +--- + +## Phase 1: Configuration & Agent Setup + +**Problem**: The council requires 5 named agents in `agent_configs` that don't exist in the default config (everything is commented out). + +**Tasks**: +1. Uncomment and populate `councils` block in `config/default.yaml` with `enabled: true`. +2. Define the 5 required agent configs: + - `council_d_arbiter` — D-group arbiter (feasibility-focused, structured JSON output) + - `council_d_freethinker` — D-group freethinker (ideation, boring-but-true) + - `council_p_arbiter` — P-group arbiter (novelty-focused, structured JSON output) + - `council_p_freethinker` — P-group freethinker (ideation, weird-is-fine) + - `council_meta_arbiter` — Meta merge agent (selects across both groups) +3. Each agent needs: + - A `system_prompt` that matches the pipeline's expected behavior (JSON-only output, role-specific framing) + - A `model_tier` (start with `default` for all; upgrade meta to `complex` after first success) +4. Decide whether to add grounder/writer agents or skip them initially (recommendation: skip, they're optional). + +**Acceptance**: `flynn tui` → `/council preflight` shows all agents resolved, tiers probed OK, no `[agent_missing]` flags. + +--- + +## Phase 2: Structured Output Compatibility + +**Problem**: The orchestrator demands strict JSON schema output (`responseFormat: jsonSchemaFormat(...)`) from every agent call. Most models handle this poorly or inconsistently. The pipeline has JSON repair + agent-based recovery, but if the underlying model doesn't support `response_format: json_schema`, it may fail before repair kicks in. + +**Tasks**: +1. Verify which models/providers in Flynn's config support `response_format` with `json_schema` type. + - OpenAI GPT-4o+: yes + - Anthropic Claude: no native `json_schema` (uses prompt-based JSON) + - Copilot/OpenRouter: depends on underlying model + - Ollama: partial support +2. Check how Flynn's model router handles `responseFormat` for providers that don't support it — does it silently drop it, error, or adapt? + - File: `src/models/` — check provider adapters +3. If needed, make the `responseFormat` parameter gracefully degrade: + - For providers without `json_schema` support, rely on the system prompt directive ("Return JSON only...") + the existing `parseWithAgentRecovery` fallback + - Don't hard-fail if the provider ignores `responseFormat` +4. Test with the actual configured model to confirm JSON output parses correctly through the Zod schemas. + +**Acceptance**: A single group round (D, round 1) completes without `repair_failed` or `parse_failed` using the configured model. + +--- + +## Phase 3: Bridge & Cap Validation + +**Problem**: `enforceBridgeCaps()` throws hard on any cap violation (`cap_exceeded`), which kills the entire run. Real model output is likely to exceed the tight defaults (e.g., `bridge_entry_max_chars: 300`). + +**Tasks**: +1. Review default cap values and increase if they're too restrictive for real output: + - `bridge_packet_max_chars: 2500` — may need 4000-5000 + - `bridge_entry_max_chars: 300` — may need 500-800 + - `bridge_field_max_bullets: 6` — probably fine +2. Consider making `enforceBridgeCaps` truncate rather than throw — trim entries to max chars, drop excess bullets, with a trace warning. +3. Alternatively, add a `strict_bridge: false` config option that allows soft enforcement. + +**Acceptance**: A 2-round run completes without `bridge_validation_failed` stop reason. + +--- + +## Phase 4: End-to-End Run + +**Tasks**: +1. Run `/council preflight` — confirm clean. +2. Run `/council ` — e.g., "What's the best approach to add persistent memory to an AI assistant?" +3. Verify: + - Pipeline reaches `max_rounds` or `convergence` stop reason (not an error). + - Both D and P groups produce shortlists. + - Meta merge produces `selected_primary` and `selected_secondary`. + - Artifacts are written to `~/.local/share/flynn/councils/`. + - Markdown summary is human-readable and useful. +4. Fix any issues surfaced during the run (likely: JSON format, cap overflow, agent prompt tuning). + +**Acceptance**: At least one clean end-to-end run with real models, artifacts saved, readable output. + +--- + +## Phase 5: Integration with Zap (OpenClaw) + +**Goal**: Let zap delegate council tasks to Flynn via external agent invocation. + +**Tasks**: +1. Determine the integration path: + - **Option A**: Flynn exposes a CLI command (`flynn council run --task "..."`) that zap can call via `exec`. + - **Option B**: Flynn exposes an HTTP endpoint for council runs (if gateway supports it). + - **Option C**: Zap uses `sessions_spawn` to invoke Flynn as an ACP agent with a council task. +2. Implement the chosen path (likely Option A as simplest): + - Add `flynn council run --task "" [--max-rounds N] [--output json|markdown]` CLI subcommand. + - Output the markdown summary to stdout, JSON to a file. +3. Update zap's council skill to support a `backend: flynn` option that delegates to Flynn instead of spawning subagents. + +**Acceptance**: Zap can invoke `flynn council run --task "..."` and get structured output back. + +--- + +## Estimated Work + +| Phase | Effort | Risk | +|-------|--------|------| +| 1. Config & agents | Small (config-only) | Low | +| 2. Structured output | Medium (may need provider adapter changes) | Medium — depends on model JSON compliance | +| 3. Bridge caps | Small (config + maybe truncation logic) | Low | +| 4. E2E run | Medium (iterative debugging) | Medium — real models are unpredictable | +| 5. Zap integration | Medium (new CLI command + skill update) | Low | + +**Total**: ~1-2 focused sessions. + +--- + +## Open Questions +- Which model tier to use for council agents? Start with `default` (cheapest), upgrade after confirmed working. +- Should we keep the scaffold system or skip it for now? Recommendation: skip (`scaffold_path` unset), use system prompts only. +- Do we need the writer agents? Recommendation: skip for v1, the meta arbiter output is sufficient. + +## TODO (from earlier council skill work) +- Revisit subagent personality depth +- Revisit skill name ("council") +- Consider unifying debate and parallel flows +- Experiment with 2-round sufficiency +- Test with different model tiers for advisors vs referee