Files

zap e7051a617f docs(council): add Flynn council pipeline fix plan

- 5-phase plan: config, structured output, bridge caps, E2E run, zap integration
- Work to happen on fix/council-pipeline branch in ~/flynn
- Goal: get Flynn's dual-council working so zap can delegate to it

2026-03-05 19:00:58 +00:00

6.5 KiB

Raw Blame History

Flynn Council Pipeline — Fix Plan

Goal: Get Flynn's dual-council pipeline (council.run) working against real models so zap can delegate council tasks to Flynn as an external agent.

Branch: fix/council-pipeline (off main)

Status: The orchestrator code, types, schemas, tool registration, TUI /council command, and preflight check all exist. Unit tests pass (mocked). But the pipeline has never run successfully against real models.

Phase 1: Configuration & Agent Setup

Problem: The council requires 5 named agents in agent_configs that don't exist in the default config (everything is commented out).

Tasks:

Uncomment and populate councils block in config/default.yaml with enabled: true.
Define the 5 required agent configs:
- council_d_arbiter — D-group arbiter (feasibility-focused, structured JSON output)
- council_d_freethinker — D-group freethinker (ideation, boring-but-true)
- council_p_arbiter — P-group arbiter (novelty-focused, structured JSON output)
- council_p_freethinker — P-group freethinker (ideation, weird-is-fine)
- council_meta_arbiter — Meta merge agent (selects across both groups)
Each agent needs:
- A system_prompt that matches the pipeline's expected behavior (JSON-only output, role-specific framing)
- A model_tier (start with default for all; upgrade meta to complex after first success)
Decide whether to add grounder/writer agents or skip them initially (recommendation: skip, they're optional).

Acceptance: flynn tui → /council preflight shows all agents resolved, tiers probed OK, no [agent_missing] flags.

Phase 2: Structured Output Compatibility

Problem: The orchestrator demands strict JSON schema output (responseFormat: jsonSchemaFormat(...)) from every agent call. Most models handle this poorly or inconsistently. The pipeline has JSON repair + agent-based recovery, but if the underlying model doesn't support response_format: json_schema, it may fail before repair kicks in.

Tasks:

Verify which models/providers in Flynn's config support response_format with json_schema type.
- OpenAI GPT-4o+: yes
- Anthropic Claude: no native json_schema (uses prompt-based JSON)
- Copilot/OpenRouter: depends on underlying model
- Ollama: partial support
Check how Flynn's model router handles responseFormat for providers that don't support it — does it silently drop it, error, or adapt?
- File: src/models/ — check provider adapters
If needed, make the responseFormat parameter gracefully degrade:
- For providers without json_schema support, rely on the system prompt directive ("Return JSON only...") + the existing parseWithAgentRecovery fallback
- Don't hard-fail if the provider ignores responseFormat
Test with the actual configured model to confirm JSON output parses correctly through the Zod schemas.

Acceptance: A single group round (D, round 1) completes without repair_failed or parse_failed using the configured model.

Phase 3: Bridge & Cap Validation

Problem: enforceBridgeCaps() throws hard on any cap violation (cap_exceeded), which kills the entire run. Real model output is likely to exceed the tight defaults (e.g., bridge_entry_max_chars: 300).

Tasks:

Review default cap values and increase if they're too restrictive for real output:
- bridge_packet_max_chars: 2500 — may need 4000-5000
- bridge_entry_max_chars: 300 — may need 500-800
- bridge_field_max_bullets: 6 — probably fine
Consider making enforceBridgeCaps truncate rather than throw — trim entries to max chars, drop excess bullets, with a trace warning.
Alternatively, add a strict_bridge: false config option that allows soft enforcement.

Acceptance: A 2-round run completes without bridge_validation_failed stop reason.

Phase 4: End-to-End Run

Tasks:

Run /council preflight — confirm clean.
Run /council <simple test task> — e.g., "What's the best approach to add persistent memory to an AI assistant?"
Verify:
- Pipeline reaches max_rounds or convergence stop reason (not an error).
- Both D and P groups produce shortlists.
- Meta merge produces selected_primary and selected_secondary.
- Artifacts are written to ~/.local/share/flynn/councils/.
- Markdown summary is human-readable and useful.
Fix any issues surfaced during the run (likely: JSON format, cap overflow, agent prompt tuning).

Acceptance: At least one clean end-to-end run with real models, artifacts saved, readable output.

Phase 5: Integration with Zap (OpenClaw)

Goal: Let zap delegate council tasks to Flynn via external agent invocation.

Tasks:

Determine the integration path:
- Option A: Flynn exposes a CLI command (flynn council run --task "...") that zap can call via exec.
- Option B: Flynn exposes an HTTP endpoint for council runs (if gateway supports it).
- Option C: Zap uses sessions_spawn to invoke Flynn as an ACP agent with a council task.
Implement the chosen path (likely Option A as simplest):
- Add flynn council run --task "<task>" [--max-rounds N] [--output json|markdown] CLI subcommand.
- Output the markdown summary to stdout, JSON to a file.
Update zap's council skill to support a backend: flynn option that delegates to Flynn instead of spawning subagents.

Acceptance: Zap can invoke flynn council run --task "..." and get structured output back.

Estimated Work

Phase	Effort	Risk
1. Config & agents	Small (config-only)	Low
2. Structured output	Medium (may need provider adapter changes)	Medium — depends on model JSON compliance
3. Bridge caps	Small (config + maybe truncation logic)	Low
4. E2E run	Medium (iterative debugging)	Medium — real models are unpredictable
5. Zap integration	Medium (new CLI command + skill update)	Low

Total: ~1-2 focused sessions.

Open Questions

Which model tier to use for council agents? Start with default (cheapest), upgrade after confirmed working.
Should we keep the scaffold system or skip it for now? Recommendation: skip (scaffold_path unset), use system prompts only.
Do we need the writer agents? Recommendation: skip for v1, the meta arbiter output is sufficient.

TODO (from earlier council skill work)

Revisit subagent personality depth
Revisit skill name ("council")
Consider unifying debate and parallel flows
Experiment with 2-round sufficiency
Test with different model tiers for advisors vs referee

6.5 KiB Raw Blame History

Flynn Council Pipeline — Fix Plan

Phase 1: Configuration & Agent Setup

Phase 2: Structured Output Compatibility

Phase 3: Bridge & Cap Validation

Phase 4: End-to-End Run

Phase 5: Integration with Zap (OpenClaw)

Estimated Work

Open Questions

TODO (from earlier council skill work)

6.5 KiB

Raw Blame History