Files
swarm-zap/memory/plans/flynn-council-fix.md
zap e7051a617f docs(council): add Flynn council pipeline fix plan
- 5-phase plan: config, structured output, bridge caps, E2E run, zap integration
- Work to happen on fix/council-pipeline branch in ~/flynn
- Goal: get Flynn's dual-council working so zap can delegate to it
2026-03-05 19:00:58 +00:00

6.5 KiB

Flynn Council Pipeline — Fix Plan

Goal: Get Flynn's dual-council pipeline (council.run) working against real models so zap can delegate council tasks to Flynn as an external agent.

Branch: fix/council-pipeline (off main)

Status: The orchestrator code, types, schemas, tool registration, TUI /council command, and preflight check all exist. Unit tests pass (mocked). But the pipeline has never run successfully against real models.


Phase 1: Configuration & Agent Setup

Problem: The council requires 5 named agents in agent_configs that don't exist in the default config (everything is commented out).

Tasks:

  1. Uncomment and populate councils block in config/default.yaml with enabled: true.
  2. Define the 5 required agent configs:
    • council_d_arbiter — D-group arbiter (feasibility-focused, structured JSON output)
    • council_d_freethinker — D-group freethinker (ideation, boring-but-true)
    • council_p_arbiter — P-group arbiter (novelty-focused, structured JSON output)
    • council_p_freethinker — P-group freethinker (ideation, weird-is-fine)
    • council_meta_arbiter — Meta merge agent (selects across both groups)
  3. Each agent needs:
    • A system_prompt that matches the pipeline's expected behavior (JSON-only output, role-specific framing)
    • A model_tier (start with default for all; upgrade meta to complex after first success)
  4. Decide whether to add grounder/writer agents or skip them initially (recommendation: skip, they're optional).

Acceptance: flynn tui/council preflight shows all agents resolved, tiers probed OK, no [agent_missing] flags.


Phase 2: Structured Output Compatibility

Problem: The orchestrator demands strict JSON schema output (responseFormat: jsonSchemaFormat(...)) from every agent call. Most models handle this poorly or inconsistently. The pipeline has JSON repair + agent-based recovery, but if the underlying model doesn't support response_format: json_schema, it may fail before repair kicks in.

Tasks:

  1. Verify which models/providers in Flynn's config support response_format with json_schema type.
    • OpenAI GPT-4o+: yes
    • Anthropic Claude: no native json_schema (uses prompt-based JSON)
    • Copilot/OpenRouter: depends on underlying model
    • Ollama: partial support
  2. Check how Flynn's model router handles responseFormat for providers that don't support it — does it silently drop it, error, or adapt?
    • File: src/models/ — check provider adapters
  3. If needed, make the responseFormat parameter gracefully degrade:
    • For providers without json_schema support, rely on the system prompt directive ("Return JSON only...") + the existing parseWithAgentRecovery fallback
    • Don't hard-fail if the provider ignores responseFormat
  4. Test with the actual configured model to confirm JSON output parses correctly through the Zod schemas.

Acceptance: A single group round (D, round 1) completes without repair_failed or parse_failed using the configured model.


Phase 3: Bridge & Cap Validation

Problem: enforceBridgeCaps() throws hard on any cap violation (cap_exceeded), which kills the entire run. Real model output is likely to exceed the tight defaults (e.g., bridge_entry_max_chars: 300).

Tasks:

  1. Review default cap values and increase if they're too restrictive for real output:
    • bridge_packet_max_chars: 2500 — may need 4000-5000
    • bridge_entry_max_chars: 300 — may need 500-800
    • bridge_field_max_bullets: 6 — probably fine
  2. Consider making enforceBridgeCaps truncate rather than throw — trim entries to max chars, drop excess bullets, with a trace warning.
  3. Alternatively, add a strict_bridge: false config option that allows soft enforcement.

Acceptance: A 2-round run completes without bridge_validation_failed stop reason.


Phase 4: End-to-End Run

Tasks:

  1. Run /council preflight — confirm clean.
  2. Run /council <simple test task> — e.g., "What's the best approach to add persistent memory to an AI assistant?"
  3. Verify:
    • Pipeline reaches max_rounds or convergence stop reason (not an error).
    • Both D and P groups produce shortlists.
    • Meta merge produces selected_primary and selected_secondary.
    • Artifacts are written to ~/.local/share/flynn/councils/.
    • Markdown summary is human-readable and useful.
  4. Fix any issues surfaced during the run (likely: JSON format, cap overflow, agent prompt tuning).

Acceptance: At least one clean end-to-end run with real models, artifacts saved, readable output.


Phase 5: Integration with Zap (OpenClaw)

Goal: Let zap delegate council tasks to Flynn via external agent invocation.

Tasks:

  1. Determine the integration path:
    • Option A: Flynn exposes a CLI command (flynn council run --task "...") that zap can call via exec.
    • Option B: Flynn exposes an HTTP endpoint for council runs (if gateway supports it).
    • Option C: Zap uses sessions_spawn to invoke Flynn as an ACP agent with a council task.
  2. Implement the chosen path (likely Option A as simplest):
    • Add flynn council run --task "<task>" [--max-rounds N] [--output json|markdown] CLI subcommand.
    • Output the markdown summary to stdout, JSON to a file.
  3. Update zap's council skill to support a backend: flynn option that delegates to Flynn instead of spawning subagents.

Acceptance: Zap can invoke flynn council run --task "..." and get structured output back.


Estimated Work

Phase Effort Risk
1. Config & agents Small (config-only) Low
2. Structured output Medium (may need provider adapter changes) Medium — depends on model JSON compliance
3. Bridge caps Small (config + maybe truncation logic) Low
4. E2E run Medium (iterative debugging) Medium — real models are unpredictable
5. Zap integration Medium (new CLI command + skill update) Low

Total: ~1-2 focused sessions.


Open Questions

  • Which model tier to use for council agents? Start with default (cheapest), upgrade after confirmed working.
  • Should we keep the scaffold system or skip it for now? Recommendation: skip (scaffold_path unset), use system prompts only.
  • Do we need the writer agents? Recommendation: skip for v1, the meta arbiter output is sufficient.

TODO (from earlier council skill work)

  • Revisit subagent personality depth
  • Revisit skill name ("council")
  • Consider unifying debate and parallel flows
  • Experiment with 2-round sufficiency
  • Test with different model tiers for advisors vs referee