From 165e987dbcdfa292677f9fee2fed82ca07698eac Mon Sep 17 00:00:00 2001 From: zap Date: Wed, 11 Mar 2026 20:35:51 +0000 Subject: [PATCH] chore(memory): remove flynn-related notes --- memory/2026-03-05.md | 12 -- memory/incidents/2026-03-04-local-patch.md | 9 -- memory/plans/flynn-council-fix.md | 128 --------------------- memory/projects.md | 10 -- memory/roadmap.md | 1 - memory/tasks.json | 15 --- 6 files changed, 175 deletions(-) delete mode 100644 memory/plans/flynn-council-fix.md diff --git a/memory/2026-03-05.md b/memory/2026-03-05.md index f4b53df..0410d3c 100644 --- a/memory/2026-03-05.md +++ b/memory/2026-03-05.md @@ -37,18 +37,6 @@ - Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles. - New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers. -## Flynn council pipeline — fix plan written -- Reviewed Flynn codebase (`~/flynn/src/councils/`): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models. -- Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging. -- Created `memory/plans/flynn-council-fix.md` — 5-phase plan: - 1. Config & agent setup (define 5 required agents in config) - 2. Structured output compatibility (JSON schema support varies by provider) - 3. Bridge & cap validation (defaults may be too tight for real output) - 4. End-to-end run with real models - 5. Integration with zap (CLI command or ACP agent) -- Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation. -- Work to happen on feature branch `fix/council-pipeline`. -- Estimated effort: 1-2 focused sessions. - 2026-03-05T21:36Z: Ran `openclaw security audit --deep` on request to clear stale-audit warning. - Result: 1 critical, 2 warn, 1 info. - Critical: plugin `acpx.bak` code-safety issue (dangerous exec pattern). diff --git a/memory/incidents/2026-03-04-local-patch.md b/memory/incidents/2026-03-04-local-patch.md index 68a1c9e..a70d1e8 100644 --- a/memory/incidents/2026-03-04-local-patch.md +++ b/memory/incidents/2026-03-04-local-patch.md @@ -16,15 +16,6 @@ user: Sender (untrusted metadata): } ``` -[Wed 2026-03-04 02:02 UTC] So between the PR and ~/flynn you are not able to find a clean solution? -assistant: [[reply_to_current]] Short answer: **not a clean, production-safe fix yet**. - -I found: -- Your Flynn endpoint clue (good), -- OpenClaw issues/PRs that match this exact failure mode (also good), -- but no merged upstream fix we can apply cleanly right now. - -Given your “no extra dependencies” constraint, the only remaining path is a **local patch** (cherry-pick PR logic into your installed OpenClaw). That can work, but it’s not “clean” because it’s a custom fork patch you’ll need to maintain across updates. So: - ✅ Diagnosis is solid. diff --git a/memory/plans/flynn-council-fix.md b/memory/plans/flynn-council-fix.md deleted file mode 100644 index 5985b95..0000000 --- a/memory/plans/flynn-council-fix.md +++ /dev/null @@ -1,128 +0,0 @@ -# Flynn Council Pipeline — Fix Plan - -**Goal**: Get Flynn's dual-council pipeline (`council.run`) working against real models so zap can delegate council tasks to Flynn as an external agent. - -**Branch**: `fix/council-pipeline` (off `main`) - -**Status**: The orchestrator code, types, schemas, tool registration, TUI `/council` command, and preflight check all exist. Unit tests pass (mocked). But the pipeline has never run successfully against real models. - ---- - -## Phase 1: Configuration & Agent Setup - -**Problem**: The council requires 5 named agents in `agent_configs` that don't exist in the default config (everything is commented out). - -**Tasks**: -1. Uncomment and populate `councils` block in `config/default.yaml` with `enabled: true`. -2. Define the 5 required agent configs: - - `council_d_arbiter` — D-group arbiter (feasibility-focused, structured JSON output) - - `council_d_freethinker` — D-group freethinker (ideation, boring-but-true) - - `council_p_arbiter` — P-group arbiter (novelty-focused, structured JSON output) - - `council_p_freethinker` — P-group freethinker (ideation, weird-is-fine) - - `council_meta_arbiter` — Meta merge agent (selects across both groups) -3. Each agent needs: - - A `system_prompt` that matches the pipeline's expected behavior (JSON-only output, role-specific framing) - - A `model_tier` (start with `default` for all; upgrade meta to `complex` after first success) -4. Decide whether to add grounder/writer agents or skip them initially (recommendation: skip, they're optional). - -**Acceptance**: `flynn tui` → `/council preflight` shows all agents resolved, tiers probed OK, no `[agent_missing]` flags. - ---- - -## Phase 2: Structured Output Compatibility - -**Problem**: The orchestrator demands strict JSON schema output (`responseFormat: jsonSchemaFormat(...)`) from every agent call. Most models handle this poorly or inconsistently. The pipeline has JSON repair + agent-based recovery, but if the underlying model doesn't support `response_format: json_schema`, it may fail before repair kicks in. - -**Tasks**: -1. Verify which models/providers in Flynn's config support `response_format` with `json_schema` type. - - OpenAI GPT-4o+: yes - - Anthropic Claude: no native `json_schema` (uses prompt-based JSON) - - Copilot/OpenRouter: depends on underlying model - - Ollama: partial support -2. Check how Flynn's model router handles `responseFormat` for providers that don't support it — does it silently drop it, error, or adapt? - - File: `src/models/` — check provider adapters -3. If needed, make the `responseFormat` parameter gracefully degrade: - - For providers without `json_schema` support, rely on the system prompt directive ("Return JSON only...") + the existing `parseWithAgentRecovery` fallback - - Don't hard-fail if the provider ignores `responseFormat` -4. Test with the actual configured model to confirm JSON output parses correctly through the Zod schemas. - -**Acceptance**: A single group round (D, round 1) completes without `repair_failed` or `parse_failed` using the configured model. - ---- - -## Phase 3: Bridge & Cap Validation - -**Problem**: `enforceBridgeCaps()` throws hard on any cap violation (`cap_exceeded`), which kills the entire run. Real model output is likely to exceed the tight defaults (e.g., `bridge_entry_max_chars: 300`). - -**Tasks**: -1. Review default cap values and increase if they're too restrictive for real output: - - `bridge_packet_max_chars: 2500` — may need 4000-5000 - - `bridge_entry_max_chars: 300` — may need 500-800 - - `bridge_field_max_bullets: 6` — probably fine -2. Consider making `enforceBridgeCaps` truncate rather than throw — trim entries to max chars, drop excess bullets, with a trace warning. -3. Alternatively, add a `strict_bridge: false` config option that allows soft enforcement. - -**Acceptance**: A 2-round run completes without `bridge_validation_failed` stop reason. - ---- - -## Phase 4: End-to-End Run - -**Tasks**: -1. Run `/council preflight` — confirm clean. -2. Run `/council ` — e.g., "What's the best approach to add persistent memory to an AI assistant?" -3. Verify: - - Pipeline reaches `max_rounds` or `convergence` stop reason (not an error). - - Both D and P groups produce shortlists. - - Meta merge produces `selected_primary` and `selected_secondary`. - - Artifacts are written to `~/.local/share/flynn/councils/`. - - Markdown summary is human-readable and useful. -4. Fix any issues surfaced during the run (likely: JSON format, cap overflow, agent prompt tuning). - -**Acceptance**: At least one clean end-to-end run with real models, artifacts saved, readable output. - ---- - -## Phase 5: Integration with Zap (OpenClaw) - -**Goal**: Let zap delegate council tasks to Flynn via external agent invocation. - -**Tasks**: -1. Determine the integration path: - - **Option A**: Flynn exposes a CLI command (`flynn council run --task "..."`) that zap can call via `exec`. - - **Option B**: Flynn exposes an HTTP endpoint for council runs (if gateway supports it). - - **Option C**: Zap uses `sessions_spawn` to invoke Flynn as an ACP agent with a council task. -2. Implement the chosen path (likely Option A as simplest): - - Add `flynn council run --task "" [--max-rounds N] [--output json|markdown]` CLI subcommand. - - Output the markdown summary to stdout, JSON to a file. -3. Update zap's council skill to support a `backend: flynn` option that delegates to Flynn instead of spawning subagents. - -**Acceptance**: Zap can invoke `flynn council run --task "..."` and get structured output back. - ---- - -## Estimated Work - -| Phase | Effort | Risk | -|-------|--------|------| -| 1. Config & agents | Small (config-only) | Low | -| 2. Structured output | Medium (may need provider adapter changes) | Medium — depends on model JSON compliance | -| 3. Bridge caps | Small (config + maybe truncation logic) | Low | -| 4. E2E run | Medium (iterative debugging) | Medium — real models are unpredictable | -| 5. Zap integration | Medium (new CLI command + skill update) | Low | - -**Total**: ~1-2 focused sessions. - ---- - -## Open Questions -- Which model tier to use for council agents? Start with `default` (cheapest), upgrade after confirmed working. -- Should we keep the scaffold system or skip it for now? Recommendation: skip (`scaffold_path` unset), use system prompts only. -- Do we need the writer agents? Recommendation: skip for v1, the meta arbiter output is sufficient. - -## TODO (from earlier council skill work) -- Revisit subagent personality depth -- Revisit skill name ("council") -- Consider unifying debate and parallel flows -- Experiment with 2-round sufficiency -- Test with different model tiers for advisors vs referee diff --git a/memory/projects.md b/memory/projects.md index 210cc34..6d1c352 100644 --- a/memory/projects.md +++ b/memory/projects.md @@ -73,16 +73,6 @@ Lightweight registry of active multi-session work. - Related memory: - `memory/tasks.json` -### Flynn council pipeline -- Status: active -- Goal: get Flynn's council pipeline working against real models, then make it delegable from zap -- Current state: - - tracked as a task in `memory/tasks.json` - - implementation notes live in `memory/plans/flynn-council-fix.md` -- Related memory: - - `memory/plans/flynn-council-fix.md` - - `memory/tasks.json` - ## Current roadmap focus ### Now diff --git a/memory/roadmap.md b/memory/roadmap.md index d804db9..8f6dbee 100644 --- a/memory/roadmap.md +++ b/memory/roadmap.md @@ -69,7 +69,6 @@ Useful, but lower urgency or more dependent on other foundations. - lightweight people-context layer - Home Assistant / smart-home integration - password-manager / secret lookup integration (high trust surface; do carefully) -- Flynn council pipeline follow-up - council personality depth / council naming revisit ## Guiding principle diff --git a/memory/tasks.json b/memory/tasks.json index 83a8fa4..0756f4b 100644 --- a/memory/tasks.json +++ b/memory/tasks.json @@ -53,21 +53,6 @@ "From council skill brainstorm session 2026-03-05." ] }, - { - "id": "task-20260305-1901-flynn-council-pipeline", - "created_at": "2026-03-05T19:01:00Z", - "title": "Fix Flynn council pipeline for real-model use + zap delegation", - "owner": "zap", - "priority": "medium", - "status": "open", - "details": "Get Flynn's dual-council pipeline (council.run) working against real models. Then integrate so zap can delegate council tasks to Flynn as an external agent. Work on feature branch fix/council-pipeline in ~/flynn.", - "notes": [ - "Plan: memory/plans/flynn-council-fix.md", - "5 phases: config/agent setup, structured output compat, bridge cap fixes, E2E run, zap integration.", - "Flynn codebase: ~/flynn/src/councils/ — orchestrator exists, unit tests pass (mocked), never run against real models.", - "Estimated effort: 1-2 focused sessions." - ] - }, { "id": "task-20260311-1908-calendar-access", "created_at": "2026-03-11T19:08:00Z",