chore(memory): remove flynn-related notes
This commit is contained in:
@@ -37,18 +37,6 @@
|
|||||||
- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
|
- Updated SKILL.md with experimental findings, recommended defaults by use case, cost profiles.
|
||||||
- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.
|
- New TODOs added: unify debate/parallel flows, test 2-round sufficiency, test mixed model tiers.
|
||||||
|
|
||||||
## Flynn council pipeline — fix plan written
|
|
||||||
- Reviewed Flynn codebase (`~/flynn/src/councils/`): orchestrator, types, scaffold, preflight, tool all exist. Unit tests pass (mocked). Never run against real models.
|
|
||||||
- Pipeline architecture: dual D/P groups (freethinker + arbiter + optional grounder each), bridge exchange between groups, meta arbiter merge, full trace/conversation logging.
|
|
||||||
- Created `memory/plans/flynn-council-fix.md` — 5-phase plan:
|
|
||||||
1. Config & agent setup (define 5 required agents in config)
|
|
||||||
2. Structured output compatibility (JSON schema support varies by provider)
|
|
||||||
3. Bridge & cap validation (defaults may be too tight for real output)
|
|
||||||
4. End-to-end run with real models
|
|
||||||
5. Integration with zap (CLI command or ACP agent)
|
|
||||||
- Will wants to keep zap's "light" council skill AND have Flynn's deterministic pipeline available for delegation.
|
|
||||||
- Work to happen on feature branch `fix/council-pipeline`.
|
|
||||||
- Estimated effort: 1-2 focused sessions.
|
|
||||||
- 2026-03-05T21:36Z: Ran `openclaw security audit --deep` on request to clear stale-audit warning.
|
- 2026-03-05T21:36Z: Ran `openclaw security audit --deep` on request to clear stale-audit warning.
|
||||||
- Result: 1 critical, 2 warn, 1 info.
|
- Result: 1 critical, 2 warn, 1 info.
|
||||||
- Critical: plugin `acpx.bak` code-safety issue (dangerous exec pattern).
|
- Critical: plugin `acpx.bak` code-safety issue (dangerous exec pattern).
|
||||||
|
|||||||
@@ -16,15 +16,6 @@ user: Sender (untrusted metadata):
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
[Wed 2026-03-04 02:02 UTC] So between the PR and ~/flynn you are not able to find a clean solution?
|
|
||||||
assistant: [[reply_to_current]] Short answer: **not a clean, production-safe fix yet**.
|
|
||||||
|
|
||||||
I found:
|
|
||||||
- Your Flynn endpoint clue (good),
|
|
||||||
- OpenClaw issues/PRs that match this exact failure mode (also good),
|
|
||||||
- but no merged upstream fix we can apply cleanly right now.
|
|
||||||
|
|
||||||
Given your “no extra dependencies” constraint, the only remaining path is a **local patch** (cherry-pick PR logic into your installed OpenClaw). That can work, but it’s not “clean” because it’s a custom fork patch you’ll need to maintain across updates.
|
|
||||||
|
|
||||||
So:
|
So:
|
||||||
- ✅ Diagnosis is solid.
|
- ✅ Diagnosis is solid.
|
||||||
|
|||||||
@@ -1,128 +0,0 @@
|
|||||||
# Flynn Council Pipeline — Fix Plan
|
|
||||||
|
|
||||||
**Goal**: Get Flynn's dual-council pipeline (`council.run`) working against real models so zap can delegate council tasks to Flynn as an external agent.
|
|
||||||
|
|
||||||
**Branch**: `fix/council-pipeline` (off `main`)
|
|
||||||
|
|
||||||
**Status**: The orchestrator code, types, schemas, tool registration, TUI `/council` command, and preflight check all exist. Unit tests pass (mocked). But the pipeline has never run successfully against real models.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 1: Configuration & Agent Setup
|
|
||||||
|
|
||||||
**Problem**: The council requires 5 named agents in `agent_configs` that don't exist in the default config (everything is commented out).
|
|
||||||
|
|
||||||
**Tasks**:
|
|
||||||
1. Uncomment and populate `councils` block in `config/default.yaml` with `enabled: true`.
|
|
||||||
2. Define the 5 required agent configs:
|
|
||||||
- `council_d_arbiter` — D-group arbiter (feasibility-focused, structured JSON output)
|
|
||||||
- `council_d_freethinker` — D-group freethinker (ideation, boring-but-true)
|
|
||||||
- `council_p_arbiter` — P-group arbiter (novelty-focused, structured JSON output)
|
|
||||||
- `council_p_freethinker` — P-group freethinker (ideation, weird-is-fine)
|
|
||||||
- `council_meta_arbiter` — Meta merge agent (selects across both groups)
|
|
||||||
3. Each agent needs:
|
|
||||||
- A `system_prompt` that matches the pipeline's expected behavior (JSON-only output, role-specific framing)
|
|
||||||
- A `model_tier` (start with `default` for all; upgrade meta to `complex` after first success)
|
|
||||||
4. Decide whether to add grounder/writer agents or skip them initially (recommendation: skip, they're optional).
|
|
||||||
|
|
||||||
**Acceptance**: `flynn tui` → `/council preflight` shows all agents resolved, tiers probed OK, no `[agent_missing]` flags.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 2: Structured Output Compatibility
|
|
||||||
|
|
||||||
**Problem**: The orchestrator demands strict JSON schema output (`responseFormat: jsonSchemaFormat(...)`) from every agent call. Most models handle this poorly or inconsistently. The pipeline has JSON repair + agent-based recovery, but if the underlying model doesn't support `response_format: json_schema`, it may fail before repair kicks in.
|
|
||||||
|
|
||||||
**Tasks**:
|
|
||||||
1. Verify which models/providers in Flynn's config support `response_format` with `json_schema` type.
|
|
||||||
- OpenAI GPT-4o+: yes
|
|
||||||
- Anthropic Claude: no native `json_schema` (uses prompt-based JSON)
|
|
||||||
- Copilot/OpenRouter: depends on underlying model
|
|
||||||
- Ollama: partial support
|
|
||||||
2. Check how Flynn's model router handles `responseFormat` for providers that don't support it — does it silently drop it, error, or adapt?
|
|
||||||
- File: `src/models/` — check provider adapters
|
|
||||||
3. If needed, make the `responseFormat` parameter gracefully degrade:
|
|
||||||
- For providers without `json_schema` support, rely on the system prompt directive ("Return JSON only...") + the existing `parseWithAgentRecovery` fallback
|
|
||||||
- Don't hard-fail if the provider ignores `responseFormat`
|
|
||||||
4. Test with the actual configured model to confirm JSON output parses correctly through the Zod schemas.
|
|
||||||
|
|
||||||
**Acceptance**: A single group round (D, round 1) completes without `repair_failed` or `parse_failed` using the configured model.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 3: Bridge & Cap Validation
|
|
||||||
|
|
||||||
**Problem**: `enforceBridgeCaps()` throws hard on any cap violation (`cap_exceeded`), which kills the entire run. Real model output is likely to exceed the tight defaults (e.g., `bridge_entry_max_chars: 300`).
|
|
||||||
|
|
||||||
**Tasks**:
|
|
||||||
1. Review default cap values and increase if they're too restrictive for real output:
|
|
||||||
- `bridge_packet_max_chars: 2500` — may need 4000-5000
|
|
||||||
- `bridge_entry_max_chars: 300` — may need 500-800
|
|
||||||
- `bridge_field_max_bullets: 6` — probably fine
|
|
||||||
2. Consider making `enforceBridgeCaps` truncate rather than throw — trim entries to max chars, drop excess bullets, with a trace warning.
|
|
||||||
3. Alternatively, add a `strict_bridge: false` config option that allows soft enforcement.
|
|
||||||
|
|
||||||
**Acceptance**: A 2-round run completes without `bridge_validation_failed` stop reason.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 4: End-to-End Run
|
|
||||||
|
|
||||||
**Tasks**:
|
|
||||||
1. Run `/council preflight` — confirm clean.
|
|
||||||
2. Run `/council <simple test task>` — e.g., "What's the best approach to add persistent memory to an AI assistant?"
|
|
||||||
3. Verify:
|
|
||||||
- Pipeline reaches `max_rounds` or `convergence` stop reason (not an error).
|
|
||||||
- Both D and P groups produce shortlists.
|
|
||||||
- Meta merge produces `selected_primary` and `selected_secondary`.
|
|
||||||
- Artifacts are written to `~/.local/share/flynn/councils/`.
|
|
||||||
- Markdown summary is human-readable and useful.
|
|
||||||
4. Fix any issues surfaced during the run (likely: JSON format, cap overflow, agent prompt tuning).
|
|
||||||
|
|
||||||
**Acceptance**: At least one clean end-to-end run with real models, artifacts saved, readable output.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Phase 5: Integration with Zap (OpenClaw)
|
|
||||||
|
|
||||||
**Goal**: Let zap delegate council tasks to Flynn via external agent invocation.
|
|
||||||
|
|
||||||
**Tasks**:
|
|
||||||
1. Determine the integration path:
|
|
||||||
- **Option A**: Flynn exposes a CLI command (`flynn council run --task "..."`) that zap can call via `exec`.
|
|
||||||
- **Option B**: Flynn exposes an HTTP endpoint for council runs (if gateway supports it).
|
|
||||||
- **Option C**: Zap uses `sessions_spawn` to invoke Flynn as an ACP agent with a council task.
|
|
||||||
2. Implement the chosen path (likely Option A as simplest):
|
|
||||||
- Add `flynn council run --task "<task>" [--max-rounds N] [--output json|markdown]` CLI subcommand.
|
|
||||||
- Output the markdown summary to stdout, JSON to a file.
|
|
||||||
3. Update zap's council skill to support a `backend: flynn` option that delegates to Flynn instead of spawning subagents.
|
|
||||||
|
|
||||||
**Acceptance**: Zap can invoke `flynn council run --task "..."` and get structured output back.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Estimated Work
|
|
||||||
|
|
||||||
| Phase | Effort | Risk |
|
|
||||||
|-------|--------|------|
|
|
||||||
| 1. Config & agents | Small (config-only) | Low |
|
|
||||||
| 2. Structured output | Medium (may need provider adapter changes) | Medium — depends on model JSON compliance |
|
|
||||||
| 3. Bridge caps | Small (config + maybe truncation logic) | Low |
|
|
||||||
| 4. E2E run | Medium (iterative debugging) | Medium — real models are unpredictable |
|
|
||||||
| 5. Zap integration | Medium (new CLI command + skill update) | Low |
|
|
||||||
|
|
||||||
**Total**: ~1-2 focused sessions.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Open Questions
|
|
||||||
- Which model tier to use for council agents? Start with `default` (cheapest), upgrade after confirmed working.
|
|
||||||
- Should we keep the scaffold system or skip it for now? Recommendation: skip (`scaffold_path` unset), use system prompts only.
|
|
||||||
- Do we need the writer agents? Recommendation: skip for v1, the meta arbiter output is sufficient.
|
|
||||||
|
|
||||||
## TODO (from earlier council skill work)
|
|
||||||
- Revisit subagent personality depth
|
|
||||||
- Revisit skill name ("council")
|
|
||||||
- Consider unifying debate and parallel flows
|
|
||||||
- Experiment with 2-round sufficiency
|
|
||||||
- Test with different model tiers for advisors vs referee
|
|
||||||
@@ -73,16 +73,6 @@ Lightweight registry of active multi-session work.
|
|||||||
- Related memory:
|
- Related memory:
|
||||||
- `memory/tasks.json`
|
- `memory/tasks.json`
|
||||||
|
|
||||||
### Flynn council pipeline
|
|
||||||
- Status: active
|
|
||||||
- Goal: get Flynn's council pipeline working against real models, then make it delegable from zap
|
|
||||||
- Current state:
|
|
||||||
- tracked as a task in `memory/tasks.json`
|
|
||||||
- implementation notes live in `memory/plans/flynn-council-fix.md`
|
|
||||||
- Related memory:
|
|
||||||
- `memory/plans/flynn-council-fix.md`
|
|
||||||
- `memory/tasks.json`
|
|
||||||
|
|
||||||
## Current roadmap focus
|
## Current roadmap focus
|
||||||
|
|
||||||
### Now
|
### Now
|
||||||
|
|||||||
@@ -69,7 +69,6 @@ Useful, but lower urgency or more dependent on other foundations.
|
|||||||
- lightweight people-context layer
|
- lightweight people-context layer
|
||||||
- Home Assistant / smart-home integration
|
- Home Assistant / smart-home integration
|
||||||
- password-manager / secret lookup integration (high trust surface; do carefully)
|
- password-manager / secret lookup integration (high trust surface; do carefully)
|
||||||
- Flynn council pipeline follow-up
|
|
||||||
- council personality depth / council naming revisit
|
- council personality depth / council naming revisit
|
||||||
|
|
||||||
## Guiding principle
|
## Guiding principle
|
||||||
|
|||||||
@@ -53,21 +53,6 @@
|
|||||||
"From council skill brainstorm session 2026-03-05."
|
"From council skill brainstorm session 2026-03-05."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"id": "task-20260305-1901-flynn-council-pipeline",
|
|
||||||
"created_at": "2026-03-05T19:01:00Z",
|
|
||||||
"title": "Fix Flynn council pipeline for real-model use + zap delegation",
|
|
||||||
"owner": "zap",
|
|
||||||
"priority": "medium",
|
|
||||||
"status": "open",
|
|
||||||
"details": "Get Flynn's dual-council pipeline (council.run) working against real models. Then integrate so zap can delegate council tasks to Flynn as an external agent. Work on feature branch fix/council-pipeline in ~/flynn.",
|
|
||||||
"notes": [
|
|
||||||
"Plan: memory/plans/flynn-council-fix.md",
|
|
||||||
"5 phases: config/agent setup, structured output compat, bridge cap fixes, E2E run, zap integration.",
|
|
||||||
"Flynn codebase: ~/flynn/src/councils/ — orchestrator exists, unit tests pass (mocked), never run against real models.",
|
|
||||||
"Estimated effort: 1-2 focused sessions."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"id": "task-20260311-1908-calendar-access",
|
"id": "task-20260311-1908-calendar-access",
|
||||||
"created_at": "2026-03-11T19:08:00Z",
|
"created_at": "2026-03-11T19:08:00Z",
|
||||||
|
|||||||
Reference in New Issue
Block a user