docs(eval): record applied runtime rollback in decision log
This commit is contained in:
@@ -208,7 +208,8 @@ Track all tool-adjacent/risky prompts that were force-routed to native (`no_tool
|
|||||||
- `empty_assistant_text`
|
- `empty_assistant_text`
|
||||||
Window B shows fallback recovery (0%) but fails minimum sample/baseline gates, so it does not overturn Window A.
|
Window B shows fallback recovery (0%) but fails minimum sample/baseline gates, so it does not overturn Window A.
|
||||||
Window D confirms guardrail compatibility behavior in controlled probes, but guard compatibility alone is insufficient to justify expansion.
|
Window D confirms guardrail compatibility behavior in controlled probes, but guard compatibility alone is insufficient to justify expansion.
|
||||||
- Next cohort/config delta: route canary users back to native path (remove/disable `pi_embedded` canary routing) until latency/fallback defects are remediated and a fresh canary is re-approved.
|
- Next cohort/config delta: route canary users back to native path until latency/fallback defects are remediated and a fresh canary is re-approved.
|
||||||
|
Applied operational rollback in runtime config: `~/.config/flynn/config.yaml` now has `agent_configs.pi_canary.backend: native` (backup created as `~/.config/flynn/config.yaml.bak-rollback-20260223-224801`).
|
||||||
|
|
||||||
## Diagram/Protocol Impact Review
|
## Diagram/Protocol Impact Review
|
||||||
|
|
||||||
|
|||||||
@@ -7,7 +7,7 @@
|
|||||||
"status": "completed",
|
"status": "completed",
|
||||||
"date": "2026-02-24",
|
"date": "2026-02-24",
|
||||||
"updated": "2026-02-24",
|
"updated": "2026-02-24",
|
||||||
"summary": "Completed formal Pi embedded canary evaluation with audit-log summaries, minimum-sample thresholds, and controlled guard-coverage probes. Decision: `rollback` canary routing for now (do not expand) due to failed latency/fallback gates in Window A despite verified guard behavior in controlled probes.",
|
"summary": "Completed formal Pi embedded canary evaluation with audit-log summaries, minimum-sample thresholds, and controlled guard-coverage probes. Final decision: `rollback` canary routing (no expansion) due to failed latency/fallback gates in Window A despite verified guard behavior in controlled probes. Operational config rollback was applied by switching `agent_configs.pi_canary.backend` to `native` in runtime config.",
|
||||||
"files_modified": [
|
"files_modified": [
|
||||||
"src/audit/backendCanarySummary.ts",
|
"src/audit/backendCanarySummary.ts",
|
||||||
"src/audit/backendCanarySummary.test.ts",
|
"src/audit/backendCanarySummary.test.ts",
|
||||||
@@ -6492,7 +6492,7 @@
|
|||||||
"remaining_phases_completion": "Phase 1: 3/3 (100%) — context levels, command registry, memory structure. Phase 2: 3/3 (100%) — component registry, confidence routing, history index. Phase 3: 2/2 (100%) — adaptive memory/compaction, truthfulness/autonomy hardening",
|
"remaining_phases_completion": "Phase 1: 3/3 (100%) — context levels, command registry, memory structure. Phase 2: 3/3 (100%) — component registry, confidence routing, history index. Phase 3: 2/2 (100%) — adaptive memory/compaction, truthfulness/autonomy hardening",
|
||||||
"next_up": "Track OpenClaw evolution regularly for inspiration and feature ideas",
|
"next_up": "Track OpenClaw evolution regularly for inspiration and feature ideas",
|
||||||
"pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default",
|
"pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default",
|
||||||
"pi_embedded_evaluation_phase": "completed — final decision rollback: Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)"
|
"pi_embedded_evaluation_phase": "completed — final decision rollback (applied in runtime config): Window A failed latency/fallback gates (p50 +259ms, p95 +5695ms, fallback 25%, categories: pi_module_interface/empty_assistant_text); Window B remained sample-insufficient; controlled probes verified guard coverage (pi_no_tools_mode/capability_query/attachments_present each hit once)"
|
||||||
},
|
},
|
||||||
"soul_md_and_cron_create": {
|
"soul_md_and_cron_create": {
|
||||||
"date": "2026-02-11",
|
"date": "2026-02-11",
|
||||||
|
|||||||
Reference in New Issue
Block a user