diff --git a/README.md b/README.md index 2fdb6e9..f56f5a1 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ Self-hosted personal AI assistant with Telegram and Terminal interfaces. - **Capture Tools**: `screen.capture` and `camera.capture` tools for host capture workflows - **Session Transfer**: Move conversations between frontends - **CLI**: Full command-line interface (`flynn start`, `send`, `doctor`, `completion`, etc.) +- **Optional Pi Embedded Backend**: Canary-only in-process Pi runtime path (`pi_embedded`) with native fallback - **Shell Completion**: Auto-generated completions for bash, zsh, and fish with `--install` flag - **Cron Scheduling**: Automated messages on cron schedules with output routing - **Daily Briefing Automation**: Optional built-in morning briefing preset (calendar + inbox + tasks summary prompt) @@ -351,15 +352,22 @@ backends: claude_code: { enabled: false, path: /usr/local/bin/claude, args: [], timeout_ms: 120000 } opencode: { enabled: false, path: /usr/local/bin/opencode, args: [], timeout_ms: 120000 } gemini: { enabled: false, path: /usr/local/bin/gemini, args: [], timeout_ms: 120000 } + pi_embedded: + enabled: false + timeout_ms: 120000 + no_tools_mode: true + model: openclaw-default + system_prompt_mode: hybrid # flynn | pi_default | hybrid + module: "@badlogic/pi-agent-core" # optional module override ``` -Each external backend also supports `retries` and `retry_delay_ms` for transient CLI failures. +`pi_embedded` is intended for canary migration cohorts. In spike mode (`no_tools_mode: true`), Flynn keeps tool-oriented turns on native and only routes plain-text turns to Pi. When `args` is non-empty: - use `{prompt}` in an argument to inject the full generated prompt directly into argv. -- if `{prompt}` is not present, Flynn writes the prompt to stdin. +- if `{prompt}` is not present, Flynn appends backend-specific prompt args. -If multiple external backends are enabled, set `backends.default` to choose explicitly. If omitted, Flynn selects by priority: `codex` -> `claude_code` -> `opencode` -> `gemini`. +If multiple external backends are enabled, set `backends.default` to choose explicitly. If omitted, Flynn selects by priority: `codex` -> `claude_code` -> `opencode` -> `gemini` -> `pi_embedded`. You can also route specific named agents to a backend: @@ -367,7 +375,10 @@ You can also route specific named agents to a backend: agent_configs: coder: model_tier: complex - backend: codex # native | codex | claude_code | opencode | gemini + backend: codex # native | codex | claude_code | opencode | gemini | pi_embedded + pi_canary: + model_tier: default + backend: pi_embedded ``` ### Native Audio Support diff --git a/docs/api/PROTOCOL.md b/docs/api/PROTOCOL.md index e69e370..7a6da1e 100644 --- a/docs/api/PROTOCOL.md +++ b/docs/api/PROTOCOL.md @@ -36,6 +36,7 @@ The gateway serialises agent work **per session**, not per WebSocket connection: - Requests for different sessions can run in parallel. - Lane policy is configurable (`collect`, `followup`, `steer`, `steer_backlog`, `interrupt`) with per-channel and per-session overrides. - Session-local overrides can be managed at runtime via `agent.send` commands: `/queue`, `/queue set ...`, `/queue reset`. +- Backend selection for a turn is server-side (`native` by default, optional external backends per config: `claude_code`, `opencode`, `codex`, `gemini`, `pi_embedded`) and does not change JSON-RPC method signatures. This is implemented via a per-lane queue (`LaneQueue`) in the gateway server, and used by `agent.send` and `agent.cancel`. diff --git a/docs/architecture/AGENT_DIAGRAM.md b/docs/architecture/AGENT_DIAGRAM.md index 1e4b2ce..31f437b 100644 --- a/docs/architecture/AGENT_DIAGRAM.md +++ b/docs/architecture/AGENT_DIAGRAM.md @@ -31,6 +31,7 @@ flowchart LR SM[SessionManager\nSQLite] OR[AgentOrchestrator] NA[NativeAgent\n(tool loop)] + EB[Optional External Backends\nclaude_code/opencode/codex/gemini/pi_embedded] MR[ModelRouter] TP[ToolPolicy + ToolRegistry] TE[ToolExecutor\nhooks + enforcement + audit] @@ -60,7 +61,9 @@ flowchart LR CA --> RT RT --> SM RT --> OR + RT --> EB OR --> NA + EB --> MP NA --> MR MR --> MP @@ -107,6 +110,9 @@ ChannelAdapter -> ChannelRegistry | v | ModelClient | + +----> (optional, non-tool turns) ExternalBackend + (claude_code/opencode/codex/gemini/pi_embedded) + | +----> (optional) PairingManager gate for unknown senders Tool Calls (inside NativeAgent loop) @@ -130,6 +136,7 @@ Key files: - Routing + per-session agent creation: `src/daemon/routing.ts` - Orchestration: `src/backends/native/orchestrator.ts` - Tool loop: `src/backends/native/agent.ts` +- External backend adapters: `src/backends/external.ts`, `src/backends/piEmbedded.ts` - Model routing: `src/models/router.ts` - Tool policy + execution: `src/tools/policy.ts`, `src/tools/executor.ts` diff --git a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md index d87c87c..874a92d 100644 --- a/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md +++ b/docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md @@ -10,6 +10,7 @@ If you only want the protocol surface, see `docs/api/PROTOCOL.md`. - Each connection is attached to a `sessionId`. - Agent work is queued per `sessionId` (FIFO), not per connection. - Sessions persist in SQLite via `SessionManager` even if clients disconnect. +- Once dequeued, message routing may execute the native orchestrator path or an optional external backend path (`claude_code`, `opencode`, `codex`, `gemini`, `pi_embedded`) depending on agent/backend config. ## Component Map @@ -30,7 +31,7 @@ flowchart LR subgraph CORE[Flynn Core] SM[SessionManager\nin-memory cache + SQLite] SS[SessionStore\nSQLite tables] - AO[AgentOrchestrator] + AO[AgentOrchestrator / External Backends] end WS --> GS diff --git a/docs/plans/state.json b/docs/plans/state.json index cb17b72..1b97795 100644 --- a/docs/plans/state.json +++ b/docs/plans/state.json @@ -3,6 +3,33 @@ "updated_at": "2026-02-24", "description": "Tracks the status of all Flynn plans and implementation phases", "plans": { + "pi-embedded-backend-canary-spike": { + "status": "completed", + "date": "2026-02-24", + "updated": "2026-02-24", + "summary": "Implemented a Pi embedded canary spike with a new optional `pi_embedded` backend, guarded no-tools canary routing, backend success/fallback latency telemetry in audit logs, focused backend/schema/routing tests, and architecture/protocol documentation updates while keeping native orchestration as the default path.", + "files_modified": [ + "src/backends/piEmbedded.ts", + "src/backends/piEmbedded.test.ts", + "src/backends/external.ts", + "src/backends/index.ts", + "src/daemon/index.ts", + "src/daemon/routing.ts", + "src/daemon/routing.test.ts", + "src/config/schema.ts", + "src/config/schema.test.ts", + "src/agents/registry.ts", + "src/audit/types.ts", + "src/audit/logger.ts", + "config/default.yaml", + "README.md", + "docs/architecture/AGENT_DIAGRAM.md", + "docs/architecture/GATEWAY_SESSIONS_AND_QUEUE.md", + "docs/api/PROTOCOL.md", + "docs/plans/state.json" + ], + "test_status": "pnpm test:run src/backends/piEmbedded.test.ts src/config/schema.test.ts src/daemon/routing.test.ts + pnpm typecheck + pnpm lint (warnings only) passing" + }, "full-audit-hardening-and-config-consolidation": { "status": "completed", "date": "2026-02-24", @@ -6403,7 +6430,7 @@ } }, "overall_progress": { - "total_test_count": 1982, + "total_test_count": 1989, "all_tests_passing": true, "p0_completion": "3/3 (100%)", "p1_completion": "4/4 (100%)", @@ -6433,7 +6460,8 @@ "model_router_correctness": "completed — fallback paths now avoid duplicate clients, apply retry policy consistently, and reject unsupported OpenAI OAuth tool requests early", "native_audio_support": "completed — smart routing for native audio (Gemini/OpenAI/GitHub) vs Whisper transcription fallback, plus 2026-02-23 arg hydration hardening, tool.args_rewritten audit metric, transient fetch retry/timeout hardening, localhost->127.0.0.1 fallback for transcription endpoint connectivity, and whisper docker-compose entrypoint arg fix for port 18801", "remaining_phases_completion": "Phase 1: 3/3 (100%) — context levels, command registry, memory structure. Phase 2: 3/3 (100%) — component registry, confidence routing, history index. Phase 3: 2/2 (100%) — adaptive memory/compaction, truthfulness/autonomy hardening", - "next_up": "Track OpenClaw evolution regularly for inspiration and feature ideas" + "next_up": "Track OpenClaw evolution regularly for inspiration and feature ideas", + "pi_embedded_canary_spike": "completed — added optional pi_embedded backend adapter, canary-safe no-tools routing guard, backend success/fallback latency audit events, and docs/diagram updates while native remains default" }, "soul_md_and_cron_create": { "date": "2026-02-11",