docs: add Codex CLI subagent runbook

2026-02-22 17:15:16 -08:00
parent 982375b65e
commit 0775c9ede2
1 changed files with 102 additions and 0 deletions
@@ -0,0 +1,102 @@
+# Codex CLI as a Flynn subagent (runbook)
+
+This runbook documents how Flynn uses the **Codex CLI** as an external “subagent” for certain tasks.
+
+It is intentionally pragmatic and should be updated as we learn.
+
+## Goals
+
+- Use Codex CLI reliably in **non-interactive** contexts (Flynn backends, scripts).
+- Prefer predictable, easy-to-ingest output.
+- Treat subagent output as *advice* that Flynn digests and verifies when possible.
+
+## Key constraints
+
+### Interactive mode requires a TTY
+Running `codex` (no subcommand) is the interactive TUI and fails in non-interactive usage (e.g., from a backend runner) with errors like:
+
+- `Error: stdin is not a terminal`
+
+**Therefore, Flynn must use `codex exec`**.
+
+Reference: https://developers.openai.com/codex/noninteractive/
+
+## Default invocation
+
+### Plain-text (default)
+Flynn’s default is **plain-text stdout** (no JSON streaming) because it is the most compatible with external-backend execution and easiest to digest.
+
+Pattern:
+
+```bash
+codex exec --ephemeral -m gpt-5.3-codex "<PROMPT>"
+```
+
+Behavioral expectations:
+- **stdout**: final assistant message (what Flynn consumes)
+- **stderr**: logs/progress (ignored unless debugging)
+
+### Why `--ephemeral`
+Use `--ephemeral` for backend/subagent usage to avoid persisting sessions to disk.
+
+## When to use `--json`
+
+Codex supports `--json` to emit an event stream (JSONL). Flynn will only opt into this when it is clearly beneficial, e.g.:
+
+- debugging / auditing where event-level structure matters
+- programmatic extraction where the plain-text output is ambiguous
+
+Pattern:
+
+```bash
+codex exec --ephemeral --json -m gpt-5.3-codex "<PROMPT>"
+```
+
+Notes:
+- `--json` changes stdout format to **JSONL events**, which is not as drop-in as plain text.
+- If/when Flynn grows a dedicated parser for Codex JSONL, we can consider making `--json` the default.
+
+## Model selection
+
+Default model:
+- `gpt-5.3-codex`
+
+Policy:
+- Keep a single default unless a task clearly benefits from a different one.
+- If a run fails with model availability errors, verify via a small `codex exec -m <MODEL> "test"` smoke test and update this runbook + config.
+
+## Prompting guidance (subagent hygiene)
+
+- Put the **task** first.
+- Specify the expected **output format** when necessary.
+- Provide only relevant context; avoid leaking secrets.
+- If Codex is being used to draft code changes, ask it for:
+  - exact file paths
+  - minimal diffs/patches
+  - assumptions and risks
+
+## How Flynn digests Codex output
+
+When Flynn uses Codex:
+
+1. **Digest**: summarize the useful pieces and discard irrelevant content.
+2. **Verify** where possible (local grep/tests/lint) before claiming correctness.
+3. **Integrate** into the final response as:
+   - a patch/commit
+   - a concise plan
+   - extracted structured data
+
+Raw Codex output is shown only when:
+- Will asks for it
+- there’s ambiguity or a failure that requires inspection
+
+## Troubleshooting
+
+- `stdin is not a terminal`
+  - You invoked interactive `codex` instead of `codex exec`.
+
+- Output is noisy / includes progress
+  - Ensure you’re reading stdout only; progress typically goes to stderr.
+
+- Need deterministic structured output
+  - Use `--json` and add a parser (or request an explicit format in the prompt).