From 0775c9ede22dbf5c5b925d4fd6d0a920d197c1bf Mon Sep 17 00:00:00 2001 From: William Valentin Date: Sun, 22 Feb 2026 17:15:16 -0800 Subject: [PATCH] docs: add Codex CLI subagent runbook --- docs/runbooks/CODEX_CLI_SUBAGENT.md | 102 ++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) create mode 100644 docs/runbooks/CODEX_CLI_SUBAGENT.md diff --git a/docs/runbooks/CODEX_CLI_SUBAGENT.md b/docs/runbooks/CODEX_CLI_SUBAGENT.md new file mode 100644 index 0000000..ff02969 --- /dev/null +++ b/docs/runbooks/CODEX_CLI_SUBAGENT.md @@ -0,0 +1,102 @@ +# Codex CLI as a Flynn subagent (runbook) + +This runbook documents how Flynn uses the **Codex CLI** as an external “subagent” for certain tasks. + +It is intentionally pragmatic and should be updated as we learn. + +## Goals + +- Use Codex CLI reliably in **non-interactive** contexts (Flynn backends, scripts). +- Prefer predictable, easy-to-ingest output. +- Treat subagent output as *advice* that Flynn digests and verifies when possible. + +## Key constraints + +### Interactive mode requires a TTY +Running `codex` (no subcommand) is the interactive TUI and fails in non-interactive usage (e.g., from a backend runner) with errors like: + +- `Error: stdin is not a terminal` + +**Therefore, Flynn must use `codex exec`**. + +Reference: https://developers.openai.com/codex/noninteractive/ + +## Default invocation + +### Plain-text (default) +Flynn’s default is **plain-text stdout** (no JSON streaming) because it is the most compatible with external-backend execution and easiest to digest. + +Pattern: + +```bash +codex exec --ephemeral -m gpt-5.3-codex "" +``` + +Behavioral expectations: +- **stdout**: final assistant message (what Flynn consumes) +- **stderr**: logs/progress (ignored unless debugging) + +### Why `--ephemeral` +Use `--ephemeral` for backend/subagent usage to avoid persisting sessions to disk. + +## When to use `--json` + +Codex supports `--json` to emit an event stream (JSONL). Flynn will only opt into this when it is clearly beneficial, e.g.: + +- debugging / auditing where event-level structure matters +- programmatic extraction where the plain-text output is ambiguous + +Pattern: + +```bash +codex exec --ephemeral --json -m gpt-5.3-codex "" +``` + +Notes: +- `--json` changes stdout format to **JSONL events**, which is not as drop-in as plain text. +- If/when Flynn grows a dedicated parser for Codex JSONL, we can consider making `--json` the default. + +## Model selection + +Default model: +- `gpt-5.3-codex` + +Policy: +- Keep a single default unless a task clearly benefits from a different one. +- If a run fails with model availability errors, verify via a small `codex exec -m "test"` smoke test and update this runbook + config. + +## Prompting guidance (subagent hygiene) + +- Put the **task** first. +- Specify the expected **output format** when necessary. +- Provide only relevant context; avoid leaking secrets. +- If Codex is being used to draft code changes, ask it for: + - exact file paths + - minimal diffs/patches + - assumptions and risks + +## How Flynn digests Codex output + +When Flynn uses Codex: + +1. **Digest**: summarize the useful pieces and discard irrelevant content. +2. **Verify** where possible (local grep/tests/lint) before claiming correctness. +3. **Integrate** into the final response as: + - a patch/commit + - a concise plan + - extracted structured data + +Raw Codex output is shown only when: +- Will asks for it +- there’s ambiguity or a failure that requires inspection + +## Troubleshooting + +- `stdin is not a terminal` + - You invoked interactive `codex` instead of `codex exec`. + +- Output is noisy / includes progress + - Ensure you’re reading stdout only; progress typically goes to stderr. + +- Need deterministic structured output + - Use `--json` and add a parser (or request an explicit format in the prompt).