3.2 KiB
Codex CLI as a Flynn subagent (runbook)
This runbook documents how Flynn uses the Codex CLI as an external “subagent” for certain tasks.
It is intentionally pragmatic and should be updated as we learn.
Goals
- Use Codex CLI reliably in non-interactive contexts (Flynn backends, scripts).
- Prefer predictable, easy-to-ingest output.
- Treat subagent output as advice that Flynn digests and verifies when possible.
Key constraints
Interactive mode requires a TTY
Running codex (no subcommand) is the interactive TUI and fails in non-interactive usage (e.g., from a backend runner) with errors like:
Error: stdin is not a terminal
Therefore, Flynn must use codex exec.
Reference: https://developers.openai.com/codex/noninteractive/
Default invocation
Plain-text (default)
Flynn’s default is plain-text stdout (no JSON streaming) because it is the most compatible with external-backend execution and easiest to digest.
Pattern:
codex exec --ephemeral -m gpt-5.3-codex "<PROMPT>"
Behavioral expectations:
- stdout: final assistant message (what Flynn consumes)
- stderr: logs/progress (ignored unless debugging)
Why --ephemeral
Use --ephemeral for backend/subagent usage to avoid persisting sessions to disk.
When to use --json
Codex supports --json to emit an event stream (JSONL). Flynn will only opt into this when it is clearly beneficial, e.g.:
- debugging / auditing where event-level structure matters
- programmatic extraction where the plain-text output is ambiguous
Pattern:
codex exec --ephemeral --json -m gpt-5.3-codex "<PROMPT>"
Notes:
--jsonchanges stdout format to JSONL events, which is not as drop-in as plain text.- If/when Flynn grows a dedicated parser for Codex JSONL, we can consider making
--jsonthe default.
Model selection
Default model:
gpt-5.3-codex
Policy:
- Keep a single default unless a task clearly benefits from a different one.
- If a run fails with model availability errors, verify via a small
codex exec -m <MODEL> "test"smoke test and update this runbook + config.
Prompting guidance (subagent hygiene)
- Put the task first.
- Specify the expected output format when necessary.
- Provide only relevant context; avoid leaking secrets.
- If Codex is being used to draft code changes, ask it for:
- exact file paths
- minimal diffs/patches
- assumptions and risks
How Flynn digests Codex output
When Flynn uses Codex:
- Digest: summarize the useful pieces and discard irrelevant content.
- Verify where possible (local grep/tests/lint) before claiming correctness.
- Integrate into the final response as:
- a patch/commit
- a concise plan
- extracted structured data
Raw Codex output is shown only when:
- Will asks for it
- there’s ambiguity or a failure that requires inspection
Troubleshooting
-
stdin is not a terminal- You invoked interactive
codexinstead ofcodex exec.
- You invoked interactive
-
Output is noisy / includes progress
- Ensure you’re reading stdout only; progress typically goes to stderr.
-
Need deterministic structured output
- Use
--jsonand add a parser (or request an explicit format in the prompt).
- Use