Files
flynn/docs/runbooks/CODEX_CLI_SUBAGENT.md
T
2026-02-22 17:15:16 -08:00

3.2 KiB
Raw Blame History

Codex CLI as a Flynn subagent (runbook)

This runbook documents how Flynn uses the Codex CLI as an external “subagent” for certain tasks.

It is intentionally pragmatic and should be updated as we learn.

Goals

  • Use Codex CLI reliably in non-interactive contexts (Flynn backends, scripts).
  • Prefer predictable, easy-to-ingest output.
  • Treat subagent output as advice that Flynn digests and verifies when possible.

Key constraints

Interactive mode requires a TTY

Running codex (no subcommand) is the interactive TUI and fails in non-interactive usage (e.g., from a backend runner) with errors like:

  • Error: stdin is not a terminal

Therefore, Flynn must use codex exec.

Reference: https://developers.openai.com/codex/noninteractive/

Default invocation

Plain-text (default)

Flynns default is plain-text stdout (no JSON streaming) because it is the most compatible with external-backend execution and easiest to digest.

Pattern:

codex exec --ephemeral -m gpt-5.3-codex "<PROMPT>"

Behavioral expectations:

  • stdout: final assistant message (what Flynn consumes)
  • stderr: logs/progress (ignored unless debugging)

Why --ephemeral

Use --ephemeral for backend/subagent usage to avoid persisting sessions to disk.

When to use --json

Codex supports --json to emit an event stream (JSONL). Flynn will only opt into this when it is clearly beneficial, e.g.:

  • debugging / auditing where event-level structure matters
  • programmatic extraction where the plain-text output is ambiguous

Pattern:

codex exec --ephemeral --json -m gpt-5.3-codex "<PROMPT>"

Notes:

  • --json changes stdout format to JSONL events, which is not as drop-in as plain text.
  • If/when Flynn grows a dedicated parser for Codex JSONL, we can consider making --json the default.

Model selection

Default model:

  • gpt-5.3-codex

Policy:

  • Keep a single default unless a task clearly benefits from a different one.
  • If a run fails with model availability errors, verify via a small codex exec -m <MODEL> "test" smoke test and update this runbook + config.

Prompting guidance (subagent hygiene)

  • Put the task first.
  • Specify the expected output format when necessary.
  • Provide only relevant context; avoid leaking secrets.
  • If Codex is being used to draft code changes, ask it for:
    • exact file paths
    • minimal diffs/patches
    • assumptions and risks

How Flynn digests Codex output

When Flynn uses Codex:

  1. Digest: summarize the useful pieces and discard irrelevant content.
  2. Verify where possible (local grep/tests/lint) before claiming correctness.
  3. Integrate into the final response as:
    • a patch/commit
    • a concise plan
    • extracted structured data

Raw Codex output is shown only when:

  • Will asks for it
  • theres ambiguity or a failure that requires inspection

Troubleshooting

  • stdin is not a terminal

    • You invoked interactive codex instead of codex exec.
  • Output is noisy / includes progress

    • Ensure youre reading stdout only; progress typically goes to stderr.
  • Need deterministic structured output

    • Use --json and add a parser (or request an explicit format in the prompt).