Files
flynn/docs/runbooks/CODEX_CLI_SUBAGENT.md
T
2026-02-22 17:15:16 -08:00

103 lines
3.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Codex CLI as a Flynn subagent (runbook)
This runbook documents how Flynn uses the **Codex CLI** as an external “subagent” for certain tasks.
It is intentionally pragmatic and should be updated as we learn.
## Goals
- Use Codex CLI reliably in **non-interactive** contexts (Flynn backends, scripts).
- Prefer predictable, easy-to-ingest output.
- Treat subagent output as *advice* that Flynn digests and verifies when possible.
## Key constraints
### Interactive mode requires a TTY
Running `codex` (no subcommand) is the interactive TUI and fails in non-interactive usage (e.g., from a backend runner) with errors like:
- `Error: stdin is not a terminal`
**Therefore, Flynn must use `codex exec`**.
Reference: https://developers.openai.com/codex/noninteractive/
## Default invocation
### Plain-text (default)
Flynns default is **plain-text stdout** (no JSON streaming) because it is the most compatible with external-backend execution and easiest to digest.
Pattern:
```bash
codex exec --ephemeral -m gpt-5.3-codex "<PROMPT>"
```
Behavioral expectations:
- **stdout**: final assistant message (what Flynn consumes)
- **stderr**: logs/progress (ignored unless debugging)
### Why `--ephemeral`
Use `--ephemeral` for backend/subagent usage to avoid persisting sessions to disk.
## When to use `--json`
Codex supports `--json` to emit an event stream (JSONL). Flynn will only opt into this when it is clearly beneficial, e.g.:
- debugging / auditing where event-level structure matters
- programmatic extraction where the plain-text output is ambiguous
Pattern:
```bash
codex exec --ephemeral --json -m gpt-5.3-codex "<PROMPT>"
```
Notes:
- `--json` changes stdout format to **JSONL events**, which is not as drop-in as plain text.
- If/when Flynn grows a dedicated parser for Codex JSONL, we can consider making `--json` the default.
## Model selection
Default model:
- `gpt-5.3-codex`
Policy:
- Keep a single default unless a task clearly benefits from a different one.
- If a run fails with model availability errors, verify via a small `codex exec -m <MODEL> "test"` smoke test and update this runbook + config.
## Prompting guidance (subagent hygiene)
- Put the **task** first.
- Specify the expected **output format** when necessary.
- Provide only relevant context; avoid leaking secrets.
- If Codex is being used to draft code changes, ask it for:
- exact file paths
- minimal diffs/patches
- assumptions and risks
## How Flynn digests Codex output
When Flynn uses Codex:
1. **Digest**: summarize the useful pieces and discard irrelevant content.
2. **Verify** where possible (local grep/tests/lint) before claiming correctness.
3. **Integrate** into the final response as:
- a patch/commit
- a concise plan
- extracted structured data
Raw Codex output is shown only when:
- Will asks for it
- theres ambiguity or a failure that requires inspection
## Troubleshooting
- `stdin is not a terminal`
- You invoked interactive `codex` instead of `codex exec`.
- Output is noisy / includes progress
- Ensure youre reading stdout only; progress typically goes to stderr.
- Need deterministic structured output
- Use `--json` and add a parser (or request an explicit format in the prompt).