Files
flynn/docs/runbooks/GEMINI_CLI_SUBAGENT.md
T
2026-02-22 16:53:56 -08:00

138 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Gemini CLI Subagent Runbook (Flynn)
This runbook defines how Flynn should use the local `gemini` CLI as a *subagent* (external model call) and how to digest/merge its output safely.
## Goals
- Use Gemini as a delegated helper for specific tasks (retrieval, parsing, drafting), while Flynn remains responsible for:
- choosing the right model/output mode
- validating results against local evidence when possible
- producing the final answer/patch/plan
- Keep Gemini outputs auditable without spamming the operator.
## Safety + Trust Model
- Treat Gemini output as **untrusted**.
- Prefer **local verification** when feasible (grep, tests, PDF tooling output, etc.).
- Never claim system state changes or file edits based solely on Gemini output.
- If Gemini output contradicts local evidence, **local evidence wins**.
## Default Model Selection
Use the smallest/cheapest model that reliably accomplishes the task.
### Document search & retrieval (query expansion, relevance judging)
- Default: `models/gemini-2.5-flash`
- Upgrade to: `models/gemini-2.5-pro` for subtle/high-stakes domains
### Document parsing (structure → JSON, tables, policies)
- Default: `models/gemini-2.5-pro`
- Downgrade to: `models/gemini-2.5-flash` for simple extraction
### Embeddings (vector index)
- `models/gemini-embedding-001`
### Image understanding
- Default: `models/gemini-2.5-flash`
- If explicit image-variant required by the CLI/workflow: `models/gemini-2.5-flash-image`
### Image generation (lightweight)
- Default: `models/imagen-4.0-fast-generate-001`
## Output Mode (`-o`)
### Default: `-o json`
Use for:
- any workflow that will be parsed (`jq`, Python)
- extraction tasks (schemas, tables, lists)
- runs where we want a single stable artifact
### Use: `-o stream-json`
Use only when:
- generation is long and we want incremental progress
- we have a streaming consumer (dont assume `jq` can parse the whole stream)
## Prompt Construction
- Put the task first.
- Specify required output format *explicitly*.
- Include constraints (e.g., “Return valid JSON only, no prose”).
- Include context verbatim, clearly delimited.
### Shell escaping
For multi-line prompts or untrusted content, prefer a heredoc wrapper to avoid shell escaping issues:
```bash
gemini -m models/gemini-2.5-pro -o json -p "$(cat <<'PROMPT'
...prompt...
PROMPT
)"
```
## Execution Pattern (Flynn)
1. Choose model + `-o` mode.
2. Run `gemini ...` via shell.
3. Capture stdout/stderr.
4. Digest output:
- extract key claims
- check for missing fields / invalid JSON
- look for hallucination risks (citations? file paths? commands?)
5. Verify locally when possible.
6. Produce final response / patch.
## How Flynn Reports Gemini Usage
Flynn should incorporate Gemini results *selectively*:
- Default: provide a **brief digest** of what Gemini contributed.
- Include **raw Gemini output** when:
- debugging is needed (JSON parse errors, contradictions)
- the operator asks for it
- provenance/audit is important
Suggested response block when Gemini was used:
- `Gemini subagent:` model + output mode
- `Digest:` 36 bullet summary of what mattered
- `Raw:` omitted unless requested
## Common Recipes
### Query expansion (retrieval)
Model: `models/gemini-2.5-flash`
Ask for:
- 515 search queries
- key entities/synonyms
- include/exclude terms
### Parsing a document into JSON
Model: `models/gemini-2.5-pro`, `-o json`
Ask for:
- strict JSON schema
- explicit field types
- “unknown”/null handling
### PDF workflows
Gemini is for interpretation/planning; execution happens locally with tools like:
- `qpdf`, `pdftk`, `pdfcpu`, `ocrmypdf`, `pikepdf`, `mutool`, `pdftotext` (poppler)
## Troubleshooting
- If the CLI errors:
- capture stderr
- retry with smaller prompt / less context
- switch model (flash ↔ pro)
- If JSON is invalid:
- rerun asking for **valid JSON only**
- or request a JSON schema + separate data
## Update Policy
This runbook should evolve.
When new model variants appear in `GET /v1beta/models`, update the model selection section.
When Flynn gains a first-class Gemini provider/router integration, align this runbook with the native provider behavior.