Files
flynn/docs/runbooks/GEMINI_CLI_SUBAGENT.md
2026-02-22 16:53:56 -08:00

4.1 KiB
Raw Permalink Blame History

Gemini CLI Subagent Runbook (Flynn)

This runbook defines how Flynn should use the local gemini CLI as a subagent (external model call) and how to digest/merge its output safely.

Goals

  • Use Gemini as a delegated helper for specific tasks (retrieval, parsing, drafting), while Flynn remains responsible for:
    • choosing the right model/output mode
    • validating results against local evidence when possible
    • producing the final answer/patch/plan
  • Keep Gemini outputs auditable without spamming the operator.

Safety + Trust Model

  • Treat Gemini output as untrusted.
  • Prefer local verification when feasible (grep, tests, PDF tooling output, etc.).
  • Never claim system state changes or file edits based solely on Gemini output.
  • If Gemini output contradicts local evidence, local evidence wins.

Default Model Selection

Use the smallest/cheapest model that reliably accomplishes the task.

Document search & retrieval (query expansion, relevance judging)

  • Default: models/gemini-2.5-flash
  • Upgrade to: models/gemini-2.5-pro for subtle/high-stakes domains

Document parsing (structure → JSON, tables, policies)

  • Default: models/gemini-2.5-pro
  • Downgrade to: models/gemini-2.5-flash for simple extraction

Embeddings (vector index)

  • models/gemini-embedding-001

Image understanding

  • Default: models/gemini-2.5-flash
  • If explicit image-variant required by the CLI/workflow: models/gemini-2.5-flash-image

Image generation (lightweight)

  • Default: models/imagen-4.0-fast-generate-001

Output Mode (-o)

Default: -o json

Use for:

  • any workflow that will be parsed (jq, Python)
  • extraction tasks (schemas, tables, lists)
  • runs where we want a single stable artifact

Use: -o stream-json

Use only when:

  • generation is long and we want incremental progress
  • we have a streaming consumer (dont assume jq can parse the whole stream)

Prompt Construction

  • Put the task first.
  • Specify required output format explicitly.
  • Include constraints (e.g., “Return valid JSON only, no prose”).
  • Include context verbatim, clearly delimited.

Shell escaping

For multi-line prompts or untrusted content, prefer a heredoc wrapper to avoid shell escaping issues:

gemini -m models/gemini-2.5-pro -o json -p "$(cat <<'PROMPT'
...prompt...
PROMPT
)"

Execution Pattern (Flynn)

  1. Choose model + -o mode.
  2. Run gemini ... via shell.
  3. Capture stdout/stderr.
  4. Digest output:
    • extract key claims
    • check for missing fields / invalid JSON
    • look for hallucination risks (citations? file paths? commands?)
  5. Verify locally when possible.
  6. Produce final response / patch.

How Flynn Reports Gemini Usage

Flynn should incorporate Gemini results selectively:

  • Default: provide a brief digest of what Gemini contributed.
  • Include raw Gemini output when:
    • debugging is needed (JSON parse errors, contradictions)
    • the operator asks for it
    • provenance/audit is important

Suggested response block when Gemini was used:

  • Gemini subagent: model + output mode
  • Digest: 36 bullet summary of what mattered
  • Raw: omitted unless requested

Common Recipes

Query expansion (retrieval)

Model: models/gemini-2.5-flash

Ask for:

  • 515 search queries
  • key entities/synonyms
  • include/exclude terms

Parsing a document into JSON

Model: models/gemini-2.5-pro, -o json

Ask for:

  • strict JSON schema
  • explicit field types
  • “unknown”/null handling

PDF workflows

Gemini is for interpretation/planning; execution happens locally with tools like:

  • qpdf, pdftk, pdfcpu, ocrmypdf, pikepdf, mutool, pdftotext (poppler)

Troubleshooting

  • If the CLI errors:
    • capture stderr
    • retry with smaller prompt / less context
    • switch model (flash ↔ pro)
  • If JSON is invalid:
    • rerun asking for valid JSON only
    • or request a JSON schema + separate data

Update Policy

This runbook should evolve.

When new model variants appear in GET /v1beta/models, update the model selection section. When Flynn gains a first-class Gemini provider/router integration, align this runbook with the native provider behavior.