The model family
| Model | Context | Strength | When to use |
|---|---|---|---|
| gemini-2.0-flash | 1M tokens | Fast, multimodal, cheap, native tool use | Default for most tasks |
| gemini-1.5-pro | 2M tokens | Top reasoning quality, huge context | Complex analysis, long docs/video |
| gemini-1.5-flash | 1M tokens | Cheaper and faster than pro, good quality | High-throughput simple tasks |
| gemini-1.5-flash-8b | 1M tokens | Cheapest, classification/extraction | Bulk simple operations |
Ultra was historically announced, but the public API lineup centers on 2.0 Flash and 1.5 Pro/Flash. Don't hardcode reliance on "Ultra".
Context up to 2M tokens
1.5 Pro accepts up to 2 million tokens — about 1.5 hours of video, ~19 hours of audio, or ~2M words of text. This changes architecture: sometimes it's cheaper and simpler to drop the whole document into context than to build RAG.
Cost is computed separately for input and output tokens, and for 1.5 Pro the rate increases beyond 128k tokens of context. Before choosing a model, estimate average input size, request frequency, and whether you need the 2M context.
Selection heuristic
Need cheap and fast, typical task -> gemini-2.0-flash
Deep analysis / >1M context -> gemini-1.5-pro
Bulk classification/extraction -> gemini-1.5-flash-8b
Need Live API (realtime voice) -> gemini-2.0-flash (live)
List models programmatically
for m in genai.list_models():
if "generateContent" in m.supported_generation_methods:
print(m.name, m.input_token_limit, m.output_token_limit)
Don't hardcode "the latest" model string blindly: aliases like gemini-2.0-flash point to the current stable version, while -001/-exp suffixes pin a specific snapshot for reproducibility.