Choosing a model in 2025

gpt-4o, gpt-4o-mini, o1/o3 (reasoning), o4-mini — when to use which. Real cost table and selection criteria.

22 min read3 questions in quizReady prompt includedIn progress

Practical exercise

What to do after this lesson

Take one of your tasks (e.g. ticket classification). Run 20 examples on gpt-4o-mini and on gpt-4o. Compare accuracy and total usage. Decide whether gpt-4o justifies the price difference.

Task grader

Take one of your tasks (e.g. ticket classification). Run 20 examples on gpt-4o-mini and on gpt-4o. Compare accuracy and total usage. Decide whether gpt-4o justifies the price difference.

Your answer

Ready-to-use prompt

Template for this lesson

Copy and adapt to your context. Text in angle brackets should be replaced.

My task: <description: classification / extraction / generation / reasoning>.
Volume: <requests/day>, avg input <N> tokens, output <M>.
Budget: <$/month>.

Tell me which OpenAI model to start with (gpt-4o-mini / gpt-4o /
o4-mini / o1), which metrics to measure, and what signal warrants
moving to a higher model.

Model lineup 2025

| Model | Class | Strong at | Speed | Price (in / out, $/1M tokens) | |---|---|---|---|---| | gpt-4o | multimodal flagship | text+images+audio, fast general intelligence | fast | ~$2.50 / $10.00 | | gpt-4o-mini | cheap default | classification, simple tasks, high volume | very fast | ~$0.15 / $0.60 | | o1 | reasoning | hard math, logic, multi-step planning | slow | ~$15 / $60 | | o3 | top reasoning | the heaviest tasks, agentic chains | slow | ~$10 / $40 (tier-dependent) | | o4-mini | cheap reasoning | reasoning on a tight budget | medium | ~$1.10 / $4.40 |

Verify exact prices at openai.com/api/pricing — they change. The orders of magnitude here are stable: mini models are 10–50× cheaper than reasoning ones.

How to choose

gpt-4o-mini is the default. Start with it for classification, data extraction, short answers, routing. It handles ~80% of production load.

gpt-4o when you need a "smart generalist": quality text generation, vision (image analysis), multimodality, complex instructions.

o-models (o1/o3/o4-mini) only when the task truly requires reasoning: proofs, multi-step planning, complex code with edge cases, puzzles. They are slow (seconds to tens of seconds) and expensive due to hidden reasoning tokens.

o4-mini is the compromise: reasoning but cheaper than o1/o3. A good pick for agentic pipelines that need to "think" on a limited budget.

Reasoning models: API differences

They don't support temperature, top_p, presence_penalty (ignored or rejected).

They use max_completion_tokens, not max_tokens.

They spend invisible reasoning tokens — counted in usage.completion_tokens_details.reasoning_tokens and billed.

There's a reasoning_effort parameter: "low" | "medium" | "high" — a tradeoff between speed/cost and depth.

const r = await client.chat.completions.create({ model: "o4-mini", reasoning_effort: "medium", max_completion_tokens: 4000, messages: [{ role: "user", content: "Prove that √2 is irrational." }], }); console.log(r.usage?.completion_tokens_details?.reasoning_tokens);

Practical cost rule of thumb

100k requests at ~500 input + 200 output tokens:

gpt-4o-mini: ≈ (50M × $0.15 + 20M × $0.60) / 1M ≈ $19.5

gpt-4o: ≈ (50M × $2.50 + 20M × $10) / 1M ≈ $325

o1: ≈ (50M × $15 + 20M × $60) / 1M ≈ $1950 (plus reasoning tokens on top)

A 100× difference — that's why you always start with mini and raise the model only where a quality metric demands it.

Report a bug

Choosing a model in 2025

Task grader

Prompt sandbox

Quiz — 3 questions

Discussion

Model lineup 2025

How to choose

Reasoning models: API differences

Practical cost rule of thumb