Choosing a model in 2025
gpt-4o, gpt-4o-mini, o1/o3 (reasoning), o4-mini — when to use which. Real cost table and selection criteria.
Take one of your tasks (e.g. ticket classification). Run 20 examples on gpt-4o-mini and on gpt-4o. Compare accuracy and total usage. Decide whether gpt-4o justifies the price difference.
Copy and adapt to your context. Text in angle brackets should be replaced.
My task: <description: classification / extraction / generation / reasoning>. Volume: <requests/day>, avg input <N> tokens, output <M>. Budget: <$/month>. Tell me which OpenAI model to start with (gpt-4o-mini / gpt-4o / o4-mini / o1), which metrics to measure, and what signal warrants moving to a higher model.