
Cheap, previousâgen âminiâ models are wildly under-rated
For many workloads they feel close enough to todayâs top models that the cost gap becomes the dominant variable, not the quality gap.
How Much Cheaper Are Minis?
Hereâs what âx times cheaperâ actually looks like at the token level for previousâgen minis versus currentâgen flagships. Prices are per 1M tokens.[1][2][3][4]
OpenAI: GPTâ4oâmini vs GPTâ5
| Metric | GPTâ4oâmini | GPTâ5 | Savings factor |
|---|---|---|---|
| Input (/1M) | $0.075 [1] | $0.625 [1] | 8.3x cheaper |
| Output (/1M) | $0.30 [1] | $5.00 [1] | 16.7x cheaper |
A ânormalâ job with 700k input and 300k output tokens costs about $0.10 on GPTâ4oâmini versus $1.97 on GPTâ5, close to a 20x difference for something that often feels similar for drafting, summarisation, and light coding.[1]
Anthropic: Claude 3.5 Haiku vs Claude 4 Opus
| Metric | Claude 3.5 Haiku | Claude 4 Opus | Savings factor |
|---|---|---|---|
| Input (/1M) | $0.25 [2] | $3.00 [3] | 12x cheaper |
| Output (/1M) | $1.25 [2] | $15.00 [3] | 12x cheaper |
Run the same 700k/300k job and youâre looking at roughly $0.55 on Haiku versus $6.60 on Opus, again about 12x cheaper while still being âgood enoughâ for a lot of production tasks.[2][3]
Google: Gemini 1.5 Flash vs Gemini 2.5 Pro
| Metric | Gemini 1.5 Flash | Gemini 2.5 Pro | Savings factor |
|---|---|---|---|
| Input (/1M) | $0.075 [4] | $1.25 [4] | 16.7x cheaper |
| Output (/1M) | $0.30 [4] | $3.75 [4] | 12.5x cheaper |
Here the same job comes out around $0.10 on 1.5 Flash versus $2.44 on 2.5 Pro, more than 20x cheaper for outputâheavy workloads like report generation or multiâturn agents.[4]
Why This Matters In Practice
Once you look at the multipliers, a pattern emerges: previousâgen minis are often 8â20x cheaper per token while still being perfectly viable as default âdaily driverâ models. For internal tools, batch jobs, RAG systems, and agents, that price curve means you can either cut your bill by an order of magnitude or run 10x more experiments for the same budget.[3][2][4][1]
The top models (GPTâ5, Claude Opus, Gemini Pro) are still worth it for the genuinely hard stuff: complex reasoning, messy multiâstep workflows, highâstakes outputs, or when latency and retries matter more than pure cost. But if you route everything through them by default, you are almost certainly burning money you do not need to spend.[5][6][7]
A Simple Routing Mental Model
A practical pattern that falls out of these numbers:
- Default to previousâgen mini (GPTâ4oâmini, Claude 3.5 Haiku, Gemini 1.5 Flash) for 80â90% of traffic: chatbots, CRUD apps, summarisation, log analysis, boilerplate code, content drafts.[8][9][10]
- Escalate to currentâgen top models only when:
- The prompt is clearly âhardâ (nested reasoning, long tool chains, high ambiguity).
- The user or system flags low confidence and asks for a second opinion.
- The domain is highârisk or highâvalue (legal, medical, financial, execâvisible outputs).
With that split, you keep the UX of frontier models where it matters, and let the cheap workhorses quietly do everything else for a tenth (or less) of the price.
References & Further Reading
- [1] OpenAI Pricing
- [2] Claude Haiku Cost & Capabilities
- [3] Anthropic: Claude Opus 4.5
- [4] Gemini API Pricing
- [5] Introducing GPT-5
- [6] LLMs: Bigger is Not Always Better
- [7] Claude Log: Model Comparison
- [8] GPT-4o-mini Evaluation Report
- [9] Claude 3.5 Haiku Comparison
- [10] Gemini 2.5 Flash vs 1.5 Pro