📖 2 min read
Pricing pages lie. Every AI company shows you the per-token rate but none of them tell you what it actually costs to get useful work done. So I ran 100 real-world tasks across GPT-5, Claude Opus 4.7, and Gemini 2.5 Ultra and tracked the total cost per completed task.
The cheapest model was not the one with the lowest token price.
📧 Want more like this? Get our free The Ultimate AI Tool Database: 200+ Tools Rated & Ranked — Downloaded 5,000+ times
The Test Methodology
100 tasks split across 5 categories (20 each):
- Code generation: Build functional components from specifications
- Long document analysis: Extract insights from 20-50 page documents
- Creative writing: Marketing copy, blog posts, email sequences
- Data transformation: Parse, clean, and restructure datasets
- Multi-step reasoning: Complex problem-solving requiring chains of logic
I measured: total tokens consumed (including retries), time to completion, accuracy of output (graded by a separate evaluator), and total API cost per successful task.
The Results
Code Generation (20 tasks)
| Model | Success Rate | Avg Cost/Task | Avg Time |
|---|---|---|---|
| Claude Opus 4.7 | 95% | $0.18 | 45s |
| GPT-5 | 90% | $0.31 | 38s |
| Gemini 2.5 Ultra | 85% | $0.24 | 52s |
Claude dominated code generation. Higher first-pass accuracy meant fewer retries, which drove the cost down despite a similar per-token rate. GPT-5 was faster but needed more correction cycles.
📧 Want more like this? Get our free The Ultimate AI Tool Database: 200+ Tools Rated & Ranked — Downloaded 5,000+ times
Long Document Analysis (20 tasks)
| Model | Success Rate | Avg Cost/Task | Avg Time |
|---|---|---|---|
| Gemini 2.5 Ultra | 95% | $0.08 | 30s |
| Claude Opus 4.7 | 90% | $0.22 | 55s |
| GPT-5 | 85% | $0.28 | 42s |
Gemini crushed this category. The 2M token context window meant zero chunking overhead, and Google priced input tokens aggressively. For pure document processing, Gemini is 2-3x cheaper than the competition.
Creative Writing (20 tasks)
| Model | Success Rate | Avg Cost/Task | Avg Time |
|---|---|---|---|
| Claude Opus 4.7 | 90% | $0.15 | 35s |
| GPT-5 | 90% | $0.19 | 28s |
| Gemini 2.5 Ultra | 80% | $0.12 | 40s |
Close race between Claude and GPT-5. Claude produced more natural, less “AI-sounding” copy. GPT-5 was faster. Gemini was cheapest but output quality was noticeably more generic.
Data Transformation (20 tasks)
| Model | Success Rate | Avg Cost/Task | Avg Time |
|---|---|---|---|
| GPT-5 | 95% | $0.14 | 25s |
| Claude Opus 4.7 | 90% | $0.16 | 32s |
| Gemini 2.5 Ultra | 90% | $0.10 | 35s |
GPT-5 edged ahead here. Structured output mode and function calling made data transformation tasks cleaner with fewer parsing errors.
📧 Want more like this? Get our free The Ultimate AI Tool Database: 200+ Tools Rated & Ranked — Downloaded 5,000+ times
Multi-Step Reasoning (20 tasks)
| Model | Success Rate | Avg Cost/Task | Avg Time |
|---|---|---|---|
| Claude Opus 4.7 | 90% | $0.42 | 90s |
| GPT-5 | 85% | $0.55 | 75s |
| Gemini 2.5 Ultra | 80% | $0.35 | 85s |
The most expensive category across the board. Extended thinking tokens add up fast. Claude had the highest accuracy, Gemini was cheapest, GPT-5 sat in the middle.
Total Cost Across All 100 Tasks
- Gemini 2.5 Ultra: $17.80 (cheapest overall)
- Claude Opus 4.7: $22.60 (best quality-adjusted value)
- GPT-5: $29.40 (most expensive when factoring retries)
The Real Takeaway
There is no single “best” model. The smart play in April 2026 is routing:
- Documents and analysis – Gemini 2.5 Ultra
- Code and reasoning – Claude Opus 4.7
- Data transformation – GPT-5
- Creative writing – Claude or GPT-5 depending on voice preference
Tools like OpenRouter and LiteLLM make multi-model routing dead simple. If you are still sending everything to one model, you are overpaying by 30-50%.
Get the Full Cost Comparison Spreadsheet
Download our updated April 2026 AI pricing tracker with all 100 task results, cost breakdowns, and routing recommendations.