GPT-5 vs Claude Opus 4.7 vs Gemini 2.5 Ultra: I Ran 100 Real Tasks and Tracked Every Dollar Spent - April 2026 Cost-Per-Output Showdown

📖 3 min read

Pricing pages lie. Every AI company shows you the per-token rate but none of them tell you what it actually costs to get useful work done. So I ran 100 real-world tasks across GPT-5, Claude Opus 4.7, and Gemini 2.5 Ultra and tracked the total cost per completed task.

The cheapest model was not the one with the lowest token price.

The Test Methodology

100 tasks split across 5 categories (20 each):

Code generation: Build functional components from specifications
Long document analysis: Extract insights from 20-50 page documents
Creative writing: Marketing copy, blog posts, email sequences
Data transformation: Parse, clean, and restructure datasets
Multi-step reasoning: Complex problem-solving requiring chains of logic

I measured: total tokens consumed (including retries), time to completion, accuracy of output (graded by a separate evaluator), and total API cost per successful task.

The Results

Code Generation (20 tasks)

Model	Success Rate	Avg Cost/Task	Avg Time
Claude Opus 4.7	95%	$0.18	45s
GPT-5	90%	$0.31	38s
Gemini 2.5 Ultra	85%	$0.24	52s

Claude dominated code generation. Higher first-pass accuracy meant fewer retries, which drove the cost down despite a similar per-token rate. GPT-5 was faster but needed more correction cycles.

Long Document Analysis (20 tasks)

Model	Success Rate	Avg Cost/Task	Avg Time
Gemini 2.5 Ultra	95%	$0.08	30s
Claude Opus 4.7	90%	$0.22	55s
GPT-5	85%	$0.28	42s

Gemini crushed this category. The 2M token context window meant zero chunking overhead, and Google priced input tokens aggressively. For pure document processing, Gemini is 2-3x cheaper than the competition.

Creative Writing (20 tasks)

Model	Success Rate	Avg Cost/Task	Avg Time
Claude Opus 4.7	90%	$0.15	35s
GPT-5	90%	$0.19	28s
Gemini 2.5 Ultra	80%	$0.12	40s

Close race between Claude and GPT-5. Claude produced more natural, less “AI-sounding” copy. GPT-5 was faster. Gemini was cheapest but output quality was noticeably more generic.

Data Transformation (20 tasks)

Model	Success Rate	Avg Cost/Task	Avg Time
GPT-5	95%	$0.14	25s
Claude Opus 4.7	90%	$0.16	32s
Gemini 2.5 Ultra	90%	$0.10	35s

GPT-5 edged ahead here. Structured output mode and function calling made data transformation tasks cleaner with fewer parsing errors.

Multi-Step Reasoning (20 tasks)

Model	Success Rate	Avg Cost/Task	Avg Time
Claude Opus 4.7	90%	$0.42	90s
GPT-5	85%	$0.55	75s
Gemini 2.5 Ultra	80%	$0.35	85s

The most expensive category across the board. Extended thinking tokens add up fast. Claude had the highest accuracy, Gemini was cheapest, GPT-5 sat in the middle.

Total Cost Across All 100 Tasks

Gemini 2.5 Ultra: $17.80 (cheapest overall)
Claude Opus 4.7: $22.60 (best quality-adjusted value)
GPT-5: $29.40 (most expensive when factoring retries)

The Real Takeaway

There is no single “best” model. The smart play in April 2026 is routing:

Documents and analysis – Gemini 2.5 Ultra
Code and reasoning – Claude Opus 4.7
Data transformation – GPT-5
Creative writing – Claude or GPT-5 depending on voice preference

Tools like OpenRouter and LiteLLM make multi-model routing dead simple. If you are still sending everything to one model, you are overpaying by 30-50%.

Get the Full Cost Comparison Spreadsheet

Download our updated April 2026 AI pricing tracker with all 100 task results, cost breakdowns, and routing recommendations.

Download Free Spreadsheet

GPT-5 vs Claude Opus 4.7 vs Gemini 2.5 Ultra: I Ran 100 Real Tasks and Tracked Every Dollar Spent – April 2026 Cost-Per-Output Showdown

The Test Methodology

The Results

Code Generation (20 tasks)

Long Document Analysis (20 tasks)

Creative Writing (20 tasks)

Data Transformation (20 tasks)

Multi-Step Reasoning (20 tasks)

Total Cost Across All 100 Tasks

The Real Takeaway

Get the Full Cost Comparison Spreadsheet

Leave a Comment Cancel Reply

The Test Methodology

The Results

Code Generation (20 tasks)

Long Document Analysis (20 tasks)

Creative Writing (20 tasks)

Data Transformation (20 tasks)

Multi-Step Reasoning (20 tasks)

Total Cost Across All 100 Tasks

The Real Takeaway

Get the Full Cost Comparison Spreadsheet

📚 Keep Reading

Leave a Comment Cancel Reply