Vault
research

Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed

Created

Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed

Builds-on: ai-token-economics-and-open-source-competition Related: anthropic-unit-economics-and-the-power-user-loss, anthropic-subsidy-stress-test, the-efficiency-counterthesis, the-data-center-convergence, ai-infrastructure-endgame-indicators


The Question

How fast has frontier-LLM pricing actually fallen since 2023? Where is the work-outcome-per-dollar ratio heading? Are prices broadly compressing, or is there a structural divergence between commodity and frontier tiers that the headline "10x/year" narrative misses?

This is the head-to-head reference doc. Three providers (Anthropic, OpenAI, Google), three capability tiers (Haiku-class, Sonnet-class, Opus-class), three years (May 2023 → May 2026).


TL;DR


Part 1: Three-Year Pricing History

Prices in USD per 1M tokens (input / output), standard tier, list pricing at launch. Mid-life price cuts called out separately.

Anthropic Claude

Model Release Input / Output Context Note
Claude 2 2023-07 $8 / $24 100K First broadly available API
Claude 2.1 2023-11 $8 / $24 200K 2x context, same price
Claude 3 Haiku 2024-03 $0.25 / $1.25 200K Cheapest Claude ever
Claude 3 Sonnet 2024-03 $3 / $15 200K Workhorse anchor
Claude 3 Opus 2024-03 $15 / $75 200K Frontier anchor
Claude 3.5 Sonnet 2024-06 $3 / $15 200K Beat Opus at Sonnet price
Claude 3.5 Haiku 2024-11 $1 / $5 200K 4x hike vs Haiku 3 (cut to $0.80/$4 Dec 2024)
Claude 3.7 Sonnet 2025-02 $3 / $15 200K First hybrid reasoning
Claude Opus 4 2025-05 $15 / $75 200K Frontier held
Claude Sonnet 4 2025-05 $3 / $15 200K/1M
Claude Opus 4.1 2025-08 $15 / $75 200K Held
Claude Sonnet 4.5 2025-09 $3 / $15 200K/1M
Claude Haiku 4.5 2025-10 $1 / $5 200K
Claude Opus 4.5 2025-11 $5 / $25 200K/1M 67% Opus-tier price cut in response to GPT-5
Claude Sonnet 4.6 2026-02 $3 / $15 200K/1M
Claude Opus 4.7 2026-04 $5 / $25 200K/1M New tokenizer uses up to 35% more tokens for fixed text (per Anthropic). Sticker unchanged from 4.5/4.6.

Caching/Batch: cache writes 1.25× input, cache reads 0.1× input (90% off), Batch API 50% off both. Stacks to ~95% off for cached batch workloads.

OpenAI

Model Release Input / Output Context Note
GPT-4 (8K) 2023-03 $30 / $60 8K Original frontier
GPT-4 32K 2023-03 $60 / $120 32K Retired
GPT-3.5 Turbo 2023-03→2024-01 $2/$2 → $0.50/$1.50 16K Multiple cuts
GPT-4 Turbo 2023-11 $10 / $30 128K 3x cut at DevDay
GPT-4o 2024-05 $5 / $15 → $2.50/$10 (2024-10) 128K
GPT-4o-mini 2024-07 $0.15 / $0.60 128K Replaced GPT-3.5 Turbo
o1-preview / o1 2024-09/12 $15 / $60 200K First reasoning
o3-mini 2025-01 $1.10 / $4.40 200K
GPT-4.5 2025-02 $75 / $150 128K Wound down July 2025
GPT-4.1 family 2025-04 $2/$8, $0.40/$1.60, $0.10/$0.40 1M Three sizes
o3 2025-04 $10/$40 → $2/$8 (Jun 2025, 80% cut) 200K
o3-pro 2025-06 $20 / $80 200K
GPT-5 2025-08 $1.25 / $10 400K Frontier price collapse
GPT-5 mini / nano 2025-08 $0.25/$2, $0.05/$0.40 400K
GPT-5.1 2025-11 $1.25 / $10 400K
GPT-5.2 2025-12 $0.875 / $7 400K
GPT-5.2 Pro 2025-12 $21 / $168 400K New premium tier
GPT-5.4 2026-Q1 $2.50 / $15 1M
GPT-5.5 / Pro 2026-Q2 premium tier ~$30/$180 1M Frontier sticker reversing

Google Gemini

Model Release Input / Output Context Note
Gemini 1.0 Pro 2023-12 $0.50 / $1.50 32K
Gemini 1.5 Pro 2024-04 $3.50 / $10.50 (<128K) 1M Launch
Gemini 1.5 Pro (cut) 2024-10 $1.25 / $5 2M 64% input cut
Gemini 1.5 Flash 2024-05 $0.075 / $0.30 1M
Gemini 1.5 Flash-8B 2024-10 $0.0375 / $0.15 1M 50% under 1.5 Flash
Gemini 2.0 Flash 2024-12 $0.10 / $0.40 1M
Gemini 2.5 Pro 2025-03 $1.25/$10 (<200K), $2.50/$15 (>200K) 1M
Gemini 2.5 Flash 2025-06 $0.30 / $2.50 1M
Gemini 2.5 Flash-Lite 2025-07 $0.10 / $0.40 1M
Gemini 3 Pro 2025-11 $2/$12 (<200K), $4/$18 (>200K) 1M+
Gemini 3.1 Pro 2026-Q1 $2 / $12 1M+

Part 2: Tier-by-Tier Compression

Frontier Tier — Opus / GPT-5 Pro / Gemini Ultra-class

The "absolute top capability money can buy" tier.

Date Cheapest frontier-capable Input / Output
2023-03 GPT-4 (8K) $30 / $60
2024-03 Claude 3 Opus $15 / $75
2024-09 o1-preview $15 / $60
2025-04 o3 (launch) $10 / $40
2025-06 o3 (post-cut) $2 / $8
2025-08 GPT-5 $1.25 / $10
2025-11 Claude Opus 4.5 / Gemini 3 Pro $5/$25 or $2/$12
2026-05 Opus 4.7 / GPT-5.5 / Gemini 3.1 Pro ~$2-5 / $12-25
2026-05 GPT-5.5 Pro (top reasoning) ~$30 / $180

Two diverging lines: the median frontier capability dropped ~85-95%. The absolute top reasoning tier (GPT-5.2 Pro, GPT-5.5 Pro, Opus 4.7 with reasoning) is back to 2023 GPT-4 prices or higher and now uses output-weighted tokenization that makes effective price even higher. This is the most important divergence in the entire dataset.

Workhorse Tier — Sonnet / GPT-4o / Gemini Pro-class

Date Reference model Input / Output
2023-03 GPT-4 $30 / $60
2023-11 GPT-4 Turbo $10 / $30
2024-03 Claude 3 Sonnet $3 / $15
2024-05 GPT-4o (launch) $5 / $15
2024-06 Claude 3.5 Sonnet $3 / $15
2024-10 GPT-4o cut / Gemini 1.5 Pro cut $2.50/$10 ; $1.25/$5
2025-04 GPT-4.1 $2 / $8
2025-08 GPT-5 $1.25 / $10
2026-05 Sonnet 4.6 / GPT-5.4 / Gemini 3.1 Pro $1.25-3 / $8-15

~96% reduction over 36 months. The notable structural feature: Anthropic anchored Sonnet at $3/$15 for the entire window while shipping Sonnet 3 → 3.5 → 3.7 → 4 → 4.5 → 4.6. Capability went up ~4-6x at constant price; effective intelligence-per-dollar improved that much without a sticker change.

Budget Tier — Haiku / 4o-mini / Flash-class

Date Reference model Input / Output
2023-03 GPT-3.5 Turbo $2 / $2
2023-12 Gemini 1.0 Pro $0.50 / $1.50
2024-03 Claude 3 Haiku $0.25 / $1.25
2024-05 Gemini 1.5 Flash $0.075 / $0.30
2024-07 GPT-4o-mini $0.15 / $0.60
2024-10 Gemini 1.5 Flash-8B $0.0375 / $0.15
2025-04 GPT-4.1 nano $0.10 / $0.40
2025-07 Gemini 2.5 Flash-Lite $0.10 / $0.40
2025-08 GPT-5 nano $0.05 / $0.40
2026-05 GPT-5 nano / Flash-8B-equiv $0.05 / $0.30

~97.5% reduction. Approaching the marginal serving cost of the underlying hardware. This is the tier where deflation looks complete.


Part 3: Intelligence-Per-Dollar Studies

Five sources independently measuring capability-per-dollar trajectory.

OpenRouter (the production-traffic empiricist)

State of AI 2025 (Nov 2025, with a16z, ArXiv 2601.10088). Empirical study of >100T tokens routed through OpenRouter Nov 2024-Nov 2025.

Directional claim: No durable monopoly on the commodity tier. The price-setter is whoever shipped a cheaper model crossing a capability threshold last quarter. Pluralism replaced the GPT-4 hegemony.

Artificial Analysis (the benchmark-normalized scorer)

Intelligence Index v4.0 (Jan 2026): equal-weighted average across Agents, Coding, Scientific Reasoning, General. Four pillars × 25% each. 10 evals (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt). Explicitly dropped MMLU-Pro, AIME 2025, LiveCodeBench (saturated/contaminated).

Directional claim: Pareto frontier moves down-and-right every quarter. Reasoning models converging toward sub-$1/M for index-60+ capability.

Epoch AI (the most rigorous source)

Methodology: lowest-priced model exceeding capability threshold on six benchmarks (MMLU, GPQA Diamond, MATH-500, MATH Level 5, HumanEval, Chatbot Arena Elo). Log-linear regression. 3:1 input/output weighting.

Critical caveat from Epoch: "The fastest price drops in that range have occurred in the past year, so it's less clear that those will persist." Translation: 2024-2025 acceleration may be price-war dynamics, not Moore's-Law durability. Most epistemically honest source on this.

Stanford AI Index 2024-2026

Famous chart: GPT-3.5-equivalent inference cost from $20.00/MTok (Nov 2022) to $0.07/MTok (Oct 2024) — ~280x in 22 months, roughly 50x/year geometric. 2026 report extends the trajectory.

a16z — "LLMflation" (Appenzeller, Nov 2024)

Most conservative published estimate. 10x/year for equivalent capability. Anchor: GPT-3 ($60/M, MMLU 42, Nov 2021) → Llama 3.2 3B ($0.06/M, same MMLU, 2024) = 1,000x in 3 years. Six drivers: GPU economics, FP16→FP4 quantization, software optimization, smaller models, better tuning, open-source pressure.

Consensus

Across all five sources, central tendency reconciles to:


Part 4: The Cost Stack and Hardware Velocity

What's driving the decline beneath the retail price.

Marginal cost of a token (GB200 NVL72 reference)

GB200 NVL72 delivers ~10x the tokens/watt of Hopper for MoE workloads.

Hardware velocity

Generation Year Inference TPS/W vs prior
H100 → H200 2023→2024 1.4-1.7x
H200 → B200 2024→2025 3-5x dense, up to 10x MoE
B200 → GB300 NVL72 2025 1.5x cost/token improvement at long context
GB300 → Rubin NVL72 2026 ~10x throughput/W
Rubin → Rubin Ultra 2027 +3.5x perf/W over B300
Rubin Ultra → Feynman 2028 TBD

Hardware alone: 2-3x/year sustained perf-per-watt through 2028. Beyond that, gains compress as 2nm/A14 nodes hit physical walls and HBM stack height tops out.

Algorithmic efficiency

Larger contributor than hardware.

Decomposition: ~2-3x/year from hardware, ~3-5x/year from algorithms = headline ~10x/year. The other ~5x in the most aggressive numbers is competitive subsidy.


Part 5: The Subsidy Gap

How much of the retail decline is real cost compression vs labs selling below cost?

Reconciliation: roughly half of the headline 10x/year retail decline is real cost compression; the other half is subsidy. When subsidies normalize in 2027-2028 (forced by IPO timing or VC return requirements), the retail curve flattens even though the cost curve keeps falling.

The frontier-vs-commodity divergence makes more sense in this frame: subsidy gets allocated to whichever tier maximizes share capture. Commodity tier is where price wars are happening; frontier tier doesn't need subsidy because it's rationed.


Part 6: Frontier vs Commodity — The Real Story

The headline number ("token prices fell 99%") hides the most important structural shift.

Commodity tier (anything that was frontier 18-24 months ago):

Frontier tier (current best reasoning models):

Net: the spread between cheapest-capable and frontier is widening, not compressing. This is the inverse of the popular "AI is being commoditized" narrative. The trailing edge is being commoditized; the leading edge is becoming a rationed product.

This matches the the-data-center-convergence thesis — capex concentration at the frontier, ratepayer socialization at the commodity, divergent unit economics across the curve.


Part 7: Forecasts 2026-2030

Gartner (March 2026)

Inference cost on a 1T-parameter LLM falls >90% by 2030 vs 2025. LLMs in 2030 will be ~100x more cost-efficient than 2022 equivalents. Caveat: agentic workloads consume 5-30x more tokens per task, so total spend rises.

Epoch AI extrapolation

If halving-every-2-months holds, 2026→2030 implies ~10^12 fixed-quality cost decline. Realistic ceiling is ~100-1000x as physical/algorithmic floors bind. Epoch itself flags the trend "may not persist."

Capital markets

BofA/Goldman: $1.6-1.7T cumulative datacenter capex by 2030. Omdia: capex peak ~2027, possible drop in 2028 "bubble scenario." Jensen pulled forward $1T annual buildout to 2028 from 2030.

Credible 2026-2030 range for commodity-tier cost per M tokens

Scenario Decline by 2030 Driver
Bull (Epoch trend holds) 100-1000x No physical walls, continued open-source pressure
Base (Gartner) ~10x (90% drop) Hardware + algorithmic compounding, partial subsidy unwind
Bear (capex peak + energy shock + subsidy unwind) 3-5x through 2028, flat after Hormuz / Iran shock + HBM supply + retrenchment

Time-to-equivalence for current frontier reaching current commodity prices

GPT-4o-mini current commodity price: ~$0.15/$0.60 per M. GPT-5-class frontier reasoning: ~$10-15/$50-75 per M blended. Spread: 50-100x.

Planning estimate: GPT-5-equivalent at $0.15-0.30/M blended by mid-2027 to early-2028.


Part 8: Counter-Forces That Could Stall or Reverse the Curve

  1. Energy shock. World Bank: energy prices up 24% in 2026, highest since 2022. Brent at ~$154 if Hormuz stays closed 12 weeks. Power becomes the binding constraint, not silicon. See hormuz-to-ai-repricing-causal-chain.
  2. Capex peak 2027-2028. Omdia: 2027 is the critical year because revenue commitments are due. If inference revenue doesn't catch up to capex, 2028 sees retrenchment.
  3. Subsidy unwind. OpenAI on path to profitability only in 2029. IPO or down-round forces price normalization. Anthropic's $100B AWS commit is a floor regardless of inference unit economics.
  4. Reasoning + agentic token explosion. Reasoning multiplies consumption 3-7x; agentic 5-30x per task. A simple classification costs $0.01 in chat, $0.10-0.50 as agentic workflow. Per-token cost falls, total spend rises.
  5. HBM supply. SK hynix sold out of 2026 HBM. Micron same. Supply rationing is a floor under inference COGS through 2027.
  6. Geopolitical/subsidy distortion. Chinese-model pricing (10-20x below US frontier) reflects subsidy + state strategy, not just cost structure. Removing that lever changes the curve materially.

What This Analysis Can't Resolve


Sources

Primary pricing data

Intelligence-per-dollar studies

Cost stack and hardware

Subsidy and unit economics

Counter-forces