Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed
Builds-on: ai-token-economics-and-open-source-competition Related: anthropic-unit-economics-and-the-power-user-loss, anthropic-subsidy-stress-test, the-efficiency-counterthesis, the-data-center-convergence, ai-infrastructure-endgame-indicators
The Question
How fast has frontier-LLM pricing actually fallen since 2023? Where is the work-outcome-per-dollar ratio heading? Are prices broadly compressing, or is there a structural divergence between commodity and frontier tiers that the headline "10x/year" narrative misses?
This is the head-to-head reference doc. Three providers (Anthropic, OpenAI, Google), three capability tiers (Haiku-class, Sonnet-class, Opus-class), three years (May 2023 → May 2026).
TL;DR
- Commodity tier (Haiku/4o-mini/Flash): ~99% reduction in 3 years. $2/M → $0.05/M for the cheapest credible model. Approaching marginal-serving-cost floor.
- Workhorse tier (Sonnet/GPT-5/Gemini Pro): ~96% reduction. $30/M → $1.25/M. Anthropic anchored Sonnet at $3/$15 for three years while quadrupling capability.
- Frontier tier (Opus/GPT-5 Pro/Gemini 3 Pro): Mixed. Opus held $15/$75 from March 2024 through Aug 2025, cut to $5/$25 in Nov 2025 (67% drop). Opus 4.5, 4.6, and 4.7 all hold the $5/$25 sticker — but Opus 4.7 shipped a new tokenizer that uses up to 35% more tokens for the same fixed text (Anthropic's own pricing-page disclosure), and GPT-5.5 Pro pushed the top-reasoning sticker back to ~$30/$180. The frontier curve is flat-on-sticker but rising on effective per-task cost.
- Headline velocity: Median 50x/year (Epoch); a16z conservative 10x/year; Stanford ~280x over 22 months at GPT-3.5 tier. The honest planning number is ~10x/year for equivalent capability, decelerating toward 3-5x/year as easy compression exhausts.
- The structural finding: Commodity and frontier curves have decoupled. Commodity is in textbook deflation; frontier is becoming rationed and is now flat-to-rising. The popular "AI is being commoditized" narrative is true for the trailing edge and false for the leading edge.
- Floor estimate: GPT-4-equivalent capability at $0.05-0.10/M blended by end-2026 (already there for some workloads via Flash-Lite / nano tiers). The cheapest model exceeding Intelligence Index 60 has collapsed to ~$0.20/$0.50 per MTok (Grok 4 Fast).
- Subsidy: Roughly half the retail decline is real cost compression (hardware + algorithms); the other half is providers selling below cost to capture share. OpenAI burning $14B in 2026 against $13B revenue. Anthropic's path is different (Trainium take-or-pay) but still subsidized.
Part 1: Three-Year Pricing History
Prices in USD per 1M tokens (input / output), standard tier, list pricing at launch. Mid-life price cuts called out separately.
Anthropic Claude
| Model | Release | Input / Output | Context | Note |
|---|---|---|---|---|
| Claude 2 | 2023-07 | $8 / $24 | 100K | First broadly available API |
| Claude 2.1 | 2023-11 | $8 / $24 | 200K | 2x context, same price |
| Claude 3 Haiku | 2024-03 | $0.25 / $1.25 | 200K | Cheapest Claude ever |
| Claude 3 Sonnet | 2024-03 | $3 / $15 | 200K | Workhorse anchor |
| Claude 3 Opus | 2024-03 | $15 / $75 | 200K | Frontier anchor |
| Claude 3.5 Sonnet | 2024-06 | $3 / $15 | 200K | Beat Opus at Sonnet price |
| Claude 3.5 Haiku | 2024-11 | $1 / $5 | 200K | 4x hike vs Haiku 3 (cut to $0.80/$4 Dec 2024) |
| Claude 3.7 Sonnet | 2025-02 | $3 / $15 | 200K | First hybrid reasoning |
| Claude Opus 4 | 2025-05 | $15 / $75 | 200K | Frontier held |
| Claude Sonnet 4 | 2025-05 | $3 / $15 | 200K/1M | |
| Claude Opus 4.1 | 2025-08 | $15 / $75 | 200K | Held |
| Claude Sonnet 4.5 | 2025-09 | $3 / $15 | 200K/1M | |
| Claude Haiku 4.5 | 2025-10 | $1 / $5 | 200K | |
| Claude Opus 4.5 | 2025-11 | $5 / $25 | 200K/1M | 67% Opus-tier price cut in response to GPT-5 |
| Claude Sonnet 4.6 | 2026-02 | $3 / $15 | 200K/1M | |
| Claude Opus 4.7 | 2026-04 | $5 / $25 | 200K/1M | New tokenizer uses up to 35% more tokens for fixed text (per Anthropic). Sticker unchanged from 4.5/4.6. |
Caching/Batch: cache writes 1.25× input, cache reads 0.1× input (90% off), Batch API 50% off both. Stacks to ~95% off for cached batch workloads.
OpenAI
| Model | Release | Input / Output | Context | Note |
|---|---|---|---|---|
| GPT-4 (8K) | 2023-03 | $30 / $60 | 8K | Original frontier |
| GPT-4 32K | 2023-03 | $60 / $120 | 32K | Retired |
| GPT-3.5 Turbo | 2023-03→2024-01 | $2/$2 → $0.50/$1.50 | 16K | Multiple cuts |
| GPT-4 Turbo | 2023-11 | $10 / $30 | 128K | 3x cut at DevDay |
| GPT-4o | 2024-05 | $5 / $15 → $2.50/$10 (2024-10) | 128K | |
| GPT-4o-mini | 2024-07 | $0.15 / $0.60 | 128K | Replaced GPT-3.5 Turbo |
| o1-preview / o1 | 2024-09/12 | $15 / $60 | 200K | First reasoning |
| o3-mini | 2025-01 | $1.10 / $4.40 | 200K | |
| GPT-4.5 | 2025-02 | $75 / $150 | 128K | Wound down July 2025 |
| GPT-4.1 family | 2025-04 | $2/$8, $0.40/$1.60, $0.10/$0.40 | 1M | Three sizes |
| o3 | 2025-04 | $10/$40 → $2/$8 (Jun 2025, 80% cut) | 200K | |
| o3-pro | 2025-06 | $20 / $80 | 200K | |
| GPT-5 | 2025-08 | $1.25 / $10 | 400K | Frontier price collapse |
| GPT-5 mini / nano | 2025-08 | $0.25/$2, $0.05/$0.40 | 400K | |
| GPT-5.1 | 2025-11 | $1.25 / $10 | 400K | |
| GPT-5.2 | 2025-12 | $0.875 / $7 | 400K | |
| GPT-5.2 Pro | 2025-12 | $21 / $168 | 400K | New premium tier |
| GPT-5.4 | 2026-Q1 | $2.50 / $15 | 1M | |
| GPT-5.5 / Pro | 2026-Q2 | premium tier ~$30/$180 | 1M | Frontier sticker reversing |
Google Gemini
| Model | Release | Input / Output | Context | Note |
|---|---|---|---|---|
| Gemini 1.0 Pro | 2023-12 | $0.50 / $1.50 | 32K | |
| Gemini 1.5 Pro | 2024-04 | $3.50 / $10.50 (<128K) | 1M | Launch |
| Gemini 1.5 Pro (cut) | 2024-10 | $1.25 / $5 | 2M | 64% input cut |
| Gemini 1.5 Flash | 2024-05 | $0.075 / $0.30 | 1M | |
| Gemini 1.5 Flash-8B | 2024-10 | $0.0375 / $0.15 | 1M | 50% under 1.5 Flash |
| Gemini 2.0 Flash | 2024-12 | $0.10 / $0.40 | 1M | |
| Gemini 2.5 Pro | 2025-03 | $1.25/$10 (<200K), $2.50/$15 (>200K) | 1M | |
| Gemini 2.5 Flash | 2025-06 | $0.30 / $2.50 | 1M | |
| Gemini 2.5 Flash-Lite | 2025-07 | $0.10 / $0.40 | 1M | |
| Gemini 3 Pro | 2025-11 | $2/$12 (<200K), $4/$18 (>200K) | 1M+ | |
| Gemini 3.1 Pro | 2026-Q1 | $2 / $12 | 1M+ |
Part 2: Tier-by-Tier Compression
Frontier Tier — Opus / GPT-5 Pro / Gemini Ultra-class
The "absolute top capability money can buy" tier.
| Date | Cheapest frontier-capable | Input / Output |
|---|---|---|
| 2023-03 | GPT-4 (8K) | $30 / $60 |
| 2024-03 | Claude 3 Opus | $15 / $75 |
| 2024-09 | o1-preview | $15 / $60 |
| 2025-04 | o3 (launch) | $10 / $40 |
| 2025-06 | o3 (post-cut) | $2 / $8 |
| 2025-08 | GPT-5 | $1.25 / $10 |
| 2025-11 | Claude Opus 4.5 / Gemini 3 Pro | $5/$25 or $2/$12 |
| 2026-05 | Opus 4.7 / GPT-5.5 / Gemini 3.1 Pro | ~$2-5 / $12-25 |
| 2026-05 | GPT-5.5 Pro (top reasoning) | ~$30 / $180 |
Two diverging lines: the median frontier capability dropped ~85-95%. The absolute top reasoning tier (GPT-5.2 Pro, GPT-5.5 Pro, Opus 4.7 with reasoning) is back to 2023 GPT-4 prices or higher and now uses output-weighted tokenization that makes effective price even higher. This is the most important divergence in the entire dataset.
Workhorse Tier — Sonnet / GPT-4o / Gemini Pro-class
| Date | Reference model | Input / Output |
|---|---|---|
| 2023-03 | GPT-4 | $30 / $60 |
| 2023-11 | GPT-4 Turbo | $10 / $30 |
| 2024-03 | Claude 3 Sonnet | $3 / $15 |
| 2024-05 | GPT-4o (launch) | $5 / $15 |
| 2024-06 | Claude 3.5 Sonnet | $3 / $15 |
| 2024-10 | GPT-4o cut / Gemini 1.5 Pro cut | $2.50/$10 ; $1.25/$5 |
| 2025-04 | GPT-4.1 | $2 / $8 |
| 2025-08 | GPT-5 | $1.25 / $10 |
| 2026-05 | Sonnet 4.6 / GPT-5.4 / Gemini 3.1 Pro | $1.25-3 / $8-15 |
~96% reduction over 36 months. The notable structural feature: Anthropic anchored Sonnet at $3/$15 for the entire window while shipping Sonnet 3 → 3.5 → 3.7 → 4 → 4.5 → 4.6. Capability went up ~4-6x at constant price; effective intelligence-per-dollar improved that much without a sticker change.
Budget Tier — Haiku / 4o-mini / Flash-class
| Date | Reference model | Input / Output |
|---|---|---|
| 2023-03 | GPT-3.5 Turbo | $2 / $2 |
| 2023-12 | Gemini 1.0 Pro | $0.50 / $1.50 |
| 2024-03 | Claude 3 Haiku | $0.25 / $1.25 |
| 2024-05 | Gemini 1.5 Flash | $0.075 / $0.30 |
| 2024-07 | GPT-4o-mini | $0.15 / $0.60 |
| 2024-10 | Gemini 1.5 Flash-8B | $0.0375 / $0.15 |
| 2025-04 | GPT-4.1 nano | $0.10 / $0.40 |
| 2025-07 | Gemini 2.5 Flash-Lite | $0.10 / $0.40 |
| 2025-08 | GPT-5 nano | $0.05 / $0.40 |
| 2026-05 | GPT-5 nano / Flash-8B-equiv | $0.05 / $0.30 |
~97.5% reduction. Approaching the marginal serving cost of the underlying hardware. This is the tier where deflation looks complete.
Part 3: Intelligence-Per-Dollar Studies
Five sources independently measuring capability-per-dollar trajectory.
OpenRouter (the production-traffic empiricist)
State of AI 2025 (Nov 2025, with a16z, ArXiv 2601.10088). Empirical study of >100T tokens routed through OpenRouter Nov 2024-Nov 2025.
- OpenRouter throughput grew 4x YoY: ~5T tokens/week → >20T tokens/week
- Programming use case went from 11% to >50% of total throughput in 2025
- Open-weight models reached ~1/3 of all tokens by late 2025
- Chinese models hit ~45-61% of top-10 token volume by Feb-Apr 2026 (MiMo-V2-Pro, Qwen 3.6, Kimi, MiniMax, DeepSeek). Cited as 10-20x cheaper than US frontier equivalents.
- Distribution flattened from near-monopoly (GPT-4 era) to 5-7 model pluralism — no single model >25% share
Directional claim: No durable monopoly on the commodity tier. The price-setter is whoever shipped a cheaper model crossing a capability threshold last quarter. Pluralism replaced the GPT-4 hegemony.
Artificial Analysis (the benchmark-normalized scorer)
Intelligence Index v4.0 (Jan 2026): equal-weighted average across Agents, Coding, Scientific Reasoning, General. Four pillars × 25% each. 10 evals (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt). Explicitly dropped MMLU-Pro, AIME 2025, LiveCodeBench (saturated/contaminated).
- "Cost to Run Intelligence Index" metric: Grok 4.3 ~$395, GLM-5 ~$547
- Grok 4 Fast: 61M tokens to complete index vs Gemini 2.5 Pro 93M vs Grok 4 full 120M — token-efficiency is now a first-class axis, not just per-token price
- DeepSeek V3.2 (Reasoning) at $0.28/$0.42 per MTok sits on the Pareto frontier
- "Lowest-priced model exceeding Intelligence Index 60" collapsed to Grok 4 Fast at $0.20/$0.50/M
Directional claim: Pareto frontier moves down-and-right every quarter. Reasoning models converging toward sub-$1/M for index-60+ capability.
Epoch AI (the most rigorous source)
Methodology: lowest-priced model exceeding capability threshold on six benchmarks (MMLU, GPQA Diamond, MATH-500, MATH Level 5, HumanEval, Chatbot Arena Elo). Log-linear regression. 3:1 input/output weighting.
- Median decline: ~50x/year across benchmarks
- Range: 9x to 900x/year depending on task
- GPT-4-level performance on PhD-level science (GPQA Diamond): 40x/year
- Pre-training compute efficiency doubles every 7.6 months
- Cost to inference at fixed performance: halving every 2 months
Critical caveat from Epoch: "The fastest price drops in that range have occurred in the past year, so it's less clear that those will persist." Translation: 2024-2025 acceleration may be price-war dynamics, not Moore's-Law durability. Most epistemically honest source on this.
Stanford AI Index 2024-2026
Famous chart: GPT-3.5-equivalent inference cost from $20.00/MTok (Nov 2022) to $0.07/MTok (Oct 2024) — ~280x in 22 months, roughly 50x/year geometric. 2026 report extends the trajectory.
- Hardware cost declining ~30%/year
- Energy efficiency improving ~40%/year
- Open-weight vs closed gap narrowed from 8% to 1.7% in one year
a16z — "LLMflation" (Appenzeller, Nov 2024)
Most conservative published estimate. 10x/year for equivalent capability. Anchor: GPT-3 ($60/M, MMLU 42, Nov 2021) → Llama 3.2 3B ($0.06/M, same MMLU, 2024) = 1,000x in 3 years. Six drivers: GPU economics, FP16→FP4 quantization, software optimization, smaller models, better tuning, open-source pressure.
Consensus
Across all five sources, central tendency reconciles to:
- 10x/year is the conservative, robust, MMLU-anchored claim
- 50x/year is the geometric central tendency across most benchmarks
- 900x/year is the outlier for tasks where a small model just barely crossed a threshold
- Honest planning number: ~10x/year for equivalent capability, decelerating toward 3-5x/year as easy compression exhausts (2027-2028)
Part 4: The Cost Stack and Hardware Velocity
What's driving the decline beneath the retail price.
Marginal cost of a token (GB200 NVL72 reference)
- GPU compute: 50-60% (B200 sustains ~60k tokens/sec on gpt-oss, ~$0.02/M at NVIDIA reference config)
- HBM memory + bandwidth: 20-25% (decode phase is memory-bound, not compute-bound)
- Networking: 5-10% (NVLink 260 TB/s in Vera Rubin NVL72)
- Energy + cooling: 10-15% (PUE 1.09-1.15 with liquid cooling vs 1.56 industry avg)
- Datacenter shell + software overhead: 5-10%
GB200 NVL72 delivers ~10x the tokens/watt of Hopper for MoE workloads.
Hardware velocity
| Generation | Year | Inference TPS/W vs prior |
|---|---|---|
| H100 → H200 | 2023→2024 | 1.4-1.7x |
| H200 → B200 | 2024→2025 | 3-5x dense, up to 10x MoE |
| B200 → GB300 NVL72 | 2025 | 1.5x cost/token improvement at long context |
| GB300 → Rubin NVL72 | 2026 | ~10x throughput/W |
| Rubin → Rubin Ultra | 2027 | +3.5x perf/W over B300 |
| Rubin Ultra → Feynman | 2028 | TBD |
Hardware alone: 2-3x/year sustained perf-per-watt through 2028. Beyond that, gains compress as 2nm/A14 nodes hit physical walls and HBM stack height tops out.
Algorithmic efficiency
Larger contributor than hardware.
- Epoch: algorithmic cost-at-fixed-performance halving every 2 months (~100x/year)
- Densing Law (Tsinghua/Mianbi, ratified by Meta 2026): model density doubles every 3.3 months
- Stack of techniques: FP8 + Flash Attention 3 + continuous batching + speculative decoding = 5-8x cost-efficiency over naive FP16 on the same H100
- DeepSeek V3: $5.576M pre-train, 2.79M GPU-hours on H800. R1 distillation transfers reasoning to dense smaller models at ~5% of conventional cost. Single biggest algorithmic-efficiency data point of the cycle.
Decomposition: ~2-3x/year from hardware, ~3-5x/year from algorithms = headline ~10x/year. The other ~5x in the most aggressive numbers is competitive subsidy.
Part 5: The Subsidy Gap
How much of the retail decline is real cost compression vs labs selling below cost?
- OpenAI 2025: ~$8.4B inference COGS against $3.7B revenue. Losing ~$2 per $1 of inference revenue. 2026 projected $14B loss on $13B revenue. 33% gross margin. Profitability target: 2029.
- Anthropic April 2026: Amazon expanded deal to $25B in ($5B immediate, $20B milestone-gated) for >$100B Anthropic commit over 10 years on AWS + 5GW new Trainium capacity. Trainium runs 30-40% below H100 cost. Anthropic run-rate revenue passed $30B by April 2026 (up from ~$9B end-2025). Reframes the deal from "survival" to "buildout."
- Microsoft-OpenAI precedent: subsidies historically last ~5 years before lab leverage forces renegotiation. Anthropic is ~3 years in.
Reconciliation: roughly half of the headline 10x/year retail decline is real cost compression; the other half is subsidy. When subsidies normalize in 2027-2028 (forced by IPO timing or VC return requirements), the retail curve flattens even though the cost curve keeps falling.
The frontier-vs-commodity divergence makes more sense in this frame: subsidy gets allocated to whichever tier maximizes share capture. Commodity tier is where price wars are happening; frontier tier doesn't need subsidy because it's rationed.
Part 6: Frontier vs Commodity — The Real Story
The headline number ("token prices fell 99%") hides the most important structural shift.
Commodity tier (anything that was frontier 18-24 months ago):
- Falling 10-50x/year
- Approaches marginal serving cost
- GPT-4 quality at ~$0.40/M in 2026 vs ~$20/M in late 2022 (50x in 3 years)
- Stanford's flagship chart applies here
Frontier tier (current best reasoning models):
- Sticker prices are flat-to-rising; effective prices are clearly rising
- GPT-5.5 Pro: ~$30/$180 per M (sticker explicitly higher than GPT-5)
- GPT-5.2 Pro: $21/$168 per M
- Opus 4.7: $5/$25 sticker held — but Anthropic shipped a new tokenizer at 4.7 that uses up to 35% more tokens for the same fixed text (Anthropic pricing-page disclosure; 1.0x-1.35x depending on content type, with code/structured-data/non-English at the high end). Effective per-task cost rises 0-35% on the same workload despite the unchanged sticker. Extended thinking tokens bill at standard output rate, so higher reasoning effort compounds the tokenizer effect.
- "Fast mode" beta on Opus 4.6+ adds an explicit premium tier at $30/$150 per M (6x standard) — not the default, but it's a price-discrimination knob that didn't exist on prior Opus generations
- Reasoning models burn 60-120M tokens to complete the Intelligence Index — even at "cheap" per-token pricing, total cost-to-run is rising because token consumption per task is rising
- Training capex per Epoch: growing 2-3x/year, $1B+ runs by 2027
Net: the spread between cheapest-capable and frontier is widening, not compressing. This is the inverse of the popular "AI is being commoditized" narrative. The trailing edge is being commoditized; the leading edge is becoming a rationed product.
This matches the the-data-center-convergence thesis — capex concentration at the frontier, ratepayer socialization at the commodity, divergent unit economics across the curve.
Part 7: Forecasts 2026-2030
Gartner (March 2026)
Inference cost on a 1T-parameter LLM falls >90% by 2030 vs 2025. LLMs in 2030 will be ~100x more cost-efficient than 2022 equivalents. Caveat: agentic workloads consume 5-30x more tokens per task, so total spend rises.
Epoch AI extrapolation
If halving-every-2-months holds, 2026→2030 implies ~10^12 fixed-quality cost decline. Realistic ceiling is ~100-1000x as physical/algorithmic floors bind. Epoch itself flags the trend "may not persist."
Capital markets
BofA/Goldman: $1.6-1.7T cumulative datacenter capex by 2030. Omdia: capex peak ~2027, possible drop in 2028 "bubble scenario." Jensen pulled forward $1T annual buildout to 2028 from 2030.
Credible 2026-2030 range for commodity-tier cost per M tokens
| Scenario | Decline by 2030 | Driver |
|---|---|---|
| Bull (Epoch trend holds) | 100-1000x | No physical walls, continued open-source pressure |
| Base (Gartner) | ~10x (90% drop) | Hardware + algorithmic compounding, partial subsidy unwind |
| Bear (capex peak + energy shock + subsidy unwind) | 3-5x through 2028, flat after | Hormuz / Iran shock + HBM supply + retrenchment |
Time-to-equivalence for current frontier reaching current commodity prices
GPT-4o-mini current commodity price: ~$0.15/$0.60 per M. GPT-5-class frontier reasoning: ~$10-15/$50-75 per M blended. Spread: 50-100x.
- At Epoch median 50x/year: mid-to-late 2027
- At a16z 10x/year: early-to-mid 2028
- At decelerating 5x/year: 2029
Planning estimate: GPT-5-equivalent at $0.15-0.30/M blended by mid-2027 to early-2028.
Part 8: Counter-Forces That Could Stall or Reverse the Curve
- Energy shock. World Bank: energy prices up 24% in 2026, highest since 2022. Brent at ~$154 if Hormuz stays closed 12 weeks. Power becomes the binding constraint, not silicon. See hormuz-to-ai-repricing-causal-chain.
- Capex peak 2027-2028. Omdia: 2027 is the critical year because revenue commitments are due. If inference revenue doesn't catch up to capex, 2028 sees retrenchment.
- Subsidy unwind. OpenAI on path to profitability only in 2029. IPO or down-round forces price normalization. Anthropic's $100B AWS commit is a floor regardless of inference unit economics.
- Reasoning + agentic token explosion. Reasoning multiplies consumption 3-7x; agentic 5-30x per task. A simple classification costs $0.01 in chat, $0.10-0.50 as agentic workflow. Per-token cost falls, total spend rises.
- HBM supply. SK hynix sold out of 2026 HBM. Micron same. Supply rationing is a floor under inference COGS through 2027.
- Geopolitical/subsidy distortion. Chinese-model pricing (10-20x below US frontier) reflects subsidy + state strategy, not just cost structure. Removing that lever changes the curve materially.
What This Analysis Can't Resolve
- Whether the algorithmic-efficiency curve breaks before the hardware curve does. Densing Law's 3.3-month doubling is extraordinary and has no clear physical analog — it might stall at any time.
- Whether reasoning-token consumption growth permanently swamps per-token deflation. If a Phd-level task costs 100M tokens at frontier reasoning, "intelligence per dollar" hasn't changed even if per-token price drops.
- Whether the frontier-vs-commodity divergence is structural or cyclical. Could be permanent (rationed compute = rationed top capability) or could collapse if open-weight catches up to closed-frontier (DeepSeek-V3.2 already on the Pareto frontier).
- Subsidy timing. Microsoft-OpenAI precedent says ~5 years; if true, retail price normalization hits 2028-2029. But neither lab has fully precedent-binding economics.
- HBM as the chokepoint. If HBM4/4e supply doesn't expand fast enough, the hardware velocity assumption breaks regardless of algorithmic progress.
Sources
Primary pricing data
- Anthropic Pricing (contains the new-tokenizer disclosure for Opus 4.7) and announcement pages for Opus 4.7, Opus 4.5, Haiku 4.5, Sonnet 4.6, 3.7 Sonnet
- Finout: Opus 4.7 Pricing — The Real Cost Story
- byteiota: 35% Cost Inflation Hits API Users
- OpenAI API pricing and model release notes; GPT-4.1; GPT-5.2; o1-preview; 4o-mini
- Google Gemini API pricing; Gemini 1.5 Pro price cut; 2.5 Flash-Lite GA; Gemini 3 Pro docs
Intelligence-per-dollar studies
- OpenRouter State of AI 2025; OpenRouter rankings; ArXiv 2601.10088
- Artificial Analysis Intelligence Index; methodology
- Epoch AI inference price trends; persistence question; inference economics
- Stanford AI Index 2025; 2026
- a16z LLMflation
- Simon Willison LLM pricing tag; llm-prices repo
- Latent Space Jeff Dean interview
- The Price of Progress (arXiv 2511.23455)
Cost stack and hardware
- NVIDIA Blackwell InferenceMAX
- SemiAnalysis InferenceMAX
- Silicon Data LLM cost per token
- Introl HBM evolution
- NVIDIA Rubin announcement; Vera Rubin NVL72
- Densing Law (arXiv)
- DeepSeek inference cost explained
Subsidy and unit economics
- Anthropic-Amazon $25B expansion; About Amazon
- OpenAI 2026 forecast / $14B burn
- Gartner 2030 inference cost forecast
- BofA $1.7T datacenter forecast
- DCD $1.6T by 2030