Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed

Builds-on: ai-token-economics-and-open-source-competition Related: anthropic-unit-economics-and-the-power-user-loss, anthropic-subsidy-stress-test, the-efficiency-counterthesis, the-data-center-convergence, ai-infrastructure-endgame-indicators

The Question

How fast has frontier-LLM pricing actually fallen since 2023? Where is the work-outcome-per-dollar ratio heading? Are prices broadly compressing, or is there a structural divergence between commodity and frontier tiers that the headline "10x/year" narrative misses?

This is the head-to-head reference doc. Three providers (Anthropic, OpenAI, Google), three capability tiers (Haiku-class, Sonnet-class, Opus-class), three years (May 2023 → May 2026).

TL;DR

Commodity tier (Haiku/4o-mini/Flash): ~99% reduction in 3 years. $2/M → $0.05/M for the cheapest credible model. Approaching marginal-serving-cost floor.
Workhorse tier (Sonnet/GPT-5/Gemini Pro): ~96% reduction. $30/M → $1.25/M. Anthropic anchored Sonnet at $3/$15 for three years while quadrupling capability.
Frontier tier (Opus/GPT-5 Pro/Gemini 3 Pro): Mixed. Opus held $15/$75 from March 2024 through Aug 2025, cut to $5/$25 in Nov 2025 (67% drop). Opus 4.5, 4.6, and 4.7 all hold the $5/$25 sticker — but Opus 4.7 shipped a new tokenizer that uses up to 35% more tokens for the same fixed text (Anthropic's own pricing-page disclosure), and GPT-5.5 Pro pushed the top-reasoning sticker back to ~$30/$180. The frontier curve is flat-on-sticker but rising on effective per-task cost.
Headline velocity: Median 50x/year (Epoch); a16z conservative 10x/year; Stanford ~280x over 22 months at GPT-3.5 tier. The honest planning number is ~10x/year for equivalent capability, decelerating toward 3-5x/year as easy compression exhausts.
The structural finding: Commodity and frontier curves have decoupled. Commodity is in textbook deflation; frontier is becoming rationed and is now flat-to-rising. The popular "AI is being commoditized" narrative is true for the trailing edge and false for the leading edge.
Floor estimate: GPT-4-equivalent capability at $0.05-0.10/M blended by end-2026 (already there for some workloads via Flash-Lite / nano tiers). The cheapest model exceeding Intelligence Index 60 has collapsed to ~$0.20/$0.50 per MTok (Grok 4 Fast).
Subsidy: Roughly half the retail decline is real cost compression (hardware + algorithms); the other half is providers selling below cost to capture share. OpenAI burning $14B in 2026 against $13B revenue. Anthropic's path is different (Trainium take-or-pay) but still subsidized.

Part 1: Three-Year Pricing History

Prices in USD per 1M tokens (input / output), standard tier, list pricing at launch. Mid-life price cuts called out separately.

Anthropic Claude

Model	Release	Input / Output	Context	Note
Claude 2	2023-07	$8 / $24	100K	First broadly available API
Claude 2.1	2023-11	$8 / $24	200K	2x context, same price
Claude 3 Haiku	2024-03	$0.25 / $1.25	200K	Cheapest Claude ever
Claude 3 Sonnet	2024-03	$3 / $15	200K	Workhorse anchor
Claude 3 Opus	2024-03	$15 / $75	200K	Frontier anchor
Claude 3.5 Sonnet	2024-06	$3 / $15	200K	Beat Opus at Sonnet price
Claude 3.5 Haiku	2024-11	$1 / $5	200K	4x hike vs Haiku 3 (cut to $0.80/$4 Dec 2024)
Claude 3.7 Sonnet	2025-02	$3 / $15	200K	First hybrid reasoning
Claude Opus 4	2025-05	$15 / $75	200K	Frontier held
Claude Sonnet 4	2025-05	$3 / $15	200K/1M
Claude Opus 4.1	2025-08	$15 / $75	200K	Held
Claude Sonnet 4.5	2025-09	$3 / $15	200K/1M
Claude Haiku 4.5	2025-10	$1 / $5	200K
Claude Opus 4.5	2025-11	$5 / $25	200K/1M	67% Opus-tier price cut in response to GPT-5
Claude Sonnet 4.6	2026-02	$3 / $15	200K/1M
Claude Opus 4.7	2026-04	$5 / $25	200K/1M	New tokenizer uses up to 35% more tokens for fixed text (per Anthropic). Sticker unchanged from 4.5/4.6.

Caching/Batch: cache writes 1.25× input, cache reads 0.1× input (90% off), Batch API 50% off both. Stacks to ~95% off for cached batch workloads.

OpenAI

Model	Release	Input / Output	Context	Note
GPT-4 (8K)	2023-03	$30 / $60	8K	Original frontier
GPT-4 32K	2023-03	$60 / $120	32K	Retired
GPT-3.5 Turbo	2023-03→2024-01	$2/$2 → $0.50/$1.50	16K	Multiple cuts
GPT-4 Turbo	2023-11	$10 / $30	128K	3x cut at DevDay
GPT-4o	2024-05	$5 / $15 → $2.50/$10 (2024-10)	128K
GPT-4o-mini	2024-07	$0.15 / $0.60	128K	Replaced GPT-3.5 Turbo
o1-preview / o1	2024-09/12	$15 / $60	200K	First reasoning
o3-mini	2025-01	$1.10 / $4.40	200K
GPT-4.5	2025-02	$75 / $150	128K	Wound down July 2025
GPT-4.1 family	2025-04	$2/$8, $0.40/$1.60, $0.10/$0.40	1M	Three sizes
o3	2025-04	$10/$40 → $2/$8 (Jun 2025, 80% cut)	200K
o3-pro	2025-06	$20 / $80	200K
GPT-5	2025-08	$1.25 / $10	400K	Frontier price collapse
GPT-5 mini / nano	2025-08	$0.25/$2, $0.05/$0.40	400K
GPT-5.1	2025-11	$1.25 / $10	400K
GPT-5.2	2025-12	$0.875 / $7	400K
GPT-5.2 Pro	2025-12	$21 / $168	400K	New premium tier
GPT-5.4	2026-Q1	$2.50 / $15	1M
GPT-5.5 / Pro	2026-Q2	premium tier ~$30/$180	1M	Frontier sticker reversing

Google Gemini

Model	Release	Input / Output	Context	Note
Gemini 1.0 Pro	2023-12	$0.50 / $1.50	32K
Gemini 1.5 Pro	2024-04	$3.50 / $10.50 (<128K)	1M	Launch
Gemini 1.5 Pro (cut)	2024-10	$1.25 / $5	2M	64% input cut
Gemini 1.5 Flash	2024-05	$0.075 / $0.30	1M
Gemini 1.5 Flash-8B	2024-10	$0.0375 / $0.15	1M	50% under 1.5 Flash
Gemini 2.0 Flash	2024-12	$0.10 / $0.40	1M
Gemini 2.5 Pro	2025-03	$1.25/$10 (<200K), $2.50/$15 (>200K)	1M
Gemini 2.5 Flash	2025-06	$0.30 / $2.50	1M
Gemini 2.5 Flash-Lite	2025-07	$0.10 / $0.40	1M
Gemini 3 Pro	2025-11	$2/$12 (<200K), $4/$18 (>200K)	1M+
Gemini 3.1 Pro	2026-Q1	$2 / $12	1M+

Part 2: Tier-by-Tier Compression

Frontier Tier — Opus / GPT-5 Pro / Gemini Ultra-class

The "absolute top capability money can buy" tier.

Date	Cheapest frontier-capable	Input / Output
2023-03	GPT-4 (8K)	$30 / $60
2024-03	Claude 3 Opus	$15 / $75
2024-09	o1-preview	$15 / $60
2025-04	o3 (launch)	$10 / $40
2025-06	o3 (post-cut)	$2 / $8
2025-08	GPT-5	$1.25 / $10
2025-11	Claude Opus 4.5 / Gemini 3 Pro	$5/$25 or $2/$12
2026-05	Opus 4.7 / GPT-5.5 / Gemini 3.1 Pro	~$2-5 / $12-25
2026-05	GPT-5.5 Pro (top reasoning)	~$30 / $180

Two diverging lines: the median frontier capability dropped ~85-95%. The absolute top reasoning tier (GPT-5.2 Pro, GPT-5.5 Pro, Opus 4.7 with reasoning) is back to 2023 GPT-4 prices or higher and now uses output-weighted tokenization that makes effective price even higher. This is the most important divergence in the entire dataset.

Workhorse Tier — Sonnet / GPT-4o / Gemini Pro-class

Date	Reference model	Input / Output
2023-03	GPT-4	$30 / $60
2023-11	GPT-4 Turbo	$10 / $30
2024-03	Claude 3 Sonnet	$3 / $15
2024-05	GPT-4o (launch)	$5 / $15
2024-06	Claude 3.5 Sonnet	$3 / $15
2024-10	GPT-4o cut / Gemini 1.5 Pro cut	$2.50/$10 ; $1.25/$5
2025-04	GPT-4.1	$2 / $8
2025-08	GPT-5	$1.25 / $10
2026-05	Sonnet 4.6 / GPT-5.4 / Gemini 3.1 Pro	$1.25-3 / $8-15

~96% reduction over 36 months. The notable structural feature: Anthropic anchored Sonnet at $3/$15 for the entire window while shipping Sonnet 3 → 3.5 → 3.7 → 4 → 4.5 → 4.6. Capability went up ~4-6x at constant price; effective intelligence-per-dollar improved that much without a sticker change.

Budget Tier — Haiku / 4o-mini / Flash-class

Date	Reference model	Input / Output
2023-03	GPT-3.5 Turbo	$2 / $2
2023-12	Gemini 1.0 Pro	$0.50 / $1.50
2024-03	Claude 3 Haiku	$0.25 / $1.25
2024-05	Gemini 1.5 Flash	$0.075 / $0.30
2024-07	GPT-4o-mini	$0.15 / $0.60
2024-10	Gemini 1.5 Flash-8B	$0.0375 / $0.15
2025-04	GPT-4.1 nano	$0.10 / $0.40
2025-07	Gemini 2.5 Flash-Lite	$0.10 / $0.40
2025-08	GPT-5 nano	$0.05 / $0.40
2026-05	GPT-5 nano / Flash-8B-equiv	$0.05 / $0.30

~97.5% reduction. Approaching the marginal serving cost of the underlying hardware. This is the tier where deflation looks complete.

Part 3: Intelligence-Per-Dollar Studies

Five sources independently measuring capability-per-dollar trajectory.

OpenRouter (the production-traffic empiricist)

State of AI 2025 (Nov 2025, with a16z, ArXiv 2601.10088). Empirical study of >100T tokens routed through OpenRouter Nov 2024-Nov 2025.

OpenRouter throughput grew 4x YoY: ~5T tokens/week → >20T tokens/week
Programming use case went from 11% to >50% of total throughput in 2025
Open-weight models reached ~1/3 of all tokens by late 2025
Chinese models hit ~45-61% of top-10 token volume by Feb-Apr 2026 (MiMo-V2-Pro, Qwen 3.6, Kimi, MiniMax, DeepSeek). Cited as 10-20x cheaper than US frontier equivalents.
Distribution flattened from near-monopoly (GPT-4 era) to 5-7 model pluralism — no single model >25% share

Directional claim: No durable monopoly on the commodity tier. The price-setter is whoever shipped a cheaper model crossing a capability threshold last quarter. Pluralism replaced the GPT-4 hegemony.

Artificial Analysis (the benchmark-normalized scorer)

Intelligence Index v4.0 (Jan 2026): equal-weighted average across Agents, Coding, Scientific Reasoning, General. Four pillars × 25% each. 10 evals (GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity's Last Exam, GPQA Diamond, CritPt). Explicitly dropped MMLU-Pro, AIME 2025, LiveCodeBench (saturated/contaminated).

"Cost to Run Intelligence Index" metric: Grok 4.3 ~$395, GLM-5 ~$547
Grok 4 Fast: 61M tokens to complete index vs Gemini 2.5 Pro 93M vs Grok 4 full 120M — token-efficiency is now a first-class axis, not just per-token price
DeepSeek V3.2 (Reasoning) at $0.28/$0.42 per MTok sits on the Pareto frontier
"Lowest-priced model exceeding Intelligence Index 60" collapsed to Grok 4 Fast at $0.20/$0.50/M

Directional claim: Pareto frontier moves down-and-right every quarter. Reasoning models converging toward sub-$1/M for index-60+ capability.

Epoch AI (the most rigorous source)

Methodology: lowest-priced model exceeding capability threshold on six benchmarks (MMLU, GPQA Diamond, MATH-500, MATH Level 5, HumanEval, Chatbot Arena Elo). Log-linear regression. 3:1 input/output weighting.

Median decline: ~50x/year across benchmarks
Range: 9x to 900x/year depending on task
GPT-4-level performance on PhD-level science (GPQA Diamond): 40x/year
Pre-training compute efficiency doubles every 7.6 months
Cost to inference at fixed performance: halving every 2 months

Critical caveat from Epoch: "The fastest price drops in that range have occurred in the past year, so it's less clear that those will persist." Translation: 2024-2025 acceleration may be price-war dynamics, not Moore's-Law durability. Most epistemically honest source on this.

Stanford AI Index 2024-2026

Famous chart: GPT-3.5-equivalent inference cost from $20.00/MTok (Nov 2022) to $0.07/MTok (Oct 2024) — ~280x in 22 months, roughly 50x/year geometric. 2026 report extends the trajectory.

Hardware cost declining ~30%/year
Energy efficiency improving ~40%/year
Open-weight vs closed gap narrowed from 8% to 1.7% in one year

a16z — "LLMflation" (Appenzeller, Nov 2024)

Most conservative published estimate. 10x/year for equivalent capability. Anchor: GPT-3 ($60/M, MMLU 42, Nov 2021) → Llama 3.2 3B ($0.06/M, same MMLU, 2024) = 1,000x in 3 years. Six drivers: GPU economics, FP16→FP4 quantization, software optimization, smaller models, better tuning, open-source pressure.

Consensus

Across all five sources, central tendency reconciles to:

10x/year is the conservative, robust, MMLU-anchored claim
50x/year is the geometric central tendency across most benchmarks
900x/year is the outlier for tasks where a small model just barely crossed a threshold
Honest planning number: ~10x/year for equivalent capability, decelerating toward 3-5x/year as easy compression exhausts (2027-2028)

Part 4: The Cost Stack and Hardware Velocity

What's driving the decline beneath the retail price.

Marginal cost of a token (GB200 NVL72 reference)

GPU compute: 50-60% (B200 sustains ~60k tokens/sec on gpt-oss, ~$0.02/M at NVIDIA reference config)
HBM memory + bandwidth: 20-25% (decode phase is memory-bound, not compute-bound)
Networking: 5-10% (NVLink 260 TB/s in Vera Rubin NVL72)
Energy + cooling: 10-15% (PUE 1.09-1.15 with liquid cooling vs 1.56 industry avg)
Datacenter shell + software overhead: 5-10%

GB200 NVL72 delivers ~10x the tokens/watt of Hopper for MoE workloads.

Hardware velocity

Generation	Year	Inference TPS/W vs prior
H100 → H200	2023→2024	1.4-1.7x
H200 → B200	2024→2025	3-5x dense, up to 10x MoE
B200 → GB300 NVL72	2025	1.5x cost/token improvement at long context
GB300 → Rubin NVL72	2026	~10x throughput/W
Rubin → Rubin Ultra	2027	+3.5x perf/W over B300
Rubin Ultra → Feynman	2028	TBD

Hardware alone: 2-3x/year sustained perf-per-watt through 2028. Beyond that, gains compress as 2nm/A14 nodes hit physical walls and HBM stack height tops out.

Algorithmic efficiency

Larger contributor than hardware.

Epoch: algorithmic cost-at-fixed-performance halving every 2 months (~100x/year)
Densing Law (Tsinghua/Mianbi, ratified by Meta 2026): model density doubles every 3.3 months
Stack of techniques: FP8 + Flash Attention 3 + continuous batching + speculative decoding = 5-8x cost-efficiency over naive FP16 on the same H100
DeepSeek V3: $5.576M pre-train, 2.79M GPU-hours on H800. R1 distillation transfers reasoning to dense smaller models at ~5% of conventional cost. Single biggest algorithmic-efficiency data point of the cycle.

Decomposition: ~2-3x/year from hardware, ~3-5x/year from algorithms = headline ~10x/year. The other ~5x in the most aggressive numbers is competitive subsidy.

Part 5: The Subsidy Gap

How much of the retail decline is real cost compression vs labs selling below cost?

OpenAI 2025: ~$8.4B inference COGS against $3.7B revenue. Losing ~$2 per $1 of inference revenue. 2026 projected $14B loss on $13B revenue. 33% gross margin. Profitability target: 2029.
Anthropic April 2026: Amazon expanded deal to $25B in ($5B immediate, $20B milestone-gated) for >$100B Anthropic commit over 10 years on AWS + 5GW new Trainium capacity. Trainium runs 30-40% below H100 cost. Anthropic run-rate revenue passed $30B by April 2026 (up from ~$9B end-2025). Reframes the deal from "survival" to "buildout."
Microsoft-OpenAI precedent: subsidies historically last ~5 years before lab leverage forces renegotiation. Anthropic is ~3 years in.

Reconciliation: roughly half of the headline 10x/year retail decline is real cost compression; the other half is subsidy. When subsidies normalize in 2027-2028 (forced by IPO timing or VC return requirements), the retail curve flattens even though the cost curve keeps falling.

The frontier-vs-commodity divergence makes more sense in this frame: subsidy gets allocated to whichever tier maximizes share capture. Commodity tier is where price wars are happening; frontier tier doesn't need subsidy because it's rationed.

Part 6: Frontier vs Commodity — The Real Story

The headline number ("token prices fell 99%") hides the most important structural shift.

Commodity tier (anything that was frontier 18-24 months ago):

Falling 10-50x/year
Approaches marginal serving cost
GPT-4 quality at ~$0.40/M in 2026 vs ~$20/M in late 2022 (50x in 3 years)
Stanford's flagship chart applies here

Frontier tier (current best reasoning models):

Sticker prices are flat-to-rising; effective prices are clearly rising
GPT-5.5 Pro: ~$30/$180 per M (sticker explicitly higher than GPT-5)
GPT-5.2 Pro: $21/$168 per M
Opus 4.7: $5/$25 sticker held — but Anthropic shipped a new tokenizer at 4.7 that uses up to 35% more tokens for the same fixed text (Anthropic pricing-page disclosure; 1.0x-1.35x depending on content type, with code/structured-data/non-English at the high end). Effective per-task cost rises 0-35% on the same workload despite the unchanged sticker. Extended thinking tokens bill at standard output rate, so higher reasoning effort compounds the tokenizer effect.
"Fast mode" beta on Opus 4.6+ adds an explicit premium tier at $30/$150 per M (6x standard) — not the default, but it's a price-discrimination knob that didn't exist on prior Opus generations
Reasoning models burn 60-120M tokens to complete the Intelligence Index — even at "cheap" per-token pricing, total cost-to-run is rising because token consumption per task is rising
Training capex per Epoch: growing 2-3x/year, $1B+ runs by 2027

Net: the spread between cheapest-capable and frontier is widening, not compressing. This is the inverse of the popular "AI is being commoditized" narrative. The trailing edge is being commoditized; the leading edge is becoming a rationed product.

This matches the the-data-center-convergence thesis — capex concentration at the frontier, ratepayer socialization at the commodity, divergent unit economics across the curve.

Part 7: Forecasts 2026-2030

Gartner (March 2026)

Inference cost on a 1T-parameter LLM falls >90% by 2030 vs 2025. LLMs in 2030 will be ~100x more cost-efficient than 2022 equivalents. Caveat: agentic workloads consume 5-30x more tokens per task, so total spend rises.

Epoch AI extrapolation

If halving-every-2-months holds, 2026→2030 implies ~10^12 fixed-quality cost decline. Realistic ceiling is ~100-1000x as physical/algorithmic floors bind. Epoch itself flags the trend "may not persist."

Capital markets

BofA/Goldman: $1.6-1.7T cumulative datacenter capex by 2030. Omdia: capex peak ~2027, possible drop in 2028 "bubble scenario." Jensen pulled forward $1T annual buildout to 2028 from 2030.

Credible 2026-2030 range for commodity-tier cost per M tokens

Scenario	Decline by 2030	Driver
Bull (Epoch trend holds)	100-1000x	No physical walls, continued open-source pressure
Base (Gartner)	~10x (90% drop)	Hardware + algorithmic compounding, partial subsidy unwind
Bear (capex peak + energy shock + subsidy unwind)	3-5x through 2028, flat after	Hormuz / Iran shock + HBM supply + retrenchment

Time-to-equivalence for current frontier reaching current commodity prices

GPT-4o-mini current commodity price: ~$0.15/$0.60 per M. GPT-5-class frontier reasoning: ~$10-15/$50-75 per M blended. Spread: 50-100x.

At Epoch median 50x/year: mid-to-late 2027
At a16z 10x/year: early-to-mid 2028
At decelerating 5x/year: 2029

Planning estimate: GPT-5-equivalent at $0.15-0.30/M blended by mid-2027 to early-2028.

Part 8: Counter-Forces That Could Stall or Reverse the Curve

Energy shock. World Bank: energy prices up 24% in 2026, highest since 2022. Brent at ~$154 if Hormuz stays closed 12 weeks. Power becomes the binding constraint, not silicon. See hormuz-to-ai-repricing-causal-chain.
Capex peak 2027-2028. Omdia: 2027 is the critical year because revenue commitments are due. If inference revenue doesn't catch up to capex, 2028 sees retrenchment.
Subsidy unwind. OpenAI on path to profitability only in 2029. IPO or down-round forces price normalization. Anthropic's $100B AWS commit is a floor regardless of inference unit economics.
Reasoning + agentic token explosion. Reasoning multiplies consumption 3-7x; agentic 5-30x per task. A simple classification costs $0.01 in chat, $0.10-0.50 as agentic workflow. Per-token cost falls, total spend rises.
HBM supply. SK hynix sold out of 2026 HBM. Micron same. Supply rationing is a floor under inference COGS through 2027.
Geopolitical/subsidy distortion. Chinese-model pricing (10-20x below US frontier) reflects subsidy + state strategy, not just cost structure. Removing that lever changes the curve materially.

What This Analysis Can't Resolve

Whether the algorithmic-efficiency curve breaks before the hardware curve does. Densing Law's 3.3-month doubling is extraordinary and has no clear physical analog — it might stall at any time.
Whether reasoning-token consumption growth permanently swamps per-token deflation. If a Phd-level task costs 100M tokens at frontier reasoning, "intelligence per dollar" hasn't changed even if per-token price drops.
Whether the frontier-vs-commodity divergence is structural or cyclical. Could be permanent (rationed compute = rationed top capability) or could collapse if open-weight catches up to closed-frontier (DeepSeek-V3.2 already on the Pareto frontier).
Subsidy timing. Microsoft-OpenAI precedent says ~5 years; if true, retail price normalization hits 2028-2029. But neither lab has fully precedent-binding economics.
HBM as the chokepoint. If HBM4/4e supply doesn't expand fast enough, the hardware velocity assumption breaks regardless of algorithmic progress.

Sources

Primary pricing data

Anthropic Pricing (contains the new-tokenizer disclosure for Opus 4.7) and announcement pages for Opus 4.7, Opus 4.5, Haiku 4.5, Sonnet 4.6, 3.7 Sonnet
Finout: Opus 4.7 Pricing — The Real Cost Story
byteiota: 35% Cost Inflation Hits API Users
OpenAI API pricing and model release notes; GPT-4.1; GPT-5.2; o1-preview; 4o-mini
Google Gemini API pricing; Gemini 1.5 Pro price cut; 2.5 Flash-Lite GA; Gemini 3 Pro docs

Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed

Token Cost Velocity 2023-2026: Three Labs, Three Tiers, and Where the Curve Is Headed

The Question

TL;DR

Part 1: Three-Year Pricing History

Anthropic Claude

OpenAI

Google Gemini

Part 2: Tier-by-Tier Compression

Frontier Tier — Opus / GPT-5 Pro / Gemini Ultra-class

Workhorse Tier — Sonnet / GPT-4o / Gemini Pro-class

Budget Tier — Haiku / 4o-mini / Flash-class

Part 3: Intelligence-Per-Dollar Studies

OpenRouter (the production-traffic empiricist)

Artificial Analysis (the benchmark-normalized scorer)

Epoch AI (the most rigorous source)

Stanford AI Index 2024-2026

a16z — "LLMflation" (Appenzeller, Nov 2024)

Consensus

Part 4: The Cost Stack and Hardware Velocity

Marginal cost of a token (GB200 NVL72 reference)

Hardware velocity

Algorithmic efficiency

Part 5: The Subsidy Gap

Part 6: Frontier vs Commodity — The Real Story

Part 7: Forecasts 2026-2030

Gartner (March 2026)

Epoch AI extrapolation

Capital markets

Credible 2026-2030 range for commodity-tier cost per M tokens

Time-to-equivalence for current frontier reaching current commodity prices

Part 8: Counter-Forces That Could Stall or Reverse the Curve

What This Analysis Can't Resolve

Sources

Primary pricing data

Intelligence-per-dollar studies

Cost stack and hardware

Subsidy and unit economics

Counter-forces