Vault
research

Anthropic Unit Economics and the Power-User Loss

Created

Anthropic Unit Economics and the Power-User Loss

Builds-on: ai-token-economics-and-open-source-competition Related: the-efficiency-counterthesis Related: ai-infrastructure-endgame-indicators Related: why-the-market-refuses-to-crash Related: hormuz-to-ai-repricing-causal-chain


The Question

A common pattern in 2026: a Claude Max user on the $100/month plan consumes roughly $1,000 of API-equivalent tokens in a month. That's a 10x retail-to-price ratio. Anthropic disclosed in late August 2025 that the worst cases were "tens of thousands of dollars in model usage on a $200 plan" — a 100x ratio at the long tail.

How bad is that for Anthropic at scale? The answer turns out to be more interesting than the retail markup implies.

The Retail-vs-COGS Gap

Retail API pricing as of April 2026:

Model Input ($/MTok) Output ($/MTok)
Haiku 4.5 $1 $5
Sonnet 4.6 $3 $15
Opus 4.7 $15 $75

Cached input runs up to 90% off (~$0.30–1.50/MTok). Batch API is 50% off.

SemiAnalysis (Dylan Patel) estimates true blended COGS for Opus 4.7 on agentic workloads at roughly $0.99/MTok — versus the $5 (input) / $25 (output) sticker. The gap exists because agentic traffic skews ~300:1 input-to-output and runs 90%+ cache hit rates. On that workload mix, retail pricing carries a 70–80% gross margin.

The implication for the $1,000-on-$100-plan case: at SemiAnalysis's COGS estimate, $1,000 of retail Opus consumption is closer to $40–$80 of actual compute cost. Anthropic's compute is then further subsidized 30–60% via the Trainium/TPU stack (more on that below). So the real loss to Anthropic on this user is on the order of $20–60/month, not $900.

That's still loss-making. But the headline 10:1 ratio at retail dramatically overstates the bleed because retail token pricing was set to capture margin from chat-style traffic with shorter contexts and lower cache hit rates, not from agentic workloads where the cost-to-revenue ratio is genuinely good.

Anthropic's Actual Financials, 2025–2026

Revenue trajectory (Bloomberg, Sacra, Epoch AI):

Funding (Anthropic press releases, CNBC, TechCrunch):

Losses and burn:

Gross margin trajectory:

Gross margin trajectory is the load-bearing assumption in the entire bull case. If accelerator throughput gains don't outpace chip cost growth at the projected rate, the GM target slips, and the path to profitability slips with it.

Claude Max Plan Mechanics

The plan tiers as currently structured:

Tier Price Sonnet hours/wk Opus hours/wk
Max 5x $100/mo 140–280 15–35
Max 20x $200/mo 240–480 24–40

Plus a 5-hour rolling window cap (~88K tokens for Max 5x).

The Aug 28, 2025 hard cap. This is the operationally important date. Anthropic introduced weekly rate limits explicitly because "less than 5% of subscribers" were running Claude Code 24/7 and burning "tens of thousands in model usage on a $200 plan." Overage now requires API-rate top-ups.

That announcement is the visible signal that consumer subscription pricing was structurally incompatible with full-throttle agentic usage. The cap is the price of admission to the bull-case GM trajectory: without it, the inference-cost overrun that forced the 10pp GM cut would have been worse.

The AWS / Google Compute Subsidy

This is the part of the picture that retail-pricing analysis tends to ignore.

This is not gross margin. It's not on the income statement as a subsidy. But it's why the COGS numbers work at all. If AWS or Google ever lose interest in proving their custom silicon — or if Nvidia closes the price gap — the entire unit-economic structure rotates against Anthropic at once.

The Power-Law User Distribution

Anthropic's August 2025 statement (<5% of subscribers driving the abuse) is the public version of a long-tail power law that exists across all consumer AI products.

Tom Tunguz's data on agentic coding tiers gives shape to it:

Usage power-law is even steeper than revenue power-law. The same ~5% who drive the bulk of inference cost are mostly not on the highest-revenue plans — they're on Max trying to extract API-equivalent value from a flat-rate subscription.

So the loss isn't evenly distributed across Max subscribers. It's concentrated on a narrow tail. Capping that tail (which is what Aug 28 did) doesn't materially affect the median Max user's experience but eliminates the structurally-loss-making segment.

OpenAI Precedent

Sam Altman, January 6, 2025 on X: "insane thing: we are currently losing money on openai pro subscriptions! people use it much more than we expected." He set the $200 price personally and expected margin.

OpenAI's response since:

OpenAI's experience is the prior. Same pattern, same response (caps and throttles), same direction on breakeven (later, not sooner). Anthropic has the benefit of watching this happen and reacting earlier.

Sustainability — The Two Theses

The expansion thesis (SemiAnalysis, Anthropic management):

The structural-loss thesis (Ed Zitron / wheresyoured.at, MBI hyperscaler analysis):

The honest read is that both theses are partially right. The expansion thesis correctly captures the COGS-vs-retail gap and the trajectory of accelerator throughput. The structural-loss thesis correctly captures that the original consumer pricing was set without a realistic model of agentic-tail behavior, and that the AWS/Google subsidy is not permanent. The Aug 2025 caps are the operational compromise: keep the flat-rate plan as the front door, gate the long tail.

What the $1,000-on-$100 User Actually Costs

Pulling the numbers together for the framing case:

Across roughly 5% of Max subscribers exhibiting this pattern, that's the bleed. Manageable when amortized against the 95% of Max users who consume far less than their plan price, and especially manageable now that the Aug 2025 caps prevent the worst-tail $5–10K/month outliers.

The headline-friendly framing ("a 10:1 retail-to-price loss ratio!") meaningfully overstates the actual P&L impact. The structural pricing problem is real but smaller than retail markup math suggests, and Anthropic has already operationally addressed the worst of it.

Open Questions

  1. At what subscriber growth rate does the long tail re-overrun the cap? The Aug 2025 caps were set against a particular usage distribution. If agentic-workload patterns generalize (i.e., more of the top quartile starts behaving like the prior <5%), the caps tighten or three-part tariffs kick in.

  2. When does AWS/Google decide Trainium/TPU is "proved"? The implicit subsidy is the largest single unmodeled lever. A 2027–2028 transition where compute pricing normalizes to market would compress GM by 15–25pp at current scale.

  3. Does the IPO price need 70% GM, or can it land at 55%? The valuation arithmetic is sensitive to the GM exit. SemiAnalysis assumes 70%+. If actual GM tracks 55–60%, the $400–500B IPO target becomes harder to defend.

  4. Does Anthropic's three-part tariff (base + included + overage) become the industry standard, or do consumer plans evolve back toward flat rates? OpenAI is moving the same direction. The longer the pattern holds, the more durable the unit-economic improvement.

  5. What does the "Opus 4.7 produces 35% more tokens per prompt" effect (Finout) do to retail GM at sticker? That's a stealth price increase for output-heavy usage. Tracks how a model lab can maintain headline pricing while quietly improving margin.


Sources