AI Token Economics and Open-Source Competition (April 2026)
Related: execution-plan-phase-0-1-2 Informs: Projects/sigil, Projects/edge-llm
Research question: What is the real state of AI economics in early 2026? Are token prices falling, is it sustainable, and what does it mean for builders?
1. Token Prices Are Falling Fast — Hard Data
Confirmed. The decline is dramatic and accelerating.
| Model tier | ~Early 2024 | ~Early 2026 | Drop |
|---|---|---|---|
| GPT-4 class (input/1M tokens) | $30 | $1.75 (GPT-5.2) | ~94% |
| Claude Opus class (input/1M) | $15 | $5 (Opus 4.5/4.6) | ~67% |
| Google frontier (input/1M) | $7 | $2 (Gemini 3 Pro) | ~71% |
| Budget tier (input/1M) | $0.50+ | $0.05 (GPT-5 nano) | ~90% |
OpenAI's ChatGPT API went from $0.03/1K tokens (2024) to $0.002/1K tokens (2026) — a 93% reduction. Industry-wide, LLM API prices dropped ~80% between early 2025 and early 2026. Multiple providers have made significant cuts multiple times per year.
Confidence: HIGH. Multiple independent pricing trackers confirm this. The numbers are public and verifiable.
2. Compute Cost Has Fallen, But Not as Fast as Prices
Partially confirmed. Compute costs are falling — but token prices are falling faster than underlying costs, creating a subsidy gap.
- Epoch AI data: LLM inference costs declined ~10x annually. GPT-4-equivalent performance costs ~$0.40/M tokens now vs $20 in late 2022 — a 50x drop over ~3 years.
- Gartner (March 2026): Inference on a 1T-parameter LLM will cost providers 90%+ less by 2030 vs 2025.
- NVIDIA Blackwell: Open-source models on Blackwell achieve up to 10x cost reduction per token vs prior generation.
But here's the catch: Token prices are being driven below cost by competitive pressure. The labs are not just passing through efficiency gains — they're subsidizing on top of that. The Artefact analysis calls this "the token cost illusion" — per-token prices drop, but total bills rise 320% because consumption scales faster than cost falls.
Confidence: MEDIUM-HIGH. The directional trend is clear. Exact cost structures are proprietary. The subsidy claim is supported by financial data (see next section).
3. Labs Are Selling Below Cost — Evidence Is Strong
Confirmed. This is not speculation.
- OpenAI generated $3.7B in revenue in 2025 while losing an estimated $5B — spending $1.35 for every $1 earned. Gross margins collapsed from ~40% to ~33% in 2025 as inference costs quadrupled with scaling usage.
- Anthropic is nearing $20B annualized revenue as of early 2026 (up from $1B at start of 2025), but is openly acknowledged to be losing money on heavy individual users. One analysis framed it as: "Anthropic is losing money on you every month."
- The pattern is classic platform economics: 5-8 years of heavy subsidies, win the market, optimize for margin later. AI is estimated to be in year 3-4 of this cycle.
The implication for builders: current API pricing is artificially low. Plan for prices to eventually stabilize higher, or for free tiers to shrink. The subsidy window is real but finite.
Confidence: HIGH. OpenAI's losses are from reported financials. Anthropic's revenue trajectory is from Axios reporting. The below-cost pricing pattern is consistent across all major labs.
4. Open-Source Models Are Genuinely Competitive Now
Confirmed, with nuance. 2025-2026 has been the tipping point.
DeepSeek is the standout:
- V3: 671B MoE (37B active), outperforms many larger proprietary models on reasoning benchmarks
- R1: 97.3% on MATH-500 (highest open-model score)
- V3.2-Speciale: Gold-medal at IMO 2025, IOI 2025, and ICPC World Finals
- API pricing undercuts proprietary alternatives by 50-90%
Llama 4 (April 2025):
- Scout and Maverick use MoE architecture — only 17B active parameters despite 109B-400B total
- Enterprise-grade performance at ~10x lower cost than proprietary APIs
Mistral continues to optimize for efficiency with MoE architecture.
The bottom line from multiple analyses: Open-source LLMs in 2026 are good enough for the vast majority of applications. Unless you specifically need absolute frontier capabilities (GPT-5.2 Pro, Claude Opus 4.6 at full power), an open model will serve at a fraction of the cost. At scale, self-hosting approaches 1/100th the per-token cost of proprietary APIs.
Confidence: HIGH. Benchmark data is public. The competitive gap has objectively narrowed. The cost advantage of self-hosted open-source is real at scale.
5. Enterprise AI Lock-In Is Weaker Than Expected
Mostly confirmed — traditional switching costs are eroding.
- A16z's enterprise survey shows companies are moving away from fine-tuning to avoid model lock-in. Prompts port more easily between models than fine-tuned weights.
- Morningstar analysis found switching-cost moats in tech are under pressure — close to half of downgraded companies previously relied on switching-cost moats.
- AI itself is reducing switching friction: agents can assist with migration work that used to be painful.
The new lock-in isn't the model — it's "representation switching costs." Companies become dependent on a vendor's way of defining and structuring reality (data schemas, workflow representations). This is more durable than software or cloud switching costs.
Enterprise consolidation trend: VCs predict enterprises will spend more on AI in 2026 but through fewer vendors — concentrating budgets rather than spreading them. Companies with proprietary data and products that can't be easily replicated are the most defensible.
Confidence: MEDIUM. The data on switching-cost erosion is from credible sources (a16z, Morningstar). The "representation lock-in" concept is newer and more theoretical.
6. Enterprise AI ROI — Real But Uneven
Partially confirmed. Real gains exist, but they're concentrated and hard-won.
- AI adoption hit 78% of enterprises in 2025.
- Average reported ROI: $3.70 per dollar invested. Top performers: $10.30 per dollar.
- Dell AI Factory early adopters: up to 2.6x ROI in first year.
- Cost savings of 26-31% in supply chain, finance, and customer operations.
- 66% of organizations report productivity/efficiency gains.
The reality check:
- Only 5% of enterprises see "real returns" per one analysis (MasterOfCode, 2026).
- Productivity gains of 10-15% only materialize after formal job redesign and structured enablement — often dozens of hours of training per employee.
- 30% of respondents still cite "lack of clarity on ROI" as a top challenge.
- Most organizations achieve satisfactory ROI within 2-4 years — far longer than typical 7-12 month technology payback expectations.
The shift: Direct financial impact (revenue + profitability) nearly doubled to 21.7% of primary responses, indicating enterprises are starting to measure AI by P&L impact rather than just vibes.
Confidence: MEDIUM. The headline numbers come from Deloitte, PwC, and a16z — credible sources. But self-reported survey data from enterprises tends to be optimistic. The "5% see real returns" counter-narrative is plausible.
7. The Thin Wrapper Die-Off Is Happening
Confirmed with hard data.
- SimpleClosure's 2025 "State of Startup Shutdowns" report: 2.5x year-over-year increase in Series A shutdowns, with AI wrappers catastrophically over-represented.
- The market is filtering aggressively for: proprietary data advantage, real unit economics, deep workflow integration.
- Multiple analyses describe a "Great AI Filter" — the 2023-2024 cycle rewarded speed and UX; the 2025-2026 cycle is punishing lack of defensibility.
- Specific example: Wuri (founded 2022, shut down 2025) — consumer app pivoted to enterprise AI wrappers, still died.
What survives: Companies with proprietary data, deep enterprise integrations, or infrastructure-layer products. The application layer is being squeezed from both sides — model providers moving up-stack, and enterprises building in-house.
Confidence: HIGH. SimpleClosure data is concrete. The pattern is visible across multiple independent reports. The "99% will die" headlines are hyperbolic, but directionally correct — the correction is real.
8. The Cisco/Nvidia Analogy — Alive and Contentious
The argument (Michael Burry, November 2025):
- Hyperscalers promising ~$3T in AI infrastructure spending over 3 years mirrors telcos laying fiber in early 2000s based on "internet traffic doubles every 100 days."
- Less than 5% of U.S. fiber capacity was operational in the early 2000s crash. Burry believes AI demand assumptions are similarly optimistic.
- Direct quote: "And once again there is a Cisco at the center of it all... Its name is Nvidia."
- Nvidia's purchase obligations surged from $16.1B to $95.2B year-over-year (~$117B including supply-related obligations).
- Burry predicted a 2025-2026 bust, noting that market peaks precede capex completion.
Nvidia's response: Nvidia directly addressed Burry in an internal memo, pushing back on bubble allegations. Key counter-arguments: AI demand is real and growing, inference demand is scaling, and the use cases are broader than telecom.
Current status (April 2026): The bust hasn't arrived on Burry's timeline, but infrastructure spend continues to accelerate. The analogy remains live — it's a question of timing, not of whether overbuild is possible.
Confidence: MEDIUM. Burry's pattern-matching is historically informed but the timing prediction hasn't played out (yet). Nvidia's financials remain strong. The overbuild risk is real but the demand side is harder to dismiss than 2000s fiber.
Synthesis: What This Means for Builders
-
The subsidy window is real and finite. Current API prices are below cost. If you're building on APIs, your unit economics look better today than they will in 2-3 years. Plan accordingly.
-
Open-source is the pressure valve. Even when subsidies end and API prices normalize, open-source models provide a credible alternative for most use cases. Self-hosting at scale is dramatically cheaper. This caps how much labs can eventually charge.
-
The moat is not the model. It's the data, the workflow integration, and the switching cost you create around your specific representation of the problem. Pure API wrappers die. Deep integrations survive.
-
Enterprise ROI is real but slow. The "AI will transform everything overnight" narrative is wrong. Real returns take 2-4 years of structured enablement. Most enterprises are still in early innings.
-
The infrastructure overbuild question is unresolved. If demand plateaus before capex completes, there will be a correction. If agents and inference-heavy applications scale as projected, the infrastructure gets absorbed. This is genuinely uncertain.
-
For Sigil specifically: The thin wrapper die-off validates the approach of building deep workflow integration rather than a model wrapper. The human-in-the-loop spec layer is defensible precisely because it creates representation switching costs. But the ROI data suggests enterprise sales cycles will be long.
-
For edge-llm: The open-source competitive landscape makes browser-native inference more viable than ever. Models that fit in-browser are now genuinely useful, not toy demos.
Sources
- AI API Pricing Comparison 2026 - IntuitionLabs
- AI Pricing 2026: Costs Drop 40-70%
- OpenAI vs Anthropic API Pricing Comparison 2026 - Finout
- LLM Inference Price Trends - Epoch AI
- Inference Unit Economics - Introl
- Gartner: AI Inference Costs to Drop 90% by 2030
- Is AI Really Getting Cheaper? The Token Cost Illusion - Artefact
- LLMflation - a16z
- OpenAI Lost $5B on $3.7B Revenue - AI Automation Global
- Anthropic Is Losing Money on You Every Month
- Anthropic Turns Tables on OpenAI in Enterprise Revenue - Axios
- NVIDIA Blackwell Reduces Cost Per Token
- Open-Source LLMs Compared 2026 - Till Freitag
- Open Source LLM Comparison 2026 - AskToDo
- How 100 Enterprise CIOs Are Building AI in 2025 - a16z
- Representation Switching Costs - Raktim Singh
- VCs Predict Enterprises Will Spend More Through Fewer Vendors - TechCrunch
- AI ROI: Why Only 5% See Real Returns - MasterOfCode
- State of AI in the Enterprise 2026 - Deloitte
- State of Startup Shutdowns 2025 - SimpleClosure
- The 2025 Startup Shutdown - Yahoo Finance
- Michael Burry Nvidia/Cisco Warning - Fortune
- Nvidia Pushes Back on AI Bubble Allegations - CNBC
- Nvidia Warning Could Shake AI Buildout - TheStreet