Vault
research

AI Token Economics and Open-Source Competition (April 2026)

Created

AI Token Economics and Open-Source Competition (April 2026)

Related: execution-plan-phase-0-1-2 Informs: Projects/sigil, Projects/edge-llm

Research question: What is the real state of AI economics in early 2026? Are token prices falling, is it sustainable, and what does it mean for builders?


1. Token Prices Are Falling Fast — Hard Data

Confirmed. The decline is dramatic and accelerating.

Model tier ~Early 2024 ~Early 2026 Drop
GPT-4 class (input/1M tokens) $30 $1.75 (GPT-5.2) ~94%
Claude Opus class (input/1M) $15 $5 (Opus 4.5/4.6) ~67%
Google frontier (input/1M) $7 $2 (Gemini 3 Pro) ~71%
Budget tier (input/1M) $0.50+ $0.05 (GPT-5 nano) ~90%

OpenAI's ChatGPT API went from $0.03/1K tokens (2024) to $0.002/1K tokens (2026) — a 93% reduction. Industry-wide, LLM API prices dropped ~80% between early 2025 and early 2026. Multiple providers have made significant cuts multiple times per year.

Confidence: HIGH. Multiple independent pricing trackers confirm this. The numbers are public and verifiable.

2. Compute Cost Has Fallen, But Not as Fast as Prices

Partially confirmed. Compute costs are falling — but token prices are falling faster than underlying costs, creating a subsidy gap.

But here's the catch: Token prices are being driven below cost by competitive pressure. The labs are not just passing through efficiency gains — they're subsidizing on top of that. The Artefact analysis calls this "the token cost illusion" — per-token prices drop, but total bills rise 320% because consumption scales faster than cost falls.

Confidence: MEDIUM-HIGH. The directional trend is clear. Exact cost structures are proprietary. The subsidy claim is supported by financial data (see next section).

3. Labs Are Selling Below Cost — Evidence Is Strong

Confirmed. This is not speculation.

The implication for builders: current API pricing is artificially low. Plan for prices to eventually stabilize higher, or for free tiers to shrink. The subsidy window is real but finite.

Confidence: HIGH. OpenAI's losses are from reported financials. Anthropic's revenue trajectory is from Axios reporting. The below-cost pricing pattern is consistent across all major labs.

4. Open-Source Models Are Genuinely Competitive Now

Confirmed, with nuance. 2025-2026 has been the tipping point.

DeepSeek is the standout:

Llama 4 (April 2025):

Mistral continues to optimize for efficiency with MoE architecture.

The bottom line from multiple analyses: Open-source LLMs in 2026 are good enough for the vast majority of applications. Unless you specifically need absolute frontier capabilities (GPT-5.2 Pro, Claude Opus 4.6 at full power), an open model will serve at a fraction of the cost. At scale, self-hosting approaches 1/100th the per-token cost of proprietary APIs.

Confidence: HIGH. Benchmark data is public. The competitive gap has objectively narrowed. The cost advantage of self-hosted open-source is real at scale.

5. Enterprise AI Lock-In Is Weaker Than Expected

Mostly confirmed — traditional switching costs are eroding.

The new lock-in isn't the model — it's "representation switching costs." Companies become dependent on a vendor's way of defining and structuring reality (data schemas, workflow representations). This is more durable than software or cloud switching costs.

Enterprise consolidation trend: VCs predict enterprises will spend more on AI in 2026 but through fewer vendors — concentrating budgets rather than spreading them. Companies with proprietary data and products that can't be easily replicated are the most defensible.

Confidence: MEDIUM. The data on switching-cost erosion is from credible sources (a16z, Morningstar). The "representation lock-in" concept is newer and more theoretical.

6. Enterprise AI ROI — Real But Uneven

Partially confirmed. Real gains exist, but they're concentrated and hard-won.

The reality check:

The shift: Direct financial impact (revenue + profitability) nearly doubled to 21.7% of primary responses, indicating enterprises are starting to measure AI by P&L impact rather than just vibes.

Confidence: MEDIUM. The headline numbers come from Deloitte, PwC, and a16z — credible sources. But self-reported survey data from enterprises tends to be optimistic. The "5% see real returns" counter-narrative is plausible.

7. The Thin Wrapper Die-Off Is Happening

Confirmed with hard data.

What survives: Companies with proprietary data, deep enterprise integrations, or infrastructure-layer products. The application layer is being squeezed from both sides — model providers moving up-stack, and enterprises building in-house.

Confidence: HIGH. SimpleClosure data is concrete. The pattern is visible across multiple independent reports. The "99% will die" headlines are hyperbolic, but directionally correct — the correction is real.

8. The Cisco/Nvidia Analogy — Alive and Contentious

The argument (Michael Burry, November 2025):

Nvidia's response: Nvidia directly addressed Burry in an internal memo, pushing back on bubble allegations. Key counter-arguments: AI demand is real and growing, inference demand is scaling, and the use cases are broader than telecom.

Current status (April 2026): The bust hasn't arrived on Burry's timeline, but infrastructure spend continues to accelerate. The analogy remains live — it's a question of timing, not of whether overbuild is possible.

Confidence: MEDIUM. Burry's pattern-matching is historically informed but the timing prediction hasn't played out (yet). Nvidia's financials remain strong. The overbuild risk is real but the demand side is harder to dismiss than 2000s fiber.


Synthesis: What This Means for Builders

  1. The subsidy window is real and finite. Current API prices are below cost. If you're building on APIs, your unit economics look better today than they will in 2-3 years. Plan accordingly.

  2. Open-source is the pressure valve. Even when subsidies end and API prices normalize, open-source models provide a credible alternative for most use cases. Self-hosting at scale is dramatically cheaper. This caps how much labs can eventually charge.

  3. The moat is not the model. It's the data, the workflow integration, and the switching cost you create around your specific representation of the problem. Pure API wrappers die. Deep integrations survive.

  4. Enterprise ROI is real but slow. The "AI will transform everything overnight" narrative is wrong. Real returns take 2-4 years of structured enablement. Most enterprises are still in early innings.

  5. The infrastructure overbuild question is unresolved. If demand plateaus before capex completes, there will be a correction. If agents and inference-heavy applications scale as projected, the infrastructure gets absorbed. This is genuinely uncertain.

  6. For Sigil specifically: The thin wrapper die-off validates the approach of building deep workflow integration rather than a model wrapper. The human-in-the-loop spec layer is defensible precisely because it creates representation switching costs. But the ROI data suggests enterprise sales cycles will be long.

  7. For edge-llm: The open-source competitive landscape makes browser-native inference more viable than ever. Models that fit in-browser are now genuinely useful, not toy demos.


Sources