The Full Stack: Energy, Chips, and the AI Subsidy Unwind
Builds-on: ai-token-economics-and-open-source-competition Related: execution-plan-phase-0-1-2 Counterpoint: the-efficiency-counterthesis — the optimist rebuttal analyzing whether efficiency gains outrun the unwind (spoiler: they get captured as provider margin, and the middle still disappears) Informs: Projects/sigil, Projects/edge-llm
Original thesis drafted April 2, 2026. Corrected and consolidated against verified data. Read alongside the-efficiency-counterthesis — the two docs arrive at the same structural conclusion from opposite directions.
TLDR
A single causal chain runs from a physical energy chokepoint (Hormuz blockade) through macro economics, supply chains, hardware constraints, and AI lab finances to a structural repricing of what AI infrastructure is worth. The chain — Hormuz closure → energy shock → stagflation pressure → VC concentration → API demand decay → token economics exposed — holds up under verification. Three corrections matter:
- The Fed isn't pinned at high rates. It already cut to 3.50-3.75%. It's cautious, not frozen. The constraint is real but less severe than the original framing.
- VC isn't frozen — it's hyper-concentrated. Q1 2026 was a record quarter ($297B). But 65% went to four companies. The wrapper-tier funding drought is real even as headline numbers hit all-time highs. This actually makes the underlying argument stronger.
- Enterprise AI ROI is materializing faster than stated. High-performers see 1.7-3x returns. This strengthens the "what survives" conclusion rather than weakening the overall thesis.
The endgame is one of three outcomes: a soft landing where capacity gets absorbed, a bust where capex gets cut and GPU prices crash, or absorption of independent AI labs into cloud hyperscalers. In every scenario, OpenAI ends up as part of Microsoft within 3-5 years. Anthropic has a genuine survival path.
The pyramid still compresses. The timeline might be slightly different than originally framed. The hard deadline from Hormuz is real.
Part 1: The Physical Layer
Hormuz Blockade and Energy Shock
Iran imposed a de facto blockade of the Strait of Hormuz after US-Israel strikes on Tehran on February 28. Only ships flagged by "friendly" nations (India, Pakistan, Malaysia, China) are transiting. This is not a threat or a partial disruption — 20% of the world's oil and LNG supply is physically cut off from Western markets.
The stopgap measures and their shelf life:
- The US authorized release of 172M barrels from the Strategic Petroleum Reserve over ~120 days. This drops the SPR to 243M barrels — the lowest since 1982.
- 32 IEA nations coordinated a 400M barrel release — the largest in history, surpassing the 2022 Ukraine response.
- Oil is above $100/barrel (+40% from pre-war levels).
The SPR bridge runs roughly into mid-July at current draw rates. After that, there is no plan B if Hormuz remains shut.
Correction from original: "Russian oil waivers" was not verified as a specific policy mechanism. The actual stopgap is the coordinated IEA release + SPR drawdown.
Ras Laffan: LNG + Helium Dual Cascade
Qatar's Ras Laffan damage is the LNG story. Iranian missile strikes on March 2 and March 18 damaged Trains 4 and 6, taking out 12.8 MTPA of production. QatarEnergy confirmed a 17% reduction in LNG export capacity, $20B in annual revenue losses, and a 3-5 year repair timeline. QatarEnergy declared force majeure on its entire LNG output. Ras Laffan was roughly 20% of global LNG supply. There is no spare capacity at that scale anywhere in the world.
The 1973 parallel is directionally valid. That crisis forced demand destruction, not just price adjustment. We're in the early stages of that same dynamic. The LNG situation may be worse — you can reroute pipeline gas and tanker routes, but you can't reroute what can't leave port and what's physically damaged.
The helium cascade goes deeper. Qatar also produced roughly 30% of the world's helium supply, which is now offline. Helium is irreplaceable in semiconductor manufacturing — it cools superconducting magnets during chipmaking and flushes toxic residue after wafer washing. There is currently no viable substitute. Existing helium shipments sustain Asian fab operations (TSMC, Samsung, SK Hynix) through approximately early April 2026. Spot prices have surged 40-100%.
This creates a second cascade from the same physical event:
Hormuz blockade
→ Ras Laffan missile damage
→ 17% LNG capacity lost (energy crisis)
→ ~30% global helium supply offline (materials crisis)
→ Semiconductor fabs slow production
→ Less DRAM/HBM produced
→ Memory prices rise further
→ Consumer hardware costs spike
→ Enterprise GPU/server costs spike
→ Self-hosting gets more expensive
→ Cloud inference gets more expensive
→ Token prices have upward pressure from hardware layer
The vicious circle: API prices rise as subsidies unwind → users flee to self-hosting → self-hosting hardware costs are rising from the same root cause → no cheap exit.
If the helium shortage extends beyond early April (and with Ras Laffan on a 3-5 year repair timeline, it will), we're looking at potential chip fab slowdowns that could constrain GPU and memory supply for 12-18 months minimum, even if alternative helium sources (US BLM reserves, Siberia, Tanzania's Helium One) scale up.
RAM/DRAM Crisis and Consumer Hardware Squeeze
The memory shortage isn't hypothetical. It's a verified, cascading crisis with direct links to the Hormuz situation.
The numbers:
- DRAM prices rose 172% throughout 2025, then another 80-90% in Q1 2026 alone
- DDR5 32GB kits that cost $80-120 in mid-2025 have tripled in price
- DDR4 32GB kits went from $60-90 to $150-180
- Samsung, SK Hynix, and Micron (95% of global DRAM production) are diverting capacity from consumer to HBM for AI
- Data centers will consume 70% of all memory chips produced in 2026
- Producing 1 bit of HBM consumes 3x the wafer capacity of DDR5
The consumer squeeze is measurable: Steam Hardware Survey for March 2026 shows a 20% drop in gaming PCs with 32GB RAM. AMD is raising GPU prices 10%+. Nvidia cut GPU production 40%. GPUs are selling at 130% MSRP. Dell raised business customer prices 30%.
This matters for the "self-hosting as escape valve" thesis. When API prices eventually rise, the fallback plan for SMBs and consumers is "run open-source models locally." But local hardware is getting more expensive at the same time, because the same AI buildout that's inflating token economics is also vacuuming up the memory supply chain.
Energy, Water, and Power Grid Constraints
Energy as the hidden AI tax — this affects everyone, not just AI users:
- US data centers now consume 176 TWh/year — 4.4% of national electricity
- Projected to hit 6.7-12% by 2028
- Global data center consumption hitting 1,100 TWh in 2026 (equal to Japan's entire national consumption)
- Retail electricity prices up 42% since 2019
- Residential bills up $16-18/month in data center regions (PJM market)
During stagflation, rising energy costs compound. Consumers paying more for electricity are also paying more for RAM, more for any product with AI-inflated compute costs baked in, and potentially more for API access as subsidies unwind. It's a cost-of-living multiplier.
Self-hosting cost: $30-80/month in electricity for a GPU workstation. Not catastrophic, but it adds up — and electricity costs are rising, not falling.
Water — the next physical chokepoint. Not affecting token prices yet, but it will constrain where new compute capacity can be built:
- Data centers use 560 billion liters of water per year globally, projected to hit 1.2 trillion by 2030
- 870% increase projected in water usage for cooling
- 2/3 of US data centers built since 2022 are in high water-stress areas
- Regulatory requirements emerging (mandatory Water Usage Effectiveness metrics in some regions)
- Modern AI racks generate 10x more heat than traditional servers
This creates geographic bottlenecks for new data center construction, which constrains the supply side of compute capacity, which puts upward pressure on cloud pricing.
The Geopolitical Scenarios (Why This Doesn't Resolve Quickly)
US and Israel launched strikes on Iran on February 28, killing Supreme Leader Khamenei. Iran closed the Strait of Hormuz in retaliation. Six weeks in, the war is active and escalating. Oil is at $112/barrel. The Fed is frozen — a surprisingly strong March jobs report (178K, inflated by a healthcare strike reversal) eliminated cover to cut. Private credit funds with heavy tech exposure are showing 40%+ redemption requests. A 48-hour ultimatum to Iran to reopen the Strait expires April 6. NATO allies, Japan, South Korea, and Australia have all refused to participate militarily. Trump is increasingly isolated internationally while threatening to strike civilian infrastructure.
Why quick reopening is the low-probability outcome:
- Military escalation backfires economically. Striking civilian infrastructure (desalination, power) triggers humanitarian backlash with real economic consequences — potential tariffs and sanctions against the US based on humanitarian concerns. Allied refusal to participate isn't just diplomatic posturing; it reflects genuine assessment that escalation worsens the situation.
- Iran's position is structurally defensible. The Strait is a chokepoint they physically control. Mining, coastal defense missiles, and fast-attack boat swarms make forced reopening costly even with carrier group superiority. The US can project power through the Strait but can't guarantee safe commercial transit against a motivated defender.
- The domestic political calculus favors dragging it out. Trump can't admit the war isn't working without losing the narrative. Walking away gives Iran the win. Escalating risks the kind of civilian casualties that turn international opinion from neutral to hostile. The path of least resistance is the grind.
Three scenarios with probability weights:
| Scenario | Probability | Summary |
|---|---|---|
| A — Negotiated Off-Ramp | 35% | Face-saving deal. Hormuz reopens, possibly with tolls. Oil retreats to $80-85. Damage contained but already baked in. |
| B — Grinding Stalemate | 45% | War drags 6-18 months. Hormuz partially functional but unreliable. Oil $85-100. Economy bleeds slowly. |
| C — Major Escalation | 20% | Infrastructure strikes trigger full regional war. Both Hormuz and Bab el-Mandeb blocked. Stagflation confirmed. |
How each scenario affects the AI repricing thesis:
Scenario A (35%): Negotiated Off-Ramp — Thesis slows but doesn't stop. Even in the best case: 6+ weeks of damage already done. Gas doesn't retreat immediately. Rates stay elevated through 2026 — the Fed moves slowly. The helium supply chain doesn't reverse (Ras Laffan repair is 3-5 years regardless). The RAM shortage doesn't reverse (HBM capacity reallocation is structural, not crisis-driven). The wrapper die-off was happening before the war. The subsidy math was unsustainable before the war. In Scenario A, the AI repricing still happens — just on a 2027-2028 timeline instead of Q3-Q4 2026.
Scenario B (45%): Grinding Stalemate — Thesis runs on the timeline described in this doc. This is the base case and the most likely outcome. 6-18 months of elevated energy costs, frozen Fed, persistent inflation, credit tightening, quiet wave of tech/SaaS failures. The SPR bridge expires in July with no plan B. The S-1 filing window and macro stress converge.
Key compounding effects in Scenario B:
- Private credit funds showing 40%+ redemption requests → forced distressed asset sales at 30 cents on the dollar → capital destruction in the startup ecosystem
- Floating-rate debt companies (most private SaaS/tech) face refinancing walls with the Fed frozen
- Developing nations hit hardest on energy → reduces global demand for tech products and services
- Sustained $85-100 oil for 12-18 months bakes structural inflation into the economy
Scenario C (20%): Major Escalation — Thesis plays out faster and harder. Full regional war. Both Hormuz and Bab el-Mandeb blocked. Oil at $150+. Gas at $6-7. Fertilizer (30% of global supply transits Hormuz) disrupted — affects harvests globally. Broad market repricing. Wave of defaults in floating-rate tech debt. This isn't just AI repricing — it's a full economic reset.
The Toll Booth Scenario (Subset of B, ~20% standalone): The most likely "resolution" that isn't really a resolution: Iran technically reopens Hormuz but collects transit fees. Oil settles $85-95 long-term vs pre-war $70s. A permanent risk premium baked into every good that moves through the Strait. Structural inflation baseline moves higher permanently. For the AI thesis: energy costs stay elevated indefinitely, data center electricity costs don't retreat, and the physical-layer pressures on token economics become permanent features, not temporary shocks.
Probability-weighted impact:
| Scenario | Probability | AI Repricing Timeline | Severity |
|---|---|---|---|
| A — Off-Ramp | 35% | Slowed to 2027-2028 | Moderate — subsidy math still unsustainable, just slower unwind |
| B — Stalemate | 45% | As described (Q3 2026–Q2 2027) | Severe — full thesis plays out on timeline |
| C — Escalation | 20% | Accelerated and amplified | Catastrophic — AI repricing is a footnote to broader crisis |
Expected outcome: In 65% of scenarios (B+C), the full causal chain plays out on the timeline described or faster. In the remaining 35% (A), the chain still plays out — just slower. There is no scenario where Hormuz resolves AND the AI subsidy math becomes sustainable. The geopolitical variable determines the SPEED of the repricing, not WHETHER it happens.
Part 2: The Financial Layer
Stagflation Trap and Fed Position
Three forces are converging: the Iran energy shock, persistent tariff-driven inflation already above target, and a labor market in "no-hire, no-fire" stagnation. Economists are calling it "stagflation lite" or "warflation." Deutsche Bank and Oxford Economics have both flagged rising recession/stagflation risks.
The Fed's actual position: The federal funds rate is at 3.50-3.75% after three cuts in late 2025. The CME FedWatch tool shows roughly two more cuts priced in for 2026 — not "priced out to December" as originally stated. The Fed is cautious, not frozen. But the bind is real: cutting further risks validating energy-driven inflation. Holding risks choking a weakening economy.
The 10-year Treasury is at 4.31-4.37% as of April 3, up 0.21 points in the past month and 0.32 points year-over-year. The direction (climbing) is correct. The level is slightly below the 4.4% originally stated.
Stagflation also attacks consumer spending directly. This matters for OpenAI's consumer revenue base and ad-funded models (Meta) that subsidize parts of the AI ecosystem.
VC Concentration (Not Freeze)
This is not a VC freeze. It's a VC concentration event. The distinction matters — and actually makes the underlying argument stronger.
Q1 2026 shattered records: $297B in global venture funding, with AI taking 81% of the total. But four rounds accounted for 65% of all global VC: OpenAI ($122B), Anthropic ($30B), xAI ($20B), Waymo ($16B).
The money is there. It's all going to five companies.
Meanwhile, the wrapper tier is dying at scale:
- 3,800 AI startups shut down in 2025 (27% of the cohort)
- Another 1,800 closed in early 2026 (additional 13%)
- 40% failure rate in under 24 months
- Series A shutdowns jumped 2.5x year-over-year
- AI wrappers — products built on API access without proprietary data or deep integration — are catastrophically over-represented in closures
The thin-wrapper companies were already facing a structural death sentence from model improvement velocity and open-source competition. Macro stress just pulls the timeline forward and removes the soft landing option. Startups that were 18 months from ramen profitability don't survive a funding gap in a stagflationary environment.
The critical variable remains: A significant but unquantified portion of API revenue at OpenAI and Anthropic comes from VC-backed startups burning investor money to pay monthly API bills. No one publishes a clean breakdown. But when the funding layer concentrates (not dries up — concentrates), the 2,000+ dead startups and the thousands more on the edge quietly stop paying API bills. This doesn't show as a clean revenue decline immediately. It shows as churn over 2-3 quarters.
Hyperscaler Capex: $660-700B Committed, $400B Borrowed
The hyperscaler capex buildout is massive and real. The five largest providers (Microsoft, Alphabet, Amazon, Meta, Oracle) have committed to $660-700B in 2026 capex. Amazon alone is at ~$200B. Morgan Stanley expects hyperscaler borrowing to top $400B in 2026 — double 2025. Amazon is projected to go free-cash-flow negative.
This spending is locked in for 2026. But the 2027 authorization conversations happen in a macro environment where:
- Rising rates increase the cost of that debt
- A Mag7 correction ($2T in combined market cap lost from highs) reduces equity market support
- Mag7 profit growth is expected at 18% in 2026, the slowest since 2022
Mag7 Correction
The Mag7 has lost $2T in combined market cap from highs. Profit growth is expected at 18% in 2026, the slowest since 2022. 57% of investors in a Deutsche Bank survey view a tech bubble as the top 2026 risk. Ray Dalio says the US is "roughly 80% into a market bubble similar to 1929 and 2000," using Nvidia specifically as the example.
The macro environment pressures hyperscalers to slow 2027 authorizations. If any of them do, the downstream effects on AI lab revenue, GPU vendors, and the entire infrastructure buildout are steep and sudden.
Part 3: The AI Lab Economics
OpenAI: Capital Position, Burn Rate, Conditional Funding, S-1 Timeline
The burn trajectory: OpenAI's own projections show $25B cash burn in 2026, $57B in 2027. Total projected burn through 2030: $665B. Positive cash flow not expected until 2030. In 2025, OpenAI lost $5B on $3.7B in revenue — spending $1.35 for every dollar earned.
Revenue quality: OpenAI's revenue has always been ~70% consumer, ~30% API/enterprise. What dropped from 50% was OpenAI's enterprise foundation model market share (now ~34%) as Anthropic doubled from 12% to 40%. The free-user ratio: ~900M weekly active users, ~50M paying subscribers, roughly 94-95% free.
The conglomerate play: Sam Altman is not building the best model. He's building a conglomerate:
- Jony Ive's hardware company — $6.5B all-equity acquisition. Devices play. First products launching 2026.
- TBPN media company — narrative control ahead of IPO. "Don't just build the tools, own the platforms that explain them."
- 40% of global RAM locked up — non-binding LOIs with Samsung and SK Hynix for 900,000 DRAM wafers/month. Drove up global DRAM prices. Micron shut down Crucial (their 29-year consumer memory brand) in response. Then OpenAI partially walked it back when Stargate Abilene fell apart over financing disagreements with Oracle. A significant portion of the 2025-2026 memory crisis was built on expectations of demand that may not materialize.
- $122B raise, IPO preparation — building the war chest and the currency for future acquisitions.
The strategy: when a clear winner emerges in any adjacent space (devices, media, enterprise workflow), buy it or copy it. The model is the commodity; the ecosystem is the moat. This is the Zuckerberg playbook — but executed from a position of financial fragility disguised as dominance ($57B projected burn in 2027, 95% free users, enterprise market share bleeding to Anthropic).
The SoftBank dependency: $10B tranches arriving April 1 and July 1, 2026.
S-1 timeline: OpenAI's IPO filing is widely expected in Q3 2026, with a potential listing in Q4 2026 or Q1 2027. CFO Sarah Friar has hinted 2027 may be more realistic. As of April 2, no S-1 has been filed. Anthropic may also file mid-2026. If Anthropic files first, expect OpenAI to accelerate.
Anthropic: Capital Position, Breakeven Path, Enterprise Moat
Daniela Amodei stated explicitly: "the next phase won't be won by the biggest pre-training runs alone, but by who can deliver the most capability per dollar of compute." Anthropic is not trying to out-spend OpenAI. They're trying to make each compute dollar go further, then lock customers through workflow depth.
The moves that confirm this:
- Mandatory consumption commitments replacing per-user enterprise fees. Customers commit to annual token spend, not seat counts. This is an annuity structure — harder to churn off of than a subscription you can cancel monthly.
- Removing API volume discounts that historically gave 10-15% relief. Counterintuitive during growth unless you're confident enterprise customers are sticky enough to absorb it.
- Claude Cowork — industry-specific plugins for sales, finance, HR, investment banking. Once a sales team's workflow runs through Claude Cowork, migrating to Llama isn't a model swap — it's a workflow rewrite.
- Compliance as moat — HIPAA-ready configuration, SSO/SCIM provisioning, audit trails, admin controls. These aren't features. They're switching costs disguised as features.
- $14B ARR (up from $1B fourteen months prior), 85% enterprise, 8 of Fortune 10, 40% of enterprise LLM spending share.
Anthropic would not let customers go to OSS. Their strategy is to make the switching cost exceed the price differential. If your compliance infrastructure, audit trails, HIPAA configuration, and team muscle memory are all built around Claude — a 2x token price increase is cheaper than migrating. That's the moat. Not model quality (which OSS is closing). Not price (which they can't win long-term). Integration depth and compliance infrastructure.
Dario Amodei's broader positioning reinforces this: 40% of his time goes to company culture, he published a 20,000-word manifesto ("The Adolescence of Technology") at Davos, and the co-founders pledged to donate 80% of their wealth. This isn't a company optimizing for a quick exit. It's building for institutional permanence — the kind of company enterprises trust with multi-year commitments.
The Circular Financing Structure Exposed
The circular financing is confirmed and deeper than it looks:
- Amazon: $8B invested in Anthropic. AWS is primary cloud/training partner. Anthropic projects $80B total cloud spend through 2029 and shares up to 50% of gross profits with AWS on Bedrock sales. Amazon/Google collectively own ~30% of Anthropic.
- Microsoft: $13B committed to OpenAI. ~27% equity stake. OpenAI contracted for $250B in Azure services. Pays Microsoft 20% of total revenue through 2032.
- NewStreet Research estimates: for every $10B Nvidia invests in OpenAI, Nvidia sees $35B in GPU purchases back.
- The FTC is investigating these circular arrangements.
These contractual commitments don't evaporate with VC sentiment — they're multi-year deals. This masks the underlying unit economics from outside view for longer than the market expects.
The Mutual Destruction Pricing Dynamic
The mutual destruction dynamic is a prisoner's dilemma at scale. Neither lab can raise prices first because the other holds prices and steals customers. But both are burning cash at rates that require price increases.
If Anthropic raises first: OAI holds, absorbs price-sensitive customers. If OAI raises first: Anthropic holds, steals more enterprise share (already went 12%→40%). If both raise: open source eats the price-sensitive tier from below. If neither raises: burn continues, S-1 forces transparency.
This is worse than normal competitive pricing because: (1) circular financing sustains the fiction longer than market dynamics would allow, (2) no cost advantage exists to break the tie — both run similar infrastructure at similar costs, (3) open source is the third player that doesn't need to make money (Meta funds Llama strategically, DeepSeek funded by Chinese hedge fund).
The resolution is consolidation, not market-based price correction. Three outcomes: one lab acquires/merges with the other (antitrust complications), both narrow to enterprise-only (Anthropic already 85% there), or hyperscaler parents absorb them (Microsoft absorbs OAI into Azure, Amazon absorbs Anthropic into AWS — most likely on 3-5 year horizon).
What the S-1 Actually Changes
The S-1 is the detonator because it's the first time audited financials become public. The market will see:
- The burn trajectory: OpenAI's own projections show $25B cash burn in 2026, $57B in 2027. Total projected burn through 2030: $665B. Positive cash flow not expected until 2030.
- Revenue quality: What percentage comes from contractually committed investors (who are also cloud providers) vs. arms-length customers
- Whether the profitability path survives a stagflation scenario with rising rates and $400B+ in hyperscaler borrowing
- The SoftBank dependency: $10B tranches arriving April 1 and July 1, 2026
If macro conditions are still hostile when that filing hits — and the SPR bridge runs out around the same time — the reception will be brutal. Pushing the IPO to 2027 just delays the reckoning and deepens the burn.
Part 4: The Upstream Supply Chain
Nvidia as Bellwether
Q4 FY2026 (ended January 2026):
- Revenue: $68.1B (+73% YoY, +20% QoQ). Record quarter.
- Data center: $62B (+75% YoY). Over 50% from hyperscalers.
- Q1 FY2027 guidance: $78B (±2%), crushing consensus of $72B.
- Jensen Huang: expects to exceed $500B in total chip manufacturing for calendar 2026.
On the surface, this looks like AI demand is accelerating, not slowing. But read it more carefully:
What Nvidia's numbers actually tell us:
- The $62B in data center revenue is coming from the same hyperscalers who've committed $660-700B in capex for 2026. This is locked-in spending being executed, not new organic demand emerging. The orders were placed 12-18 months ago. Nvidia is shipping against commitments made in 2024-2025.
- 50%+ from hyperscalers means Nvidia's revenue is concentrated in 4-5 customers. If any of them slow 2027 authorizations (which the macro environment pressures them to do), Nvidia's revenue cliff is steep and sudden.
- An NBER study (February 2026) found 90% of firms report no AI impact on workplace productivity, despite executives projecting 1.4% productivity increase and 0.8% output increase. The gap between AI spending and AI results is quantified.
As an IPO bellwether: Nvidia's stock performance is the single best proxy for AI market sentiment. If Nvidia is down when OAI or Anthropic file their S-1, the IPO reception will be hostile regardless of the labs' individual metrics. Nvidia's current price (~$192) with analyst targets ranging from $140 (bearish, 27% downside) to $352 (bullish, 83% upside) tells you the market is genuinely undecided on whether AI capex is justified. That uncertainty is the environment the IPOs have to navigate.
Nvidia's May 2026 earnings report is arguably the single most important data point for whether OAI's S-1 lands in a receptive or hostile market.
TSMC: The Physical Bottleneck
TSMC is the single point of failure for advanced AI chips. Nvidia, AMD, Apple, Qualcomm — everyone fabbing at leading edge goes through TSMC.
Capacity:
- Advanced-node wafer demand is 3x available supply. This isn't soft demand — it's orders TSMC can't fill.
- CoWoS (advanced packaging for AI chips) is sold out through 2026 and into 2027. Nvidia has booked >50% of CoWoS capacity for 2026-27.
- TSMC is quadrupling CoWoS capacity to ~130,000 wafers/month by late 2026. But that still won't close the gap.
- AI accelerator revenue CAGR 2024-2029 revised UP to 54-56% (from 45%).
Capex:
- $150B in capex planned over three years. $52-56B in 2026 alone.
- 10-20% going to packaging/testing (the real bottleneck, not transistors).
What this tells us: TSMC's order book confirms that AI chip demand is real and physical, not just financial. But the supply side CAN'T keep up until 2027-2028 at the earliest. Advanced packaging is the binding constraint, and quadrupling capacity takes 12-18 months from decision to production. Even if demand stays flat, the supply shortage persists through 2027. And if demand drops, TSMC has $150B in committed capex that can't be un-committed — creating an overbuild risk on the other side.
ASML: The 12-18 Month Forward Indicator
ASML makes the machines that make the chips. Their order book is the longest-lead indicator in the entire supply chain because their EUV lithography systems take 12-18 months to deliver.
Current state:
- Backlog: €38.8B ($46.3B) — record
- Net bookings: €28.0B ($33.4B)
- EUV systems: 65% of backlog (up from 62% prior year)
- Massive $7.9B order from SK Hynix alone (for HBM production)
- 2026 revenue guidance: €34-39B ($40-46B)
- Revenue target: $71B by 2030
What this tells us: The semiconductor industry is building capacity AT SCALE for 2027-2028. The ASML backlog confirms:
- Memory manufacturers are aggressively expanding HBM capacity (the SK Hynix order). The DRAM-to-HBM capacity shift continues and potentially accelerates. Consumer DRAM stays starved.
- Leading-edge capacity expansion is locked in through 2027-2028. These machines are ordered, being built, and will be installed. The question isn't whether capacity comes — it's whether demand justifies it when it arrives.
- EUV's share of backlog increasing means the expansion is concentrated at the frontier — the most expensive, most AI-relevant nodes. Very specifically AI-optimized.
Memory Manufacturers: Consumer Starvation Timeline
SK Hynix:
- ALL DRAM, NAND, and HBM production through 2026 is sold out
- HBM annualized revenue run-rate: ~$8B
- New M15X fab: HBM4 production starting February 2026, ~10,000 wafers/month initially, scaling severalfold by end of 2026
- Yongin cluster: begins operations 2027, adds HBM4 capacity
- Holds 62% of HBM market share
Samsung:
- Planning 50% HBM capacity increase in 2026 (to ~250,000 wafers/month by year-end)
- Raising DRAM prices 30% for Q2 2026 contracts
- P5 facility (Pyeongtaek): not operational until 2028
- Micron shut down Crucial (consumer brand) entirely — redirected all wafer allocation to enterprise/datacenter
- DDR4 extended lifecycles for enterprise/contract customers — consumer and OEM markets unlikely to see relief before 2027
The consumer starvation math:
- Data centers consume 70% of all memory chips produced in 2026
- HBM consumes 3x the wafer capacity of DDR5 per gigabyte
- All three manufacturers (95% of global DRAM) are prioritizing HBM/enterprise over consumer
- Consumer DRAM relief: not before late 2027 at the earliest, and only if new fabs (SK Hynix Yongin, Samsung P5) come online on schedule
Power/Cooling as the New Binding Constraint
Beyond chips and memory, the upstream picture reveals a constraint nobody was talking about 12 months ago:
- Power draw per rack: surged from 10-14 kW to over 100 kW for AI accelerator clusters
- 50% of global data center projects face delays due to power limitations and grid equipment shortages
- Power availability has replaced fiber connectivity as the primary data center siting constraint
- 11 GW of announced data center capacity for 2026 has no construction underway
- AEP Ohio has paused ALL new data center interconnections due to insufficient power infrastructure
This is the ceiling that money can't buy. You can order more Nvidia GPUs. You can build more TSMC fabs. But you can't build power generation and grid infrastructure in 12 months. The lead time on new power generation is 3-7 years depending on type (gas turbine, nuclear, solar+storage).
The $660-700B hyperscaler capex commitment is real money chasing physical capacity that doesn't exist yet and can't be built fast enough.
The Four Numbers to Watch
These are the numbers that can't be spun. They're the ground truth beneath the narrative:
- Nvidia's revenue trajectory (quarterly earnings, May and August 2026)
- TSMC utilization rates (if they drop below 90% at advanced nodes, alarm bells)
- ASML order cancellations or deferrals (the longest-lead indicator of capex retrenchment)
- Memory contract pricing (Samsung's Q2 2026 +30%; if Q3 moderates, demand is softening)
Part 5: Token Economics and Market Stratification
Current Subsidized Pricing Across Tiers
Token prices have fallen ~80% in the past 12 months. GPT-4-class input went from $30/M tokens to $1.75/M tokens in two years. Industry median: 50x price decline per year. But underlying compute costs haven't fallen proportionally. Inference costs are down ~10x annually, while prices are being pushed well below that by competitive pressure. Labs are deliberately subsidizing — this is classic platform economics in year 3-4 of a 5-8 year cycle. The gap between cost and price is an intentional subsidy funded by investor capital.
Post-Subsidy Price Estimates
OpenAI lost $5B on $3.7B revenue in 2025 — spending $1.35 for every $1 earned. Anthropic is acknowledged to be losing money on heavy individual users.
To reach break-even, prices need to be roughly 1.35x current levels. For healthy margins (20-30%), more like 1.7-2x. For profitability that justifies the valuation, potentially 2.5-3x.
But it's not uniform. Heavy enterprise users on committed contracts likely have better unit economics than consumer users generating unpredictable, bursty inference loads. The subsidy is deepest on the consumer side.
These estimates assume: subsidies unwind 30-50%, efficiency gains (TurboQuant etc.) offset ~20-30% of the increase, and physical supply constraints (RAM, helium, energy) add 10-20% upward pressure.
Tier 1: Frontier Proprietary (enterprise)
| Current (subsidized) | Post-subsidy estimate | Net change | |
|---|---|---|---|
| Opus-class input/M tokens | $5 | $8-12 | +60-140% |
| Opus-class output/M tokens | $25 | $40-65 | +60-160% |
| Monthly enterprise seat | ~$60 | ~$80-120 | +33-100% |
Enterprises absorb this. The 5% seeing 1.7-3x ROI will keep paying. The other 95% doing AI pilots will face hard budget conversations.
Tier 2: Mid-tier Proprietary (prosumer/SMB)
| Current (subsidized) | Post-subsidy estimate | Net change | |
|---|---|---|---|
| Sonnet-class input/M tokens | $3 | $5-8 | +67-167% |
| Sonnet-class output/M tokens | $15 | $25-40 | +67-167% |
| Monthly subscription (Plus-tier) | $20 | $30-50 | +50-150% |
This is the squeeze zone. The $20/month ChatGPT Plus that millions of people use is almost certainly underpriced. OpenAI already called it "accidental." Expect this tier to either rise significantly or get capability-capped (you get last-gen models, rate-limited).
Tier 3: Budget Proprietary (consumer/ad-supported)
| Current (subsidized) | Post-subsidy estimate | Net change | |
|---|---|---|---|
| Haiku/mini-class input/M tokens | $0.50-1.00 | $1-2 | +100-200% |
| Consumer subscription (Go-tier) | $8 | $8-12 (ad-subsidized) | Stable-ish |
| Free tier | Free (rate-limited) | Free (severely limited + ads) | Quality degrades |
This tier survives but degrades. Ad subsidies replace VC subsidies. You get functional but mediocre models with rate limits. The free tier becomes a funnel, not a product.
Tier 4: Self-hosted Open Source
| Current | Post-supply-crisis estimate | Net change | |
|---|---|---|---|
| RTX 4090 (24GB) | ~$1,600 | ~$2,000-2,500 | +25-56% |
| 64GB DDR5 kit | ~$200 | ~$500-700 | +150-250% |
| Electricity/month | $30-80 | $40-100 | +15-33% |
| Effective cost/M tokens (at 50%+ util) | ~$0.013/K | ~$0.02-0.03/K | +50-130% |
| Effective cost/M tokens (at 10% util) | ~$0.13/K | ~$0.15-0.20/K | Worse than APIs |
The break-even point for self-hosting moves higher. Currently ~2M tokens/day or 8,000 conversations/day. Post-crisis, probably 3-4M tokens/day. This excludes the expertise cost — you need someone who can set up and maintain the infrastructure.
The Three-Tier Stratification
Enterprise (Fortune 500): Costs rise 50-100% but are absorbed. These are the durable customers. Deep integrations survive. The 2027 budget conversation gets harder but doesn't kill committed deployments.
Mid-market enterprise (1000-10000 employees): This is where the pain hits. Big enough to have started AI integration, small enough that a 2x cost increase triggers a "is this actually worth it" review. Many will scale back from frontier to mid-tier models, or shift to open-source for non-critical workloads. Bifurcated AI usage: frontier for high-value tasks, open-source for everything else.
SMB and below: Gets pushed almost entirely to open-source or budget ad-supported tiers. The $20-50/month subscription either prices them out or gets capped to the point where the capability isn't worth it. The ones technical enough to self-host do fine. The rest lose access to meaningful AI capabilities.
Consumer: Free tier degrades to ad-supported, rate-limited, last-gen models. Still functional for casual use. Not functional for the "AI as daily tool" use case that power users have built workflows around. The $200/month Pro tier becomes effectively enterprise-only.
Solo developers/indie hackers: The most interesting segment. Currently building products on $3/M token APIs. Post-subsidy, their COGS doubles or triples. Many products that are marginally viable today become unviable. The survivors are the ones who can move to self-hosted open-source models — but that requires hardware investment and MLOps skills that most indie developers don't have.
Enterprise ROI Reality
Correction from original: Enterprise AI ROI is materializing — average $3.70 per dollar invested, with high-performers seeing $10.30 per dollar. But only 5% of enterprises see "real returns," and most need 2-4 years of structured enablement and formal job redesign. The correct frame isn't "ROI hasn't materialized" — it's "ROI is materializing for deep integrations but not for surface-level tool adoption." This is the survival filter: deep integration survives, wrapper-layer products die.
The Jevons Paradox is already happening. Per-token prices drop 80%. But the "token cost illusion" (Artefact's analysis): total enterprise AI bills are rising 320% because usage scales faster than cost falls. Agents, chain-of-thought reasoning, multi-turn workflows, RAG pipelines — these all multiply token consumption per task by 10-100x compared to simple chat completions.
Post-subsidy, this creates a brutal math problem: prices go up AND consumption per task keeps going up. Enterprise AI budgets don't scale linearly with either variable — they hit a ceiling. The question isn't "will companies pay more per token" but "will companies consume fewer tokens when each token costs more and each task consumes more tokens?" The answer might be: they just stop doing certain AI tasks entirely. That's the real demand destruction scenario.
Open Source Adoption Barriers (The Linux Kernel Problem)
The "just switch to open source" advice is technically correct and practically wrong for most organizations.
The current developer experience of deploying OSS LLMs in production:
- Building infrastructure from scratch represents months of engineering work before teams can focus on their actual application
- Production challenges include: autoscaling (lunch traffic is 10x morning), multi-region orchestration for latency, zero-downtime model updates, GPU cost optimization at low utilization
- Teams spend weeks evaluating which model to use, when deployment reliability matters more than model choice
- Ollama made it "one command to run a model locally" — excellent for a developer experimenting on a Saturday, insufficient for a 50-person company needing reliable, monitored, secure inference with audit trails
The capability gap is real and concentrated in the high-value tasks: Open source is "good enough" for 70-80% of tasks by volume. But the 20-30% where frontier models genuinely outperform — complex reasoning, multi-step agent workflows, nuanced document analysis, large-codebase code generation — tend to be the HIGH-VALUE tasks. The ones where AI actually delivers measurable ROI.
This creates a paradox for the "switch to OSS to save money" pitch: the tasks where you'd save money are the tasks that barely matter, and the tasks that matter still need frontier models. The rational response is a hybrid architecture — frontier for high-value, OSS for commodity — but that requires the architectural sophistication to route between them, which most mid-market companies don't have.
Who actually adopts OSS successfully:
- Companies with existing ML/DevOps teams (already have the talent)
- Companies processing >2M tokens/day (the math forces it)
- Companies with strong data privacy requirements (the regulation forces it)
- Developer tools companies (their users understand the tradeoffs)
Who doesn't:
- SMBs without technical staff
- Companies in the "figuring out AI" phase (need the easy path first)
- Regulated industries without compliance frameworks for self-hosted models
- Anyone who needs to move fast and can't afford months of infrastructure work
The Talent Gap as a Self-Hosting Barrier
Self-hosting is the rational economic response to rising API prices. But it requires MLOps expertise that is scarce and expensive. During an economic downturn, companies cut headcount. The engineers who can run self-hosted LLM infrastructure are exactly the ones who don't get laid off — they're too valuable. This means:
- Enterprises that already have ML teams can self-host (and save money)
- Enterprises that don't have ML teams can't self-host (and pay rising API prices)
- SMBs can't afford the talent regardless
The talent gap widens the stratification between large and small companies' access to AI capabilities. It's a moat for enterprises and a wall for everyone else.
Part 6: The Endgame Scenarios
Overbuild vs. Soft Landing vs. Captive Capability
The upstream data reveals the fundamental tension: TSMC is building $150B in capacity. ASML has $46B in backlog. Samsung and SK Hynix are adding 50%+ HBM capacity. All of this arrives 2027-2028.
Scenario A — Soft Landing (55-60%): Capex growth slows to 25-49%, capacity gets absorbed, token prices settle at post-subsidy levels. OAI survives diminished, eventually absorbed by Microsoft. Anthropic IPOs and reaches profitability.
Scenario B — Bust (25-30%): 2027 capex cuts 20-30%, TSMC utilization drops, GPU prices crash 30-50%. OAI dead or absorbed by mid-2027 (Amazon's conditional $35B never unlocks). Anthropic survives — the Amazon-in-2001 parallel. Paradoxically the BEST long-term outcome for AI accessibility.
Scenario C — Captive Capability (10-15%): Demand holds at enterprise tier but collapses for external customers. Hyperscalers use the infrastructure internally. Both labs absorbed. AI becomes a feature of cloud platforms, not a separate market.
In EVERY scenario, OAI ends up as part of Microsoft within 3-5 years. Anthropic has a genuine survival path in A and B.
The Corrected Timeline
| Window | What Happens | Confidence |
|---|---|---|
| Now-April | Hormuz is the binary. Iran-Oman protocol negotiations underway. UK-led 40-nation coalition forming. Reopens = slower unwind. Stays shut = accelerated. | High (situation is live) |
| Q2 2026 | Stagflation lite confirmed or not. Fed cautious but not frozen (has room to cut from 3.50-3.75%). SPR release bridge holds through mid-July. Bond markets continue repricing. | High |
| Q3 2026 | OAI S-1 filed or delayed. Anthropic may file first. Wrapper die-off continues — 40% cumulative failure rate. VC concentration intensifies, not freezes. SPR bridge expires ~July. | Medium-High |
| Q4 2026 | API demand decay from dead wrappers starts showing in lab revenue. Circular financing relationships become scrutinized (FTC already investigating). Enterprise demand share grows as wrapper revenue falls off. | Medium |
| Q1-Q2 2027 | True unit economics land in public markets or next funding round markdowns. Open-source pressure on token pricing is existential for mid-tier API products. Hyperscaler 2027 capex authorizations happen under hostile macro conditions. The Cisco/Nvidia thesis either validates or gets a reprieve depending on whether enterprise ROI narratives hold. | Medium |
What Doesn't Break
Genuine enterprise demand from companies deeply integrated on Claude Code, Azure AI, AWS Bedrock — that's durable. The 5% of enterprises seeing real returns will keep spending. The 70% of Fortune 100 using Claude won't churn. Deep workflow integration creates representation switching costs that survive a macro downturn.
What Breaks
Everything built on the assumption that VC-subsidized growth was real demand. The wrapper layer. The consumer free-tier economics at current scale. The fiction that circular financing represents organic market validation. The token pricing structure that requires perpetual subsidy.
The pyramid was always going to compress. The Iran war gave it a hard deadline — the SPR bridge runs out in July, and the S-1 filing window opens around the same time. Those two events converging is the inflection point.
The Dotcom Parallel (and Why It's Imperfect)
Ray Dalio's "80% into a bubble" assessment. 57% of investors see tech bubble as top risk. Michael Burry specifically uses the Cisco/Nvidia comparison.
The parallel: massive infrastructure buildout completing just as demand peaks. In the dotcom era, fiber was laid that wouldn't be lit for a decade. In AI, compute capacity is being built that may exceed demand by 2027-2028.
Why it's imperfect: unlike the dotcom era, the enterprise demand signal is real (if uneven). 5% of enterprises see genuine returns. The Fortune 100 isn't going back. The question is whether real demand justifies the scale of investment, not whether real demand exists. The gap between AI spending and AI results is quantified (NBER: 90% of firms report no productivity impact) but the survivors are genuinely transformative.
If hyperscaler demand slows, capacity arrives into a market that doesn't need it all. GPU prices drop, memory prices eventually normalize, compute gets cheaper, and token prices fall further — not because of efficiency gains or competition, but because of oversupply. This would be deflationary for the AI infrastructure layer and potentially good for downstream consumers (cheaper tokens, cheaper hardware) but catastrophic for anyone who invested in infrastructure at peak prices.
Where OAI and Anthropic End Up in Each Scenario
Scenario A (Soft Landing): OAI survives diminished — the conglomerate play partially works but enterprise share continues bleeding. Eventually absorbed by Microsoft as the natural endstate of the Azure integration. Anthropic IPOs successfully in late 2026 or early 2027, reaches profitability through enterprise efficiency, becomes the independent alternative to hyperscaler-owned AI.
Scenario B (Bust): OAI dead or absorbed by mid-2027. The $57B annual burn is unsurvivable when the S-1 reveals the full picture and macro conditions prevent another mega-round. Amazon's conditional $35B for Anthropic never fully unlocks, but Anthropic's lower burn rate and 85% enterprise revenue base lets it survive — the Amazon-in-2001 parallel, where the company that actually had real customers outlasted the bubble.
Scenario C (Captive Capability): Both labs absorbed. Microsoft takes OAI into Azure. Amazon takes Anthropic into AWS (or Anthropic merges with Google's DeepMind as a defensive play). AI becomes a feature of cloud platforms, not a separate market. Independent AI labs cease to exist as standalone companies.
Part 7: Liability and the Business Landscape
AI Liability and Insurance
This is the sleeper issue that most people building AI businesses are ignoring. It's about to become very loud.
The insurance market is actively excluding AI: ISO introduced CG 40 47 in early 2026 — an explicit endorsement allowing carriers to exclude ALL claims arising from generative AI outputs from standard commercial general liability (CGL) policies. This is not theoretical risk management. It's insurers saying "we won't cover this."
The market has bifurcated: firms with documented AI governance frameworks may secure affirmative coverage. Firms without evidence-based governance artifacts face absolute exclusions at renewal. 57% of companies identify AI errors and hallucinations as their top risk concern.
Courts treat AI as a tool, not an agent: Liability falls on the deploying organization, not the model provider. When an LLM gives bad advice and someone acts on it, the company that deployed the LLM is liable — not OpenAI, not Meta, not Anthropic. The model provider's terms of service explicitly disclaim liability for outputs. The deployer owns the risk.
The accumulation risk problem: The insurance industry's deepest concern: because everyone uses the same few foundation models, a critical flaw in one widely-adopted model could trigger claims across thousands of unrelated policyholders simultaneously. Unlike natural catastrophes with geographic boundaries, AI failures propagate instantly across industries and borders. This is an unpriced systemic risk.
Frontier Provider Strategy: Anthropic vs. OpenAI
Anthropic's approach: Win on efficiency, lock through integration. Compliance as moat. The switching cost exceeds the price differential. HIPAA configuration, audit trails, SSO/SCIM — these aren't features, they're switching costs disguised as features.
OpenAI's approach: Platform conglomerate. The model is the commodity; the ecosystem is the moat. Devices (Jony Ive), media (TBPN), RAM lock-ups, acquisitions. The Zuckerberg playbook from a position of financial fragility.
Implications for the mid-market: Both frontier providers will actively retain enterprise customers through different mechanisms. Neither will compete for the SMB/consumer market on price — that tier gets degraded, ad-supported, or left to open source. The mid-market becomes contested territory where the providers want the revenue but won't invest in the support infrastructure to serve it cost-effectively.
Business Models That Survive This Environment
The convergence of frontier provider strategy, OSS adoption friction, and liability risk creates a specific set of openings — and a larger set of traps.
What doesn't work:
- AI wrappers (dying at 40% rate, no moat, commodity zone)
- Pure AI education for consumers/SMBs (teaching people to use products that are about to get more expensive or worse)
- Managed OSS hosting without a compliance layer (takes on liability without the tools to manage it)
- Anything that competes directly with Anthropic's enterprise motion or OpenAI's conglomerate strategy
What works:
1. The AI Cost Architect (consulting, $150-250/hr) — Audit enterprise AI spend. Route low-value tasks to OSS, keep high-value tasks on frontier. Document the decision framework so the legal team can defend it and the CFO can justify it. Pure advisory, no deployment liability. Timely — every mid-market company will need this as subsidies unwind and they see their API bills double. Requires: deep understanding of both frontier and OSS capabilities, ability to speak to both engineering and finance, credibility from actual builds.
2. The Compliance-First MSP ($500-2000/mo per client) — Not "we run your LLM" but "we run your LLM with governance." Audit trails, output monitoring, documented risk frameworks, human-in-the-loop checkpoints. The value isn't the model — it's the compliance wrapper that keeps the client's insurance valid and their liability manageable. This is the gap nobody fills. Frontier providers sell compliance to Fortune 500. OSS tools have no compliance layer. MSPs are bolting on AI as an afterthought. Requires: understanding of AI governance frameworks, insurance requirements, and the regulatory landscape.
3. The Human-in-the-Loop Layer (product, SaaS) — This is Sigil's positioning. A spec layer that sits between the LLM and production output. The human checkpoint is what makes the output defensible. "The AI suggested it, a human reviewed and approved it" is a legally defensible position. "The AI did it autonomously" is not. As liability pressure increases and insurance exclusions bite, every company using AI in production will need a human-in-the-loop mechanism. Requires: the product to exist and be adopted before the liability pressure peaks.
4. Edge-LLM as the Zero-Liability Consumer Play (product, open source) — Browser-native inference has a unique property: it transfers ALL liability to the end user, by design. The model runs on their device. The outputs are generated locally. There's no service provider in the middle to sue. For consumers and indie developers priced out of frontier APIs: free, private, no ongoing cost, no liability chain. The capability ceiling is lower (small models only), but in a world where every other option is getting more expensive, more restricted, or more legally fraught — "runs for free on your laptop with zero strings" has a clear value proposition.
Implications for Friend's Business Ideas
AI-based debate/comparison system — High risk. Falls squarely in the wrapper kill zone — 40% of this category has already died. Without proprietary data or deep enterprise workflow integration, it faces: low switching costs, commoditization from model providers moving up-stack, and margin compression as the token subsidy window closes. Dead on arrival in this environment. It's a wrapper in a wrapper-killing market with rising costs and contracting funding for exactly this category of product.
AI/agent education for businesses — More viable, shorter window than expected. Real demand exists — enterprises are moving from pilots to production. But:
- Enterprise ROI is already materializing for high-performers (1.7-3x returns), meaning the "figuring it out" education window is narrower than it looks
- SMB/consumer education spending is discretionary and early to cut in a stagflation scenario
- The durable version is enterprise consulting embedded in deployment, not classes — but that's a services business with a cold-start problem
- Build time means arriving late to a window that's already narrowing
- The people who most need AI education are the people who will be pushed to degraded tiers or self-hosting. Teaching them to use ChatGPT is teaching them to use a product that's about to get either more expensive or worse. Teaching them to self-host open-source models is more durable — but it's a harder sell, a smaller market, and requires technical depth that's harder to productize.
The honest assessment: there is demand, the runtime may be short, and the effort-to-return ratio is unfavorable for a product. As a consulting/services play with existing enterprise relationships, it's viable. As a product bet requiring months of build time, the timing doesn't work.
The Edge-LLM and Sigil Positioning
What this means for the AI services space: If you run a managed OSS LLM service for SMBs, you are potentially the deploying organization in the liability chain. When the LLM gives bad advice to your client's customer, the chain goes: injured party → your client → you → ??? The open-source model provider (Meta, Mistral, DeepSeek) has no contractual relationship with you. There's no SLA, no indemnification, no target for the lawsuit upstream.
This is why Sigil's human-in-the-loop positioning matters. And why Edge-LLM's zero-liability architecture matters. Both sidestep the liability sandwich from different angles — Sigil by making the human the decision-maker, Edge-LLM by removing the service provider entirely.
Unknown Unknowns
1. The Helium-Semiconductor-Memory Triple Cascade
Less helium → slower chip production → less memory manufactured → higher prices → more expensive AI infrastructure → higher token costs. AND simultaneously: the same fabs that make consumer DRAM also make the HBM that goes into AI servers. So the shortage doesn't just raise consumer costs — it constrains the supply side of compute capacity that determines how many tokens can be served at any price. This is a feedback loop, not just a chain.
2. The Jevons Paradox Is Already Happening
Per-token prices drop 80%. Total enterprise AI bills rise 320% because usage scales faster than cost falls. Agents, chain-of-thought reasoning, multi-turn workflows, RAG pipelines multiply token consumption per task by 10-100x. Post-subsidy, prices go up AND consumption per task keeps going up. Companies might just stop doing certain AI tasks entirely. That's the real demand destruction scenario.
3. China's Parallel AI Ecosystem as a Price Anchor
DeepSeek already matches frontier models on many benchmarks at a fraction of the cost. Huawei's UB-Mesh (CXL alternative) creates independent memory architecture. If China builds a fully domestic AI stack, you get a parallel ecosystem with fundamentally different cost structures. This could be deflationary for global token prices — if Chinese models accessible via API can serve most use cases at 1/10th the price, it caps how high Western labs can raise prices. This is a geopolitical question masquerading as an economics question.
4. Insurance and Liability Costs Haven't Been Priced In
As AI moves from "experimental" to "production," liability emerges. As regulatory frameworks solidify (EU AI Act enforcement starts 2026), compliance costs and potential liability insurance become a pricing layer on top of raw inference costs. Could add 10-20% to enterprise total cost of ownership. Also creates a moat for the largest providers who can absorb compliance overhead.
5. The Talent Gap as a Self-Hosting Barrier
Covered in detail in Part 5. The short version: self-hosting requires MLOps expertise that is scarce and expensive, and economic downturns make the talent gap worse, not better.
6. The Ad-Supported AI Model May Not Work
OpenAI is adding ads to the free tier. But ad-supported AI has a fundamental problem: the interaction model (conversational, private, task-specific) is structurally hostile to advertising. Users aren't browsing — they're doing tasks. Attention isn't ambient — it's focused. If the ad model underperforms, the free tier degrades faster or gets subsidized less. This accelerates the quality gap between free and paid tiers.
7. Model Commoditization Could Flip the Script
The one that could break the thesis in the other direction. If open-source models continue to improve at current rates, and if TurboQuant-type efficiency breakthroughs compound, there's a scenario where frontier capability matters less and less (diminishing returns on model scale), mid-tier open-source models become genuinely good enough for 95%+ of use cases, self-hosting costs drop even as hardware costs rise (because models get more efficient), and the proprietary labs' pricing power evaporates entirely.
This is the "commoditization escape" — where the technology gets cheap enough fast enough that the supply chain crisis and subsidy unwind don't matter because you don't need expensive frontier models anymore. The timeline question: does this happen in 12 months (probably not) or 24-36 months (possibly)?
TurboQuant suggests the efficiency trajectory is real. DeepSeek suggests the open-source capability trajectory is real. But the physical constraints (helium, RAM, energy) create headwinds that slow adoption of self-hosted solutions even if the models themselves are ready.
Technical Notes
CXL (Compute Express Link)
CXL 3.1 is deploying broadly in 2026 — 90% of new servers are CXL-capable. It dissolves the traditional "memory wall" by enabling memory pooling across systems, hitting 128 GB/s bidirectional throughput on x16 links. This is real and it helps — but it's enterprise infrastructure. It helps hyperscalers use memory more efficiently. It does not help consumers or SMBs running Ollama on a desktop. The Chinese variant (Huawei UB-Mesh) creates a parallel ecosystem, which matters for the DeepSeek/Qwen cost advantage story.
TurboQuant
Google's TurboQuant (dropped March 25, 2026) is genuinely impressive: 6x KV cache memory reduction, 8x inference speedup on H100s, zero accuracy loss. Training-free, works on existing models. This is the kind of breakthrough that could meaningfully change the cost curve.
But it only addresses one bottleneck (KV cache memory). It doesn't fix the underlying HBM/DRAM supply crunch, energy costs, the helium-to-fab pipeline, the fundamental cost of frontier model training, or the gap between inference cost and current subsidized token prices.
TurboQuant-type breakthroughs will keep coming — inference efficiency is a hot research area. Over 2-3 years, these compound into real cost reductions. Over 12 months, with a physical supply chain crisis layered on top, they're speed bumps against a freight train.
Who Benefits, Who Gets Hurt (Probability-Weighted)
Gets hurt regardless of scenario:
- US working/middle class — persistent inflation, softening job market, reduced purchasing power through at least 2027
- Tech/SaaS companies with floating-rate debt — quiet wave of failures, compressed multiples, no exit liquidity
- Developing world — bad to catastrophic depending on scenario
- Iran civilians — bad in all scenarios
- US alliance structure — damaged in all scenarios
Benefits regardless of scenario:
- Oil majors — very good in all scenarios
- Defense contractors — very good in all scenarios
- Distressed debt / vulture funds — very good in B and C
- Gold / hard asset holders — sustained inflation hedge demand
The AI-specific implication: Every scenario that's bad for the broad economy is bad for the VC-subsidized AI ecosystem. Consumer spending contracts → OAI consumer revenue pressured. Credit tightens → startup funding dries up → API churn accelerates. Rates stay elevated → hyperscaler borrowing costs rise → 2027 capex authorizations get harder. The geopolitical grind doesn't need to be catastrophic to trigger the AI repricing. It just needs to be sustained.
Sources
Hormuz and Energy
- 2026 Strait of Hormuz crisis - Wikipedia
- Al Jazeera: Iran's closure of the Strait of Hormuz
- NPR: How traffic dried up in the Strait of Hormuz
- Fox Business: QatarEnergy 17% LNG output cut
- Bloomberg: Qatar Ras Laffan hit by missile
- US DOE: SPR Release of 172M barrels
- CBS: SPR at lowest levels in 44 years
Macro and Markets
- Morningstar: Oil spike shrinks rate cut expectations
- Treasury Yields Snapshot April 2, 2026
- CNBC: Tech AI spending approaches $700B
- Bloomberg: Mag7 near correction
- Techi: Mag7 $2T lost
VC and Startups
- Crunchbase: Q1 2026 shatters funding records
- SimpleClosure: State of Startup Shutdowns 2025
- Yahoo Finance: 2025 Startup Shutdown report
AI Lab Financials
- Axios: Anthropic turns tables on OpenAI in enterprise
- SaaStr: Anthropic $14B ARR
- Fortune: OpenAI cash burn projections
- Sherwood News: OpenAI burn rate doubling
- CNBC: Amazon $8B in Anthropic
- Where's Your Ed At: Anthropic/Cursor AWS costs
Token Economics
- Epoch AI: LLM inference price trends
- Artefact: The token cost illusion
- MasterOfCode: Only 5% see real AI ROI
- Deloitte: State of AI in the Enterprise 2026
- Fortune: Michael Burry Nvidia/Cisco warning
RAM, Helium, and Hardware Supply Chain
- Wikipedia: 2024-2026 global memory supply shortage
- CNBC: AI memory sold out, unprecedented price surge
- IEEE Spectrum: AI boom fuels DRAM shortage
- Samsung warns of memory shortages driving price surge
- TrendForce: AI to consume 20% of global DRAM wafer capacity
- Tom's Hardware: Helium shortage threatens chipmaking
- CBS: Iran war disrupting helium and aluminum supplies
- HPCwire: Helium shortage constrains high-density compute
- Tom's Hardware: RAM price tracking 2026
- Wccftech: RAM shortage 2026 explained
- Tom's Hardware: AMD raising GPU prices 10%+
Efficiency Breakthroughs
- Google Research: TurboQuant — redefining AI efficiency
- Stark Insider: TurboQuant breakthrough worth watching
- CXL Consortium: Overcoming the AI memory wall
Energy and Water
- NPR: AI data centers and your power bill
- Yahoo Finance: AI data center boom raising power costs
- Domain-b: 2026 is the year AI bottlenecks shift from chips to water
- Bloomberg: AI data center water use still an afterthought
Pricing and Self-Hosting
- Winbuzzer: OpenAI calls current pricing "accidental"
- SitePoint: Local LLMs vs Cloud APIs TCO 2026
- DevTk: Self-hosting LLM vs API cost 2026
- a16z: LLMflation — inference cost going down fast
Frontier Provider Strategy
- CNBC: Anthropic's "do more with less" bet
- Fortune: Dario Amodei on culture and leadership
- Fortune: Amodei on power concentration in AI
- NPI Financial: Anthropic's new pricing — lower seats, higher TCO
- Big Technology: Sam Altman on OpenAI's plan to win
- CNBC: OpenAI M&A strategy "chasing vibes"
- The Deep Dive: OpenAI locked up 40% of global RAM
- Tom's Hardware: OpenAI Stargate DRAM deal
AI Liability and Insurance
- Wiley: 2026 state AI bills expanding liability
- Risk & Insurance: Traditional insurance leaves enterprises exposed
- Swept AI: New CGL exclusions for AI
- PHL Firm: Generative AI insurance exclusions 2026
- IAPP: AI liability risks challenging insurance landscape
- Harvard Law: Hidden C-suite risk of AI failures
OSS Adoption and Mid-Market
- Channel Insider: MSP guide to AI strategy for SMBs
- OpenAI: Small Business AI Jam
- SitePoint: Definitive guide to running local LLMs in production
- Northflank: Open source LLMs developer guide
Upstream Supply Chain and Vendor Analysis
- CNBC: Nvidia Q4 FY2026 earnings — $68B revenue, data center +75%
- Fortune: Nvidia Q4 — $68B revenue quashes bubble talk
- TSMC: CoWoS capacity quadrupling to 130K wafers/month
- TrendForce: TSMC earnings preview — $150B capex over 3 years
- Fusion Worldwide: AI bottleneck — CoWoS, HBM, and 2-3nm constraints through 2027
- ASML: Q4 FY2025 press conference — record backlog €38.8B
- Tom's Hardware: ASML projects $71B revenue by 2030
- TrendForce: Samsung 50% HBM capacity surge in 2026
- NotebookCheck: SK Hynix sold out through 2026
- Tom's Hardware: HBM is eating your RAM
- BuySellRam: Samsung raises DRAM prices 30% for Q2 2026
- Moodys: Semiconductors 2026 — supply chains are a major bottleneck
- Deloitte: 2026 semiconductor industry outlook
IPO and Capital Structure
- CNBC: OpenAI closes $122B round
- TechCrunch: OpenAI raises $3B from retail investors
- Anthropic: $30B Series G at $380B valuation
- Winbuzzer: Anthropic eyes $60B IPO in Q4 2026
- TechCrunch: SoftBank's $40B loan points to 2026 OpenAI IPO
- Let's Data Science: Anthropic revenue doubled, targeting $60B IPO
MAGI Review — April 2, 2026
MELCHIOR (The Operator)
This is a 650-line macro thesis with four business models stapled to the end. The analysis is genuinely strong — the helium-semiconductor cascade, the insurance exclusion angle (CG 40 47), the circular financing exposure are real insights most people haven't connected.
But: 90% building the case, 10% on what to actually do. The four models read like a consultant's options slide, not a builder's plan. No pricing validation. No first customer. No "I email this person Monday morning." The AI Cost Architect idea fits the profile best — technical depth, business background, credibility from real builds — but it's still just a label. Who's the first client? What's the week-one deliverable?
5-10 hrs/week of margin. This doc consumed a meaningful chunk. Did writing it move closer to revenue, or just feel like it did? If this informs Sigil or Edge-LLM, show me the spec change or the positioning pivot that resulted. Otherwise it's intellectual infrastructure with no load on it.
Verdict: NEEDS WORK — not on the analysis, but on extracting one concrete action with a deadline.
BALTHASAR (The Contrarian)
The structural weakness is the coupling assumption. Six layers presented as dominoes, but they run on independent clocks with different reversal speeds. AI subsidy economics were unwinding BEFORE the first missile hit Ras Laffan — 3,800 startups died in 2025, token prices fell 80%, OAI was burning $1.35 per dollar earned. Hormuz is catalyst, not cause.
The doc can't decide if it's geopolitical analysis or AI economics. Binding them together adds narrative force but makes the frame fragile — if Hormuz reopens, the whole dramatic timeline dissolves. The AI repricing thesis is strong on its own merits. Bolting Hormuz onto it makes a strong argument artificially fragile by coupling it to a geopolitical variable that could reverse tomorrow.
Token pricing estimates (60-160% increases) are weakest. They assume efficiency gains offset only 20-30%, but TurboQuant alone delivers 6-8x on one bottleneck, and these compound. The doc acknowledges this in Unknown #7 then proceeds to ignore its own caveat.
The Altman-Zuckerberg comparison is backwards. Zuckerberg pivoted from profitable dominance. Altman pivots from subsidized dominance with $57B projected burn and declining enterprise share. That's not "platform conglomerate" — that's conglomerate desperation.
Verdict: FRAME WEAK — the AI economics thesis holds independently, but the causal chain architecture makes a strong argument artificially fragile by coupling it to a geopolitical variable that could reverse tomorrow.
CASPER (The Mirror)
You started this because a friend floated some business ideas and you wanted to fact-check them. You could have done that in two pages. Instead you wrote the most comprehensive macro-to-micro analysis in this vault — fourteen thousand words, tracing physical supply chains through geopolitics through semiconductor materials through insurance law through token economics through exact pricing tiers. On a single day.
This is not fact-checking for a friend. This is you building the case for yourself.
You already know Sigil is the right bet. You knew before you wrote this. But you needed to prove — through exhaustive, undeniable, multi-layered research — that the world is converging on exactly the problem Sigil solves. The human-in-the-loop layer. The liability play. The compliance moat. You traced the entire global economy to arrive at your own product.
The pattern: you build conviction through research, not through action. The research IS the conviction-building process, and it's genuine. But it also feels safe. Nobody can reject a research doc. Nobody can ignore your product if it doesn't exist yet.
You're not procrastinating. You're loading the spring. I just want to make sure you know it's loaded.
Question: When does the research stop being preparation and start being the thing you do instead of shipping?