
OpenAI signed a $10+ billion deal with Cerebras Systems on January 14, 2026, committing to 750 megawatts of AI inference computing capacity through 2028—the largest high-speed AI inference deployment in the world. That’s enough power to run 123,000 U.S. homes, all dedicated to making chatbots respond faster. The deal reveals what the AI industry doesn’t want to discuss: the economics don’t work, and the energy demands are unsustainable.
The Profitability Crisis Nobody Wants to Address
OpenAI spent $8.67 billion on inference compute in the first nine months of 2025—nearly double the company’s revenue for the same period. Sam Altman has publicly admitted that OpenAI loses money on $200-per-month ChatGPT Pro subscriptions. Now the company is committing another $10 billion to Cerebras, on top of existing multi-cloud contracts totaling $115+ billion through 2029 across AWS, Azure, Google Cloud, and Oracle.
The math is brutal. In 2024, OpenAI spent $3.76 billion on inference alone while generating $3.7 billion in revenue. By the first half of 2025, costs had ballooned to $5.02 billion against $4.3 billion in revenue. The API business shows a healthy 75% profit margin, but consumer-facing products like ChatGPT are hemorrhaging money. Microsoft takes a 20% revenue share on top of compute costs, making profitability even more distant.
This isn’t sustainable, and OpenAI isn’t alone. The entire AI industry is pricing products below cost, subsidized by unprecedented venture capital. The question isn’t whether AI will transform industries—it’s whether these companies can survive long enough to make the economics work.
Why AI Inference Became the Bottleneck
While the industry obsessed over training costs—building the models—inference quietly became the real problem. Training is a one-time expense. However, inference is the cost of serving every single request from every single user, billions of times over. When ChatGPT has 100 million weekly users, each query hits the compute bill.
OpenAI’s $10 billion Cerebras partnership exclusively targets inference, signaling where the industry’s pain point has shifted. Cerebras’ Wafer-Scale Engine delivers responses 15 times faster than GPU-based systems by eliminating the communication bottlenecks that plague traditional architectures. The company’s WSE-3 chip puts 900,000 AI cores and 44 GB of memory on a single silicon wafer, achieving 21 petabytes per second of memory bandwidth—orders of magnitude beyond what Nvidia’s GPUs can deliver through external High-Bandwidth Memory.
Benchmarks tell the story. For GPT-like models, Cerebras achieves 2,700 tokens per second compared to Nvidia’s B200 at 900 tokens per second. For Meta’s Llama 4 Maverick, it’s 2,500 versus 1,000. Moreover, Cerebras is also 32% cheaper per workload than Nvidia’s flagship Blackwell GPU. Speed matters when milliseconds determine whether users perceive your AI as responsive or sluggish. Low latency is becoming a competitive moat.
750 Megawatts to Power Faster Chatbots
The energy scale is staggering. 750 megawatts equals the electricity needs of a medium-sized city with 500,000 residents, or the output of a large nuclear reactor. That’s for one company’s inference workload—just OpenAI. The current largest U.S. data centers draw less than 500 megawatts. OpenAI’s Cerebras deal alone is 1.5 times larger.
Industry projections suggest U.S. AI data center power demand could reach 123 gigawatts by 2035, up from 4 gigawatts in 2024—a thirtyfold increase. Furthermore, global data center electricity consumption may double to 945 terawatt-hours by 2030, representing nearly 3% of total global electricity. Average data center power density has already doubled in two years, jumping from 8 kilowatts per rack to 17 kilowatts, with expectations to hit 30 kilowatts by 2027.
Where will this power come from? Who pays the environmental cost? The AI race treats energy consumption as an externality to be solved later, but “later” is arriving faster than the infrastructure can handle. We’re building the equivalent of small cities to make email drafts and code autocomplete slightly better.
Diversification That Creates New Lock-In
OpenAI’s Cerebras deal is part of a broader strategy to escape dependence on Microsoft Azure, which has provided most of the company’s compute since its founding. In 2025, OpenAI signed a $38 billion seven-year partnership with AWS, secured multi-billion dollar agreements with Google Cloud and Oracle, and still committed to $250 billion of Azure services. The Microsoft exclusivity arrangement shifted to a “Right of First Refusal” model—training workloads can go anywhere, but API traffic still routes through Azure, preserving Microsoft’s 20% revenue share.
The irony is thick. OpenAI is diversifying to avoid vendor lock-in by… creating lock-in with multiple vendors through massive multi-year commitments totaling $115+ billion. Only mega-funded companies like OpenAI, Google, and Anthropic can afford this strategy. Smaller AI startups face a choice: rely on expensive API providers or compete on infrastructure—which they can’t afford.
Cerebras benefits too. In the first half of 2024, G42 (a UAE company) accounted for 87% of Cerebras’ revenue—a dangerous single-customer dependency. Consequently, OpenAI’s $10 billion deal provides critical diversification and validates the company’s technology ahead of a potential IPO. Both sides are building infrastructure moats, but the moats increasingly look like trenches neither can escape.
Specialized Chips Challenge GPU Dominance
The Cerebras partnership signals another shift: Nvidia’s stranglehold on AI compute is weakening. While GPUs remain dominant for training, specialized inference chips are proving superior for serving models at scale. Cerebras’ wafer-scale approach—eliminating chip-to-chip communication by putting everything on a single 46,255 mm² wafer—is fundamentally different from Nvidia’s strategy of networking many smaller chips together.
The future architecture is emerging: train on GPUs, serve on specialized chips. Groq’s LPU (Language Processing Unit) and various custom ASICs are entering the market with similar promises—faster inference, lower latency, better cost-efficiency for specific workloads. Nvidia still has ecosystem advantages—ubiquitous availability, mature developer tools, broad use case support—but workload-specific optimization is becoming the key differentiator.
Whether this fragmentation benefits the industry or just shifts dependency from one vendor to several remains unclear. OpenAI’s multi-cloud, multi-chip strategy suggests the latter. Infrastructure is becoming so expensive and complex that only the largest, best-funded companies can compete.
The Unsustainable Arithmetic of AI at Scale
OpenAI’s $10 billion Cerebras deal is a bet that faster inference will unlock new use cases and eventually justify the costs. Agentic AI applications—autonomous systems that take actions on behalf of users—require low-latency responses. Real-time interactions need millisecond-level performance. The technology is impressive. Cerebras delivered genuine innovation in chip architecture. The speed improvements are real.
But the fundamental economics remain broken. Spending twice your revenue on compute alone isn’t a business model—it’s a subsidy program funded by venture capital. Energy consumption that rivals small cities for chatbot responses isn’t sustainable. Multi-billion dollar infrastructure commitments that only the ultra-funded can afford will stifle competition and innovation.
The AI infrastructure crisis is real, and deals like this one make it more visible. Until the industry solves the profitability equation—or at least gets costs within striking distance of revenue—every billion-dollar compute commitment looks less like visionary investment and more like doubling down on an unsustainable bet.










