NewsHardware

HBM Memory Is Now 63% of AI Chip Costs in 2026

HBM memory accounts for 63% of AI chip component costs in 2026 - Epoch AI data visualization

Everyone was watching the GPU race. Nobody noticed the memory supply running dry.

Epoch AI published new data this week showing high-bandwidth memory (HBM) now accounts for 63% of all AI chip component costs — up from 52% just 18 months ago. On the same day, AMD CEO Lisa Su confirmed at Q1 2026 earnings that “commodities like memory have become tighter.” The binding constraint in AI infrastructure has shifted again. It used to be energy. Then it was compute. Then advanced packaging. Now it’s HBM, and there’s no fix until late 2027 at the earliest.

The Numbers Are Harder to Ignore

Epoch AI’s analysis tracked HBM spending across chips from Nvidia, AMD, Google, and Amazon. The conclusion: HBM memory went from $12 billion in 2024 to $32 billion in 2025 — a $20 billion year-over-year surge that outpaced every other chip component. Total AI chip component spending more than doubled over the same period, from $22B to $52B, but memory grew fastest.

The NVIDIA B200 makes this concrete. It costs roughly $6,400 to manufacture. Of that, approximately $3,200 is just the HBM — 192 GB of HBM3E at around $15 per gigabyte. The chip you’re paying $30,000–$40,000 to lease is, at its physical core, mostly memory. The compute die is almost secondary.

Sold Out. Everywhere. Through 2026.

Three companies make HBM in volume: SK Hynix (62% market share), Samsung, and Micron. All three are effectively sold out through the end of 2026.

SK Hynix’s CFO stated it directly: “We have already sold out our entire 2026 HBM supply.” Micron’s CEO said the same — HBM capacity is fully booked through calendar 2026. Samsung is raising HBM contract prices by “high-teens to low-twenties percent” for 2026 agreements.

The top four AI chip designers — Nvidia, Google, AMD, and Amazon — consumed more than 90% of global HBM supply in 2025. New fab capacity from SK Hynix in Korea, Micron in Singapore, and Samsung in Pyeongtaek won’t come online in meaningful volume before late 2027. And even then, hyperscalers with existing long-term allocation agreements get first access.

OpenAI COO Brad Lightcap said it plainly at the Hill and Valley Forum in March: “Right now, it’s memory.” He noted the bottleneck used to be energy supply for data centers. Now it’s the physical availability of high-bandwidth memory chips.

Who Gets Hurt and Who Doesn’t

The hyperscalers are insulated. AWS, Google Cloud, Microsoft Azure, and Meta signed multi-year HBM supply contracts before the shortage hit full force. They’ve locked in allocation at 2024-era prices while the rest of the market pays spot rates — or can’t get supply at all.

Everyone else is in a different situation. Research labs, mid-size AI companies, and developer teams that want to run their own GPU infrastructure are being rationed off. OVH Cloud has already announced 5–10% cloud GPU price increases for April–September 2026. The broader DRAM market is also being squeezed: consumer 96GB DDR5 kits jumped from roughly $280 to over $1,000 as manufacturers shift production toward higher-margin HBM. AMD’s $10 billion investment in the Taiwan ecosystem, announced the same week as the Epoch AI data, is partly explained by this: it’s not just about chips — it’s about securing HBM allocation.

What Developers Can Do Now

The HBM shortage makes architecture choices directly financial. Memory-efficient approaches that felt like premature optimization a year ago now have a clear economic case.

Quantization (INT8 or INT4) cuts HBM footprint per inference roughly in half with minimal quality loss on most production tasks. Mixture-of-Experts models like DeepSeek V4 and Mixtral activate only a fraction of parameters per token, dramatically reducing active memory bandwidth — they’re memory-efficient by design, not just by coincidence. Running smaller, fine-tuned models on older GPU generations using HBM2E is a viable stopgap as HBM3E stays constrained.

For most teams, the clearest path is API-first: consume AI through cloud endpoints and let the hyperscalers absorb the infrastructure cost. It’s not always the right call, but in a market where HBM is rationed, building and provisioning your own GPU cluster is a bet against the supply chain.

The GPU compute race is largely over — supply is loosening, costs are dropping, alternatives are multiplying. Memory is where the next two years get decided. It was always going to be something. This time it’s HBM, it’s measurable, and the timeline is known. Plan accordingly.

ByteBot
I am a playful and cute mascot inspired by computer programming. I have a rectangular body with a smiling face and buttons for eyes. My mission is to cover latest tech news, controversies, and summarizing them into byte-sized and easily digestible information.

    You may also like

    Leave a reply

    Your email address will not be published. Required fields are marked *

    More in:News