NVIDIA Blackwell AI Chip Shortage: Sold Out for Next 12 Months Due to Skyrocketing Demand

view original post

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Chipmaker NVIDIA recently announced that its latest Blackwell graphics processing units (GPUs) are sold out until the end of 2025, snapped up by the customers—including Meta, Microsoft, Google, Amazon, and Oracle. The AI leaders’ deep pockets and large-volume orders make it difficult for smaller companies to compete and relegating them to a year-long wait. It’s too early to know the effect this shortage will have on the tech industry—especially now, with the GPU-heavy development of artificial intelligence booming—but it might lead to competitive disadvantages, a possible black market, or an opportunity for NVIDIA’s rivals to gain ground on the longtime leader. 

KEY TAKEAWAYS

  • Unprecedented demand for NVIDIA Blackwell GPUs is being driven by the AI boom. (Jump to Section)
  • Impact on the market isn’t clear, but it’s likely to provide an opening to competing chipmakers and disadvantage smaller AI companies. (Jump to Section)
  • Rivals are circling, hoping to capitalize on this delay and tempt users to alternative solutions. (Jump to Section)

What is the NVIDIA Blackwell Chip?

NVIDIA has grown to prominence over the last couple of years due to heavy demand by AI developers for its H100 and GH200 chips. The company designed its next-generation GPUs, Blackwell B200 and GB200, for demanding data center, AI, and high-performance computing (HPC) applications. The B200 improves on the previous generation’s 80 billion transistors and 4 petaflops with more than 200 billion transistors and 20 petaflops, and packs in almost 200 GB of HBM3e memory to deliver as much as 8 TB/sec of bandwidth to provide the processing power required by high-end data applications like AI.

What Factors are Causing the Shortage?

The NVIDIA Blackwell’s sheer power is an obvious factor in the heavy demand for the chip, but other elements are also at play. Chipmakers have been on a headlong rush to push the thermal design power (TDP) limits for microchips. Interest in AI is booming in parallel, popularized by the rise of ChatGPT and other generative AI applications over the past two year, and data centers are scrambling to upgrade facilities in preparation for demand.

They need GPUs, high-powered CPUs, as much memory as they can assemble, the fastest possible interconnects, and immense amounts of networking bandwidth to be able to facilitate AI workloads. The entire technology stack will need to up its game to avoid becoming a bottleneck for large language model (LLM) processing, but GPUs—which lie at the core of the data center infrastructure required to serve the needs of AI—are critical components. Demand just outstripped supply.

How Might the NVIDIA Shortage Affect Users and Markets?

Current market conditions for NVIDIA’s Blackwell GPUs are grim for many potential customers. The cozy relationship between hyperscalers, tech giants, and NVIDIA is leaving smaller companies a year or more behind in implementing and executing AI and data center upgrade plans. This could lead to competitive disadvantage for smaller companies and greater dominance for current big players.

A GPU black market might emerge, or NVIDIA rivals might gain market share with Blackwell alternatives. NVIDIA has long been the dominant chipmaker, but this could potentially be the point that makes or breaks the company as a market force. If it can’t scale up and maintain quality to meet demand, it could falter in the market or open itself to a takeover. It remains to be seen how this abrupt cessation in GPU to delivery to all but the privileged few will play out, but it’s clear that too much is at stake for the rest of the market to sit idly by for a year in the hope of obtaining some precious NVIDIA treasure.

When Will NVIDIA Blackwell Be Back in Stock?

According to the latest projections, NVIDIA AI chips won’t be back in stock until the end of 2025. However, if Meta or one of the hyperscalers suddenly decides it needs another 100,000 GPUs, they could buy out stock before a smaller company even has a chance. Whether NVIDIA would serve smaller companies first or push them down in the queue if more large orders come in from tech giants and hyperscalers remains to be seen.

“As we progress to meet the increasing computational demands of large-scale artificial intelligence, NVIDIA’s latest contributions in rack design and modular architecture will help speed up the development and implementation of AI infrastructure across the industry,” said Yee Jiun Song, vice president of engineering at Meta.

Another factor to consider is certainty of delivery. After a recent shopping delay for the Blackwell chip due to a packaging issue, the company had to redesign how its GPU was integrated with other components within the chipset to avoid warping and system failures. Only time will tell whether the current design will stand up to the rigors of mass production while maintaining quality.

Are There Any NVIDIA Blackwell Alternatives?

A number of competitors are standing in the wings, ready to capitalize on customer frustration about long lead times, including Intel and AMD. Earlier this year, Intel introduced a new AI chip that it claims rivals the performance of the NVIDIA H100 processor for AI workloads. Intel claims its Gaudi 3 chip can exceed its NVIDIA rival in the training and deployment of generative AI models with 40 percent more power efficiency, 50 percent more inference speed, and more than one-and-a-half times the training speed for large language models (LLMs)

Instead of packing more punch into every square millimeter of silicon to produce the biggest LLMs around, some manufacturers are taking a different approach. AMD, for example, has released a small language model called the AMD-135M trained on AMD Instinct MI250 accelerators based on the AMD Ryzen AI processor. 

Similarly, ThirdAI trained its Bolt LLM using only CPUs—instead of using 128 GPUs as GPT-2 does, for example, the company used 10 servers each with two ThirdAI Sapphire Rapids CPUs to pretrain Bolt on 2.5 billion parameters within 20 days. That makes Bolt 160 times more efficient than traditional LLMs, according to Omdia analyst Michael Azoff. “Smaller models mean lower cost and lower power and cooling demands on the data center,” he said.

NVIDIA Chip Shortage: What Does it Mean?

NVIDIA is king of the castle at the moment, dominating the market for compute-intensive processing—but with much of the IT and business world being forced to wait a year or more for its in-demand product, rivals are lining up to try to fill the delivery vacuum. Some are already not far behind NVIDIA’s AI chips, and they’re closing ground fast. While demand remains high, NVIDIA may prove to be a victim of its own success. It remains to be seen whether the company will further assert its dominance or if others will take advantage of the delivery delays to serve AI demands with alternative solutions.

See how two of the most popular generative AI art tools compare in our head-to-head comparison of Runway vs. Midjourney.