Why is OpenAI partnering with Cerebras?
TSMC's manufacturing capacity rules everything around me.
A few weeks ago, OpenAI announced a massive new partnership with Cerebras, the wafer-scale AI chip company. This came as a big surprise to those of us who have been watching Cerebras for years -- the AI chip startup had, until recently, no major customers other than the UAE-based G42, who was also an investor in the company. That lack of customers primarily stemmed from the total cost of ownership (TCO) of Cerebras systems.
Like other AI accelerators that lack high-bandwidth memory (HBM), Cerebras needed to network together many chips to host a single large model. The relatively small Llama 3.1 70B LLM takes 4 racks of Cerebras wafer-scale systems, each of which costs between two and three million dollars. An Nvidia DGX H100 can run the same workload at a tenth of the cost. And the TCO difference only grows for larger models, which require more and more Cerebras chips to run without performance plummeting due to model weights being moved off-chip.
Cerebras would claim that this cost overhead is worth it for the ultra-low latency that they provide. They’re not wrong that Cerebras chips, by storing all of their data in SRAM, offer significantly lower inference latency than equivalent Nvidia solutions. The same is true for other SRAM-based inference chips, like d-Matrix and the recently-acquired Groq. But players like SambaNova and Furiosa also offer relatively high-speed inference without the same cost tradeoff, due to their inclusion of HBM. So if Cerebras’ chips are so expensive, and their low-latency inference isn’t that unique, why is OpenAI partnering with them?
In my opinion, what actually makes Cerebras unique is that, after the Groq acquisition, they are the only major AI chip startup that has already deployed chips at scale that isn’t limited by TSMC’s chip-on-wafer-on-substrate (CoWoS) manufacturing capacity.
CoWoS, HBM, and SRAM
As I discussed in my article on Nvidia’s Groq acquisition, Nvidia’s chip manufacturing volume is limited by TSMC’s CoWoS manufacturing capacity. CoWoS packaging is expensive, difficult, and has limited production volume. Nvidia isn’t the only company fighting for that limited volume; AMD uses CoWoS packaging for their AI chips as well. Plus, startups are trying to leverage CoWoS, with SambaNova also using CoWoS packaging, and other startups like Furiosa, Etched, and MatX probably will as well. CoWoS capacity shortages are the norm, and are predicted to persist for years.
CoWoS packaging is key to the HBM-based architectures that Nvidia, AMD, and startups like SambaNova and Furiosa are selling. With massive, low-latency banks of HBM, these chips can store LLM weights, biases, and KV cache values off-chip while still accessing them relatively quickly. This high memory density enables a datacenter to serve small models like Llama 3.1 70B with less than one rack of servers, offering a cost-efficient way to deliver inference.
On the other hand, SRAM-based architectures, like the ones developed by Cerebras, Groq, and d-Matrix, have to store all that data on-chip. On-chip SRAM memory is far, far less dense and far, far more expensive than off-chip SRAM. That’s why it takes over ten million dollars to build out a Cerebras system for even a small model. On the other hand, SRAM is much faster than HBM, which lets these chips run inference workloads faster than their HBM counterparts.
For years, analysis and the market have been looking at this tradeoff and deciding it wasn’t worth it. Nobody wanted to pay orders of magnitude more for a system that was slightly faster. That’s why Groq had to pivot from selling chips, which nobody would buy, to running their own inference clusters. It’s also why Cerebras’ only major customer was G42, the Emirati company that was also one of their major investors.
But the limited supply of CoWoS packaging changes the equation. If every bit of TSMC capacity is used for Nvidia and AMD chips that are already flying off the shelves, what do you do when you want even more compute? You might need to start buying chips with a worse cost-to-performance tradeoff.
The AI bubble doesn’t care about cost
OpenAI is in talks to raise as much as $100 billion dollars. Anthropic is raising $20 billion. And existing tech giants like Meta and Google are pouring countless billions into their own AI efforts. Put simply, the AI giants have more money than they know what to do with. And many of them see the AI race as a winner-take-all effort. Any small advantage now could be worth trillions of dollars in the future if it gives a company a substantial competitive edge in developing and deploying the next models sooner.
What Cerebras is offering is just that. It’s not actually a better chip than what Nvidia has -- in fact, it offers a worse cost-to-performance tradeoff. But the demand for AI chips is so high that “almost as good as Nvidia but still worse” has become a multi-billion dollar value proposition. Every single one of Nvidia’s chips is already being deployed in AI datacenters, and they can’t make more unless TSMC adds additional CoWoS manufacturing capacity. Cerebras is able to fill the excess demand for compute that exists past that limited TSMC manufacturing capacity.
As a brief aside — Cerebras is not the only player offering all-SRAM inference chips. But Groq was just acquired by Nvidia, and the only other player in this space, d-Matrix doesn’t have chips deployed at scale in the way Cerebras does. If OpenAI wants to source additional compute, even at a relatively high cost, it makes sense to pick the player with the lowest risk. In this case, that’s Cerebras.
Ultimately, as the demand for compute grows, the cost-efficiency and power-efficiency of different chips starts to matter less and less, and the availability of those chips starts to matter more. This explains Nvidia’s acquisition of Groq, and it explains OpenAI’s Cerebras deal. Sure, these chips might be slightly worse than Nvidia’s H100 and B200s at serving large models at scale, but Nvidia’s chips are a limited resource. When the demand for AI chips is so high that Nvidia is running out of stock, “almost as good as Nvidia” becomes a viable strategy.


“But the demand for AI chips is so high that “almost as good as Nvidia but still worse” has become a multi-billion dollar value proposition”
So true Zach. Also, this remains my favorite Substack publication!