Softbank is building an AI datacenter empire
Ampere, Graphcore, and ARM are all under one roof.
Recently, Softbank announced that it would be acquiring Ampere, a designer of ARM-based server processors, for $6.5 billion. And this isn’t SoftBank’s first foray into the world of silicon. In 2016, they acquired ARM itself for £24.3 billion, and in 2024, they acquired the AI chipmaker Graphcore for an undisclosed sum. Each of these companies could be a compelling investment in and of itself, but SoftBank is clearly acquiring all of these companies to leverage their synergies. I think that Masa Son and Softbank are trying to build an AI datacenter empire to rival Nvidia. Maybe it’s even related to Project Stargate, the $500B partnership between OpenAI, Softbank, Microsoft, and Oracle to build and deploy new AI infrastructure.
One of Nvidia’s key strengths is the raw power of its GPUs. Most startups struggle to compete on performance, and while some may be competitive purely on model inference, it’s even harder to unseat Nvidia when it comes to training models. At the same time, as many folks in tech are acutely aware, the CUDA software stack is also a huge moat for Nvidia; most state-of-the-art AI workloads are written using the CUDA framework, which only runs on Nvidia GPUs. But more recently, Nvidia has been focusing on another strategy as well: providing turnkey datacenter systems. And their DGX systems are the prime example.
Nvidia’s DGX Strategy
In the early days of AI, if you wanted to build a datacenter to run AI workloads, you would go buy a bunch of Nvida Tesla GPUs, install them in server racks, and network all of those servers together manually with Ethernet cables. This method worked well enough for relatively small models, but as models grew large enough to stop fitting comfortably on individual GPUs, bottlenecks started to show up. And as bottlenecks arose, Nvidia developed solutions.
First, all of the GPUs in a server had to talk to the same CPU over PCIe. Not only does PCIe have limited bandwidth, but forcing all of the data transfer to be coordinated by the CPU can slow down system performance. So in 2014, Nvidia introduced NVLink, a data link with higher bandwidth than PCIe that also enabled direct GPU-to-GPU communication over a mesh network.
As models grew large enough that they couldn’t even fit on individual servers, Ethernet became a bottleneck too. This one was harder for Nvidia to solve internally, but they knew that the Infiniband optical networking standard would offer a way to scale AI datacenters to greater and greater size. So, in 2019, they acquired Mellanox, the primary manufacturer of Infiniband technology.
Finally, Nvidia recognized that the x86 CPUs in their servers were limiting datacenter performance on highly parallel applications. So, in 2023, they built a new line of ARM-based datacenter-class CPUs, Grace, in-house. With a large number of efficient cores in parallel, the Grace CPU was better suited for the AI workloads Nvidia has been more and more focused on.
Now, the ideal AI datacenter uses Nvidia GPUs, Nvidia CPUs, Nvidia short-distance networking (NVLink), and Nvidia long-distance networking (Infiniband). So Nvidia did what made the most sense, and built an entire datacenter that customers can just buy directly from them: the DGX SuperPOD. They call it a “turnkey AI supercomputer”, and they’re not wrong.
While customers like AWS and Microsoft will still probably be building their own datacenter deployments, the SuperPOD enables all sorts of large enterprise customers, who are often lagging in deploying and leveraging AI, to easily acquire and deploy AI on-premise at scale. I think this could be a massive win for Nvidia; as AI penetrates large legacy enterprises, the opportunity to help deploy large models on-premise in a turn-key way could become a major revenue driver.
Could Softbank make a turnkey datacenter?
Now, let’s look at Softbank. They have a compelling AI chip business that they acquired from Graphcore. While Graphcore’s chips don’t blow Nvidia’s out of the water, they represent a fairly compelling silicon offering with decent performance. Graphcore also offers a datacenter-scale mesh network solution called IPU-Fabric. With the Ampere acquisition, Softbank also has a high-quality, efficient, high-core-count CPU in its technology portfolio. So, could Softbank be able to deploy a competitor to Nvidia’s DGX SuperPODs?
In my mind, I think Softbank has a shot at it. It will be hard for them to truly compete with Nvidia, given the maturity of CUDA, Infiniband, and all of their other solutions, but by building off of Graphcore’s existing IPU-POD multi-rack solutions and Ampere’s AmpereOne server-class CPUs, they could build an alternative solution that may make sense for some customers. If their solution is more affordable or more power efficient than the DGX SuperPODs, or has the upper hand on some other metric, it could drive significant revenue as legacy enterprises try to deploy large AI models on-premises.
But Softbank may have a couple other tricks up their sleeve to make their turnkey datacenter products more than a second-tier product behind Nvidia’s SuperPOD: their ownership of ARM, and their participation in Project Stargate.
Stargate, ARM, and Softbank
Softbank is the primary financial backer of OpenAI’s Project Stargate, a $500B plan to build new AI infrastructure in the United States. If Softbank is spending billions of dollars helping OpenAI build new datacenters, I think they have a reasonable shot at encouraging some of those datacenters to be full of Softbank hardware. But also, if OpenAI and Softbank form a tight partnership, the next generation of chips that Graphcore and Ampere are building could easily be tailored to OpenAI’s exact needs. OpenAI is already working on building custom silicon; with support from the experienced silicon teams at Graphcore and Ampere, one could imagine Project Stargate datacenters filled with hardware designed by Softbank-owned companies specifically for OpenAI.
Softbank has another trick up its sleeve: its ownership of ARM. All of Nvidia’s CPU solutions currently use ARM processor IP. If Softbank wants to really pull out the big guns in competing with Nvidia, they could have ARM refuse to license its IP to Nvidia in the future. This would grind Nvidia’s CPU development to a halt, and give Softbank’s Ampere a shot to develop a clear winner in the world of AI datacenter CPUs. And ARM hasn’t been afraid to play dirty before; they’re currently locked deep in a legal battle with Qualcomm, and are starting to actively compete against their own customers by building chips in-house. Ultimately, Softbank’s acquisitions of Graphcore, and Ampere give them a shot at building the turn-key datacenters of the future -- but it’s their partnership with OpenAI and their ownership of ARM that may give them the upper hand over Nvidia.