AI chip startups don’t need to be great to get acquired.
Meta might buy Furiosa, just because they’re good enough.
Recently, Forbes broke the news that Meta is considering buying the South Korean AI chip startup Furiosa AI. For those of us outside of Korea, Furiosa has been overshadowed by the American players in the AI chip space, like SambaNova Systems, Cerebras, and Groq. But they’ve been claiming that their Tensor Contraction Processor, RNGD, achieves impressive performance on inference. Clearly, that impressive performance has attracted Meta, who thinks that Furiosa can outperform NVidia, right?
Well, the story is a little more complicated than that. Furiosa’s RNGD uses a flexible architecture that’s able to maximize data re-use when operating on tensors of many different shapes, which would normally significantly decrease utilization of a more traditional systolic array. But when we actually investigate their performance numbers, they under-perform Nvidia on pure computational power. Let’s take a look.
Furiosa RNGD’s Performance Numbers
Let’s start with the top-line numbers. The H100 delivers 989 BF16 TFLOPs and 1979 INT8 TOPs, while Furiosa only offers 256 TFLOPs and 512 TOPs, respectively. Plus, the B200 is coming out soon and will significantly eclipse the H100 on inference FLOPs with its double-die package and support for additional low-precision data formats.
Furiosa’s saving grace is power efficiency, consuming 150W to Nvidia’s 700W. Some of that efficiency gain is likely due to Furiosa’s lower memory capacity (48GB vs 80GB) and lower memory bandwidth (1.5TB/s vs 3.35TB/s). Memory capacity and bandwidth is a significant factor in AI accelerator overall power consumption and its performance, so Furiosa likely isn’t getting a free lunch for their efficiency gains.
It’s hard to determine the effect of this reduced memory bandwidth from Furiosa’s public benchmarks, which focus on smaller models (Llama 3.1 70B and Llama 3.1 8B) with relatively small context windows. The largest number of input tokens they report performance numbers for is 2048; in practical applications, context windows may stretch tens of thousands or hundreds of thousands tokens long. Those long context windows make memory capacity and bandwidth a key performance bottleneck, and Furiosa may not be able to keep up.
However, it’s not all bad on the memory front. Compared to other AI companies like Groq and d-Matrix, Furiosa’s chips are equipped with HBM, which means that they can run fairly large models on reasonable amounts of hardware. Furiosa runs Llama 3.1 70B efficiently on 8 accelerator cards. The same workload takes d-Matrix an entire 64-chip rack to run efficiently, and takes Groq a whopping 576 chips, spread across multiple racks, to run at all.
Overall, Furiosa’s chips’ performance is a bit of a mixed bag. They offer good-but-not-great performance, with potential efficiency gains, but the system-level performance impact of their reduced memory bandwidth on larger workloads is unclear. So why is Meta considering buying them? Well, for hyperscalers like Meta, AI chip startups have more strategic value than just the performance of their chips. Meta owning Furiosa could help them reduce their reliance on Nvidia chips.
The Strategic Value of Furiosa
First off, H100s are extremely expensive, running customers around $25,000 for each GPU. The B200 is projected to be even pricier, at $30-40k each. And most of this money is gross profit; the cost to manufacture an H100 is only a few thousand dollars. At the same time, these state-of-the-art GPUs are very hard for companies to get their hands on, with wait times sometimes stretching to multiple months.
Even worse, Nvidia also has the ability to put their thumb on the scales of next-generation AI development by picking who gets priority access to their newest GPUs. Overall, it makes sense why Meta might want to break free from Nvidia’s tyranny by bringing AI chip capabilities in-house. But why are they considering buying Furiosa, rather than just spinning up their own chip team like Google did for their TPUs?
As a matter of fact, Meta does already have an internal silicon team, which has developed a series of AI chips called the Meta Training and Inference Accelerator, or MTIA. Importantly, though, MTIA is targeted towards recommender workloads:
This chip’s architecture is fundamentally focused on providing the right balance of compute, memory bandwidth, and memory capacity for serving ranking and recommendation models. (source)
These are valuable workloads for Meta, who are constantly leveraging low-latency ML to recommend ads, Instagram reels, and more. But they’re not the massive LLM workloads that Nvidia chips are best at, which require large amounts of off-chip memory and high memory bandwidth. So while MTIA is a useful asset for Meta, it doesn’t fundamentally reduce their dependence on Nvidia for generative AI workloads.
Specifically, MTIA is focused on smaller workloads that can mostly fit in on-chip SRAM, with off-chip memory provided by relatively low-speed LPDDR5. On the other hand, Furiosa’s chips leverage HBM3, which significantly increases memory bandwidth. HBM3 is notoriously hard for chip manufacturers to access and integrate into their chips, which is part of the reason why some AI chip startups, like Groq, Cerebras, and d-Matrix, eschewed HBM entirely. This makes Furiosa unique as an acquisition target: they’re one of the few AI chip startups that have managed to source HBM3 and successfully integrate it into their products.
Ultimately, even if Furiosa’s chips under-perform Nvidia’s chips, they offer Meta some unique advantages as an acquisition target. Furiosa would be able to reduce Meta’s dependence on Nvidia for high-performance, LLM-focused chips equipped with HBM. And if Meta does acquire Furiosa, I don’t think this will be the last acquisition of an AI chip startup by a hyperscaler. As hyperscalers try to liberate themselves from total dependence on Nvidia for AI chips, they may end up buying chip startups, not because those chips are particularly amazing, but because of the strategic value of having in-house LLM-focused silicon.