Last week, news broke that Untether AI, an AI inference chip startup, is shutting down. Their engineering team is joining AMD, but their core product lines of AI inference chips and software aren’t coming with them to their acquirer. Obviously, I hope this is a good outcome for the engineers at Untether, but I can’t say I’m particularly surprised. While Untether had good technology and impressive performance for small neural networks, they ran into the same pitfall that many AI chip startups have in the past few years. If a company isn’t focused on large, generative models like diffusion models or LLMs, there’s not a huge market for their chip.
Untether’s First Chip
Untether was founded in 2018 -- notably, before the generative AI boom kicked off by ChatGPT and StableDiffusion. Their original goal was to “untether” AI from datacenters and enable AI inference at the network edge. Notably, this is a bit different from what some other edge AI startups were working on at the time; Untether didn’t want to put their AI chips in edge devices like laptops and ultra-low-power sensors. Instead they wanted to bring edge AI to distributed servers at the periphery of a network. For datacenter AI workloads like recommender models, avoiding routing requests to the network core could reduce latency and improve reliability.
Over time, though, that mission expanded. Their 2019 fundraising announcement focused on automotive and traditional cloud use-cases, alongside a mention of more conventional battery-powered edge devices. At the time, most AI chip startups were either focusing purely on high-power datacenter chips, or ultra-low-power chips for always-on sensors, so it made sense for Untether to focus on niches that other players were avoiding. And their first chip, runAI, followed through on their promises, delivering up to 2 PetaOps per accelerator card and as much as 8 TOPs/watt.
The rise of GenAI
Untether’s second chip, speedAI, was released in 2022, delivering a significant efficiency improvement -- 30 TOPs/watt compared to runAI’s 8 TOPs/ watt. But it was also released at an unfortunate time. SpeedAI was announced just months before the release of ChatGPT, and the refocusing of the entire AI industry around large language models and generative AI. Without HBM, mediocre chip-to-chip connectivity, and limited on- and off-chip memory, speedAI was doomed to be an also-ran in the LLM inference world.
So, Untether doubled down on vision inference, a market that their chips could still succeed in. They partnered with GM to work on autonomous vehicle systems. They released reference applications for their tsn200 accelerator card focused on “smart city” use cases like pedestrian and vehicle detection. And they kept doubling down on vision and sensing applications, collaborating with ARM to develop automotive solutions for driver assistance and self-driving cars, and taking aim at markets in robotics, surveillance, agriculture, and machine inspection.
But while Untether was focusing on vision applications for surveillance cameras, the world was moving on. Large language models became the darlings of the AI world, while computer vision became an afterthought, and for good reason. AI-powered surveillance cameras and cars were still a relatively small market, and I still don’t fully understand what a “smart city” even is. On the other hand, LLMs were clearly disrupting huge industries, and quickly. Because of this, AI chip startups were starting to focus less on vision and convolutional networks, and more on accelerating LLMs. And building LLM accelerators requires different considerations than vision accelerators. Running an LLM quickly simply requires building a large, fast matrix multiply, and keeping it supplied with as much data as possible. Untether’s chips were good at matrix multiplication, but didn’t have the memory capacity, memory hierarchy, or chip-to-chip connectivity to actually tackle LLMs.
When Untether finally announced MLPerf results in 2024, they focused on ResNet-50 inference performance. While their competitors were releasing impressive tokens-per-second results for LLama-70B, Untether was bragging about power efficiency on a model from 2015.
AI chip startups need to see the future
Untether had good technology. They had a fantastic team -- there’s a reason AMD hired all of them. But they missed one of the biggest developments in the AI world: the rise of large generative models. If they had started a couple years later, and hadn’t taped out their second-generation silicon before the launch of ChatGPT, things could have easily turned out differently.
This is one of the reasons why I think one of the most important things for an AI chip startup to do is keep an ear to the ground for future developments in AI, and to build chips that are flexible enough to run whatever might be coming down the pipeline. Untether AI called their shot that vision models at the network edge were the future, and so their chips weren’t flexible enough to run transformers. Maybe the companies focusing exclusively on transformers will meet a similar fate soon enough.
I am from Kinara.ai and it got acquired by NXP and we were able to do multiple chips and our chip is being sold to a lot of customers.