YC recently released their latest Request for Startups, and one specific item has already been getting some buzz in chip design communities. It’s exciting that an organization as high-profile as YC is taking chip design more seriously… or at least it’s exciting until you actually start reading what their proposal is. Unfortunately, YC seems to have fundamentally misunderstood the key challenges in the world of chip design, and where both LLMs and AI tools more broadly can help.
YC makes the following argument:
Design of customized digital systems [...] has typically been costly because of the amount of custom design, development and testing necessary to bring such a system online. With the advent of large language models, these costs are coming down significantly, such that ever more specialized types of computation could be done.
We know there is a clear engineering trade-off: it is possible to optimize especially specialized algorithms or calculations such as cryptocurrency mining, data compression, or special-purpose encryption tasks such that the same computation would happen faster (5x to 100x), and using less energy (10x to 100x).
If Gary Tan and YC believe that LLMs will be able to design chips 100x better than humans currently can, they’re significantly underestimating the difficulty of chip design, and the expertise of chip designers. While LLMs are capable of writing functional Verilog sometimes, their performance is still subhuman. More importantly, LLMs aren’t capable of designing novel chip architectures, which is the primary driver of performance improvements for modern accelerator chips. LLMs primarily pump out mediocre Verilog code.
The more charitable version of YC’s proposal is that LLM-based tools will significantly reduce the cost of chip design, which will make hardware acceleration more viable for applications where it’s currently cost-prohibitive. Unfortunately, the semiconductor industry already tried something similar. It was called high-level synthesis, and it resoundingly failed.
High-level synthesis, all over again.
High-level synthesis, or HLS, was born in 1998, when Forte Design Systems was founded. They developed a tool called Cynthesizer, which could automatically translate SystemC to Verilog given certain timing constraints. More generally, HLS tools let engineers write code in a higher-level language like C, C++, or Scala, and automatically convert that code to Verilog, which can in turn be synthesized into logic gates for ASICs and FPGAs. The idea is to make silicon development cheaper and more accessible, especially for teams that don’t have access to deep Verilog expertise.
Despite Cynthesizer’s cool name, it never really caught on. Forte was still desperately trying to argue that HLS was the future in 2011. In 2014, they were acquired by Cadence, and Cynthesizer was rolled into Cadence’s new HLS tool, Stratus.
Xilinx, now AMD, has been a major proponent of HLS, specifically targeting FPGA acceleration. This makes some sense, as FPGA development is usually done on smaller teams where the additional firepower HLS promises could be super useful. Their Vitis tool is one of the best HLS tools in the industry -- unfortunately, that’s a bit of a low bar.
The current state of HLS is best summed up by Lan Huang in the introduction to his survey of the current state-of-the-art in HLS tools:
The performance of HLS tools still has limitations. For example, designers remain exposed to various aspects of hardware design, development cycles are still time consuming, and the quality of results (QoR) of HLS tools is far behind that of RTL flows.
Other researchers have reported similar disappointing results in their own surveys.
Ultimately, while HLS makes designers more productive, it reduces the performance of the designs they make. And if you’re designing high-value chips in a crowded market, like AI accelerators, performance is one of the major metrics you’re expected to compete on. So it makes sense to spend the additional upfront cost to hire talented RTL designers, build chips with better performance, and deliver more successful products.
What new kinds of accelerators could LLMs build?
So HLS tools failed to gain any traction for high-value, high-volume chips, where performance and efficiency requirements necessitated human engineers writing high-quality Verilog. LLMs, which also produce poor-quality Verilog, will likely face the same challenges But what about other kinds of chips? Could LLMs help there?
The few successful applications of HLS enabled engineers without silicon expertise to leverage hardware acceleration. For example, HLS has seen success in FPGA acceleration of genomics workloads and CFD workloads.
In both cases, HLS is valuable because the engineers developing the hardware accelerators don’t have the RTL experience to build optimized RTL-level designs from scratch. But this is only the case because the markets for hardware-accelerated genomics and CFD are relatively small. GPUs already serve these use cases somewhat well, and neither use-case would drive even a fraction of the sales volume of AI chips or cryptographic accelerators. If these markets were bigger, it would make economic sense to dedicate talented hardware engineers to build optimized silicon.
The idea that YC is proposing falls into the same trap. LLMs can reduce the cost to develop silicon. This might make dedicated ASICs for applications like genomics economically viable, when they previously weren’t. But they weren’t viable for a reason: there isn’t a huge market there. Ultimately trying to leverage LLMs to build hardware accelerators for underserved applications is a loser’s bet.1 If an application doesn’t warrant hardware acceleration yet, it’s probably because it’s a small market, and that makes it a poor target for a startup.
So what can LLMs do in chip design?
If LLM-based chip design primarily unlocks low-value markets, why are so many startups and established EDA companies trying to leverage LLMs? Well, it turns out that LLMs are also pretty valuable when it comes to chips for lucrative markets -- but they won’t be doing most of the design work. LLM copilots for Verilog are, at best, mediocre. But leveraging an LLM to write small snippets of simple code can still save engineers time, and ultimately save their employers money.
More importantly, though, the chip design world is experiencing a massive talent shortage in verification. Normally, you want two verification engineers for every designer, because silicon bugs are incredibly costly. But in the modern era, good verification engineers are hard to come by.
If you could leverage LLMs to make verification faster, easier, or more effective, that could be massively valuable to established semiconductor companies and semiconductor startups alike. But actually ensuring that LLMs can actually understand and reason about chip designs specifications is no small task, which is why I’m skeptical of startups like Bronco or Instachip who are naively throwing fine-tuned LLMs at chip verification.
First of all, chip specifications are often complex, incomplete, labyrinthine monstrosities -- they were written by people like me, after all. But more importantly, successful verification of a chip requires an LLM to understand the expected internal state of the chip. In essence, the language needs to have some formal model of the chip: think AlphaProof, but for semiconductor verification instead of math. This is one of the things we’re working on at Normal Computing, alongside our novel thermodynamic chip architectures.
Ultimately, LLMs will make chip design cheaper. But this will primarily benefit three kinds of companies: large semiconductor companies who can reduce their verification workforce, conventional chip startups who can operate leaner teams, and then the EDA software startups selling LLM-based tools. LLMs won’t build 100x better chips, or enable hardware startups to tackle markets lacking hardware acceleration, because the economics just don’t make sense.
If a market is growing, low-quality LLM-designed chips may help startups get a foothold affordably. But once the market is large enough to start justifying full chip design teams, human-designed chips will easily beat LLM-designed chips on performance, at least for the forseeable future.
Thank you for the post! What do you think of other compilers, like Clash? https://clash-lang.org/ Does it make sense to compile high-level code written in languages like Haskell for FPGAs, or there are too many performance ‘traps‘ that it doesn’t work well?
Zach - do you see breakthroughs in hardware design once we take into account of reinforcement learning beyond the LLMs come into play? Or multi-agentic AI systems?