My most popular post of all time was about why Y Combinator is wrong about LLMs for chip design. Put simply, I don’t think LLMs are able to generate high-performance or high-efficiency chip designs, because the process of designing high-performance chips is incredibly unforgiving. The entire “vibe coding” trend is about generating a large quantity of mediocre code, while the challenge of chip design is writing a relatively small amount of extremely high-quality, extremely performance-sensitive code in a specialized language called Verilog. And so far, I’ve been proven right. By and large, LLMs kinda suck at writing Verilog.
Never fear, though! Reinforcement learning (aka RL) will come to the rescue, right? RL has proven invaluable when it comes to making normally error-prone AI agents solve unforgiving tasks, from playing board games to constructing complex mathematical proofs. Why can’t we leverage RL systems to train AI models to write the sort of high-quality, performance-sensitive Verilog that is required to deliver high-performance chips?
Well, it turns out that it’s really, really hard, but not for the reasons you might think. Put simply, existing chip design tools are just too slow to allow any RL system to learn anything useful in a reasonable amount of time.
How would RL help?
Reinforcement learning is a training methodology for AI models that lets them improve at a specific task through trial-and-error. This is particularly useful for tasks that are not well-represented in the training set, but can be quick for a model to try and fail at. A great example is mathematical proofs. Google’s AlphaProof was able to learn to solve IMO math problems by repeatedly trying solutions and seeing if they were logically consistent.
One of the key pieces of the RL training process is the “RL environment”. These environments are sandboxes that allow AI models to perform tasks and see how well they do. In the case of AlphaProof, the environment was an automated theorem prover that could check if the model’s generated proof was correct. For coding models, the environment could measure how many unit tests a model’s code passes.
In the world of chip design, the environment seems pretty obvious. Most industry standard EDA tools can measure a design’s correctness, speed, size, and power consumption. So we should be able to train a model to write Verilog code, floorplan a chip, design SPICE netlists, or lay out analog circuits, and then measure the performance of those circuits using these industry standard chip design and analysis tools. But in practice, this is extremely difficult to actually pull off.
Why is it so hard to use RL for chip design?
If RL allows AI models to effectively learn from a reward signal, and chip design tools have such clear reward signals, why hasn’t RL resulted in an AI chip design breakthrough? Well, the problem is that chip design tools run extremely slowly. The process of going from Verilog to a set of logic gates, called synthesis, can take hours for a large design. Actually placing those logic gates on a chip layout and wiring them all up, called place and route, can take even longer. And then analyzing the timing and power information from that final design takes a while, too.
The same problem exists in the analog design world. Analog designs feature fewer transistors, but they need to be laid out and simulated with far more precision than digital logic gates. Circuit simulators need to solve complicated differential equations and take manufacturing variability into account, or a circuit that works in simulation may not work when it’s actually baked into a physical chip. Often, signing off on an analog circuit design requires hours or even days worth of simulation.
How do we go faster?
If the biggest challenge stopping RL from being useful for chip design is how slow the tools are, why not just make the tools faster? Well, a handful of new tools may make that possible.
Silimate offers an AI-powered PPA prediction tool that can estimate a design’s power consumption and timing characteristics, without having to go through the conventional synthesis and place-and-route process. Their methodology is far faster than conventional methods, and could form the basis of an RL environment that can learn at a fast enough pace to actually get good at writing Verilog. However, having an AI-powered reward function is also risky; the AI model being trained could end up reward hacking by writing code that the PPA prediction tool thinks will perform well, but would actually perform poorly when passed through the conventional, slower synthesis and place-and-route tools.
There are also companies like Partcl who are tackling the problem of accelerating silicon tooling more directly. By accelerating place-and-route and static timing analysis using GPUs, they can speed up key aspects of a chip design RL environment by orders of magnitude. However, building a complete set of chip design tools with full compatibility with modern PDKs is really, really hard -- and making it fast enough to enable RL is even harder. On the analog design side, there’s Dash Crystal, who are also building GPU-accelerated simulation tools.
Ultimately, I have faith that eventually, we’ll get fast enough chip design tools to enable RL. Whether or not the engineers at these startups are even thinking about RL or not, faster simulation and analysis tools for chip design are valuable in and of themselves. So while it may take a lot of time and effort to develop the full suite of tools necessary for a high-speed, high-performance RL environment for chip design, all of the intermediate steps are still valuable to us human chip designers by making our current workflows that much faster.