Side channel attacks on AI chips are very real.
BarraCUDA and other attacks show that GPU side channels are practical.
I’ve talked a lot about side-channel attacks on cryptographic systems. These sorts of attacks usually focus on compromising low-cost, commodity cryptographic hardware, like the chips you’d find in a credit card or a YubiKey. But more recently, industry leaders in the AI world, including OpenAI, have been calling for more focus on preventing side-channel attacks on GPUs and AI chips. Unfortunately, thus far, practical proposals for secure AI hardware are still in the research stage. This isn’t the worst thing, though, as practical, real-world side channel attacks on AI chips and GPUs haven’t been demonstrated yet… right?
Wrong.
In October 2024, a group of researchers from Radford University released BarraCUDA, an attack capable of extracting neural network weights and biases from an NVidia Jetson chip over electromagnetic side channels. And another group of researchers from UC Riverside demonstrated remote side-channel attacks on GPUs, enabling a process on a shared GPU to spy on other running processes. Today, we’ll take a deeper look at these attacks, and what they imply for AI chip design and security.
BarraCUDA: Stealing Weights over Side-Channels
BarraCUDA takes the traditional electromagnetic side-channel attack, normally used to attack cryptographic hardware, and extends it to attacks on AI models. By measuring the electromagnetic radiation given off by Nvidia Jetson Nano and Orin Nano chips while they’re performing inference, the researchers were able to recover the weights and biases of the neural network those chips were running.
At a high level, this attack relies on how AI chips calculate large multiply-accumulate (MACC) operations, which are essential for the matrix multiplications underpinning all modern deep neural networks. On a GPU, each MACC is divided into a sequence of partial sums. As the authors write, each partial sum “depend[s] on the previous sum and a small number of weights and inputs. Consequently, assuming the previous sum and the inputs are known, the adversary can guess the weights, use these to predict leakage, and correlate the prediction with the measured side-channel trace.”
If we examine Figure 4 from the paper, we can clearly see the t-value of the weight leakage increase early on during the computation of the convolution block. This is the step where the GPU is performing all of the partial sums inside of the MACC, from which the attackers extract the weights. The biases, on the other hand, leak near the end of the computation, because they are usually added towards the end of the partial sum process, after all the weights and inputs have been accumulated.
These attacks are surprisingly efficient, too. BarraCUDA could extract the weights from an Orion Nano with only one day of trace collection -- about 5 minutes per weight. That may be a lot for large language models, but for small edge models, BarraCUDA could be a real threat.
When the researchers told Nvidia about this attack, Nvidia’s response was simple: they recommended that users prevent physical access to Jetson and Orion devices. But not all attacks can be prevented by preventing physical access. Some side channels can also be performed remotely.
Remote GPU Side-Channels
A group of researchers from UC Riverside found one such remote side-channel attack. Sometimes, one GPU can be running multiple processes at once. This is common on personal computers, when rendering multiple windows at once, but it’s also common in a cloud environment, when multiple users are sharing virtualized resources.
For this attack, the researchers intentionally colocated a “spy” process on the same GPU as the “target” process. When this happens, both processes will share the same cache hierarchy. That means that, if the target process is causing a large number of cache misses, this can be observed by the spy process.
In the case of a neural network, the structure of the network can significantly impact its memory access patterns. Usually, a model will start to experience a higher number of cache misses when computation of one layer is complete. This enables the spy process to determine the size and structure of the different layers of the target network.
In theory, this attack could be prevented by flushing the cache when process switches occur. However, this would significantly impact the overall performance of the system. But without sacrificing performance, it seems unlikely that these sorts of side-channel attacks can be prevented.
So what do we do about it?
GPUs and AI chips clearly have security flaws. Not only can attackers with physical access steal model weights off the chip, but they could potentially do so remotely. And this isn’t an easy problem to solve. Even if GPUs sacrifice cache performance to prevent remote attacks, physical attacks are still a major concern.
Even worse, physical side channels can be turned into remote side channels through an attack called Hertzbleed. High-performance chips, like AI chips and GPUs, leverage a technology called dynamic frequency scaling to maximize performance and efficiency. Essentially, chips can dynamically speed up or slow down their clock frequency based on their current power consumption. This way, the chip will always be running as fast as it can be for a given power envelope. But that means that power-consumption-based side-channels, which normally require physical access to measure the chip, now can be measured based on program runtime.
There are proposals for truly secure AI hardware using masked logic or approximate computation, but these architectures still have their limitations. Approximate computation methods are far more efficient, but introduce some error into the network output. On the other hand, exact methods require much more expensive masking techniques. More importantly, though, both methods are only being investigated by a handful of researchers worldwide.
Given that AI chips are being deployed en-masse, and have clear, practical security concerns, building truly secure AI chips shouldn’t just be the focus of a small number of researchers. I think existing AI chip companies and GPU companies need to take these challenges seriously. And as AI gets deployed more and more in sensitive organizations with data privacy concerns, I think also there’s a major opportunity for new startups to deliver end-to-end secure AI hardware and software products.
Dumb question: isn't avoiding physical side-channel attacks as simple as putting the edge AI device inside a radiation enclosure? Like one of those wallets that protect your credit card?
Guess that wouldn't make a very good phone case, but would work on a Jetson?