A couple weeks ago, I got a ChipWhisperer Husky. It’s a compact, rugged little hacking tool designed to excel at two specific things: side-channel power analysis, and fault injection. Over the next few months, I’m going to go step-by-step through the process of hacking embedded software and hardware. I’ll post updates every couple weeks, and hopefully end up building some open-source, side-channel secure systems to help get new engineers interested in hardware security as a field!
But before we start that process, we need to start even simpler: what is side-channel analysis? And why is it important?
What is side-channel security?
When people think about hacking into a computer system, they usually think of software security. By finding a bug in a program, or a backdoor into a system, or even by cracking a password, you can get access to data you otherwise shouldn’t be able to access.
But what if the software you’re trying to attack doesn’t have these vulnerabilities? Sure, all software has bugs, but in the modern era of software development, authentication and authorization follow fairly standardized implementations in fairly standardized packages that are generally secure. In that case, attackers can focus on attacking the hardware that the software is running on, which opens up a large number of new vulnerabilities.
For example, let’s say you have an algorithm that performs a computation using a secret key. If you use a lot of branching logic, like if statements and while loops, the runtime of your program will be affected by the branch prediction algorithms in the CPU it’s running on. Often, this will result in data-dependent execution time, where the runtime of your program depends on the value of the secret key. That means that a clever attacker can measure how long your program takes to run, and use that data to extract the secret key.
This sort of attack is called a side-channel attack. Your program is secure, but it’s leaking secret information through other means, called “side channels”, that attackers can pick up on. There are a number of ways for attackers to leverage side channels to compromise secure systems, but the most common, and one of the most powerful, is differential power analysis, or DPA.
What is differential power analysis?
Most “simple” side-channel analyses, like timing analysis, can be defeated by using algorithms that run in a constant amount of time, or in a randomized amount of time. But in 1999, the team at Cryptographic Research, Inc lead by Paul Kocher created a new, more powerful kind of side-channel analysis: differential power analysis.
When data is moved around inside a computer chip, that means that physical voltages on wires need to change. Every time you need to switch a wire from 0V to 1V, that consumes some small amount of power. If you carefully measure how much power gets consumed when a chip performs a computation, that measurement is going to depend on the data being processed.
If you perform this measurement many times, for different pieces of data, you can reconstruct a lot of information about the underlying computation. For example, you can use differential power analysis to recover the secret key for a block cipher like AES.
These attacks are extremely hard to defend against. Luckily, since 1999, hardware security researchers have been hard at work developing techniques that can secure chips against these attacks.
How do you defend against side-channel analysis?
The most powerful technique to protect hardware against differential power analysis is gate-level masking. The idea behind gate-level masking is to modify the physical circuit such that no matter what data is being processed, the power consumption of the circuit is random. The simplest masking scheme is simply to XOR your data bits with random bits called mask bits — this allows you to send data along wires without side-channel leakage.
The concept was invented by Elena Trichina in 2003, who also introduced an AND gate that could securely compute on masked values. That way, the AND gate would consume a random amount of power regardless of its inputs, and logical functions could be computed securely.
However, Trichina’s scheme had some flaws. In 2005, an attack was found that leveraged CMOS glitching behavior to extract secret data from masked implementations. Essentially, if different inputs to the circuit show up at different times, the circuit may not remain secure. Since then, there have been multiple techniques for mitigating this vulnerability. We’re not going to go through the entire evolution here; instead, we’ll jump right to modern gate-level masking techniques. These techniques roughly fall into three camps, each with advantages and disadvantages.
Threshold Implementation
Threshold implementations (TI) have been around since 2006; they’re theoretically well-understood, and offer security even in the case of glitches. They’re very easy to implement as well, and don’t require special circuit layout techniques to work. However, threshold implementations are usually much larger than their counterparts. A TI implementation might be 5-10x larger than an implementation masked using a different scheme. As such, they’re very common in academia, but less so in industry.
Dual-Rail Precharge Masked Logic
There are a number of dual-rail, precharged, masked logic schemes, all of which are much smaller than TI while offering similar levels of security. It combines different elements to achieve high security at a low cost. Firstly, dual-rail logic represents every logical value with two physical wires, only one of which is 1 at any given time. This helps equalize the power consumption of the logic. For example, we can represent a 1 as 01 and a 0 as 10.
At the same time, precharging divides the logic operation into two phases: a precharge phase, where all values are set to 1, and an evaluation phase, where the logic is actually computed. By ensuring that all of the logic gates used are monotonic, this prevents glitching behavior. Finally a random mask is applied to all of the logic to randomize the computation further.
The current state-of-the-art dual-rail, precharged, masked logic scheme is LMDPL. To my knowledge, it’s used in most of Rambus’s side-channel secure products. Companies other than Rambus are limited in their ability to leverage this technique, though, as it’s patented by Rambus and they will absolutely sue you if you infringe on it.
Delay-Based Masking
Finally, there are a suite of techniques that introduce additional registers into the data-path to force signals to arrive in a certain order. This is probably the most simple solution to glitching behavior in secure circuits; they don’t require dual-rail operation or precharging. As such, these techniques are often the most efficient in terms of area. However, for complex circuits, adding additional registers to enforce signal arrival times may introduce many extra cycles of delay.
Domain-Oriented Masking has been shown to achieve extremely compact AES implementations, while SecAND2 masking has demonstrated extremely compact DES implementations. These techniques are also well-suited for bit-sliced ALUs.
The next step: crack an AES core.
Now that we know what side-channel security is, I want to actually try to leverage the ChipWhisperer to hack a simple chip. We’re going to be starting with the AES block cipher, implemented on either a microcontroller or an FPGA. I’ll walk through the process of cracking the core on the blog, while also demonstrating the gate-level masking techniques above to build a more secure AES core. Hopefully, with the gate-level masking techniques, I’ll be able to build a core that I can’t crack!
Stay tuned!