On April 10th, a researcher by the name of Yilei Chen published a preprint called Quantum Algorithms for Lattice Problems, which presents a quantum algorithm for solving the Learning with Errors (LWE) problem efficiently. This worried a lot of people -- our current post-quantum cryptographic algorithms rely on the LWE problem, so if there’s a polynomial-time quantum algorithm to solve it, that would break all of the post-quantum algorithms we’ve been spending years to develop and standardize, right? Thankfully, that’s not the case.

Firstly, this is a single-author, 64-page preprint full of dense math. There could very well be an error in one of the proofs. But even if Chen’s paper is correct, this algorithm only works for instances of the LWE problem with certain modulus-noise ratios. The paper itself mentions that this technique is unable to break CRYSTALS-Kyber, the post-quantum encryption algorithm being standardized by NIST.

But *why* doesn’t this break Kyber and our other post-quantum cryptographic algorithms? What is a “modulus-noise ratio”? To understand that, you need to understand the LWE problem itself.

## Learning with Errors

The LWE problem was introduced by Oded Regev in his seminal 2005 paper. This wasn’t the first time cryptographers built an algorithm based on lattices. An earlier lattice-based cryptographic scheme called NTRU was developed in 1998, but LWE has proved more popular over time for a few reasons. Firstly, there’s a proof that the security of LWE rests on well-understood lattice problems; such a proof doesn’t exist for NTRU. More importantly, LWE is really easy to build cryptosystems with. As a matter of fact, the majority of the lattice-based cryptosystems submitted to NIST’s post-quantum standardization competition were based on variants of LWE.

So, how does LWE work? We start by picking a prime number and a dimension for our problem, which we’ll call *p* and *n* respectively. We’ll pick an *n-*vector **s** with elements between 0 and *p*, and we’ll sample an *mxn* matrix **A** such that its elements are integers uniformly distributed between 0 and *p*1. Finally, we also sample an “error” vector **e**. In the “classic” LWE formulation, we use a discrete Gaussian distribution with standard deviation *αq*, but other distributions with the same standard deviation also work.

Using these values, we can calculate a new vector **b** like so:

Given **A**, **s**, and **e**, it’s really easy to calculate **b**. But it turns out that, given just **A** and **b**, it’s very hard to calculate the value of **s**. This “one-way” property of the LWE problem is what makes it really useful in cryptography. I’m going to skip over the specifics of how to turn this problem into a cryptosystem, but there’s a good explanation in Regev’s LWE review.

## Why doesn’t this attack break Kyber?

Kyber specifically uses a variant of LWE called “Ring Learning with Errors”, where we work with polynomials rather than vectors. At a high level, though, the parameters are the same: we have a prime *p,* a dimension *n*, and an error distribution with standard deviation *αq.* For Kyber, our prime is 3329, and our dimensions are either 512, 768, or 1024.2 Kyber generates its errors using a centered binomial distribution, defined by the parameter η_{2} = 2. This is the same as a binomial distribution with n = 4, so our standard deviation *αq* turns out to just be 1.

In Chen’s paper, the polynomial time algorithm only works if the value of q is larger than approximately (*αq*)^{4}*n*^{2}. In formal notation:

In Kyber, *αq* is 1. So for this algorithm to solve the Kyber Ring-LWE problem, *q* would need to be larger than ~260,000.3 Because the actual *q* used in Kyber is 3329, Chen’s algorithm wouldn’t be able to break Kyber.

If Kyber had significantly smaller errors, or a much larger prime relative to those errors, it would be vulnerable. This makes sense intuitively. The larger the **e** vector used when constructing an LWE problem, the harder that problem is to solve. As a matter of fact, if **e **was reduced to zero, the problem would be easy to solve even for a classical, non-quantum computer.

Because this was a short post, I’ve decided to cover one more piece of news, and turn this post into the first *zach’s tech blog double feature!*

# Groq is No Longer Selling Hardware

Groq has announced that they’re no longer selling their hardware to third parties. Instead, they’re acting as an AI cloud services and API provider. First off, I just wanna say that I called it!

[Groq has] the best hardware infrastructure for serving APIs, and the huge API players might try to step onto Groq’s turf and make hardware, so Groq is vertically integrating and providing an API themselves.

Like I’ve discussed before, Groq’s value proposition is pretty clear: they offer the lowest latency on LLMs, if you’re willing to install multiple racks of Groq systems to run those LLMs. Because of that, the upfront cost of their hardware is incredibly high, and you need to be serving a lot of volume to actually make the economics work out. Those economics make the most sense for API providers and their cloud partners. But those companies, including Amazon, Google, and Microsoft, are trying to build chips of their own, and probably won’t want Groq’s. So instead, Groq is going head to head with those big players. It’s a bold bet, but it might pay off.

What are the potential risks with this strategy? First off, there’s the risk that serving a generic model as fast as possible isn’t actually the best way to deliver AI features to users. If different organizations use different models for different tasks, it wouldn’t necessarily make sense to devote hundreds of chips and tens of racks to each model.

But now that Groq is squarely competing with OpenAI, Anthropic, Microsoft, and Google, I can see another weakness a bit more clearly. So far, Groq has only demonstrated open-source models on their chips. They may have scaling issues getting good performance on larger models, as their optical interconnect only scales up to 264 chips, while running GPT-4 on their hardware would require over 10,000 chips. But even if they solve the interconnect bottleneck, there’s a bigger problem: they don’t have a GPT-4-class model to run.

If Groq isn’t selling hardware to organizations that have GPT-4 class models, the only way for Groq to run a GPT-4-class model is to make one themselves. And that’s very, very difficult and expensive. Not only do you need significant capital investment in model training, you also need a group of amazing AI researchers, who are both expensive and hard to hire. Also, Groq’s own hardware is optimized for inference, not training, so they may need to rely on expensive Nvidia chips to train such a model. And if Groq can’t execute on their own GPT-4 class model, they’re going to be stuck in a different product class: serving mediocre models, just really fast.

It turns out that *m*, the number of rows in the A matrix, doesn’t affect security significantly, so we’re not going to discuss it.

Chen’s paper indicates that the Kyber module sizes are {3,4,5}. I’m guessing this is a typo; Kyber’s standard module sizes are {2,3,4}.

I’m leaving out logarithmic factors for simplicity. In practice, q would need to be even bigger.