# Solving problems with lattice reduction

Suppose that we are given a system of modular equations

$\begin{array}{rcl} a_0 \cdot x + b_0\cdot y & = & c_0 \pmod q \\ a_1 \cdot x + b_1\cdot y & = & c_1 \pmod q \\ & \ldots & \\ a_n\cdot x + b_n\cdot y & = & c_n \pmod q \\ \end{array}$

Trivially, this can be solved for unknown $x$ and $y$ using a simple Gaussian elimination step, i.e., writing the equations as

$\mathbf{A} \begin{pmatrix}x & y\end{pmatrix}^T = \mathbf{c} \iff \begin{pmatrix}x & y\end{pmatrix}^T = \mathbf{A}^{-1} \mathbf{c}$.

This is perfectly fine as long as the equations share a common modulus $q$, but what about when the equations share unknowns but are defined under different moduli? Let us take a real-world scenario from the realm of cryptography.

Example: DSA using linear congruential generator (LCG)

The DSA (Digital Signature Algorithm) has two functions $\textsf{Sign}$ and $\textsf{Verify}$. To sign a message (invoking $\textsf{Sign}$), the following steps are performed:

1. Let $H$ be a hash function and $m$ the message to be signed:
2. Generate a (random) ephemeral value $k$ where $0 < k < q$.
3. Compute $r=\left(g^{k}\bmod\,p\right)\bmod\,q$. If $r=0$, go to step 2.
4. Compute $s=k^{-1}\left(H\left(m\right)+r \cdot x\right)\bmod\,q$. If $s=0$, go to step 2.
5. The signature is $\left(r,s\right)$.

To verify as signature (invoking $\textsf{Verify}$), the below steps are performed:

1. If $0 < r < q$ or $0 < s < q$ is not satisfied, reject the signature.
2. Compute $w = s^{-1} \bmod\,q$.
3. Compute $u_1 = H\left(m\right) \cdot w\, \bmod\,q$.
4. Compute $u_2 = r \cdot w\, \bmod\,q$.
5. Compute $v = \left(g^{u_1}y^{u_2} \bmod\,p\right) \bmod\,q$.
6. If $v = r$, accept. Otherwise, reject.

In the case of using LCG as a psuedorandom-number generator for the values $k$, two consecutively generated values (we assume one signature was generated right after the other) will be correlated as $a \cdot k_1 + b = k_2 \pmod m$ for some public parameters $a,b,M$. Assuming that $M \neq q$, we obtain equations

$\begin{array}{rclc} s_1 \cdot k_1 - r_1\cdot x & = & H(m_1) &\pmod q \\ s_2 \cdot k_2 - r_2\cdot x & = & H(m_1) &\pmod q \\ -a\cdot k_1 + 1\cdot k_2 & = & c & \pmod M \\ \end{array}$

from the fourth step of $\textsf{Sign}$.

Chinese Remainder Theorem (CRT)

A well-known theorem in number theory is the Chinese Remainder Theorem (commonly referred to with the acronym CRT), which deals with simple equations over different and pairwise-prime moduli. For instance,

$\begin{array}{rcl} x & = & 2 \pmod 3 \\x & = & 3 \pmod 5 \\ x & = & 2 \pmod 7 \end{array}$

which has the solution $x = 23$. In solving actual multivariate equations, we will hit a brick wall, as needed operations such as row reduction and modular inversion does not work.

Lattice reduction

A lattice is a discrete linear space spanned by a set of basis vectors. For instance, $\mathbf b_1$ and $\mathbf b_2$ are two basis vectors that spans the space below. They are not unique in that sense, but evidently, they are the shortest basis vectors possible (such a basis can be found using the LLL algorithm). If the two column vectors $\mathbf b_1$ and $\mathbf b_2$ are basis vectors, then the corresponding lattice is denoted $\mathcal{L}(\mathbf B)$. Here, $\mathbf B = (\mathbf b_1 ~ \mathbf b_2)$.

The problem of finding the shortest vector is called $\textsf{SVP}$. Sloppily formulated: for a given lattice $\mathcal{L}(\mathbf B)$, find the shortest (in terms of Euclidan norm) non-zero vector. The answer to that question would be $\mathbf b_2$. Starting with different basis vectors, the problem will show be trickier.

A related problem is to find the closest vector, which is commonly called $\textsf{CVP}$. In this scenario, we are given a vector $\mathbf t$ in the vector space (but it might not be in $\mathcal{L}(\mathbf B)$) and we want to find the closest vector in $\mathcal{L}(\mathbf B)$. There is a simple reduction from $\textsf{CVP}$ to $\textsf{SVP}$ by setting $\mathbf t = \mathbf 0$. There is also a reduction in the other direction, which involves extending the basis of $\mathcal{L}(\mathbf B)$ to also include $\mathbf t$. This is called embedding.

Let us return to the example of DSA and different-moduli equations. The equations we got from before

$\begin{array}{rclc} s_1 \cdot k_1 - r_1\cdot x & = & H(m_1) &\pmod q \\ s_2 \cdot k_2 - r_2\cdot x & = & H(m_1) &\pmod q \\ -a\cdot k_1 + 1\cdot k_2 & = & c & \pmod M \\ \end{array}$

can be formulated as basis vectors. In matrix form, we have

$\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 \\ -r_2 & 0 & s_2 & 0 & q & 0 \\ 0 & -a & 1 & 0 & 0 & M\end{pmatrix} \cdot \mathbf y = \begin{pmatrix}H(m_1) \\ H(m_2) \\ c\end{pmatrix}$

or (by embedding technique):

$\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 & H(m_1) \\ -r_2 & 0 & s_2 & 0 & q & 0 & H(m_2)\\ 0 & -a & 1 & 0 & 0 & M & c \end{pmatrix} \cdot \begin{pmatrix}\mathbf y \\ -1 \end{pmatrix} = \mathbf 0$

The idea is now to force the values corresponding to the guessed values $x', k_1', k_2'$ to be close in the normed space. By adding additional constraints we may perform the following minimization:

$\min_{x,k_1,k_2}\left\|\begin{pmatrix} -r_1 & s_1 & 0 & q & 0 & 0 & H(m_1) \\ -r_2 & 0 & s_2 & 0 & q & 0 & H(m_2)\\ 0 & -a & 1 & 0 & 0 & M & c \\ 1/\gamma_x & 0 & 0 & 0 & 0 & 0 & x'/\gamma_x \\ 0 & 1/\gamma_{k_1} & 0 & 0 & 0 & 0 & k_1'/\gamma_{k_1} \\ 0 & 0 & 1/\gamma_{k_2} & 0 & 0 & 0 & k_2'/\gamma_{k_2}\end{pmatrix} \cdot \begin{pmatrix}\mathbf y \\ -1 \end{pmatrix} \right\|$

where $\gamma_x = \min(x',q-x')$$\gamma_{k_1} = \min(k_1',m-k_1')$ and $\gamma_{k_2} = \min(k_2',m-k_2')$. Finding a closest approximation (preferably, using LLL/Babai) yields a solution to the DSA equations using LCG.

Example: Knapsack

The knapsack problem (sometimes subset-sum problem) is stated as follows. Given a weight $t$ and $n$ items/weights $\{w_1,w_2,\dots,w_n\}$, find the sequence $x_1, x_2, \dots, x_n \in \{0,1\}^n$ such that

$\sum_{i=1}^n w_i \cdot x_i = t.$

It is not too hard to prove that this is NP-complete, but we omit the reduction here.

In a cryptographic setting, this can be used to encode data in the sequence $x_1, x_2, \dots, x_n$. This is called the Merkle-Hellman public-key cryptosystem. It is easy to see that encryption actually is the encoding procedure mentioned above. However, the decryption procedure is a bit more involved; in fact it requires a special class of instances. If the weights can be transformed into a super-increasing sequence, the problem becomes trivial to retrieve the sequence $x_1, x_2, \dots, x_n$.

Think of it this way. Assume that all weights sum to something smaller than the $w_n$ weight. Then, if the target sum $t$ (the ciphertext) is larger than $w_1 + w_2 + \cdots + w_{n-1}$ (if not, $t$ must be smaller assuming there is a unique solution), we know that $x_n=1$. We can now remove $x_n$ and $w_n$ from the equation by solving for the remaining weights and a recalculated $t' = t - w_n$. This procedure can be repeated until all weights have been found (or $t = 0$).

Merkle-Hellman provided a way to transform the super-increasing sequence into a hard one. We omit the details here, but the procedure can be found all over internet.

It turns out we can use lattice reduction for this problem. We create a basis matrix $\mathbf A$ in the following way

$\mathbf A = \begin{pmatrix} 1 & 0 & \cdots & 0 & w_1 \\ 0 & 1 & \cdots & 0 & w_2 \\ \vdots & \vdots & \ddots & 0 & \vdots \\ 0 & 0 & \cdots & 1 & w_n \\0 & 0 & 0 & 0 & -t\end{pmatrix}$

and perform lattice reduction on it. Since the LLL/BKZ algorithm achieves a set of short basis, it will try to find something that makes the entries of the left-most column small. What this actually means is that it will find a sum of the rows that achieves a solution to the knapsack problem.

Of course, the solution must only contain values in $\{0,1\}$. Depending on the instance, this may or may not be the case. So, how do we penalize the algorithm for choosing values outside the allow set?

A new approach*

Let us create a new basis matrix $\mathbf A'$ in the following manner:

$\mathbf A'_i = \begin{pmatrix} 1 & 0 & \cdots & 0 & w_1 & \alpha\\ 0 & 1 & \cdots & 0 & w_2 & \alpha\\ \vdots & \vdots & \ddots & 0 & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & w_n & \alpha\\0 & 0 & 0 & 0 & -t & -\alpha \cdot i\end{pmatrix}$

The algorithm performs the following steps:

1. Randomly (or deterministically) pick a guess $i$ on the number of non-zero values in the solution.
2. Update the matrix run LLL/BKZ on $\mathbf A'_i$.
3. If a satisfiable reduced-basis vector is found, return it. Otherwise, goto 1.

It does not guarantee that an incorrect solution is penalized, but it increases the probability of it (reduces the set of ‘false’ basis vectors). We omit a formal proof, but think of it this way: Assume that $\mathbf v$ is false solution and a reduced-basis vector of $\mathbf A$. In $\mathbf A'_i$ it also has to sum to the number of non-zero weights in the correct solution. Assume all false vectors appear randomly (they do not, but let us assume it!). Then for a correct guess of $i$, the probability of the vector $\mathbf v$ being is ${n \choose i} / 2^n$. If $i = \epsilon \cdot n$, this is $2^{n(H(\epsilon)-1)}$, which is a noticeable reduction for most values of $i$.

Here is an example of a CTF challenge which could be solved with the above technique.

* I do not know if this should attributed to someone.

Example: Ring-LWE

TBA