# Breaking affine ciphers – a matrix approach

An affine cipher is a one the remnants of classical cryptography, easily broken with todays computational power. The cipher defines a symbol mapping from $f :\{A,B,\ldots,\} \mapsto \mathbb{Z}_n$. Each cipher symbol is then computed as $a \cdot x + b \rightarrow y$, where $a \in \mathbb{Z}^*_n$ and $b \in \mathbb{Z}_n$. Decryption is then done by computing $x= (y - b) \cdot a^{-1}$.

In this blog post, I will show how to break this cipher in time faster than trying all keys. Let us first sketch the general idea. Consider an expected distribution $\hat{P}$ of the symbols and a given distribution $P$, the integral $\int (\hat{P}(x) - P(x))^2 dx$ defines a statistical distance between the distributions (this would correspond to the Euclidian distance), which we would like to minimize. Now, clearly

$(\hat{P}(x) - P(x))^2 = \hat{P}(x)^2 - \hat{P}(x)P(x) + P(x)^2$.

Trivially, $\hat{P}(x)^2$ and $P(x)^2$ remains constant over any keypair $(a,b)$, so instead of minimizing the above, we can maximize $\hat{P}(x)P(x)$. Therefore, the minimization problem can be turned into a maximization problem $\max_{a,b} \int \hat{P}(x)P_{a,b}(x) dx$. Cool.

In terms of our cipher, which is discrete, the minimization problem is a sum $\max_{a,b} \sum \hat{P}(x)P_{a,b}(x)$. The observant reader may notice that this looks like a term in a matrix multiplication. There is just one caveat; the indices corresponding appear only in one term. There is an easy way to get around this. Instead of applying transformations on only $P$, we may split them among the two. So by instead computing

$\max_{a,b} \sum \hat{P}_a(x) P_{b}(x)$,

we have achieved what we desired. This means that we shuffle $\hat{P}$ with $a$ and ${P}$ with $b$. Let us interpret this as Python. The expected distribution of an alphabet ABCDEFGHIJKLMNOPQRSTUVWXYZ ,. may be as follows (depending on the observation):

P_hat = {' ': 0.05985783763561542, ',': 0.0037411148522259637, '.': 0.0028058361391694723, 'A': 0.0764122708567153, 'C': 0.02600074822297044, 'B': 0.012065095398428732, 'E': 0.11878039655817432, 'D': 0.03974934530490086, 'G': 0.018892630003741116, 'F': 0.020856715301159744, 'I': 0.0651889263000374, 'H': 0.05695847362514029, 'K': 0.00720164609053498, 'J': 0.0014029180695847362, 'M': 0.02254021698466143, 'L': 0.03769173213617658, 'O': 0.07023943135054246, 'N': 0.06313131313131314, 'Q': 0.0009352787130564909, 'P': 0.01805087916199027, 'S': 0.05920314253647587, 'R': 0.0560231949120838, 'U': 0.025813692480359144, 'T': 0.08473625140291807, 'W': 0.022072577628133184, 'V': 0.00916573138795361, 'Y': 0.01842499064721287, 'X': 0.0014029180695847362, 'Z': 0.0006546950991395436}


The transformations are done by computing the matrices

# compute first matrix for transformed P_hat
for i in range(1, N):
for element in priori_dist:
X[i, (look_up.index(element) * i) % N] = priori_dist[element]

# compute second matrix for transformed P
for j in range(N):
for element in dist:
Y[(look_up.index(element) - j) % N, j] = dist[element]


Here, the $i$th row in $X$ corresponds to $\hat{P}$ transformed by $a = i$. Moreover, the $j$th row in $Y$ corresponds ${P}$ transformed by $b = j$.

For some distribution, they may look like

As we can see, $X$ is only shifted (by the addition), while in $Y$ the indices are reordered by multiplication with row index $i$. Taking advantage of the matrix multiplication property, we may now compute $Z=XY$.

Any entry in $Z$ is $Z_{a,b} = \sum_x X_{a,x} Y_{x,b}$ so finding a maximum element in $Z$ is equivalent to saying

$\max_{a,b} \sum_x X_{a,x} Y_{x,b}$.

Looks familiar? It should. This is our maximization problem, which we stated earlier.

Therefore, we may solve the problem using

Z = numpy.dot(X, Y)
a, b = numpy.unravel_index(Z.argmax(), Z.shape)


This breaks the affine cipher.

Some notes on complexity

So, what is the complexity of the matrix approach? Computing the matrices takes $O(N^2)$ modular operations. The matrix multiplication takes naively $O(N^3)$ operations, but for large $N$ this can be achieved faster. For instance Strassen takes $O(N^{2.807})$ but faster algorithms exist. Also, taking advantage of symmetry and structure could probably decrease the complexity further. This is the total complexity of this approach.

Compare this with brute-force guessing of the key (taking $O(N^2)$ guesses) and for each guess, compute the distance taking $O(N)$ operations, which in total yields $O(N^3)$. It should be noted that complexity of this approach may be reduced by picking $a,b$ in an order which minimizes the number of tries.

Example implementation for the id0-rsa.pub: github