An affine cipher is a one the remnants of classical cryptography, easily broken with todays computational power. The cipher defines a symbol mapping from . Each cipher symbol is then computed as , where and . Decryption is then done by computing .

In this blog post, I will show how to break this cipher in time faster than trying all keys. Let us first sketch the general idea. Consider an expected distribution of the symbols and a given distribution , the integral defines a statistical distance between the distributions (this would correspond to the Euclidian distance), which we would like to minimize. Now, clearly

.

Trivially, and remains constant over any keypair , so instead of minimizing the above, we can maximize . Therefore, the minimization problem can be turned into a maximization problem . Cool.

In terms of our cipher, which is discrete, the minimization problem is a sum . The observant reader may notice that this looks like a term in a matrix multiplication. There is just one caveat; the indices corresponding appear only in one term. There is an easy way to get around this. Instead of applying transformations on only , we may split them among the two. So by instead computing

,

we have achieved what we desired. This means that we shuffle with and with . Let us interpret this as Python. The expected distribution of an alphabet `ABCDEFGHIJKLMNOPQRSTUVWXYZ ,.`

may be as follows (depending on the observation):

P_hat = {' ': 0.05985783763561542, ',': 0.0037411148522259637, '.': 0.0028058361391694723, 'A': 0.0764122708567153, 'C': 0.02600074822297044, 'B': 0.012065095398428732, 'E': 0.11878039655817432, 'D': 0.03974934530490086, 'G': 0.018892630003741116, 'F': 0.020856715301159744, 'I': 0.0651889263000374, 'H': 0.05695847362514029, 'K': 0.00720164609053498, 'J': 0.0014029180695847362, 'M': 0.02254021698466143, 'L': 0.03769173213617658, 'O': 0.07023943135054246, 'N': 0.06313131313131314, 'Q': 0.0009352787130564909, 'P': 0.01805087916199027, 'S': 0.05920314253647587, 'R': 0.0560231949120838, 'U': 0.025813692480359144, 'T': 0.08473625140291807, 'W': 0.022072577628133184, 'V': 0.00916573138795361, 'Y': 0.01842499064721287, 'X': 0.0014029180695847362, 'Z': 0.0006546950991395436}

The transformations are done by computing the matrices

# compute first matrix for transformed P_hat for i in range(1, N): for element in priori_dist: X[i, (look_up.index(element) * i) % N] = priori_dist[element] # compute second matrix for transformed P for j in range(N): for element in dist: Y[(look_up.index(element) - j) % N, j] = dist[element]

Here, the th row in corresponds to transformed by . Moreover, the th row in corresponds transformed by .

For some distribution, they may look like

As we can see, is only shifted (by the addition), while in the indices are reordered by multiplication with row index . Taking advantage of the matrix multiplication property, we may now compute .

Any entry in is so finding a maximum element in is equivalent to saying

.

Looks familiar? It should. This is our maximization problem, which we stated earlier.

Therefore, we may solve the problem using

Z = numpy.dot(X, Y) a, b = numpy.unravel_index(Z.argmax(), Z.shape)

This breaks the affine cipher.

**Some notes on complexity**

So, what is the complexity of the matrix approach? Computing the matrices takes modular operations. The matrix multiplication takes naively operations, but for large this can be achieved faster. For instance Strassen takes but faster algorithms exist. Also, taking advantage of symmetry and structure could probably decrease the complexity further. This is the total complexity of this approach.

Compare this with brute-force guessing of the key (taking guesses) and for each guess, compute the distance taking operations, which in total yields . It should be noted that complexity of this approach may be reduced by picking in an order which minimizes the number of tries.

**Example implementation **for the id0-rsa.pub: github