Breaking affine ciphers – a matrix approach

An affine cipher is a one the remnants of classical cryptography, easily broken with todays computational power. The cipher defines a symbol mapping from f :\{A,B,\ldots,\} \mapsto \mathbb{Z}_n. Each cipher symbol is then computed as a \cdot x + b \rightarrow y, where a \in \mathbb{Z}^*_n and b \in \mathbb{Z}_n. Decryption is then done by computing x= (y - b) \cdot a^{-1}.

In this blog post, I will show how to break this cipher in time faster than trying all keys. Let us first sketch the general idea. Consider an expected distribution \hat{P} of the symbols and a given distribution P, the integral \int (\hat{P}(x) - P(x))^2 dx defines a statistical distance between the distributions (this would correspond to the Euclidian distance), which we would like to minimize. Now, clearly

(\hat{P}(x) - P(x))^2 = \hat{P}(x)^2 - \hat{P}(x)P(x) + P(x)^2.

Trivially, \hat{P}(x)^2 and P(x)^2 remains constant over any keypair (a,b), so instead of minimizing the above, we can maximize \hat{P}(x)P(x). Therefore, the minimization problem can be turned into a maximization problem \max_{a,b} \int \hat{P}(x)P_{a,b}(x) dx. Cool.

In terms of our cipher, which is discrete, the minimization problem is a sum \max_{a,b} \sum \hat{P}(x)P_{a,b}(x). The observant reader may notice that this looks like a term in a matrix multiplication. There is just one caveat; the indices corresponding appear only in one term. There is an easy way to get around this. Instead of applying transformations on only P, we may split them among the two. So by instead computing

\max_{a,b} \sum \hat{P}_a(x) P_{b}(x),

we have achieved what we desired. This means that we shuffle \hat{P} with a and {P} with b. Let us interpret this as Python. The expected distribution of an alphabet ABCDEFGHIJKLMNOPQRSTUVWXYZ ,. may be as follows (depending on the observation):

P_hat = {' ': 0.05985783763561542, ',': 0.0037411148522259637, '.': 0.0028058361391694723, 'A': 0.0764122708567153, 'C': 0.02600074822297044, 'B': 0.012065095398428732, 'E': 0.11878039655817432, 'D': 0.03974934530490086, 'G': 0.018892630003741116, 'F': 0.020856715301159744, 'I': 0.0651889263000374, 'H': 0.05695847362514029, 'K': 0.00720164609053498, 'J': 0.0014029180695847362, 'M': 0.02254021698466143, 'L': 0.03769173213617658, 'O': 0.07023943135054246, 'N': 0.06313131313131314, 'Q': 0.0009352787130564909, 'P': 0.01805087916199027, 'S': 0.05920314253647587, 'R': 0.0560231949120838, 'U': 0.025813692480359144, 'T': 0.08473625140291807, 'W': 0.022072577628133184, 'V': 0.00916573138795361, 'Y': 0.01842499064721287, 'X': 0.0014029180695847362, 'Z': 0.0006546950991395436}

The transformations are done by computing the matrices

# compute first matrix for transformed P_hat
for i in range(1, N):
    for element in priori_dist:
        X[i, (look_up.index(element) * i) % N] = priori_dist[element]

# compute second matrix for transformed P
for j in range(N):
    for element in dist:
        Y[(look_up.index(element) - j) % N, j] = dist[element]

Here, the ith row in X corresponds to \hat{P} transformed by a = i. Moreover, the jth row in Y corresponds {P} transformed by b = j.

For some distribution, they may look like


As we can see, X is only shifted (by the addition), while in Y the indices are reordered by multiplication with row index i. Taking advantage of the matrix multiplication property, we may now compute Z=XY.


Any entry in Z is Z_{a,b} = \sum_x X_{a,x} Y_{x,b} so finding a maximum element in Z is equivalent to saying

\max_{a,b} \sum_x X_{a,x} Y_{x,b}.

Looks familiar? It should. This is our maximization problem, which we stated earlier.

Therefore, we may solve the problem using

Z =, Y)
a, b = numpy.unravel_index(Z.argmax(), Z.shape)

This breaks the affine cipher.

Some notes on complexity

So, what is the complexity of the matrix approach? Computing the matrices takes O(N^2) modular operations. The matrix multiplication takes naively O(N^3) operations, but for large N this can be achieved faster. For instance Strassen takes O(N^{2.807}) but faster algorithms exist. Also, taking advantage of symmetry and structure could probably decrease the complexity further. This is the total complexity of this approach.

Compare this with brute-force guessing of the key (taking O(N^2) guesses) and for each guess, compute the distance taking O(N) operations, which in total yields O(N^3). It should be noted that complexity of this approach may be reduced by picking a,b in an order which minimizes the number of tries.

Example implementation for the github

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s