An affine cipher is a one the remnants of classical cryptography, easily broken with todays computational power. The cipher defines a symbol mapping from . Each cipher symbol is then computed as
, where
and
. Decryption is then done by computing
.
In this blog post, I will show how to break this cipher in time faster than trying all keys. Let us first sketch the general idea. Consider an expected distribution of the symbols and a given distribution
, the integral
defines a statistical distance between the distributions (this would correspond to the Euclidian distance), which we would like to minimize. Now, clearly
.
Trivially, and
remains constant over any keypair
, so instead of minimizing the above, we can maximize
. Therefore, the minimization problem can be turned into a maximization problem
. Cool.
In terms of our cipher, which is discrete, the minimization problem is a sum . The observant reader may notice that this looks like a term in a matrix multiplication. There is just one caveat; the indices corresponding appear only in one term. There is an easy way to get around this. Instead of applying transformations on only
, we may split them among the two. So by instead computing
,
we have achieved what we desired. This means that we shuffle with
and
with
. Let us interpret this as Python. The expected distribution of an alphabet
ABCDEFGHIJKLMNOPQRSTUVWXYZ ,.
may be as follows (depending on the observation):
P_hat = {' ': 0.05985783763561542, ',': 0.0037411148522259637, '.': 0.0028058361391694723, 'A': 0.0764122708567153, 'C': 0.02600074822297044, 'B': 0.012065095398428732, 'E': 0.11878039655817432, 'D': 0.03974934530490086, 'G': 0.018892630003741116, 'F': 0.020856715301159744, 'I': 0.0651889263000374, 'H': 0.05695847362514029, 'K': 0.00720164609053498, 'J': 0.0014029180695847362, 'M': 0.02254021698466143, 'L': 0.03769173213617658, 'O': 0.07023943135054246, 'N': 0.06313131313131314, 'Q': 0.0009352787130564909, 'P': 0.01805087916199027, 'S': 0.05920314253647587, 'R': 0.0560231949120838, 'U': 0.025813692480359144, 'T': 0.08473625140291807, 'W': 0.022072577628133184, 'V': 0.00916573138795361, 'Y': 0.01842499064721287, 'X': 0.0014029180695847362, 'Z': 0.0006546950991395436}
The transformations are done by computing the matrices
# compute first matrix for transformed P_hat for i in range(1, N): for element in priori_dist: X[i, (look_up.index(element) * i) % N] = priori_dist[element] # compute second matrix for transformed P for j in range(N): for element in dist: Y[(look_up.index(element) - j) % N, j] = dist[element]
Here, the th row in
corresponds to
transformed by
. Moreover, the
th row in
corresponds
transformed by
.
For some distribution, they may look like
As we can see, is only shifted (by the addition), while in
the indices are reordered by multiplication with row index
. Taking advantage of the matrix multiplication property, we may now compute
.
Any entry in is
so finding a maximum element in
is equivalent to saying
.
Looks familiar? It should. This is our maximization problem, which we stated earlier.
Therefore, we may solve the problem using
Z = numpy.dot(X, Y) a, b = numpy.unravel_index(Z.argmax(), Z.shape)
This breaks the affine cipher.
Some notes on complexity
So, what is the complexity of the matrix approach? Computing the matrices takes modular operations. The matrix multiplication takes naively
operations, but for large
this can be achieved faster. For instance Strassen takes
but faster algorithms exist. Also, taking advantage of symmetry and structure could probably decrease the complexity further. This is the total complexity of this approach.
Compare this with brute-force guessing of the key (taking guesses) and for each guess, compute the distance taking
operations, which in total yields
. It should be noted that complexity of this approach may be reduced by picking
in an order which minimizes the number of tries.
Example implementation for the id0-rsa.pub: github