Computing sum of divisors for numbers up to N

After fiddling with Project Euler, I started to wonder how to compute the sum of divisors in the most efficient way (albeit, most the problems in PE does not require any super-sophisticated solution). The trivial approach is doing division and adding up the divisors sequentially. This may be good enough for small N, but as N grows this approach gives an infeasible time complexity.

The next thing one may think of is to generate a list of prime numbers (using a sieve) and for each number try all the numbers in the list. This is faster than the trivial approach, but still requires cN \cdot |\mathcal{P}_N| = cN \cdot \pi(N) \sim \mathcal{O}(N^2/\log N). Since the sieve requires time \mathcal{O}(N\cdot \log N), we do not need to do better than that. But can we match it? Turns out we can.

First, we need to establish a simple number theoretical fact: the \sigma(n) is the divisor counting function which has the properties \sigma(p^i) = 1 + p + \cdots p^i (and, thus, \sigma(p^i) = \sigma(p^{i-1}) + p^i) and  \sigma(p^i \cdot q^j) = \sigma(p^i) \cdot \sigma(q^j).

The solution is quite simple; consider the first element in the prime set \mathcal{P}. That is 2. We split the range into even numbers and for even  i compute

\sigma(n_i) = \sigma(2\cdot m) = \underbrace{\sigma(2)}_{\text{store this in }x_i} \cdot ~\sigma(m).

Then for 2^2, compute  \sigma(2^2) = \sigma(2) + 2^2, and so forth up to 2^i < N. The requires N/2 + N/4 + \cdots operations.

\begin{array}{cccccccccccc} 1 & 2 & 3& 4 &5& 6 &7& 8 &9& 10 &11& 12 \\ & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) \\ & & & +& & &&+& & && + \\ & & & 2^2& & && 2^2& & && 2^2 \\ & & & & & &&+& & && \\ & & & & & && 2^3& & && \\ \hline & \sigma(2) & & \sigma(4) & & \sigma(2) & & \sigma(8) & & \sigma(2) & & \sigma(4) \end{array}

Running it for all powers and primes, we use in total N \cdot (\sum_{p\in\mathcal{P}} \frac{1}{p} + \sum_{p\in\mathcal{P}} \frac{1}{p^2} + \cdots) operations. Now, we have \sum_{i > 1} \frac{1}{p^i} = \frac{1}{p(p-1)}. So the complexity is \mathcal{O}(N \cdot \sum_{p\in\mathcal{P}} \frac{1}{p}). This is a harmonic series and treating it as an integral, we have \int_{p\in\mathcal{P}} \frac{1}{p} < \log {p}. Of course, we do only generate primes up to N, so the final complexity is \mathcal{O}(N \cdot \log N). So we are done 🙂

As always, let us look at a Python representation:

def sum_of_divisors(N):
    N += 1 # extend limit to inclusive
    primes = primetools.sieve(N)
    numbers = [1]*N
    for q in primes:
        tmp = {}
        mult = q
        while mult <= N:
            for i in range(mult, N, mult):
                if i in tmp: tmp[i] += mult
                else: tmp[i] = 1 + mult
            mult *= q
        for j in tmp: numbers[j] *= tmp[j]
    return numbers[2:]

Below is a plot of the timing of the code. Since the code stores data in memory, some undesired effects comes into play for largers values (due to the need of shuffling data back and forth) and is therefore not present in the graph.


Bounded by the sieving complexity, the algoritm cannot be any faster using the sieve. Can we do better? Note that the algorithm returns a list of sums. If we are really interested in the sum of sums, we may instead compute \sum_{i}^{N}\sigma(i) = \sum_{i}^{N} i \cdot \lfloor \frac{N}{i} \rfloor, which is \mathcal{O}(N) and uses basically no memory. This can be written as

def sum_of_sum_of_divisors(N):
    div_sum = 0
    for i in range(1, N):
        div_sum += i * (N / i)
    return div_sum

How does it work? For each i, it computes the cardinality of the set of numbers divisible by i,i.e., \sum_{i|j, 1 \leq j \leq N} i, which is exactly  \lfloor \frac{N}{i} \rfloor. This contributes with i for every element, so the total contribution is  i\lfloor \frac{N}{i} \rfloor. Doing it for all possible divisors, we end up with the sum of sum of divisors.

Turns out, we can actually do it in \mathcal{O}(\sqrt N). Here is how:

def square_root_time_sum_of_sum_of_divisors(N):
    global dcached
    if N in dcached: return dcached[N]
    div_sum, i = 0, 1
    q = int(math.sqrt(N))
    while i <= q:
        div_sum += (i * (N / i))
        i += 1
    i = 1
    while i <= N / (q + 1):
        m = N / i
        k = N / (i + 1)
        div_sum += (i * (m * (m + 1) - k * (k + 1)) / 2)
        i += 1
    dcached[N] = div_sum
    return div_sum

Consider the below example. In the left table, we have the sum \sum_{i=1}^8 i \cdot \lfloor \frac{8}{i} \rfloor and in the right table \sum_{i=1}^8T(\lfloor 8/i\rfloor) where T(n) = \frac12 \cdot n(n+1):

\begin{array}{c | cc cc cc cc | c} i & & & & & & & & & i \cdot \lfloor \frac{8}{i} \rfloor\\ \hline 1 & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & 8 \cdot 1\\ 2 & \bf{2} & \bf{2} & \bf 2 & \bf 2 & & & & & 2 \cdot 4\\\hline 3 & 3 & 3 & & & & & & & 3 \cdot 2\\ 4 & 4 & 4 & & & & & & & 4 \cdot 2\\ 5 & 5 & & & & & & & & 5 \cdot 1\\ 6 & 6 & & & & & & & & 6 \cdot 1\\ 7 & 7 & & & & & & & & 7 \cdot 1\\ 8 & 8 & & & & & & & & 8 \cdot 1\\ \end{array} \iff \begin{array}{c | cc |cc cc cc | c} i & & & & & & & & & T(\lfloor \frac{8}{i} \rfloor)\\ \hline 1 & 1 & 2 &\bf 3 & \bf 4 & \bf 5 & \bf 6 & \bf 7 & \bf 8 & \frac12 \cdot 9 \cdot 8\\ 2 & 1 & 2 &\bf 3 & \bf 4 & & & & & \frac12 \cdot 5\cdot 4\\ 3 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 4 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 5 & 1 & & & & & & & & 1\\ 6 & 1 & & & & & & & & 1\\ 7 & 1 & & & & & & & & 1\\ 8 & 1 & & & & & & & & 1\\ \end{array}

Clearly, it is the same table/matrix but transposed, so the sum of elements are the same. As we notice in the matrices above, their rows become sparser for increasing i. Think about it! What if we were to compute only the bold numbers and adding them? Then we need only two function calls each for left and right. The point where we stop computing left and start to compute right can be chosen arbitrarily, but the optimal is \sqrt{N}. Then \lfloor \frac{N}{\sqrt{N}+1}\rfloor + \lfloor \sqrt{N} \rfloor \approx 2\sqrt{N}.

Example: Project Euler 23
This is a simple problem, where we are supposed to compute all numbers which cannot be expressed as a sum of two abundant numbers. Using the above code, we can solve the problem in about 100 ms using pypy:

def abundant_numbers(N):
    numbers = sum_of_divisors(N)
    abundants = []
    for i in range(0, N):
        if 2*i < numbers[i]: abundants.append(i)
    return abundants[1:]

N = 28123

abundants = abundant_numbers(N)
s = [0]*N

for x in abundants:
    for y in abundants:
        ls = x + y
        if ls < N: s[ls] = 1
        else: break

print sum(i for i in range(0, N) if not s[i])

Example: Project Euler 439
This problem is quite nice, because it requires a very well-thoughtout approach for it to be feasible solving it (it is also among the hardest problems on PE). Here is the question (notation slightly altered):

Let \sigma(k) be the sum of all divisors of k. We define the function S(N) = \sum_{1 \leq i \leq N}\sum_{1 \leq j \leq N} d(i \cdot j). For example, S(3) =\sigma(1) +\sigma(2) +\sigma(3) +\sigma(2) +\sigma(4) +\sigma(6) +\sigma(3) +\sigma(6) +\sigma(9) = 59

You are given that S(10^3) = 563576517282 and S(10^5) \bmod 10^9= 215766508. Find S(10^{11}) \bmod 10^9.

So, running our algorithm would be fine for lower values, but if we would try to store (10^{11})^2 \approx 2^{73} values in memory, we will run into a wall very fast (this is about 7.5 \cdot 10^{10} TBs assuming that each number is a 64-bit integer – note the modular reduction which can be done in every step). Clearly, we need to use the memoryless approach.

Let us first disregard the fact that \sigma(i \cdot j) =\sigma(i) \cdot\sigma(j) only holds when \gcd(i,j) = 1. Then S(N) = (\sum_{i}^{N} i \cdot \lfloor \frac{n}{i} \rfloor)^2 - C(N). The case of S(3), we will have

\begin{array}{rcccccc} (\sigma(1) +\sigma(2) +\sigma(3))^2 = & \sigma(1)\cdot\sigma(1) &+& \sigma(1)\cdot\sigma(2) &+& \sigma(1)\cdot\sigma(3) & + \\ &\sigma(2)\cdot\sigma(1) &+& \sigma(2)\cdot\sigma(2) &+& \sigma(2)\cdot\sigma(3) & +\\&\sigma(3)\cdot\sigma(1) &+& \sigma(3)\cdot\sigma(2) &+& \sigma(3)\cdot\sigma(3)\\\\ = & \sigma(1\cdot 1) &+& \sigma(1 \cdot 2) &+& \sigma(1 \cdot 3)&+\\&\sigma(2 \cdot 1) &+& \underline{\sigma(2)\cdot\sigma(2)} &+& \sigma(2 \cdot 3)&+\\ &\sigma(3 \cdot 1) &+&\sigma(3 \cdot 2) &+& \underline{\sigma(3)\cdot\sigma(3)} \end{array}

OK, so what about C(N)? We can estimate the how many numbers in the product for which it that holds \gcd(i,j) > 1. Take, \sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}). We have \sigma(p^i) = (p^{i+1}-1)/(p-1), so

\sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}) = \frac{(p^{i+1}-1)(p^{j+1}-1) - p^{i+j+1}-1}{p-1} = \frac{p^{i+j+2}- p^{i+j+1}-p^{i+1}-p^{j+1}}{p-1}

In particular, \sigma(p)\cdot\sigma(p) - \sigma(p^2) = p. Let us consider the case \gcd(i,j) = p. The (negative) contribution from this set

p \cdot  \underbrace{\left[ \left(\sum_{i=2}^Ni\left\lfloor \frac{N}{i \cdot p} \right\rfloor\right)^2 - C({N}/{p})\right]}_{=S(N/p)}.

Running over all possible gcds (all numbers p), we get

S(N) = \left(\sum_{i=1}^{N} i \cdot \lfloor \frac{N}{i} \rfloor\right)^2 - \underbrace{\sum_{p=2}^N p \cdot S(\lfloor N/p \rfloor)}_{C(N)}

The following code computes the smaller results in matter of milliseconds, but for N = 10^{11}, it takes too long (as we are aiming for the sub-minute mark):

cached = {}
dcached = {}
mod = 10**9

def S(N):
    global cached
    if N in cached: return cached[N]
    dv = square_root_time_sum_of_sum_of_divisors(N)
    S_sum, i = 0, 2
    while i <= N:
        # max recursion depth log2(10^11) ~ 33
        S_sum += (i * S(N / i)) % mod
        i += 1
    cached[N] = (dv * dv - S_sum) % mod
    return cached[N]

OK. Further optimization needed… note that

\begin{array}{rcc} S(3) = & 1 \cdot (\sigma(1/1) +\sigma(2/1) +\sigma(3/1))^2 & - \\ & 2 \cdot (\sigma(1/2) +\sigma(2/2) +\sigma(3/2))^2 &- \\ & 3 \cdot (\sigma(1/3) +\sigma(2/3) +\sigma(3/3))^2 \end{array}

Using the same approach as before, we can derive that

S(N) = \sum_{i=1}^N i \cdot \mu(i) \cdot \left(\sum_{j=1}^{N} j \cdot \lfloor \frac{N}{i \cdot j} \rfloor\right)^2

where \mu(\cdot) is the Möbius function, defined

\mu(n) = \left\{\begin{array}{rl} -1 & n \textnormal{ has an odd~number of primefactors,}\\ 1 & n \textnormal{ has an even~number of primefactors,}\\ 0 & n \textnormal{ has a squared primefactor.} \end{array}\right.

We can compute this in \mathcal{O}(N^{3/2}) time. The Möbius function can be pre-computed in \mathcal{O}(N \cdot \log N) like so:

def mobius(N):
    L = [1]*(N+1)
    i = 2
    while i <= N:
        # if i is not sq. free, then no multiple is
        if L[i] == 1 or L[i] == -1:
            L[i] = -L[i]
            sq = i * i
            j = i + i
            while j <= N:
                # we use 2 to mark non-primes
                if abs(L[j]) == 2:
                    L[j] = -L[j]
                    L[j] = -L[j]*2
                j += i
            j = sq
            while j <= N:
                L[j] = 0
                j += sq
        i += 1
    i = 2
    while i <= (N):
        if abs(L[i]) == 2:
            L[i] = L[i] / 2
        i += 1
    return L

Problem of dice

The initial purpose of this post was give a proper proof of a problem posted on Twitter by @jamestanton (it is hard within the 140 char limit), but the post was later extended to cover some other problems.

The Twitter-problem
Show that a four-sided and a nine-sided dice cannot be used to simulate the probability distribution of the product of outcomes when using two six-sided dice. In the original wording:

Is there a 4-sided die & a 9-sided die that together roll each of the products 1,2,3,…,30,36 w the same prob as two ordinary 6-sided dice?

We make an argument by contradiction, considering only the possible outcomes without taking the actual probabilities into account.

Obviously, to reach the same outcomes as for two normal dice \{1,2,3,4,5,6\} \times \{1,2,3,4,5,6\}, we need both dice to have the identity \{1\} (otherwise, we will not be able to reach 1 \cdot 1 = 1). So, \{1,*,*,*\} \times \{1,*,*,*,*,*,*,*,*\}.

Now, consider the prime 5. It must be on both dice, or we would have \mathbb{P}(5^2\cdot b)>0, b>1. So, \{1,5,*,*\} \times \{1,5,*,*,*,*,*,*,*\}. Also, since 5 appears on both dice, no dice can contain some product of the primes \{2,3,5\} and their powers (e.g 2^2 \cdot 3) that does not exist on the original dice, because then impossible products could be reached.

Hence, 6 must be on both dice, giving \{1,5,6,*\} \times \{1,5,6,*,*,*,*,*,*\}. There are 6 sides left on the larger die but we have more even products, so 2 must also be on each die. \{1,5,6,2\} \times \{1,5,6,2,*,*,*,*,*\}. Now, there is no space left for 3 on the smaller die. This means that 3^2 must be one the larger die, but then \mathbb{P}(3^2\cdot 5)>0, which is a contradiction.

(@FlashDiaz gave a shorter proof)

Project Euler 205

Peter has nine four-sided (pyramidal) dice, each with faces numbered 1, 2, 3, 4.
Colin has six six-sided (cubic) dice, each with faces numbered 1, 2, 3, 4, 5, 6.

Peter and Colin roll their dice and compare totals: the highest total wins. The result is a draw if the totals are equal.

What is the probability that Pyramidal Pete beats Cubic Colin? Give your answer rounded to seven decimal places in the form 0.abcdefg.

The probability functions of the nine four-sided dice and the six six-sided dice are given by the generating functions \frac{1}{4^9} \cdot (x^1+x^2+x^3+x^4)^9 and \frac{1}{6^6} \cdot (y^1+y^2+y^3+y^4+y^5+y^6)^6, respectively.

Let X_1,...,X_9 be i.i.d random variables taking values in the range [1,4] and let Y_1,...,Y_6 taking values in the range [1,6]. We want to determine the probability that \rho = \mathbb{P}(X_1+...+X_9 > Y_1+...+Y_6).

The distributions can be computed as

def rec_compute_dist(sides, nbr, side_sum):
        global dist
        if nbr == 1:
            for i in range(1, sides+1):
                dist[side_sum+i] += 1
            for i in range(1, sides+1):
                rec_compute_dist(sides, nbr-1, side_sum+i)

dist = [0]*37
dist_49 = dist
dist = [0]*37
dist_66 = dist

To determine \rho, we may express it as

\begin{array}{rl} \rho = & \sum_{t=6}^{36} \mathbb{P}(X_1+...+X_9 > t| Y_1+...+Y_6 = t)\cdot \mathbb{P}(Y_1+...+Y_6 = t) \\\\ = & \sum_{t=6}^{36} \mathbb{P}(X_1+...+X_9 > t)\cdot \mathbb{P}(Y_1+...+Y_6 = t) \end{array} .

Computing the sum using the following code,

probability = 0
for i in range(6,36+1):
    for j in range(i+1,36+1):
        probability += dist_66[i]*dist_49[j]

print 1.0 * probability/(6**6 * 4**9)

we obtain the answer. Great 🙂

Project Euler 240

There are 1111 ways in which five six-sided dice (sides numbered 1 to 6) can be rolled so that the top three sum to 15. Some examples are:

\begin{array}{rcl} D_1,D_2,D_3,D_4,D_5 &=& 4,3,6,3,5\\ D_1,D_2,D_3,D_4,D_5 &=& 4,3,3,5,6\\ D_1,D_2,D_3,D_4,D_5 &=& 3,3,3,6,6\\ D_1,D_2,D_3,D_4,D_5 &=& 6,6,3,3,3 \end{array}

In how many ways can twenty twelve-sided dice (sides numbered 1 to 12) be rolled so that the top ten sum to 70?

Let us first consider the simpler problem \left\{ d_1+..+d_{10}=70 \right\}. If we restrict the remaining ten dice to be less than or equal to the minimum value of the ten dice, we then can compute the cardinality. Let n_i denote the number of i‘s we got. Then, n_1 \cdot 1 + n_2 \cdot 2 + ... + n_{12} \cdot 12 = 70 where n_1  + n_2 + ... + n_{12}  = 10, n_i \geq 0.

All histograms of top-ten dice can be computed with

from copy import copy
d = [0] * 12
possible = []

def rec_compute(i, j, sum):
    global d
    if j == 0:
        if sum == 70:
    while i > 0:
        if sum + i <= 70:
            d[i - 1] += 1
            rec_compute(i, j - 1, sum + i)
            d[i - 1] -= 1
        i -= 1

rec_compute(12, 10, 0)

The code exhausts all solutions in 200ms. Call any solution H. For instance H = [0, 0, 0, 0, 0, 0, 10, 0, 0, 0, 0, 0]. The remaining dice can take any values in the range [1, j], where j is the left-most non-zero index (starting from 1). The number of configurations for this particular solution is then given by

20! \cdot \left((10+H_7)!H_6!H_5!H_4!H_3!H_2!H_1!\right)^{-1}, where \sum^7_1 H_i = 10.

Unfortunately, there is no good analytical way of computing this. So, the easiest way is to enumerate all possible H_i. Disregarding H_7, we compute all permutations of a given histogram in the same way (hence, we can make the search space a lot smaller) and the using the multiplicity to determine the exact number. All and all, the following code gives our answer:

def configurations(i, j, x, s, l):
    if sum(x) == s:
        # we can stop here as the sum cannot get smaller
        multiplicity = fact(l) / fact(l-len(x)) / \
                       reduce(lambda m, n: m * n, \
                       [fact(y) for y in \

        return fact(DICE) * multiplicity / \
               reduce(lambda m, n: m * n, \
               [fact(y) for y in x])

    if j == 0 or i == 0: return 0
    return configurations(i-1, j, x, s, l) + \
           configurations(i, j-1, x + [i], s, l)

S = 0
for H in possible_top_dice:
    min_index = next((i for i, \
                x in enumerate(H) if x), None)

    for j in range(0, REMAINING+1):
        u = reduce(lambda m, n: m * n, \
            [fact(y) for y in H])

        # mutually exclusive -- may add instead of union
        if j < REMAINING:
            q = configurations(REMAINING-j, min_index, \
                [], REMAINING-j, min_index) / u
            q = fact(DICE) / u
        H[min_index] += 1
        S += q
print S

Breaking affine ciphers – a matrix approach

An affine cipher is a one the remnants of classical cryptography, easily broken with todays computational power. The cipher defines a symbol mapping from f :\{A,B,\ldots,\} \mapsto \mathbb{Z}_n. Each cipher symbol is then computed as a \cdot x + b \rightarrow y, where a \in \mathbb{Z}^*_n and b \in \mathbb{Z}_n. Decryption is then done by computing x= (y - b) \cdot a^{-1}.

In this blog post, I will show how to break this cipher in time faster than trying all keys. Let us first sketch the general idea. Consider an expected distribution \hat{P} of the symbols and a given distribution P, the integral \int (\hat{P}(x) - P(x))^2 dx defines a statistical distance between the distributions (this would correspond to the Euclidian distance), which we would like to minimize. Now, clearly

(\hat{P}(x) - P(x))^2 = \hat{P}(x)^2 - \hat{P}(x)P(x) + P(x)^2.

Trivially, \hat{P}(x)^2 and P(x)^2 remains constant over any keypair (a,b), so instead of minimizing the above, we can maximize \hat{P}(x)P(x). Therefore, the minimization problem can be turned into a maximization problem \max_{a,b} \int \hat{P}(x)P_{a,b}(x) dx. Cool.

In terms of our cipher, which is discrete, the minimization problem is a sum \max_{a,b} \sum \hat{P}(x)P_{a,b}(x). The observant reader may notice that this looks like a term in a matrix multiplication. There is just one caveat; the indices corresponding appear only in one term. There is an easy way to get around this. Instead of applying transformations on only P, we may split them among the two. So by instead computing

\max_{a,b} \sum \hat{P}_a(x) P_{b}(x),

we have achieved what we desired. This means that we shuffle \hat{P} with a and {P} with b. Let us interpret this as Python. The expected distribution of an alphabet ABCDEFGHIJKLMNOPQRSTUVWXYZ ,. may be as follows (depending on the observation):

P_hat = {' ': 0.05985783763561542, ',': 0.0037411148522259637, '.': 0.0028058361391694723, 'A': 0.0764122708567153, 'C': 0.02600074822297044, 'B': 0.012065095398428732, 'E': 0.11878039655817432, 'D': 0.03974934530490086, 'G': 0.018892630003741116, 'F': 0.020856715301159744, 'I': 0.0651889263000374, 'H': 0.05695847362514029, 'K': 0.00720164609053498, 'J': 0.0014029180695847362, 'M': 0.02254021698466143, 'L': 0.03769173213617658, 'O': 0.07023943135054246, 'N': 0.06313131313131314, 'Q': 0.0009352787130564909, 'P': 0.01805087916199027, 'S': 0.05920314253647587, 'R': 0.0560231949120838, 'U': 0.025813692480359144, 'T': 0.08473625140291807, 'W': 0.022072577628133184, 'V': 0.00916573138795361, 'Y': 0.01842499064721287, 'X': 0.0014029180695847362, 'Z': 0.0006546950991395436}

The transformations are done by computing the matrices

# compute first matrix for transformed P_hat
for i in range(1, N):
    for element in priori_dist:
        X[i, (look_up.index(element) * i) % N] = priori_dist[element]

# compute second matrix for transformed P
for j in range(N):
    for element in dist:
        Y[(look_up.index(element) - j) % N, j] = dist[element]

Here, the ith row in X corresponds to \hat{P} transformed by a = i. Moreover, the jth row in Y corresponds {P} transformed by b = j.

For some distribution, they may look like


As we can see, X is only shifted (by the addition), while in Y the indices are reordered by multiplication with row index i. Taking advantage of the matrix multiplication property, we may now compute Z=XY.


Any entry in Z is Z_{a,b} = \sum_x X_{a,x} Y_{x,b} so finding a maximum element in Z is equivalent to saying

\max_{a,b} \sum_x X_{a,x} Y_{x,b}.

Looks familiar? It should. This is our maximization problem, which we stated earlier.

Therefore, we may solve the problem using

Z =, Y)
a, b = numpy.unravel_index(Z.argmax(), Z.shape)

This breaks the affine cipher.

Some notes on complexity

So, what is the complexity of the matrix approach? Computing the matrices takes O(N^2) modular operations. The matrix multiplication takes naively O(N^3) operations, but for large N this can be achieved faster. For instance Strassen takes O(N^{2.807}) but faster algorithms exist. Also, taking advantage of symmetry and structure could probably decrease the complexity further. This is the total complexity of this approach.

Compare this with brute-force guessing of the key (taking O(N^2) guesses) and for each guess, compute the distance taking O(N) operations, which in total yields O(N^3). It should be noted that complexity of this approach may be reduced by picking a,b in an order which minimizes the number of tries.

Example implementation for the github

Custom terminal for Vagrant/SSH

Short story: I wanted to distinguish my terminal windows between local sessions, ssh sessions and vagrant sessions.


set_th () {
  osascript -e "tell app \"Terminal\" to set current settings of first window to settings set \"$1\""

set_id () {
  osascript -e "tell app \"Terminal\" to set current settings of first window to $1 $2 $3 $4" #$@ does not work!

get_id () {
  cur_id=$(osascript -e "tell app \"Terminal\" to get current settings of first window")

    set_th $SSH_THEME
    /usr/bin/ssh "$@"
	set_id $cur_id

	if [ $1 = "ssh" ]; then
	    set_th $VAGRANT_THEME
	    /opt/vagrant/bin/vagrant "$@"
		set_id $cur_id
		/opt/vagrant/bin/vagrant "$@"

The code creates a temporary variable of the current theme before switching. So, when ending the session, the original theme changes back instead of a fixed one.
Putting the above code in your .bash_profile: gives the following nice behavior:

Color coding your sessions is a great way to visualize things and make sure you do not take any unwanted action by mistake 🙂

Of course, the code can be used to wrap any application. For instance, one could use it to make the interactive interpreter of Python/Sage or terminal sessions using torsocks appear in different colors or even fonts.

Re-mapping KBT Pure Pro in OS X

For my everyday-use computer, I use a modded KBT Pure Pro; this is a small mechanical keyboard with aluminium base and background lightning, perfect for programming and typing. The size of the keyboard is 60 % of a normal one, making it suitable for spatially constrained workspaces. To my experience, it is also more ergonomic. Below is a comparison of the Pure Pro and a wireless Apple keyboard. For those being the in the process of buying a keyboard, I recommend this one 🙂


For quite a while, I have used Linux on this computer. But after installing OS X, the keyboard map went wack, so to speak. Many keys were mapped incorrectly. Using Ukulele, I created a customized layout with correct mapping (don’t mind the duplicate keys):

Screen Shot 2016-06-15 at 13.43.21

The layout covers all keys and can be found here. NOTE: this is a layout for KBT Pure Pro with British ISO layout and not ANSI.

BackdoorCTF16 – Collision Course

With 350 points and a description as follows:

In today’s world, hash collisions are becoming more and more popular. That is why, one must rely on standardized hashing techniques, such as bcrypt. However, n00bster shall never learn, and he has implemented his own hash function which he proudly calls foobar. Attached is an implementation of the hash function and a file with which you are supposed to find a collision. He believes that you will not be able to find a collision for the file, especially since he hasn’t even given you the hashing algorithm, but has packaged it as a black box application. Prove to him that he is wrong.

Note: Multiple collisions are possible, but only one of them is a valid flag.
You will realize you’ve gotten it once you do.

The hash is given as follows:

Screenshot 2016-06-05 11.21.01

So, we start off by looking at the binary. Using Hopper, we obtain the following pseudo code by decompilation:

int hash(int input) {
    eax = _rotr(input ^ 0x24f50094, 
               (input ^ 0x24f50094) & 0xf);
    eax = _rotl(eax + 0x2219ab34, 
                eax + 0x2219ab34 & 0xf);
    eax = eax * 0x69a2c4fe;
    return eax;
int main() {
    esp = (esp & 0xfffffff0) - 0x20;
    stack[2039] = "\nBar:";
    while (stack[2039] < *(esp + 0x18)) {
            stack[2039] = *(stack[2039] + 
                            stack[2039] * 0x4);
            *(esp + 0x14) = *(esp + 0x14) ^ 
            eax = _rotr(stack[2039], 0x7);
            printf("%08lx", stack[2039]);
            *(esp + 0x10) = *(esp + 0x10) + 0x1;
    eax = putchar(0xa);
    return eax;

We sketch the above code as block scheme below:

Skärmavbild 2016-06-05 kl. 16.31.19

The first thing to note is that we can find an infinite number of collisions just by appending arbitrary data after 10 blocks. However, this is not interesting to us, but completely defeats the conditions for a safe cryptographic hash function.

This Merkle-Damgård-like structure allows us to solve blocks iteratively, starting from the first. Here is how. Starting from the first block, we can find an input to the function H such that when rotated 7 steps is equal to block 0 (here, denoted B_0). Hence, the problem we solve is to find an x such that H(x) \ll 7 = B_0. This is a simple thing for Z3. Then, we take the next block and solve for (H(x) \oplus B_0) \ll 7 = B_1 and so forth. Implemented in Python/Z3, it may look like the following:

from z3 import *
import binascii, string, itertools
bits = 32
mask = 2**bits - 1
allowed_chars = string.printable
def convert_to_hex(s):
   return ''.join([hex(ord(x))[2:].zfill(2) for x in s[::-1]])
def convert_to_string(h):
   return ''.join([chr(int(x, 16)) for x in list(map(''.join, zip(*[iter(hex(h)[2:])]*2)))[::-1]])
def rot(val, steps):
   return (val << (bits-steps)) | LShR(val, steps)
def hash_foobar(input):
   eax = rot(input ^ 0x24f50094, (input ^ 0x24f50094) & 0xf)
   eax = rot(eax + 0x2219ab34, bits - (eax + 0x2219ab34 & 0xf))
   eax = eax * 0x69a2c4fe
   return eax & mask
def break_iteratively(hashdata, i):
   if i == 0: prev_block = 0
   else: prev_block = hashdata[i-1]
   s = Solver()
   j = BitVec('current_block', bits)
   eax = rot(prev_block ^ hash_foobar(j), 7)
   s.add(eax == hashdata[i])
   block_preimages = []
   while s.check() == sat:
       sol = s.model()
       s.add(j != sol[j].as_long())
       block_string = convert_to_string(sol[j].as_long())
       if all(c in allowed_chars for c in block_string):
   return block_preimages
known = '9513aaa552e32e2cad6233c4f13a728a5c5b8fc879febfa9cb39d71cf48815e10ef77664050388a3' # this the hash of the file
data = list(map(''.join, zip(*[iter(known)]*8)))
hashdata = [int(x, 16) for x in data]
print '[+] Hash:', ''.join(data)
print '[+] Found potential hashes:\n'
for x in itertools.product(*[break_iteratively(hashdata, i) for i in range(10)]):
   print ' * ' + ''.join(x)

This code is surprisingly fast, thanks to Z3, and runs in 0.3 seconds. Taking all possible collisions into consideration…

[+] Hash: 9513aaa552e32e2cad6233c4f13a728a5c5b8fc879febfa9cb39d71cf48815e10ef77664050388a3
[+] Found potential hashes:

 * CTFEC0nstra1nts_m4keth_fl4g}
 * CTFEC0nstra1nts_m4keth_nl4g}
 * CTFEC0nstra1nws_m4keth_fl4g}
 * CTFEC0nstra1nws_m4keth_nl4g}
 * CTFEC0nstra9nts_m4keth_fl4g}
 * CTFEC0nstra9nts_m4keth_nl4g}
 * CTFEC0nstra9nws_m4keth_fl4g}
 * CTFEC0nstra9nws_m4keth_nl4g}
 * CTF{C0nstra1nts_m4keth_fl4g}
 * CTF{C0nstra1nts_m4keth_nl4g}
 * CTF{C0nstra1nws_m4keth_fl4g}
 * CTF{C0nstra1nws_m4keth_nl4g}
 * CTF{C0nstra9nts_m4keth_fl4g}
 * CTF{C0nstra9nts_m4keth_nl4g}
 * CTF{C0nstra9nws_m4keth_fl4g}
 * CTF{C0nstra9nws_m4keth_nl4g}

…we finally conclude that the flag is the SHA-256 of C0nstra1nts_m4keth_fl4g.

BackdoorCTF16 – Baby

Worth 200 points, this challenge was presented with the following:

z3r0c00l has a safe repository of files. The filename is signed using z3r0c00l’s private key (using the PKCS-1 standard). Anyone willing to read a file, has to ask for a signature from z3r0c00l. But z3r0c00l is currently unavailable.

Can you still access a file named “flag” on z3rc00l’s repository?

nc 9001

Let us take a look at the public key… 3072 bits and public exponent e = 3. Hmm… having a small exponent is usually not a good practice. First, I tried computing the roots to x^3 - s \bmod n, where s is the signature and n is the modulus, but then I realized that this was not the way to go. What if we use non-modular squareroot, plain old Babylonian style? After looking around, I also realized that this is Bleicherbacher’s e = 3 attack, which I probably should have known about. There is a lot of information about this attack (therefore, I will not describe it here) and, of course, lots of people have already written code for this. Being lazy/efficient, I rewrote a functional code into the the following:

from libnum import *
from gmpy2 import mpz, iroot, powmod, mul, t_mod
import hashlib, binascii, rsa, os
def get_bit(n, b):
    """ Returns the b-th rightmost bit of n """
    return ((1 << b) & n) >> b
def set_bit(n, b, x):
    """ Returns n with the b-th rightmost bit set to x """
    if x == 0: return ~(1 << b) & n
    if x == 1: return (1 << b) | n
def cube_root(n):
    return int(iroot(mpz(n), 3)[0])
snelhest = hashlib.sha256('flag')
ASN1_blob = rsa.pkcs1.HASH_ASN1['SHA-256']
suffix = b'\x00' + ASN1_blob + snelhest.digest()
sig_suffix = 1
for b in range(len(suffix)*8):
    if get_bit(sig_suffix ** 3, b) != get_bit(s2n(suffix), b):
        sig_suffix = set_bit(sig_suffix, b, 1)
while True:
    prefix = b'\x00\x01' + os.urandom(3072//8 - 2)
    sig_prefix = n2s(cube_root(s2n(prefix)))[:-len(suffix)] + b'\x00' * len(suffix)
    sig = sig_prefix[:-len(suffix)] + n2s(sig_suffix)
    if b'\x00' not in n2s(s2n(sig) ** 3)[:-len(suffix)]: break
print hex(s2n(sig))[2:-1]

Ok, so lets try it:

Screenshot 2016-06-05 16.15.09