After fiddling with Project Euler, I started to wonder how to compute the sum of divisors in the most efficient way (albeit, most the problems in PE does not require any super-sophisticated solution). The trivial approach is doing division and adding up the divisors sequentially. This may be good enough for small , but as grows this approach gives an infeasible time complexity.

The next thing one may think of is to generate a list of prime numbers (using a sieve) and for each number try all the numbers in the list. This is faster than the trivial approach, but still requires . Since the sieve requires time , we do not need to do better than that. But can we match it? Turns out we can.

First, we need to establish a simple number theoretical fact: the is the divisor counting function which has the properties (and, thus, ) and .

The solution is quite simple; consider the first element in the prime set . That is . We split the range into even numbers and for even compute

.

Then for , compute , and so forth up to . The requires operations.

Running it for all powers and primes, we use in total operations. Now, we have . So the complexity is . This is a harmonic series and treating it as an integral, we have . Of course, we do only generate primes up to , so the final complexity is . So we are done 🙂

As always, let us look at a Python representation:

def sum_of_divisors(N): N += 1 # extend limit to inclusive primes = primetools.sieve(N) numbers = [1]*N for q in primes: tmp = {} mult = q while mult <= N: for i in range(mult, N, mult): if i in tmp: tmp[i] += mult else: tmp[i] = 1 + mult mult *= q for j in tmp: numbers[j] *= tmp[j] return numbers[2:]

Below is a plot of the timing of the code. Since the code stores data in memory, some undesired effects comes into play for largers values (due to the need of shuffling data back and forth) and is therefore not present in the graph.

Bounded by the sieving complexity, the algoritm cannot be any faster using the sieve. Can we do better? Note that the algorithm returns a list of sums. If we are really interested in the sum of sums, we may instead compute , which is and uses basically no memory. This can be written as

def sum_of_sum_of_divisors(N): div_sum = 0 for i in range(1, N): div_sum += i * (N / i) return div_sum

How does it work? For each , it computes the cardinality of the set of numbers divisible by ,i.e., , which is exactly . This contributes with for every element, so the total contribution is . Doing it for all possible divisors, we end up with the sum of sum of divisors.

Turns out, we can actually do it in . Here is how:

def square_root_time_sum_of_sum_of_divisors(N): global dcached if N in dcached: return dcached[N] div_sum, i = 0, 1 q = int(math.sqrt(N)) while i <= q: div_sum += (i * (N / i)) i += 1 i = 1 while i <= N / (q + 1): m = N / i k = N / (i + 1) div_sum += (i * (m * (m + 1) - k * (k + 1)) / 2) i += 1 dcached[N] = div_sum return div_sum

Consider the below example. In the left table, we have the sum and in the right table where :

Clearly, it is the same table/matrix but transposed, so the sum of elements are the same. As we notice in the matrices above, their rows become sparser for increasing . Think about it! What if we were to compute only the bold numbers and adding them? Then we need only two function calls each for left and right. The point where we stop computing left and start to compute right can be chosen arbitrarily, but the optimal is . Then .

**Example: Project Euler 23**

This is a simple problem, where we are supposed to compute all numbers which cannot be expressed as a sum of two abundant numbers. Using the above code, we can solve the problem in about 100 ms using pypy:

def abundant_numbers(N): numbers = sum_of_divisors(N) abundants = [] for i in range(0, N): if 2*i < numbers[i]: abundants.append(i) return abundants[1:] N = 28123 abundants = abundant_numbers(N) s = [0]*N for x in abundants: for y in abundants: ls = x + y if ls < N: s[ls] = 1 else: break print sum(i for i in range(0, N) if not s[i])

**Example: Project Euler 439**

This problem is quite nice, because it requires a very well-thoughtout approach for it to be feasible solving it (it is also among the hardest problems on PE). Here is the question (notation slightly altered):

Let be the sum of all divisors of . We define the function . For example,

You are given that and . Find .

So, running our algorithm would be fine for lower values, but if we would try to store values in memory, we will run into a wall very fast (this is about TBs assuming that each number is a 64-bit integer – note the modular reduction which can be done in every step). Clearly, we need to use the memoryless approach.

Let us first disregard the fact that only holds when . Then . The case of , we will have

OK, so what about ? We can estimate the how many numbers in the product for which it that holds . Take, . We have , so

In particular, . Let us consider the case . The (negative) contribution from this set

.

Running over all possible gcds (all numbers ), we get

The following code computes the smaller results in matter of milliseconds, but for , it takes too long (as we are aiming for the sub-minute mark):

cached = {} dcached = {} mod = 10**9 def S(N): global cached if N in cached: return cached[N] dv = square_root_time_sum_of_sum_of_divisors(N) S_sum, i = 0, 2 while i <= N: # max recursion depth log2(10^11) ~ 33 S_sum += (i * S(N / i)) % mod i += 1 cached[N] = (dv * dv - S_sum) % mod return cached[N]

OK. Further optimization needed… note that

Using the same approach as before, we can derive that

where is the Möbius function, defined

We can compute this in time. The Möbius function can be pre-computed in like so:

def mobius(N): L = [1]*(N+1) i = 2 while i <= N: # if i is not sq. free, then no multiple is if L[i] == 1 or L[i] == -1: L[i] = -L[i] sq = i * i j = i + i while j <= N: # we use 2 to mark non-primes if abs(L[j]) == 2: L[j] = -L[j] else: L[j] = -L[j]*2 j += i j = sq while j <= N: L[j] = 0 j += sq i += 1 i = 2 while i <= (N): if abs(L[i]) == 2: L[i] = L[i] / 2 i += 1 return L