Computing sum of divisors for numbers up to N

After fiddling with Project Euler, I started to wonder how to compute the sum of divisors in the most efficient way (albeit, most the problems in PE does not require any super-sophisticated solution). The trivial approach is doing division and adding up the divisors sequentially. This may be good enough for small N, but as N grows this approach gives an infeasible time complexity.

The next thing one may think of is to generate a list of prime numbers (using a sieve) and for each number try all the numbers in the list. This is faster than the trivial approach, but still requires cN \cdot |\mathcal{P}_N| = cN \cdot \pi(N) \sim \mathcal{O}(N^2/\log N). Since the sieve requires time \mathcal{O}(N\cdot \log N), we do not need to do better than that. But can we match it? Turns out we can.

First, we need to establish a simple number theoretical fact: the \sigma(n) is the divisor counting function which has the properties \sigma(p^i) = 1 + p + \cdots p^i (and, thus, \sigma(p^i) = \sigma(p^{i-1}) + p^i) and  \sigma(p^i \cdot q^j) = \sigma(p^i) \cdot \sigma(q^j).

The solution is quite simple; consider the first element in the prime set \mathcal{P}. That is 2. We split the range into even numbers and for even  i compute

\sigma(n_i) = \sigma(2\cdot m) = \underbrace{\sigma(2)}_{\text{store this in }x_i} \cdot ~\sigma(m).

Then for 2^2, compute  \sigma(2^2) = \sigma(2) + 2^2, and so forth up to 2^i < N. The requires N/2 + N/4 + \cdots operations.

\begin{array}{cccccccccccc} 1 & 2 & 3& 4 &5& 6 &7& 8 &9& 10 &11& 12 \\ & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) & & \sigma(2) \\ & & & +& & &&+& & && + \\ & & & 2^2& & && 2^2& & && 2^2 \\ & & & & & &&+& & && \\ & & & & & && 2^3& & && \\ \hline & \sigma(2) & & \sigma(4) & & \sigma(2) & & \sigma(8) & & \sigma(2) & & \sigma(4) \end{array}

Running it for all powers and primes, we use in total N \cdot (\sum_{p\in\mathcal{P}} \frac{1}{p} + \sum_{p\in\mathcal{P}} \frac{1}{p^2} + \cdots) operations. Now, we have \sum_{i > 1} \frac{1}{p^i} = \frac{1}{p(p-1)}. So the complexity is \mathcal{O}(N \cdot \sum_{p\in\mathcal{P}} \frac{1}{p}). This is a harmonic series and treating it as an integral, we have \int_{p\in\mathcal{P}} \frac{1}{p} < \log {p}. Of course, we do only generate primes up to N, so the final complexity is \mathcal{O}(N \cdot \log N). So we are done 🙂

As always, let us look at a Python representation:

def sum_of_divisors(N):
    N += 1 # extend limit to inclusive
    primes = primetools.sieve(N)
    numbers = [1]*N
    for q in primes:
        tmp = {}
        mult = q
        while mult <= N:
            for i in range(mult, N, mult):
                if i in tmp: tmp[i] += mult
                else: tmp[i] = 1 + mult
            mult *= q
        for j in tmp: numbers[j] *= tmp[j]
    return numbers[2:]

Below is a plot of the timing of the code. Since the code stores data in memory, some undesired effects comes into play for largers values (due to the need of shuffling data back and forth) and is therefore not present in the graph.

timing_figure_1

Bounded by the sieving complexity, the algoritm cannot be any faster using the sieve. Can we do better? Note that the algorithm returns a list of sums. If we are really interested in the sum of sums, we may instead compute \sum_{i}^{N}\sigma(i) = \sum_{i}^{N} i \cdot \lfloor \frac{N}{i} \rfloor, which is \mathcal{O}(N) and uses basically no memory. This can be written as

def sum_of_sum_of_divisors(N):
    div_sum = 0
    for i in range(1, N):
        div_sum += i * (N / i)
    return div_sum

How does it work? For each i, it computes the cardinality of the set of numbers divisible by i,i.e., \sum_{i|j, 1 \leq j \leq N} i, which is exactly  \lfloor \frac{N}{i} \rfloor. This contributes with i for every element, so the total contribution is  i\lfloor \frac{N}{i} \rfloor. Doing it for all possible divisors, we end up with the sum of sum of divisors.

Turns out, we can actually do it in \mathcal{O}(\sqrt N). Here is how:

def square_root_time_sum_of_sum_of_divisors(N):
    global dcached
    if N in dcached: return dcached[N]
    div_sum, i = 0, 1
    q = int(math.sqrt(N))
    while i <= q:
        div_sum += (i * (N / i))
        i += 1
    i = 1
    while i <= N / (q + 1):
        m = N / i
        k = N / (i + 1)
        div_sum += (i * (m * (m + 1) - k * (k + 1)) / 2)
        i += 1
    dcached[N] = div_sum
    return div_sum

Consider the below example. In the left table, we have the sum \sum_{i=1}^8 i \cdot \lfloor \frac{8}{i} \rfloor and in the right table \sum_{i=1}^8T(\lfloor 8/i\rfloor) where T(n) = \frac12 \cdot n(n+1):

\begin{array}{c | cc cc cc cc | c} i & & & & & & & & & i \cdot \lfloor \frac{8}{i} \rfloor\\ \hline 1 & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & \bf{1} & 8 \cdot 1\\ 2 & \bf{2} & \bf{2} & \bf 2 & \bf 2 & & & & & 2 \cdot 4\\\hline 3 & 3 & 3 & & & & & & & 3 \cdot 2\\ 4 & 4 & 4 & & & & & & & 4 \cdot 2\\ 5 & 5 & & & & & & & & 5 \cdot 1\\ 6 & 6 & & & & & & & & 6 \cdot 1\\ 7 & 7 & & & & & & & & 7 \cdot 1\\ 8 & 8 & & & & & & & & 8 \cdot 1\\ \end{array} \iff \begin{array}{c | cc |cc cc cc | c} i & & & & & & & & & T(\lfloor \frac{8}{i} \rfloor)\\ \hline 1 & 1 & 2 &\bf 3 & \bf 4 & \bf 5 & \bf 6 & \bf 7 & \bf 8 & \frac12 \cdot 9 \cdot 8\\ 2 & 1 & 2 &\bf 3 & \bf 4 & & & & & \frac12 \cdot 5\cdot 4\\ 3 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 4 & 1 & 2 & & & & & & & \frac12 \cdot 3\cdot 2\\ 5 & 1 & & & & & & & & 1\\ 6 & 1 & & & & & & & & 1\\ 7 & 1 & & & & & & & & 1\\ 8 & 1 & & & & & & & & 1\\ \end{array}

Clearly, it is the same table/matrix but transposed, so the sum of elements are the same. As we notice in the matrices above, their rows become sparser for increasing i. Think about it! What if we were to compute only the bold numbers and adding them? Then we need only two function calls each for left and right. The point where we stop computing left and start to compute right can be chosen arbitrarily, but the optimal is \sqrt{N}. Then \lfloor \frac{N}{\sqrt{N}+1}\rfloor + \lfloor \sqrt{N} \rfloor \approx 2\sqrt{N}.

Example: Project Euler 23
This is a simple problem, where we are supposed to compute all numbers which cannot be expressed as a sum of two abundant numbers. Using the above code, we can solve the problem in about 100 ms using pypy:

def abundant_numbers(N):
    numbers = sum_of_divisors(N)
    abundants = []
    for i in range(0, N):
        if 2*i < numbers[i]: abundants.append(i)
    return abundants[1:]

N = 28123

abundants = abundant_numbers(N)
s = [0]*N

for x in abundants:
    for y in abundants:
        ls = x + y
        if ls < N: s[ls] = 1
        else: break

print sum(i for i in range(0, N) if not s[i])

Example: Project Euler 439
This problem is quite nice, because it requires a very well-thoughtout approach for it to be feasible solving it (it is also among the hardest problems on PE). Here is the question (notation slightly altered):

Let \sigma(k) be the sum of all divisors of k. We define the function S(N) = \sum_{1 \leq i \leq N}\sum_{1 \leq j \leq N} d(i \cdot j). For example, S(3) =\sigma(1) +\sigma(2) +\sigma(3) +\sigma(2) +\sigma(4) +\sigma(6) +\sigma(3) +\sigma(6) +\sigma(9) = 59

You are given that S(10^3) = 563576517282 and S(10^5) \bmod 10^9= 215766508. Find S(10^{11}) \bmod 10^9.

So, running our algorithm would be fine for lower values, but if we would try to store (10^{11})^2 \approx 2^{73} values in memory, we will run into a wall very fast (this is about 7.5 \cdot 10^{10} TBs assuming that each number is a 64-bit integer – note the modular reduction which can be done in every step). Clearly, we need to use the memoryless approach.

Let us first disregard the fact that \sigma(i \cdot j) =\sigma(i) \cdot\sigma(j) only holds when \gcd(i,j) = 1. Then S(N) = (\sum_{i}^{N} i \cdot \lfloor \frac{n}{i} \rfloor)^2 - C(N). The case of S(3), we will have

\begin{array}{rcccccc} (\sigma(1) +\sigma(2) +\sigma(3))^2 = & \sigma(1)\cdot\sigma(1) &+& \sigma(1)\cdot\sigma(2) &+& \sigma(1)\cdot\sigma(3) & + \\ &\sigma(2)\cdot\sigma(1) &+& \sigma(2)\cdot\sigma(2) &+& \sigma(2)\cdot\sigma(3) & +\\&\sigma(3)\cdot\sigma(1) &+& \sigma(3)\cdot\sigma(2) &+& \sigma(3)\cdot\sigma(3)\\\\ = & \sigma(1\cdot 1) &+& \sigma(1 \cdot 2) &+& \sigma(1 \cdot 3)&+\\&\sigma(2 \cdot 1) &+& \underline{\sigma(2)\cdot\sigma(2)} &+& \sigma(2 \cdot 3)&+\\ &\sigma(3 \cdot 1) &+&\sigma(3 \cdot 2) &+& \underline{\sigma(3)\cdot\sigma(3)} \end{array}

OK, so what about C(N)? We can estimate the how many numbers in the product for which it that holds \gcd(i,j) > 1. Take, \sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}). We have \sigma(p^i) = (p^{i+1}-1)/(p-1), so

\sigma(p^i)\cdot\sigma(p^j) - \sigma(p^{i+j}) = \frac{(p^{i+1}-1)(p^{j+1}-1) - p^{i+j+1}-1}{p-1} = \frac{p^{i+j+2}- p^{i+j+1}-p^{i+1}-p^{j+1}}{p-1}

In particular, \sigma(p)\cdot\sigma(p) - \sigma(p^2) = p. Let us consider the case \gcd(i,j) = p. The (negative) contribution from this set

p \cdot  \underbrace{\left[ \left(\sum_{i=2}^Ni\left\lfloor \frac{N}{i \cdot p} \right\rfloor\right)^2 - C({N}/{p})\right]}_{=S(N/p)}.

Running over all possible gcds (all numbers p), we get

S(N) = \left(\sum_{i=1}^{N} i \cdot \lfloor \frac{N}{i} \rfloor\right)^2 - \underbrace{\sum_{p=2}^N p \cdot S(\lfloor N/p \rfloor)}_{C(N)}

The following code computes the smaller results in matter of milliseconds, but for N = 10^{11}, it takes too long (as we are aiming for the sub-minute mark):

cached = {}
dcached = {}
mod = 10**9

def S(N):
    global cached
    if N in cached: return cached[N]
    dv = square_root_time_sum_of_sum_of_divisors(N)
    S_sum, i = 0, 2
    while i <= N:
        # max recursion depth log2(10^11) ~ 33
        S_sum += (i * S(N / i)) % mod
        i += 1
    cached[N] = (dv * dv - S_sum) % mod
    return cached[N]

OK. Further optimization needed… note that

\begin{array}{rcc} S(3) = & 1 \cdot (\sigma(1/1) +\sigma(2/1) +\sigma(3/1))^2 & - \\ & 2 \cdot (\sigma(1/2) +\sigma(2/2) +\sigma(3/2))^2 &- \\ & 3 \cdot (\sigma(1/3) +\sigma(2/3) +\sigma(3/3))^2 \end{array}

Using the same approach as before, we can derive that

S(N) = \sum_{i=1}^N i \cdot \mu(i) \cdot \left(\sum_{j=1}^{N} j \cdot \lfloor \frac{N}{i \cdot j} \rfloor\right)^2

where \mu(\cdot) is the Möbius function, defined

\mu(n) = \left\{\begin{array}{rl} -1 & n \textnormal{ has an odd~number of primefactors,}\\ 1 & n \textnormal{ has an even~number of primefactors,}\\ 0 & n \textnormal{ has a squared primefactor.} \end{array}\right.

We can compute this in \mathcal{O}(N^{3/2}) time. The Möbius function can be pre-computed in \mathcal{O}(N \cdot \log N) like so:

def mobius(N):
    L = [1]*(N+1)
    i = 2
    while i <= N:
        # if i is not sq. free, then no multiple is
        if L[i] == 1 or L[i] == -1:
            L[i] = -L[i]
            sq = i * i
            j = i + i
            while j <= N:
                # we use 2 to mark non-primes
                if abs(L[j]) == 2:
                    L[j] = -L[j]
                else:
                    L[j] = -L[j]*2
                j += i
            j = sq
            while j <= N:
                L[j] = 0
                j += sq
        i += 1
    i = 2
    while i <= (N):
        if abs(L[i]) == 2:
            L[i] = L[i] / 2
        i += 1
    return L
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s