Quick and dirty way to calculate large binomial coefficients in Python

This is a trivial, yet very fast approximation of calculating binomial coefficients is to use the logarithm rules we got from the basic course in calculus. We have that

n!=n\times (n-1)\times\cdots\times 2\times 1\iff\log n!=\log (n\times(n-1)\times\cdots\times 2 \times 1)

\log n+\log (n-1)+\cdots+\log 2+\log 1=\sum_{i=2}^n \log i

Therefore,

n! = e^{\sum_{i=2}^n \log i}.

Example: In Python, that is

from math import log
from math import exp

def log_fac(n):
	sum = 0
	for i in xrange(2,n+1): sum += log(i)
	return sum

def log_binomial(n,k): return (log_fac(n)-log_fac(k)-log_fac(n-k))

To compute for instance {80000 \choose 30}, write print exp(log_binomial(80000,30)) which outputs 4.64170748337e+114 in a matter of milliseconds. We can improve this further. Here’s a code which does a bit less calculation, although the improvement is marginal:

def log_fac(m,n):
	sum = 0
	for i in xrange(m,n+1): sum += log(i)
	return sum

def log_binomial(n,k):
	if k > (n-k):
		return (log_fac(n-k+1,n)-log_fac(2,k))
	else:
		return (log_fac(k+1,n)-log_fac(2,n-k))

Sergey Shashkov suggested in the comments the following exact computation of the binomial coefficient:

def pow_binomial(n, k):
	def eratosthenes_simple_numbers(N):
		yield 2
		nonsimp = set()
		for i in xrange(3, N + 1, 2):
			if i not in nonsimp:
				nonsimp |= {j for j in xrange(i * i, N + 1, 2 * i)}
				yield i
	def calc_pow_in_factorial(a, p):
		res = 0
		while a:
			a //= p
			res += a
		return res
	ans = 1
	for p in eratosthenes_simple_numbers(n):
		ans *= p ** (calc_pow_in_factorial(n, p) - calc_pow_in_factorial(k, p) - calc_pow_in_factorial(n - k, p))
	return ans

The code above generates a set \mathcal P containing all primes up to n. Then, for each p \in\mathcal P  it computes the number of times that p divides n!, k! and (n-k)!. Define \eta(n,p) as the number of times p divides n. It then computes the difference z = \eta(n,p) - \eta(k,p) - \eta(n-k,p) and multiplies the product with p^z. By the fundamental theorem of arithmetic, we have that

{n \choose k} = \prod_{p \in \mathcal{P}}  \big(\eta(n,p) - \eta(k,p) - \eta(n-k,p)\big)

Another way of visualizing it is to write {n \choose k} as

\frac{p_1^{e_1}p_2^{e_2}\cdots}{(p_1^{e'_1}p_2^{e'_2}\cdots) (p_1^{\hat e_1}p_2^{\hat e_2}\cdots)} = \frac{p_1^{e_1}}{p_1^{e'_1}p_1^{\hat e_1}}\cdot\frac{p_2^{e_2}}{p_2^{e'_2}p_2^{\hat e_2}}\cdots = p_1^{e_1 -e'_1 - \hat e_1} \cdot p_2^{e_2 -e'_2 - \hat e_2} \cdots

To count the number of times p occurs in n!, it first computes \lfloor n/p \rfloor, which is the number of times p occurs once in the range [2,n]. Then, it computes \lfloor n/p^2\rfloor and so forth until p^2 \geq n. Adding all cardinalities, we get \sum_i \lfloor n/p^i \rfloor.

The reason why it runs faster than computing \frac{n!}{k! (n-k)!} is that it omitts multiplying with factors of n! that we know will cancel out from division by k!(n-k)!.

Running the code 100 times each, the average time to compute is as follows

Exact approach: 38.8791418076 ms
Non-exact log approach: 15.0316905975 ms

It still seems that the approximate log computation is a bit faster, although the difference is quite small. For excessively large binomial coefficients, we can exploit Stirling’s approximation: \log x! = x \log x.

def log_fac2(m,n):
	return n*log(n)-m*log(m)

def log_binomial2(n,k):
    if k > (n-k):
        return (log_fac2(n-k+1,n)-log_fac2(2,k))
    else:
        return (log_fac2(k+1,n)-log_fac2(2,n-k))

This runs in about 1.18 μs, at the cost of lower precision.

Computing factorial and binomial coefficients modulo prime

A quite efficient way to compute the factorial modulo a prime p can be achieved by using a precomputed table containing a sparse subset of the function image (if you are familiar with the baby-step giant-step algorithm for computing discrete logarithms, these two approaches are very similar). For instance, let the table contain values {0: 1, 10000000: 682498929, 20000000: 491101308, 30000000: 76479948, 40000000 ... }, up to the maximum value (the modulus p). This means that we at most need to compute 10000000 values. If the distance between the points in the table is \sqrt{p}, we can compute the a factorial mod p in time \mathcal{O}(\sqrt{p}). Let us for simplicity assume that p = 1000000007.

Define the factorial function as follows:

pre_computed = {}
sqrt_mod = int(sqrt(mod))
def factorial(n):
    if n >= mod:
        return 0
    f = 1
    for i in reversed(xrange(1, n + 1)):
        if i % sqrt_mod == 0:
            if i in pre_computed:
               f = f * pre_computed[i] % mod
               break
        f = f * i % mod
        if f == 0:
            break
    return f

Then, we compute the table as follows:

def create_precomputed_table():
    for i in range(0, mod, sqrt_mod):
        pre_computed[i] = factorial(i)

create_precomputed_table()

Obviously, we do this only once (it is a precomputation, right?). This precomputation requires \mathcal{O}({p}) time (we have to compute all values in between and \sqrt{p} \cdot \sqrt{p} = p).

Here is some code to compute the modular inverse (you can use basically any implementation):

def extended_gcd(aa, bb):
    lastremainder, remainder = abs(aa), abs(bb)
    x, lastx, y, lasty = 0, 1, 1, 0
    while remainder:
        lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
        x, lastx = lastx - quotient*x, x
        y, lasty = lasty - quotient*y, y
    return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)

def modinv(a, m):
    g, x, y = extended_gcd(a, m)
    if g != 1:
        raise ValueError
    return x % m

Finally, we compute the binomial coefficient as follows

def binomial(n, k):
    if n < k: return 0
    return factorial(n) * modinv(factorial(n - k) * factorial(k), mod) % mod

which also requires \mathcal{O}(\sqrt{p}) time.

Advertisements

6 thoughts on “Quick and dirty way to calculate large binomial coefficients in Python

  1. What about using the recursive formula to avoid calculating the factorials explicitly? Something like

    # def log_binomial(n,k)
    # b = 1
    # for i in range(1,k): b += log(n-(k-i)) – log(i)
    # return b

  2. The same time, but the exact result.
    And it does work for (80000 40000).

    # def pow_binomial(n, k):
    # ….def eratosthenes_simple_numbers(N):
    # ……..yield 2
    # ……..nonsimp = set()
    # ……..for i in range(3, N + 1, 2):
    # …………if i not in nonsimp:
    # …………….nonsimp |= {j for j in range(i * i, N + 1, 2 * i)}
    # …………….yield i
    # ….def calc_pow_in_factorial(a, p):
    # ……..res = 0
    # ……..while a:
    # …………a //= p
    # …………res += a
    # ……..return res
    # ….ans = 1
    # ….for p in eratosthenes_simple_numbers(n+1):
    # ……..ans *= p ** (calc_pow_in_factorial(n, p) – calc_pow_in_factorial(k, p) – calc_pow_in_factorial(n – k, p))
    # ….return ans

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s