# Quick and dirty way to calculate large binomial coefficients in Python

This is a trivial, yet very fast approximation of calculating binomial coefficients is to use the logarithm rules we got from the basic course in calculus. We have that $n!=n\times (n-1)\times\cdots\times 2\times 1\iff\log n!=\log (n\times(n-1)\times\cdots\times 2 \times 1)$ $\log n+\log (n-1)+\cdots+\log 2+\log 1=\sum_{i=2}^n \log i$

Therefore, $n! = e^{\sum_{i=2}^n \log i}$.

Example: In Python, that is

from math import log
from math import exp

def log_fac(n):
sum = 0
for i in xrange(2,n+1): sum += log(i)
return sum

def log_binomial(n,k): return (log_fac(n)-log_fac(k)-log_fac(n-k))


To compute for instance ${80000 \choose 30}$, write print exp(log_binomial(80000,30)) which outputs 4.64170748337e+114 in a matter of milliseconds. We can improve this further. Here’s a code which does a bit less calculation, although the improvement is marginal:

def log_fac(m,n):
sum = 0
for i in xrange(m,n+1): sum += log(i)
return sum

def log_binomial(n,k):
if k > (n-k):
return (log_fac(n-k+1,n)-log_fac(2,k))
else:
return (log_fac(k+1,n)-log_fac(2,n-k))


Sergey Shashkov suggested in the comments the following exact computation of the binomial coefficient:

def pow_binomial(n, k):
def eratosthenes_simple_numbers(N):
yield 2
nonsimp = set()
for i in xrange(3, N + 1, 2):
if i not in nonsimp:
nonsimp |= {j for j in xrange(i * i, N + 1, 2 * i)}
yield i
def calc_pow_in_factorial(a, p):
res = 0
while a:
a //= p
res += a
return res
ans = 1
for p in eratosthenes_simple_numbers(n):
ans *= p ** (calc_pow_in_factorial(n, p) - calc_pow_in_factorial(k, p) - calc_pow_in_factorial(n - k, p))
return ans


The code above generates a set $\mathcal P$ containing all primes up to $n$. Then, for each $p \in\mathcal P$  it computes the number of times that $p$ divides $n!$, $k!$ and $(n-k)!$. Define $\eta(n,p)$ as the number of times $p$ divides $n$. It then computes the difference $z = \eta(n,p) - \eta(k,p) - \eta(n-k,p)$ and multiplies the product with $p^z$. By the fundamental theorem of arithmetic, we have that ${n \choose k} = \prod_{p \in \mathcal{P}} \big(\eta(n,p) - \eta(k,p) - \eta(n-k,p)\big)$

Another way of visualizing it is to write ${n \choose k}$ as $\frac{p_1^{e_1}p_2^{e_2}\cdots}{(p_1^{e'_1}p_2^{e'_2}\cdots) (p_1^{\hat e_1}p_2^{\hat e_2}\cdots)} = \frac{p_1^{e_1}}{p_1^{e'_1}p_1^{\hat e_1}}\cdot\frac{p_2^{e_2}}{p_2^{e'_2}p_2^{\hat e_2}}\cdots = p_1^{e_1 -e'_1 - \hat e_1} \cdot p_2^{e_2 -e'_2 - \hat e_2} \cdots$

To count the number of times $p$ occurs in $n!$, it first computes $\lfloor n/p \rfloor$, which is the number of times $p$ occurs once in the range $[2,n]$. Then, it computes $\lfloor n/p^2\rfloor$ and so forth until $p^2 \geq n$. Adding all cardinalities, we get $\sum_i \lfloor n/p^i \rfloor$.

The reason why it runs faster than computing $\frac{n!}{k! (n-k)!}$ is that it omitts multiplying with factors of $n!$ that we know will cancel out from division by $k!(n-k)!$.

Running the code 100 times each, the average time to compute is as follows

Exact approach: 38.8791418076 ms Non-exact log approach: 15.0316905975 ms 

It still seems that the approximate log computation is a bit faster, although the difference is quite small. For excessively large binomial coefficients, we can exploit Stirling’s approximation: $\log x! = x \log x$.

def log_fac2(m,n):
return n*log(n)-m*log(m)

def log_binomial2(n,k):
if k > (n-k):
return (log_fac2(n-k+1,n)-log_fac2(2,k))
else:
return (log_fac2(k+1,n)-log_fac2(2,n-k))


This runs in about 1.18 μs, at the cost of lower precision.

Computing factorial and binomial coefficients modulo prime

A quite efficient way to compute the factorial modulo a prime $p$ can be achieved by using a precomputed table containing a sparse subset of the function image (if you are familiar with the baby-step giant-step algorithm for computing discrete logarithms, these two approaches are very similar). For instance, let the table contain values {0: 1, 10000000: 682498929, 20000000: 491101308, 30000000: 76479948, 40000000 ... }, up to the maximum value (the modulus $p$). This means that we at most need to compute 10000000 values. If the distance between the points in the table is $\sqrt{p}$, we can compute the a factorial mod $p$ in time $\mathcal{O}(\sqrt{p})$. Let us for simplicity assume that $p = 1000000007$.

Define the factorial function as follows:

pre_computed = {}
sqrt_mod = int(sqrt(mod))
def factorial(n):
if n >= mod:
return 0
f = 1
for i in reversed(xrange(1, n + 1)):
if i % sqrt_mod == 0:
if i in pre_computed:
f = f * pre_computed[i] % mod
break
f = f * i % mod
if f == 0:
break
return f


Then, we compute the table as follows:

def create_precomputed_table():
for i in range(0, mod, sqrt_mod):
pre_computed[i] = factorial(i)

create_precomputed_table()


Obviously, we do this only once (it is a precomputation, right?). This precomputation requires $\mathcal{O}({p})$ time (we have to compute all values in between and $\sqrt{p} \cdot \sqrt{p} = p$).

Here is some code to compute the modular inverse (you can use basically any implementation):

def extended_gcd(aa, bb):
lastremainder, remainder = abs(aa), abs(bb)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1)

def modinv(a, m):
g, x, y = extended_gcd(a, m)
if g != 1:
raise ValueError
return x % m


Finally, we compute the binomial coefficient as follows

def binomial(n, k):
if n < k: return 0
return factorial(n) * modinv(factorial(n - k) * factorial(k), mod) % mod


which also requires $\mathcal{O}(\sqrt{p})$ time.

## 6 thoughts on “Quick and dirty way to calculate large binomial coefficients in Python”

1. Charlie says:

What about using the recursive formula to avoid calculating the factorials explicitly? Something like

# def log_binomial(n,k)
# b = 1
# for i in range(1,k): b += log(n-(k-i)) – log(i)
# return b

1. Carl Löndahl says:

Yeah, why not? Nice 🙂

2. diegocaro says:

If you replace range by xrange, the script will require less memory

1. Carl Löndahl says:

True! I can’t see why you shouldn’t. I didn’t know this until now, thanks!

3. Sergey Shashkov says:

The same time, but the exact result.
And it does work for (80000 40000).

# def pow_binomial(n, k):
# ….def eratosthenes_simple_numbers(N):
# ……..yield 2
# ……..nonsimp = set()
# ……..for i in range(3, N + 1, 2):
# …………if i not in nonsimp:
# …………….nonsimp |= {j for j in range(i * i, N + 1, 2 * i)}
# …………….yield i
# ….def calc_pow_in_factorial(a, p):
# ……..res = 0
# ……..while a:
# …………a //= p
# …………res += a
# ……..return res
# ….ans = 1
# ….for p in eratosthenes_simple_numbers(n+1):
# ……..ans *= p ** (calc_pow_in_factorial(n, p) – calc_pow_in_factorial(k, p) – calc_pow_in_factorial(n – k, p))
# ….return ans

1. Carl Löndahl says:

Yes, this works very well! Thank you, I have added it to the post.