This is a trivial, yet very fast approximation of calculating binomial coefficients is to use the logarithm rules we got from the basic course in calculus. We have that

Therefore,

.

**Example**: In Python, that is

from math import log from math import exp def log_fac(n): sum = 0 for i in xrange(2,n+1): sum += log(i) return sum def log_binomial(n,k): return (log_fac(n)-log_fac(k)-log_fac(n-k))

To compute for instance , write `print exp(log_binomial(80000,30))`

which outputs `4.64170748337e+114`

in a matter of milliseconds. We can improve this further. Here’s a code which does a bit less calculation, although the improvement is marginal:

def log_fac(m,n): sum = 0 for i in xrange(m,n+1): sum += log(i) return sum def log_binomial(n,k): if k > (n-k): return (log_fac(n-k+1,n)-log_fac(2,k)) else: return (log_fac(k+1,n)-log_fac(2,n-k))

**Sergey Shashkov **suggested in the comments the following exact computation of the binomial coefficient:

def pow_binomial(n, k): def eratosthenes_simple_numbers(N): yield 2 nonsimp = set() for i in xrange(3, N + 1, 2): if i not in nonsimp: nonsimp |= {j for j in xrange(i * i, N + 1, 2 * i)} yield i def calc_pow_in_factorial(a, p): res = 0 while a: a //= p res += a return res ans = 1 for p in eratosthenes_simple_numbers(n): ans *= p ** (calc_pow_in_factorial(n, p) - calc_pow_in_factorial(k, p) - calc_pow_in_factorial(n - k, p)) return ans

The code above generates a set containing all primes up to . Then, for each it computes the number of times that divides , and . Define as the number of times divides . It then computes the difference and multiplies the product with . By the *fundamental theorem of arithmetic*, we have that

Another way of visualizing it is to write as

To count the number of times occurs in , it first computes , which is the number of times occurs once in the range . Then, it computes and so forth until . Adding all cardinalities, we get .

The reason why it runs faster than computing is that it omitts multiplying with factors of that we know will cancel out from division by .

Running the code 100 times each, the average time to compute is as follows

`Exact approach: 38.8791418076 ms`

Non-exact log approach: 15.0316905975 ms

It still seems that the approximate log computation is a bit faster, although the difference is quite small. For excessively large binomial coefficients, we can exploit Stirling’s approximation: .

def log_fac2(m,n): return n*log(n)-m*log(m) def log_binomial2(n,k): if k > (n-k): return (log_fac2(n-k+1,n)-log_fac2(2,k)) else: return (log_fac2(k+1,n)-log_fac2(2,n-k))

This runs in about `1.18 μs`

, at the cost of lower precision.

**Computing factorial and binomial coefficients modulo prime**

A quite efficient way to compute the factorial modulo a prime can be achieved by using a precomputed table containing a sparse subset of the function image (if you are familiar with the baby-step giant-step algorithm for computing discrete logarithms, these two approaches are very similar). For instance, let the table contain values `{0: 1, 10000000: 682498929, 20000000: 491101308, 30000000: 76479948, 40000000 ... }`

, up to the maximum value (the modulus ). This means that we at most need to compute 10000000 values. If the distance between the points in the table is , we can compute the a factorial mod in time . Let us for simplicity assume that .

Define the `factorial`

function as follows:

pre_computed = {} sqrt_mod = int(sqrt(mod)) def factorial(n): if n >= mod: return 0 f = 1 for i in reversed(xrange(1, n + 1)): if i % sqrt_mod == 0: if i in pre_computed: f = f * pre_computed[i] % mod break f = f * i % mod if f == 0: break return f

Then, we compute the table as follows:

def create_precomputed_table(): for i in range(0, mod, sqrt_mod): pre_computed[i] = factorial(i) create_precomputed_table()

Obviously, we do this only once (it is a precomputation, right?). This precomputation requires time (we have to compute all values in between and ).

Here is some code to compute the modular inverse (you can use basically any implementation):

def extended_gcd(aa, bb): lastremainder, remainder = abs(aa), abs(bb) x, lastx, y, lasty = 0, 1, 1, 0 while remainder: lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder) x, lastx = lastx - quotient*x, x y, lasty = lasty - quotient*y, y return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1) def modinv(a, m): g, x, y = extended_gcd(a, m) if g != 1: raise ValueError return x % m

Finally, we compute the binomial coefficient as follows

def binomial(n, k): if n < k: return 0 return factorial(n) * modinv(factorial(n - k) * factorial(k), mod) % mod

which also requires time.

What about using the recursive formula to avoid calculating the factorials explicitly? Something like

# def log_binomial(n,k)

# b = 1

# for i in range(1,k): b += log(n-(k-i)) – log(i)

# return b

Yeah, why not? Nice 🙂

If you replace range by xrange, the script will require less memory

True! I can’t see why you shouldn’t. I didn’t know this until now, thanks!

The same time, but the exact result.

And it does work for (80000 40000).

# def pow_binomial(n, k):

# ….def eratosthenes_simple_numbers(N):

# ……..yield 2

# ……..nonsimp = set()

# ……..for i in range(3, N + 1, 2):

# …………if i not in nonsimp:

# …………….nonsimp |= {j for j in range(i * i, N + 1, 2 * i)}

# …………….yield i

# ….def calc_pow_in_factorial(a, p):

# ……..res = 0

# ……..while a:

# …………a //= p

# …………res += a

# ……..return res

# ….ans = 1

# ….for p in eratosthenes_simple_numbers(n+1):

# ……..ans *= p ** (calc_pow_in_factorial(n, p) – calc_pow_in_factorial(k, p) – calc_pow_in_factorial(n – k, p))

# ….return ans

Yes, this works very well! Thank you, I have added it to the post.