This is a trivial, yet very fast approximation of calculating binomial coefficients is to use the logarithm rules we got from the basic course in calculus. We have that
Therefore,
.
Example: In Python, that is
from math import log from math import exp def log_fac(n): sum = 0 for i in xrange(2,n+1): sum += log(i) return sum def log_binomial(n,k): return (log_fac(n)-log_fac(k)-log_fac(n-k))
To compute for instance , write
print exp(log_binomial(80000,30))
which outputs 4.64170748337e+114
in a matter of milliseconds. We can improve this further. Here’s a code which does a bit less calculation, although the improvement is marginal:
def log_fac(m,n): sum = 0 for i in xrange(m,n+1): sum += log(i) return sum def log_binomial(n,k): if k > (n-k): return (log_fac(n-k+1,n)-log_fac(2,k)) else: return (log_fac(k+1,n)-log_fac(2,n-k))
Sergey Shashkov suggested in the comments the following exact computation of the binomial coefficient:
def pow_binomial(n, k): def eratosthenes_simple_numbers(N): yield 2 nonsimp = set() for i in xrange(3, N + 1, 2): if i not in nonsimp: nonsimp |= {j for j in xrange(i * i, N + 1, 2 * i)} yield i def calc_pow_in_factorial(a, p): res = 0 while a: a //= p res += a return res ans = 1 for p in eratosthenes_simple_numbers(n): ans *= p ** (calc_pow_in_factorial(n, p) - calc_pow_in_factorial(k, p) - calc_pow_in_factorial(n - k, p)) return ans
The code above generates a set containing all primes up to
. Then, for each
it computes the number of times that
divides
,
and
. Define
as the number of times
divides
. It then computes the difference
and multiplies the product with
. By the fundamental theorem of arithmetic, we have that
Another way of visualizing it is to write as
To count the number of times occurs in
, it first computes
, which is the number of times
occurs once in the range
. Then, it computes
and so forth until
. Adding all cardinalities, we get
.
The reason why it runs faster than computing is that it omitts multiplying with factors of
that we know will cancel out from division by
.
Running the code 100 times each, the average time to compute is as follows
Exact approach: 38.8791418076 ms
Non-exact log approach: 15.0316905975 ms
It still seems that the approximate log computation is a bit faster, although the difference is quite small. For excessively large binomial coefficients, we can exploit Stirling’s approximation: .
def log_fac2(m,n): return n*log(n)-m*log(m) def log_binomial2(n,k): if k > (n-k): return (log_fac2(n-k+1,n)-log_fac2(2,k)) else: return (log_fac2(k+1,n)-log_fac2(2,n-k))
This runs in about 1.18 μs
, at the cost of lower precision.
Computing factorial and binomial coefficients modulo prime
A quite efficient way to compute the factorial modulo a prime can be achieved by using a precomputed table containing a sparse subset of the function image (if you are familiar with the baby-step giant-step algorithm for computing discrete logarithms, these two approaches are very similar). For instance, let the table contain values
{0: 1, 10000000: 682498929, 20000000: 491101308, 30000000: 76479948, 40000000 ... }
, up to the maximum value (the modulus ). This means that we at most need to compute 10000000 values. If the distance between the points in the table is
, we can compute the a factorial mod
in time
. Let us for simplicity assume that
.
Define the factorial
function as follows:
pre_computed = {} sqrt_mod = int(sqrt(mod)) def factorial(n): if n >= mod: return 0 f = 1 for i in reversed(xrange(1, n + 1)): if i % sqrt_mod == 0: if i in pre_computed: f = f * pre_computed[i] % mod break f = f * i % mod if f == 0: break return f
Then, we compute the table as follows:
def create_precomputed_table(): for i in range(0, mod, sqrt_mod): pre_computed[i] = factorial(i) create_precomputed_table()
Obviously, we do this only once (it is a precomputation, right?). This precomputation requires time (we have to compute all values in between and
).
Here is some code to compute the modular inverse (you can use basically any implementation):
def extended_gcd(aa, bb): lastremainder, remainder = abs(aa), abs(bb) x, lastx, y, lasty = 0, 1, 1, 0 while remainder: lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder) x, lastx = lastx - quotient*x, x y, lasty = lasty - quotient*y, y return lastremainder, lastx * (-1 if aa < 0 else 1), lasty * (-1 if bb < 0 else 1) def modinv(a, m): g, x, y = extended_gcd(a, m) if g != 1: raise ValueError return x % m
Finally, we compute the binomial coefficient as follows
def binomial(n, k): if n < k: return 0 return factorial(n) * modinv(factorial(n - k) * factorial(k), mod) % mod
which also requires time.
What about using the recursive formula to avoid calculating the factorials explicitly? Something like
# def log_binomial(n,k)
# b = 1
# for i in range(1,k): b += log(n-(k-i)) – log(i)
# return b
Yeah, why not? Nice 🙂
If you replace range by xrange, the script will require less memory
True! I can’t see why you shouldn’t. I didn’t know this until now, thanks!
The same time, but the exact result.
And it does work for (80000 40000).
# def pow_binomial(n, k):
# ….def eratosthenes_simple_numbers(N):
# ……..yield 2
# ……..nonsimp = set()
# ……..for i in range(3, N + 1, 2):
# …………if i not in nonsimp:
# …………….nonsimp |= {j for j in range(i * i, N + 1, 2 * i)}
# …………….yield i
# ….def calc_pow_in_factorial(a, p):
# ……..res = 0
# ……..while a:
# …………a //= p
# …………res += a
# ……..return res
# ….ans = 1
# ….for p in eratosthenes_simple_numbers(n+1):
# ……..ans *= p ** (calc_pow_in_factorial(n, p) – calc_pow_in_factorial(k, p) – calc_pow_in_factorial(n – k, p))
# ….return ans
Yes, this works very well! Thank you, I have added it to the post.