Calculating Combinatorials

Posted on May 29, 2009, under general.

A question came up recently, over at RedBrick, about how to efficiently calculate very large combinatorials. A combinatorial of the form:

c = N! / (k! * (N - k)!)

describes how many combinations (c) there are if you pick k items out of N. For example if you pick 5 people at random out of a team of 10, there are 252 potential combinations. A straight forward code implementation of the formula looks like:

1
2
3
4
5
6
7
8
def fact(n, acumulator=1):
    if not n:
        return acumulator
    else:
        return fact(n - 1, acumulator * n)

def combinatorial(n, k):
    return fact(n) / fact(k) / fact(n - k)

But this is very inefficient, each call to factorial is O(N). Surprisingly, google couldn’t find any clear alternatives, so to fix that, here is the more efficient way to calculate it:

1
2
3
4
5
def combinatorial(n, k):
    c = 1
    for i in range(k + 1, n + 1):
        c *= float(i) / (i - k)
    return c

This implementation runs in O(N-K), which is at least 3 times faster than the initial implementation and usually much more so.

Here’s how it works;

First, re-organise the formula as:

c = ((N!) / (k!)) / (N -k)!

Next, it should be obvious that N! / k! is the same thing as N * N – 1 * N – 2 * … k + 1. So we construct our loop to compute that:

1
2
3
c = 1
for i in range(k + 1, n + 1):
    c *= i

but we still need to divide by (N – k)!. We already have a for loop, of exactly that number of iterations, so we can reuse it. (N – K)! is the same thing as SUM(i – k) , as we iterate i. Since division and multiplication are commutative in this context, and the order never matters we place the division directly in-line within the loop. Lastly, since we’re now dividing, we might end up with fractions at intermediary stages, so we use floating point.

1
2
3
c = 1
for i  in range(k, n):
    c *= float(i) / (i - k)

Another great property of these kinds of operations is that are they partitionable, if we have W workers, we can give each (N – K) / W values to compute, and then multiply their respective answers. But I’ll leave that as an exercise for the reader.

9 Replies to "Calculating Combinatorials"

gravatar

JD  on May 29, 2009

I was thinking about combining the division and multiplication when you posted your original solution, but wouldn’t ending up with floats slow down the entire operation?

gravatar

colmmacc  on May 29, 2009

Not in comparison to what’s saved though. In a real implementation, not shortened for a blog post, the multiplication and division will be performed with arbitrary precision arithmetic libraries anyway :-)

Mostly just wanted a correct illustrative snippet people could try on their own machine quickly.

gravatar

JD  on May 29, 2009

And a brief test proves you correct… in python at least :)

It’s a good point that applying a little bit of brain to make calculations easier for a computer is often time well spent.

gravatar

Adam  on May 29, 2009

I was interested by the decision to do the calculation in one loop rather
than two, as this forces you to work in arbitrary precision rationals
rather than integers. My implementation of your algorithm is:

comb :: Rational -> Rational -> Rational
comb n k = prod $ map f [k+1..n]
where f i = i / (i – k)

but doing two loops and dividing the result has the advantage of allowing
you to do integer arithmetic only (at the expense of dealing with larger
numbers).

comb :: Integer -> Integer -> Integer
comb n k = x `div` y
where x = prod [k+1..n]
y = prod $ map (\i -> (i-k)) [k+1..n]

For my quick and unscientific test this is 30% faster doing comb 100000 50000.

gravatar

dwmalone  on June 2, 2009

For largeish numbers, then using Stirling’s formula may give you OK estimates. Largeish may not be as large as you expect too.

(I guess you know that you basically get this by estimating log(n!) with an integral.)

gravatar

Fergal Daly  on December 10, 2009

Throw in a

k = max(k, n – k)

at the beginning to knock 50% off the expected number of loop iterations (assuming you’re calling it multiple times and all values of k are equally likely).

Without it, running this on k=0, …, n would loop 0 + 1 + 2 + … + n = n*(n+1)/2 times.

With it, that’s 2*(0 + 1 + 2 + … + n/2-1 + n/2) = (n/2) * (n/2+1) which is pretty much half of n*(n+1)/2 .

gravatar

Fergal Daly  on December 10, 2009

Also, you can avoid floats while keeping just 1 loop.. Just change

c *= float(i) / (i – k)

to

c *= i
c /= i – k

and you’re done. You won’t get fractions because

comb(n, k) = comb(n, k – 1) * (n – k + 1) / k

(just write it out as factorials and cancel things)

so c in your loop takes the values

comb(n, 0)
comb(n, 1)
comb(n, 2)

comb(n, k)

which are, of course, all integers.

By the way, I think the last code snippet in the main post should be range(k + 1, n + 1).

gravatar

colmmacc  on December 13, 2009

The final result won’t have fractions, but intermediate calculations will – I started the c *= i and it doesn’t pass tests, at least in python :-)

c * i / (i – k)

can clearly result in fractions! There’s no reason k should be a whole divisor of c * i :-) You’re right about the last snippet! stupid typo.

gravatar

Fergal Daly  on December 14, 2009

The cs definitely are all integers. For random integers

c * i / (i – k)

can yield fractions however c is not random. It’s a product of consecutive numbers divided by another product of just as many, smaller consecutive numbers and i and (i – k) are the next numbers in those repective sequences. This is always an integer, in fact it is comb(i, i – k) (I got that wrong in the earlier post, when I said comb(n, i))

Also, I think when you say

There’s no reason k should be a whole divisor of c * i

you are assuming that c*i/(i-k) is whole if and only if k divides i. This is not the case e.g. c=5, i=6, k=2: 5 * 6 / (6 – 2) = 30/4 does not divide even though k does divide i. Also c=5, i=6, k=4: 30/2 = 15 despite the fact that k does not divide i. That kind of thing hold on the numerator e.g. (9 + x)/3 is only whole when 3 divides x.

The comment box ate my attempt to do a bit of maths and post my code, so I posted it over here.

Leave a Comment