Calculating Combinatorials

Posted on May 29, 2009, under general.

A question came up recently, over at RedBrick, about how to efficiently calculate very large combinatorials. A combinatorial of the form:

c = N! / (k! * (N - k)!)

describes how many combinations (c) there are if you pick k items out of N. For example if you pick 5 people at random out of a team of 10, there are 252 potential combinations. A straight forward code implementation of the formula looks like:

1
2
3
4
5
6
7
8
def fact(n, acumulator=1):
    if not n:
        return acumulator
    else:
        return fact(n - 1, acumulator * n)

def combinatorial(n, k):
    return fact(n) / fact(k) / fact(n - k)

But this is very inefficient, each call to factorial is O(N). Surprisingly, google couldn’t find any clear alternatives, so to fix that, here is the more efficient way to calculate it:

1
2
3
4
5
def combinatorial(n, k):
    c = 1
    for i in range(k + 1, n + 1):
        c *= float(i) / (i - k)
    return c

This implementation runs in O(N-K), which is at least 3 times faster than the initial implementation and usually much more so.

Here’s how it works;

First, re-organise the formula as:

c = ((N!) / (k!)) / (N -k)!

Next, it should be obvious that N! / k! is the same thing as N * N – 1 * N – 2 * … k + 1. So we construct our loop to compute that:

1
2
3
c = 1
for i in range(k + 1, n + 1):
    c *= i

but we still need to divide by (N – k)!. We already have a for loop, of exactly that number of iterations, so we can reuse it. (N – K)! is the same thing as SUM(i – k) , as we iterate i. Since division and multiplication are commutative in this context, and the order never matters we place the division directly in-line within the loop. Lastly, since we’re now dividing, we might end up with fractions at intermediary stages, so we use floating point.

1
2
3
c = 1
for i  in range(k, n):
    c *= float(i) / (i - k)

Another great property of these kinds of operations is that are they partitionable, if we have W workers, we can give each (N – K) / W values to compute, and then multiply their respective answers. But I’ll leave that as an exercise for the reader.

Blasphemy in a nutshell

Posted on May 17, 2009, under general, humour.

Optimising strlen()

Posted on March 1, 2009, under coding, general.

The learner-driver problem

Posted on February 11, 2009, under general.

Broken state programming

Posted on February 1, 2009, under general.

The Wild Mountain Thyme

Posted on January 5, 2009, under creative commons, music.

The ticking cat

Posted on January 4, 2009, under creative commons, music.

HOWTO: Making an amazing Canvas print with reads

Posted on September 20, 2008, under general, photography.

New place, new job!

Posted on September 1, 2008, under general.

Apachecon US registration now open!

Posted on August 3, 2008, under apache, general.