Calculating Combinatorials
Posted on May 29, 2009, under general.
A question came up recently, over at RedBrick, about how to efficiently calculate very large combinatorials. A combinatorial of the form:
c = N! / (k! * (N - k)!)
describes how many combinations (c) there are if you pick k items out of N. For example if you pick 5 people at random out of a team of 10, there are 252 potential combinations. A straight forward code implementation of the formula looks like:
1 2 3 4 5 6 7 8 | def fact(n, acumulator=1): if not n: return acumulator else: return fact(n - 1, acumulator * n) def combinatorial(n, k): return fact(n) / fact(k) / fact(n - k) |
But this is very inefficient, each call to factorial is O(N). Surprisingly, google couldn’t find any clear alternatives, so to fix that, here is the more efficient way to calculate it:
1 2 3 4 5 | def combinatorial(n, k): c = 1 for i in range(k + 1, n + 1): c *= float(i) / (i - k) return c |
This implementation runs in O(N-K), which is at least 3 times faster than the initial implementation and usually much more so.
Here’s how it works;
First, re-organise the formula as:
c = ((N!) / (k!)) / (N -k)!
Next, it should be obvious that N! / k! is the same thing as N * N – 1 * N – 2 * … k + 1. So we construct our loop to compute that:
1 2 3 | c = 1 for i in range(k + 1, n + 1): c *= i |
but we still need to divide by (N – k)!. We already have a for loop, of exactly that number of iterations, so we can reuse it. (N – K)! is the same thing as SUM(i – k) , as we iterate i. Since division and multiplication are commutative in this context, and the order never matters we place the division directly in-line within the loop. Lastly, since we’re now dividing, we might end up with fractions at intermediary stages, so we use floating point.
1 2 3 | c = 1 for i in range(k, n): c *= float(i) / (i - k) |
Another great property of these kinds of operations is that are they partitionable, if we have W workers, we can give each (N – K) / W values to compute, and then multiply their respective answers. But I’ll leave that as an exercise for the reader.
Blasphemy in a nutshell
Optimising strlen()
The learner-driver problem
Broken state programming
The Wild Mountain Thyme
Posted on January 5, 2009, under creative commons, music.
The ticking cat
Posted on January 4, 2009, under creative commons, music.
HOWTO: Making an amazing Canvas print with reads
Posted on September 20, 2008, under general, photography.