Calculating Combinatorials
Posted on May 29, 2009, under general.
A question came up recently, over at RedBrick, about how to efficiently calculate very large combinatorials. A combinatorial of the form:
c = N! / (k! * (N - k)!)
describes how many combinations (c) there are if you pick k items out of N. For example if you pick 5 people at random out of a team of 10, there are 252 potential combinations. A straight forward code implementation of the formula looks like:
1 2 3 4 5 6 7 8 | def fact(n, acumulator=1): if not n: return acumulator else: return fact(n - 1, acumulator * n) def combinatorial(n, k): return fact(n) / fact(k) / fact(n - k) |
But this is very inefficient, each call to factorial is O(N). Surprisingly, google couldn’t find any clear alternatives, so to fix that, here is the more efficient way to calculate it:
1 2 3 4 5 | def combinatorial(n, k): c = 1 for i in range(k + 1, n + 1): c *= float(i) / (i - k) return c |
This implementation runs in O(N-K), which is at least 3 times faster than the initial implementation and usually much more so.
Here’s how it works;
First, re-organise the formula as:
c = ((N!) / (k!)) / (N -k)!
Next, it should be obvious that N! / k! is the same thing as N * N – 1 * N – 2 * … k + 1. So we construct our loop to compute that:
1 2 3 | c = 1 for i in range(k + 1, n + 1): c *= i |
but we still need to divide by (N – k)!. We already have a for loop, of exactly that number of iterations, so we can reuse it. (N – K)! is the same thing as SUM(i – k) , as we iterate i. Since division and multiplication are commutative in this context, and the order never matters we place the division directly in-line within the loop. Lastly, since we’re now dividing, we might end up with fractions at intermediary stages, so we use floating point.
1 2 3 | c = 1 for i in range(k, n): c *= float(i) / (i - k) |
Another great property of these kinds of operations is that are they partitionable, if we have W workers, we can give each (N – K) / W values to compute, and then multiply their respective answers. But I’ll leave that as an exercise for the reader.
9 Replies to "Calculating Combinatorials"
colmmacc on May 29, 2009
Not in comparison to what’s saved though. In a real implementation, not shortened for a blog post, the multiplication and division will be performed with arbitrary precision arithmetic libraries anyway :-)
Mostly just wanted a correct illustrative snippet people could try on their own machine quickly.
JD on May 29, 2009
And a brief test proves you correct… in python at least :)
It’s a good point that applying a little bit of brain to make calculations easier for a computer is often time well spent.
Adam on May 29, 2009
I was interested by the decision to do the calculation in one loop rather
than two, as this forces you to work in arbitrary precision rationals
rather than integers. My implementation of your algorithm is:
comb :: Rational -> Rational -> Rational
comb n k = prod $ map f [k+1..n]
where f i = i / (i – k)
but doing two loops and dividing the result has the advantage of allowing
you to do integer arithmetic only (at the expense of dealing with larger
numbers).
comb :: Integer -> Integer -> Integer
comb n k = x `div` y
where x = prod [k+1..n]
y = prod $ map (\i -> (i-k)) [k+1..n]
For my quick and unscientific test this is 30% faster doing comb 100000 50000.
dwmalone on June 2, 2009
For largeish numbers, then using Stirling’s formula may give you OK estimates. Largeish may not be as large as you expect too.
(I guess you know that you basically get this by estimating log(n!) with an integral.)
Fergal Daly on December 10, 2009
Throw in a
k = max(k, n – k)
at the beginning to knock 50% off the expected number of loop iterations (assuming you’re calling it multiple times and all values of k are equally likely).
Without it, running this on k=0, …, n would loop 0 + 1 + 2 + … + n = n*(n+1)/2 times.
With it, that’s 2*(0 + 1 + 2 + … + n/2-1 + n/2) = (n/2) * (n/2+1) which is pretty much half of n*(n+1)/2 .
Fergal Daly on December 10, 2009
Also, you can avoid floats while keeping just 1 loop.. Just change
c *= float(i) / (i – k)
to
c *= i
c /= i – k
and you’re done. You won’t get fractions because
comb(n, k) = comb(n, k – 1) * (n – k + 1) / k
(just write it out as factorials and cancel things)
so c in your loop takes the values
comb(n, 0)
comb(n, 1)
comb(n, 2)
…
comb(n, k)
which are, of course, all integers.
By the way, I think the last code snippet in the main post should be range(k + 1, n + 1).
colmmacc on December 13, 2009
The final result won’t have fractions, but intermediate calculations will – I started the c *= i and it doesn’t pass tests, at least in python :-)
c * i / (i – k)
can clearly result in fractions! There’s no reason k should be a whole divisor of c * i :-) You’re right about the last snippet! stupid typo.
Fergal Daly on December 14, 2009
The cs definitely are all integers. For random integers
c * i / (i – k)
can yield fractions however c is not random. It’s a product of consecutive numbers divided by another product of just as many, smaller consecutive numbers and i and (i – k) are the next numbers in those repective sequences. This is always an integer, in fact it is comb(i, i – k) (I got that wrong in the earlier post, when I said comb(n, i))
Also, I think when you say
There’s no reason k should be a whole divisor of c * i
you are assuming that c*i/(i-k) is whole if and only if k divides i. This is not the case e.g. c=5, i=6, k=2: 5 * 6 / (6 – 2) = 30/4 does not divide even though k does divide i. Also c=5, i=6, k=4: 30/2 = 15 despite the fact that k does not divide i. That kind of thing hold on the numerator e.g. (9 + x)/3 is only whole when 3 divides x.
The comment box ate my attempt to do a bit of maths and post my code, so I posted it over here.
JD on May 29, 2009
I was thinking about combining the division and multiplication when you posted your original solution, but wouldn’t ending up with floats slow down the entire operation?