Period Pain part 2

Posted on September 27, 2009, under general.

Last week I wrote about problems with periodicity but it was only half of the problem. But before moving on to the second half, it seems like a good time to post with some clarifications.

I wrote that using some locally unique well-distributed value, such as a mac address, was better than choosing a random number once. But crucially, I left out how to do such a thing. A few commenters asked what the best way might be, including some good examples.

To be a bit more rigourous about it, and make sure, the great people at HEAnet provided me with an anonymous list (the prefixes had been stripped) of over 200,000 IPv6 addresses that have used ftp.heanet.ie in the last month. Included in that list were over 150,000 EUI-64 style addresses, which look like this …

2001:880:18:1:214:4fff:fe02:e6ee

the last 4 octets include a slightly modified version of the user’s MAC address. The details are straightforward, but you can take it from me that “214:4fff:fe02:e6ee” corresponds to a MAC address of “00:14:4F:02:E6:EE”, and that the md5sum of that string is “d32227ed9a3bf7d8714590f837884286″.

Mac addresses, and the hash, are both really just numbers. A 48-bit number and a 128-bit number respectively. Bash can handle these kind of numbers natively, and if you need a well-distributed number between say 0 and 999 then the mod operator is perfect:

# Prove that bash can handle even 128-bit numbers
colmmacc@infiltrator (~) $ echo $(( 0xd32227ed9a3bf7d8714590f837884286 ))
8162089295436857990

# Use the MAC address directly to pick a number
colmmacc@infiltrator (~) $ MACADDR=`/sbin/ifconfig | grep HWad \
                               | awk '{print $5 }' | head -1`
colmmacc@infiltrator (~) $ echo $((  16#$MACADDR  % 1000 )) | sed 's/^-//'
174

# Use the md5sum of the MAC address to pick a number
colmmacc@infiltrator (~) $  echo $((  `echo $MACADDR | md5sum |\
                             cut -d\  -f1| sed 's/^/0x/'` \
                             % 1000 )) | sed 's/^-//'
363

As per one of the comments on the previous post, from brady, getting rid of any minus sign (the last sed operation), is a cheap form of abs().

But, which is better; randomness, mac addresses or the md5sums? To get rid of any temporal bias, I’ve graphed the distribution of the above operations for 18,365 real world MAC addresses from one day’s worth of requests to ftp.heanet.ie.

Mac address distributions

MD5Sums come out slightly ahead (stddev of 4.6 compared to 4.81), and essentially performed just as well as how random numbers should do (around a stddev of 4.55). Using the MAC addresses on their own, without md5suming should be good enough for most purposes too.

So why not use a random number? Well two reasons.

  1. It’s harder – you have to store the state somewhere. A mac address on the other hand is already stored state, if you can look it up each time and it will be relatively stable.
  2. A lot of the time automated tasks are being installed at provisioning time – there isn’t actually that much real entropy available, so the randomness either tends to be weak, or you contribute to exhausting entropy and denying it to more useful things.

And lastly, James pointed out that from the point of view of a single host half an hour of jitter doesn’t really matter. He’s dead right – and of course the combined effects do matter for the distributed system – and the next post will be how to exploit that property to get better scheduling.

3 Replies to "Period Pain part 2 "

gravatar

pixelbeat  on September 28, 2009

Note $(( 128to_signed_64bit )) is done by all shells

Also you need to strip the ‘:’. Doing as 1 cmd:

echo $(( 0x$(/sbin/ifconfig | sed -n ‘s/://g;s/.*HWaddr \(.*\)/\1/p;T;q’) % 1000 )) | sed ‘s/^-//’

gravatar

dwmalone  on September 29, 2009

Note that bash chewed your 128 bit number – the result in decimal had fewer digits than the one in hex. I think 0xd32227ed9a3bf7d8714590f837884286 is actually 280644455042589680170892351931054310022.

gravatar

/~colmmacc/ » Period Pain 3  on November 26, 2009

[...] promised, though it’s been a while coming, I wrote that there’d be a followup on scheduling periodic [...]

Leave a Comment