Archive for September, 2009
Period Pain part 2
Posted on September 27, 2009, under general.
Last week I wrote about problems with periodicity but it was only half of the problem. But before moving on to the second half, it seems like a good time to post with some clarifications.
I wrote that using some locally unique well-distributed value, such as a mac address, was better than choosing a random number once. But crucially, I left out how to do such a thing. A few commenters asked what the best way might be, including some good examples.
To be a bit more rigourous about it, and make sure, the great people at HEAnet provided me with an anonymous list (the prefixes had been stripped) of over 200,000 IPv6 addresses that have used ftp.heanet.ie in the last month. Included in that list were over 150,000 EUI-64 style addresses, which look like this …
2001:880:18:1:214:4fff:fe02:e6ee
the last 4 octets include a slightly modified version of the user’s MAC address. The details are straightforward, but you can take it from me that “214:4fff:fe02:e6ee” corresponds to a MAC address of “00:14:4F:02:E6:EE”, and that the md5sum of that string is “d32227ed9a3bf7d8714590f837884286″.
Mac addresses, and the hash, are both really just numbers. A 48-bit number and a 128-bit number respectively. Bash can handle these kind of numbers natively, and if you need a well-distributed number between say 0 and 999 then the mod operator is perfect:
# Prove that bash can handle even 128-bit numbers
colmmacc@infiltrator (~) $ echo $(( 0xd32227ed9a3bf7d8714590f837884286 ))
8162089295436857990
# Use the MAC address directly to pick a number
colmmacc@infiltrator (~) $ MACADDR=`/sbin/ifconfig | grep HWad \
| awk '{print $5 }' | head -1`
colmmacc@infiltrator (~) $ echo $(( 16#$MACADDR % 1000 )) | sed 's/^-//'
174
# Use the md5sum of the MAC address to pick a number
colmmacc@infiltrator (~) $ echo $(( `echo $MACADDR | md5sum |\
cut -d\ -f1| sed 's/^/0x/'` \
% 1000 )) | sed 's/^-//'
363
As per one of the comments on the previous post, from brady, getting rid of any minus sign (the last sed operation), is a cheap form of abs().
But, which is better; randomness, mac addresses or the md5sums? To get rid of any temporal bias, I’ve graphed the distribution of the above operations for 18,365 real world MAC addresses from one day’s worth of requests to ftp.heanet.ie.
MD5Sums come out slightly ahead (stddev of 4.6 compared to 4.81), and essentially performed just as well as how random numbers should do (around a stddev of 4.55). Using the MAC addresses on their own, without md5suming should be good enough for most purposes too.
So why not use a random number? Well two reasons.
- It’s harder – you have to store the state somewhere. A mac address on the other hand is already stored state, if you can look it up each time and it will be relatively stable.
- A lot of the time automated tasks are being installed at provisioning time – there isn’t actually that much real entropy available, so the randomness either tends to be weak, or you contribute to exhausting entropy and denying it to more useful things.
And lastly, James pointed out that from the point of view of a single host half an hour of jitter doesn’t really matter. He’s dead right – and of course the combined effects do matter for the distributed system – and the next post will be how to exploit that property to get better scheduling.
Period Pain
Imagine it was your job – along with 1,000 other people – to pick a number between 1 and 60. You can use any method you like (though you must use the same one), but if more than 30 of you choose the the same number, those of you who did would be shot. Would you let the group pick numbers at random?
Probably not, there’s always a chance it could go horribly wrong. And that chance? We could derive the correct p-value for having to shoot people easily enough from the uniform distribution, but forget that, let’s do it with code. It’s a lot easier to understand – and it’s a good habit too.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import random count = 0 # Simulate 10,000 runs of 1,000 people # picking a number between 0 and 59. for runs in range(10000): numbers = [0] * 60 for people in range(1000): r = random.randint(0, 59) numbers[ r ] += 1 if sorted(numbers)[-1] >= 30: count += 1 print count/10000.0 |
If you run this, hopefully you’ll get an answer that’s around “0.1″. In other words, about 10% of the time we expect to have at least one number being chosen by 30 or more people. Those don’t seem like great odds, and I wouldn’t play Russian roulette with lives like that.
Yet this is almost exactly what we do with a lot of automated tasks in some large-scale distributed systems. A pattern than can be observed over and over again is that someone writes a scheduled task that runs once an hour, day, month or whatever but then sleeps a random amount of time (typically between 0 and 3600 seconds) when it starts- in a very naive attempt to distribute impact across the larger system evenly.

The impacts can be pretty serious – it might be extended load on a dependent service, or it might simply be too many busy nodes in a cluster. Or it might be a two-day global outage of a popular Voip service. Too many things happening at once is usually a bad thing in distributed systems.
Security and anti-virus updates, apt/yum/ports repositories and auditing tools in particular seem to get this pattern wrong. Here’s a good example, from Ubuntu’s daily apt script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | # sleep for a random intervall of time (default 30min) # (some code taken from cron-apt, thanks) random_sleep() { RandomSleep=1800 eval $(apt-config shell RandomSleep APT::Periodic::RandomSleep) if [ $RandomSleep -eq 0 ]; then return fi if [ -z "$RANDOM" ] ; then # A fix for shells that do not have this bash feature. RANDOM=$(dd if=/dev/urandom count=1 2> /dev/null | cksum | cut -c"1-5") fi TIME=$(($RANDOM % $RandomSleep)) sleep $TIME } |
This is a very bad pattern – I’d go so far as to say that it’s actually worse than letting everything happen at the same time in the first place. It has 2 particularly dangerous qualities;
-
It increases the effective period of the task
If a task is running once an hour – it’s probably because we need to do something once an hour. That may seem tautological, but there’s subtlety. The clock-time hours we define as humans are arbitrary, by once an hour we should really mean “at least once in a 60 minute interval”, not “once between 1PM and 2PM”.
If we pick a random sleep time, we might end up running just under two hours apart. If at 1PM we pick 0 sleep seconds, and then at 2PM two we pick 3599 sleep seconds – look, we just ran two real hours apart!. Unsurprisingly the converse can happen, and we’ll run just 1 second apart, doing heavens knows what to load along the way.
-
The system as a whole has to cope with dangerous load spikes
Using our earlier example, If we allocated the numbers evenly, each value would get chosen only 16 or 17 times. We could plan for an impact of say 20 running at once. But as we’ve seen, if we pick random numbers every time, then 1 in every 10 runs, we’re going to have to cope with an impact of 30. That’s 50% more load, because of a one line error!
If this task is running every hour, then about 1 in every 7 weeks, we’re going to have to deal with an impact of 40 or more. And it will appear totally out of the blue, it’s a random occurrence. Nice! To take it to the extreme, there is a small – but finite – chance all all 1,000 systems choosing the same value. But avoiding this is why we are spreading the load in the first place, so why leave it to chance?
I used to run a busy Ubuntu mirror, and every day between 6:25 and 6:55 we’d see a gigantic wedge of load that could be distributed a lot more evenly. Though I think the worst of these problems have now been fixed.
The optimal fix for the problem is simple; coordinate – use a central allocator, or a gossip protocol, to ensure that every slot has at most N/M consumers.
This isn’t always possible though – Open Source security updates are usually polled from relatively dumb mirror servers, that don’t keep enough state to be able to divide load like this. But there are still two better approaches.
Firstly, you can pick a random number just once, and then reuse the same random number every time. This guarantees that the task does actually run once an hour, and it makes load predictable for the distributed system as a whole.
Secondly, you can use any uniformly distributed unique piece of data, local to the node, to choose your number. I like to use an MD5 hash of the MAC address, but even the hostname would do.
Probability-wise, the two approaches are identical, but in the real world you get fewer collisions with the latter – probably due to the similar state of entropy identical systems will be in when you mass boot a fleet of several hundred. In both cases, where aberrations emerge due to bad luck, they at least only have to be identified and fixed once. We’re no longer playing the load lottery once an hour.
But this is only half the story … what about other automated tasks with their own periods … how do we avoid colliding with them? And how do we architect timings in distributed systems in general to make them a lot easier to debug. That’s what the next blog post will be about. Stay tuned.
N+1
I rarely write introspective or meta blog posts, in fact I rarely even use the word “I” on this blog (one of the habits you should develop as a manager or team-member is to use the word “we” almost all of the time) … so I hope you’ll forgive this brief, obnoxious, self-centered, round-up of the last year. I’ll try to smatter it with some useful observations to make up for it.
Today it’s one year since I started working for Amazon.com, and officially moved in to my new place and hence back to Dublin. Before that, I’d been spending most of my time in Amsterdam and traveling. By just September last year, I’d managed 78 flights in 2008. In the year since, it’s been a much more comfortable, and relatively conservative, 24. Some visits to Seattle, New York, New Orleans, Barcelona, Bristol and London keeping me occupied and worldly.

I’ve learned a lot in the last year, a lot more and in very different directions, than I expected. Amazon turns out to be even more interesting than I had anticipated, and getting thrown in at the deep end early on – with a new Amazon Web Service to help build and support – has been an amazing experience. Before I started, I naively thought that scale was mostly about understanding distributed systems problems and performant designs. It is about those things, but not mostly.
In reality (well, in my opinion) the hard problems of scale are the incredible attention to detail and testing it requires, because when your rapidly-changing code is handling billions of requests – even a tiny fault that is triggered on one in every million requests will get you paged in no time at all. I’m not sure that there is any other way to learn it than having to support it for real. I recommend it.
I’m getting a chance to work on stuff I enjoy, building some big things and making some important and critical code go really really fast. It’s always nice to feel that features exist – that the universe is different in some way – because of work you’ve been involved in. That should always be the measure of progress; “How has the universe been changed?”, everything else is meta-work.
I am learning technical things at Amazon; when I started I had never written a line of perl, now I’ve written entire perl frameworks, a very basic perl compiler, a bytecode analyzer, perform low-level code-reviews, and teach perl once a week. The experience has reinforced my opinion that the notion of “knowing programming language X” is itself a broken anti-pattern.
That said, Perl was comparatively difficult to learn. Coming to it with about 15 years programming experience in many languages, it still took about 5 weeks to become proficient in it to the point that I really understood all of the magic symbols, operators and patterns in front of me at a fundamental level. At a guess, it took just 2 weeks to reach the same level of proficiency in python.
But that learning is self-driven, the real learning experience at Amazon is how things are organised and managed. How a huge multinational can structure and orientate itself such that things can happen incredibly efficiently and quickly is fascinating to observe and participate in. It’s like getting a free MBA. I can recommend that too.
On the college front – it’s been a strange year, as it was also my final year doing my BSc. in Computer Science in Trinity College, Dublin. It was tough going, mostly just to stick through it, but I think worth it in the end. Somehow came out with a first class honours degree. I’ve decided not to progress with a part-time postgrad just yet – it seems like a good time to try not doing so much college for a change, but maybe next year.
In the last year, I’ve learned two new instruments, banjo – for the fun of it – to a level where I can now keep up in sessions – and piano – to a lesser level but I can now arrange and play relatively intricate pieces. My place has worked out very well, and a year on I’m still very happy to be living here, it’s ideal. Ups and downs in my personal life in the last year have been extreme, but I’ve learned a lot from those too and am mostly the better for it.
But now that I have some more free time, I have to admit I’m not entirely sure what to do with it. I’m finding things to do with my evenings, and catching up with friends properly, but still have itches to be a bit more productive. I haven’t done as much Apache stuff as I’d have liked in the last 2 years, but the urge to write a webserver from scratch using a more functional programming oriented approach (though not necessarily a lamba-calculus derived language) is strong … and also pointless.
This past year also saw the final, conclusive, victory of sense in the Irish E-voting debacle. In short; we were correct all along, and the system has been completely abandoned. I have to admit it was a bit fun to be able to gloat about it on the radio.
So now … what next for my free time … well maybe you can help. To further compound the impression of arrogance and self-obsession I have no doubt created, it’s like this; I’m pretty clever. I’ve got above average maths, analytical and linguistics skills. I’m an expert programmer/developer and builder of things. I’m a fairly decent musician and photographer with some basic sense of style, design and composition.
Oh god there’s more! I’m politically knowledgeable, and I know how to manipulate the political and press systems and strategise (with a proven track record through two lobby groups). I’ve worked for two start-ups, and I know how to make things happen. I’m a quick learner, and I know that when something is worthwhile, realistic and interesting I can throw a huge amount of effort and pragmatism right at it. I like doing cool things for free, and earn just enough to be able to.
Now all that said, to moderate things I’m also basically an introvert and can be pretty awkward and quiet around too many unfamiliar people, am about the last person you’d ever want to take to a bar (I don’t drink for a start), know close to nothing about popular culture and have read maybe only 4 or 5 fiction books in the last 10 years. So there’s a strong counter-argument that I’m also an illiterate anti-social bore to be kept in mind here too. But thankfully, not a terrible one.
But with all of that in mind, what does need doing? Especially in Ireland/Dublin – because it would be nice to involve meeting people and getting better at that whole social thing. Any ideas? the bigger the better. Any technical, political, or social gaps that really need filling? anybody need some help? What would you, or indeed, jesus do? There are a few ideas knocking around already, I’ll be sure to update when they are more concrete. How should we change the universe?