Fylde Alexander Guitar

Posted on August 18, 2009, under general, music.

Almost 6 months ago now, after a lot of trialling, some borrowing, and a serious think I order a new guitar. This time, it’s a Fylde Alexander guitar. It’s hand made, and now that it’s arrived – absolutely gorgeous and a joy to play.

Fylde Alexander Guitar

First things first though, I’ve managed to figure out how to work the H2 recorder to get decent sound … have a listen, these two tunes are a slow waltz-like “Heritage Close” and a hornpipe I can never remember the name of;


Direct MP3 link.

The tuning is DADGAD, played low to get a feel for the bass. As an experiment, I’ve made a video .. the sound quality isn’t as good, but you can get an idea of the kind of pull-offs and ornamentation the guitar enables, maybe this will get across just how much easier it is to play.

I also have one other recording – done through the pickup, of a jig in E-minor that I also can’t remember the name of – it gives a good sense of how great the pickup sounds.


Direct MP3 link.

I ordered the guitar through Monastery Music. If you’re ever thinking about getting a hand-made folk instrument, I’d give Fylde a serious look. Very very happy with it so far.

Hurling, the Musical

Posted on July 18, 2009, under general, humour.

Now that Riverdance is coming to a close, after 15 years as a raving success, we need another kitch Irish musical to uplift our times and kickstart an economic recovery. So, as an attempt, I give you “Hurling; The musical”, set to pipes, harps and the beat of a merry Irish heart.

Act 1:

When the curtains raise, it starts on a misty morning, in the time of Romantic Ireland – chieftains and warrior poets abound. Our hero, Setanta, is in an epic athletic struggle. It’s him against 5 other players, each armed with sticks. A ball is thrown around, and the movement of play is in the style of a interpretive dance half way between West Side Story dance-fighting and the well-oiled movements of a samurai. Gradually it becomes clear to the audience that the aim isn’t just to hit each other with the sticks (though that’s encouraged) but to get the ball past them. The programme will probably have some patronising comparisons to Ice Hockey in it anyway.

Insert a song about how wonderful it is to be young and carefree. Blow by blow, Setanta takes two players out with a skillful hit of the ball, skips around the other three, and pucks the ball along a guide-wire above the audience – where it explodes in a mini dazzle of green and sulfur.

Cut to Setanta’s uncle, Conor, leader of the brave and valiant red knights. He and the red knights are having a piss-up at Culain’s. This calls for nothing less than an over-the-top homo-erotic “Oh how great are we, we bunch of fighting men” number, immediately followed by a drinking song. Disgusted with their own drunkenness, Conor decides to call upon this nephew Setanta – the goody two-shoes with an ancient pioneer pin, to set an example. We might have an awful joke about using the “serf message service” to tell him, but only on the Broadway run.

So back to Setanta – singing a traveling song – as he skips along, on his way to his uncle. Oh how great it is to be young on carefree. But suddenly the mood changes, when he arrives to the castle, his idiot drunk of an uncle has forgotten to ask Culain to keep the hound in. And this is no ordinary hound .. this is a hound that’s represented by 3 expert dancers, chinese dragon style. More fight dancing, some explosive and aggressive tapping of feet, and then a face off. Cornered, battle-scarred and weary, our hero takes his hurl and pucks a ball straight into the hounds mouth. The hound died with a huge groan. The music reaches an epic climax, the pipes roar.

The Red Knights come out, now sobered up, Culain falls to his knees on the sight of his dead hound. There might be space for a “Man’s best friend” lament. Culain explains to Setanta that he now has a debt of honour, and that he should guard the castle in the hounds place. Setanta, young and carefree, doesn’t show much interest at first. Then, from behind the guards, emerges Caoimhe – gorgeous locks of long curly red hair and a shapely fit body that says “I dance a musical 10 times a week”. Spotlight on Setanta, who then sings a moving love-at-first-sight, oh how great it is to be young and carefree, ballad. Never mind honour – there’s a woman to be impressed – Setanta signs up, and we end act 1 with some displays and dancing as Setanta – now Cú Chullain – basically acts as a bouncer. “sorry, you can’t come in dressed like that – no tunic”.

Act 2:

It’s modern Ireland, to provide context the background might feature some unfinished building sites and an eight lane motorway that has a 60km speed limit, it’s up to the set dresser. To the same music as the opening of act 1, we come across our hero playing a game of hurling. Again, it’s one against 5. But now, the player is not setanta, but a modern Camógaí player – Caoimhe. This time she takes out three players, and gives the other two the run around, and of course sings a song about how great it is to be young and carefree. She seems even better than Setanta was, mostly it’s a wire-acrobatics kit that’s letting her jump higher.

Watching from the sidelines is Setanta, now the local hurling captain, and he’s enthralled. Once Caoimhe is done trashing the 5 players, he pleads with her – would she like to go out? He asks her to come see him play in the local final on Sunday. She looks torn, and spotlit – she sings to us about how she’d love to say yes, but just can’t. She let’s him down, and says she’s sorry, she can’t make it. Setanta feels the gentle hand of a put down, and walks off in a bit of a mood, he sings a song about this happening every week, but that he’ll persevere.

Caoimhe, now joined by her Camógaí team mates sings another song about how much she really likes Setanta, but hates the fact that they can’t play together – on the same team. It’s a song that is simultaneously full of sexual innuendo – all about playing together breathlessly – and yet speaks to the importance of being on the same team and that raising a family is the implicit reason for any healthy Irish relationship. She wants to get it on, but would also like to give birth to an entire team of patriotic hurlers.

So Caoimhe reveals her secret – big surprise, she’s been disguising herself as a guy – and playing midfield on the senior mens team, alongside Setanta, for months. It’s a musical, it’s ok for it to make no sense that he never recognised her in all of that time. She laments with her Camógaí team-mates about how unfair it all is, but gets some girl-power reinforcement and validation from the chorus. Sings about how crazy it is that Ireland is so modern in so many ways, but that the spectre of sexism is still there on the field of hurling.

And then, to our climactic scene – the Sunday game. It’s a tight one, it’s the last 5 minutes and it’s an even score. Insert tension here. Setanta and Caoimhe both miss a few chances. But at the last minute, after some epic dance-fighting, Setanta passes the ball to Caoimhe, who scores the winning point. Setanta and Caoimhe embrace in what he thinks is a brotherly hug of fellowship.

But of course, Caoimhe removes her helmet and reveals the curly red locks. The secret is out. At first, Setanta is shocked (think “you finally made a monkey out of me”) but quickly gets over it – oh how great it is to be young and carefree. End with a number that ramps up both the sexual innuendo and happy thoughts of Catholic family.

Fin.

Liberally sprinkle in some jokes about teams going on strike, questions about players getting paid, some post-modern recession references and a dancing at the Crossroads jig in the middle, and I think there’s winning formula potential.

Photo courtesy of Eoin Campbell and is CC licensed.

Methodist Medicine Religion

Posted on July 14, 2009, under general.

In a very short period of time, the president is going to sign into law Ireland’s new blasphemy legislation – making offending the superstitious worthy of a fine. Forget all of the offence this itself is causing, we should see it as an opportunity. So, I’d like to announce a new religion; the Methodist Medicine Religion (MMR, for short).

Broadly speaking, we fully agree with the methods of modern evidence-based medicine – but for pragmatic reasons – completely disagree with the supposed mechanisms of efficacy. Biochemical explanations are in fact a sleight of hand maintained by God, to test our faith in the truth; energy fields and planetary motions.

Despite the plain evidence, and the obviously benign intentions of God, that MMR is effective and safe, there is a disturbing trend of naysayers who cast doubt upon our religion. This dubious form of speech is manifestly a danger that deserves to be regulated. Such doubts cause real disease, sterility and death, but naturally our objection is merely on the grounds that it offends us, a more serious problem. Who’s with me?

(Image courtesy of wardabamby.

Calculating Combinatorials

Posted on May 29, 2009, under general.

A question came up recently, over at RedBrick, about how to efficiently calculate very large combinatorials. A combinatorial of the form:

c = N! / (k! * (N - k)!)

describes how many combinations (c) there are if you pick k items out of N. For example if you pick 5 people at random out of a team of 10, there are 252 potential combinations. A straight forward code implementation of the formula looks like:

1
2
3
4
5
6
7
8
def fact(n, acumulator=1):
    if not n:
        return acumulator
    else:
        return fact(n - 1, acumulator * n)

def combinatorial(n, k):
    return fact(n) / fact(k) / fact(n - k)

But this is very inefficient, each call to factorial is O(N). Surprisingly, google couldn’t find any clear alternatives, so to fix that, here is the more efficient way to calculate it:

1
2
3
4
5
def combinatorial(n, k):
    c = 1
    for i in range(k + 1, n + 1):
        c *= float(i) / (i - k)
    return c

This implementation runs in O(N-K), which is at least 3 times faster than the initial implementation and usually much more so.

Here’s how it works;

First, re-organise the formula as:

c = ((N!) / (k!)) / (N -k)!

Next, it should be obvious that N! / k! is the same thing as N * N – 1 * N – 2 * … k + 1. So we construct our loop to compute that:

1
2
3
c = 1
for i in range(k + 1, n + 1):
    c *= i

but we still need to divide by (N – k)!. We already have a for loop, of exactly that number of iterations, so we can reuse it. (N – K)! is the same thing as SUM(i – k) , as we iterate i. Since division and multiplication are commutative in this context, and the order never matters we place the division directly in-line within the loop. Lastly, since we’re now dividing, we might end up with fractions at intermediary stages, so we use floating point.

1
2
3
c = 1
for i  in range(k, n):
    c *= float(i) / (i - k)

Another great property of these kinds of operations is that are they partitionable, if we have W workers, we can give each (N – K) / W values to compute, and then multiply their respective answers. But I’ll leave that as an exercise for the reader.

Blasphemy in a nutshell

Posted on May 17, 2009, under general, humour.

For reasons best understood by our Minister for Justice (who seems to be going it alone on this one) Ireland may shortly introduce a new offence of blasphemous libel. The Irish blogosphere, and twitter are both alight with incandescent disapproval.

I’m not entirely sure what to make of the proposal. I think it’s an amazingly dumb idea, not only for the straight-forward civil rights reasons, but also because I can’t see the courts or anybody else implementing it in the real world. It is political tokenism in its most bare, stupidest, form.

Normally, I’m not one to set out to offend people … live and let live, I say. But faced with only limited amount of time to safely go on the record on the topic, here’s my own summary of roughly where the major beliefs (that I’ve read enough about to have an opinion on) are in relation to each other:

Religions compared

Don’t take it too seriously … it’s just an approximate summary from my own personal musings. Though it’s not arbitrary, I can back it all up.

I do happen to think that it’s a lot more likely that aliens exist than, say … angels. Pantheism is less contradictory than atheism, because at least the former can ascribe the existence of the universe to “Um, magic” with a straight face. The Abrahamic religions naturally get progressively more insane, as they inherit myths and superstitions from each other. And Quakers, well, they just make the nicest cereals. Mormonism? see the Golden Plates.

Saying that a religion or belief is “bad” or “good” is nonsense. Christianity and Islam, for example, are brilliant wonderful positive movements for good. They encourage great values and positive work. But at the same time, they can be a force for harm. Major branches discourage life-saving devices like condoms and deny many rights to women. Hence the spread between both good, and evil. I’ve trended them more towards evil only because of the dark ages and the stifling effects on the progress of mankind.

Disclaimer:

Though it shouldn’t need to be said; the above is mainly humour. The categories are waaaaaay too broad. The comparison completely ignores any implicit value faith may have (for its own sake) and the scales are unscientific. Sanity or insanity shouldn’t be inferred upon any actual practitioners (Except maybe Richard Dawkins, who might be both non-contradictory and a little insane), these are just big rough averages.

If you’re happy with your beliefs, than I am happy for you. If you support the idea that blasphemy should be illegal, get a life.

Optimising strlen()

Posted on March 1, 2009, under coding, general.

Optimisation has a bit of a bad rep these days, good advice such as Hoare’s dictum that “Premature optimization is the root of all evil”, has led to a stern outlook on adding obfuscated mess to gain efficiency. Sometimes going really really fast just makes you really really insane.

Economically, the convenience of developers has won out over efficiency. Very high-level languages are near ubiquitous. Very few people have to squeeze implementations into so many CPU operations, or such and such an amount of memory. On the contrary, click-dragging many millions of CPU operations and many millions of bytes of memory in the form of some library, form-control or icon in an even more inefficient IDE is the norm.

But as others have put far better, advocating the complete absence of optimisation is advocating a fallacy. Focused, intelligent, and crucially – well documented – optimisation is a very important skill that can save lots of time and money. To give three concrete examples;

1. Deriving the most optimal buffer sizes

At HEAnet when we were scaling ftp.heanet.ie to cope with over 50,000 concurrent downloads from a single host, we ran a lot of experiments to figure out the most optimal read buffer sizes.

This is very rarely done, most buffers are chosen arbitrarily, 4k and 8k being common for various reasons. Our experiments showed that the optimal buffer size was actually around 40k, and when we made this change within Apache we measured a 25% capacity improvement.

2. Using integers to compare strings

The objective of a contract I was involved in some years ago was to analyse a lot of HTTP data, from a load-balancer, to try and determine some statistics about it. Our input was billions of requests and getting the reports as quickly as possible was important (it fed into a health-check system, with a rolling average).

One of the key stages of the process was the very start of the request, because we would branch on the type of request (GET, HEAD, CONNECT, PUT, POST … etc). A branch here was bad for two reasons, it interfered with pipe-lining and the CPU caches, which is especially inefficient when most requests were GETs anyway.

Re-ordering the branches such that the GET case was a “fall-through” (that it didn’t involve a jump) helped a little, but there was still some inefficiency going on. The string comparison itself seemed to be wasting some of the L1 cache on us.

So, as a crazy solution, we treat the first 4 bytes of the request as an integer – and use integer comparison. On x86, “GET” == 5522759, “HEAD” == 1145128264 and so on. The first four bytes just happen to be unique in HTTP methods, and the check can happen in the CPU registers directly without having to deference pointers.

The app got about twice as fast, but this was probably due to some other variable now fitting in the L1 cache. Plenty of explanatory comments in the code made this an acceptable, if still crazy, optimisation to make. It also taught everyone involved exactly what SIGBUS really is.

3. Replacing inefficient calls

Sometimes the simplest optimisations are the most effective. At my current job, we managed to speed one of our external services up by about 20% just by replacing the top 3 calls that showed up in gprof with more standard implementations that are implemented in x86 assembly.

This is classic optimisation, run a profiler and go for a targeted attack, but it can still be amazing what it can achieve.

Unfortunately one of the problems with optimisation is that it’s hard. Being able to profile something is relatively straightforward and repeatable, but even the basic step of knowing that your application is slow is non-trivial. Sure there might be a high-level SLA, but you can always try throwing hardware at a problem.

Knowing that something is slow, or large, involves being able to make educated estimates about implementations, from first principals. Optimisers can approach a problem and think “well we’re accessing X much data, and performing about Y basic CPU operations, the fundamental limit is probably around Z”. That’s just step 1.

The other key part is that opimisers tend to have a deep, intuitive, understanding of systems at every level. They know not just a programming language, but the intricacies of how it is implemented. They’ll know how memory managers, file-systems, compilers, CPUs, networks and all sorts of things in-between actually work. This takes years, and quite a few “ahhhh” moments at 3AM in the middle of some nightmarish failure scenario.

strlen()

So where to start? One good place is to look is standard libraries, they contain many implementations of very basic routines that are called so often that they have to be optimal. Additionally, the really common routines are implemented in nearly every language, and you can look the trade-offs made within each. It’s a good way to learn a lot. One great case-study is strlen().

Before we go on, let’s get some things defined. For the most part here, we’re going to look at C style strings. A quick refresher; in C, strings are just a 0 terminated array of bytes (char is the C type for a character). They look like this;

1
2
3
4
5
6
7
8
/* A C string initialised as an array */
char hello[] = { 'h', 'e', 'l', 'l', 'o', 0 };

/* The same C string */
char hello[] = "hello";

/* A pointer to a C string */
char * hello = "hello";

strlen(), as the name suggests, returns the length of a string;

1
2
3
4
5
6
7
/**
 * Return the length of a string.
 *
 * @param str The string to measure
 * @return    The string's length
 */

size_t strlen(const char * str);

Some important things; strlen(“hello”) returns 5, not 6. The zero at the end doesn’t count as part of the string. Zero is the only thing that defines the end of a string, all sorts of unprintable and control characters can be inside a string.

When someone is learning to program, a typical first attempt at a strlen implementation will be something like this:

Method 1: an iterative for-loop

1
2
3
4
5
6
7
8
size_t strlen(const char * str)
{
    size_t len;

    for (len = 0; str[len]; len++);

    return len;
}

This method is easy to understand, and read, but it’s not very good. It runs in time O(n) and involves a lot of jumps.

The next thing programmers usually realise is that they don’t have to use array indices, that pointers are enough.

Method 2: sacrifice a variable and readability

1
2
3
4
5
6
7
8
size_t strlen(const char * str)
{
   char * ptr;

   for (ptr = str; *ptr; ++ptr);
   
   return ptr - str;
}

Depending on how clever the compiler is, this code may be slightly faster, because there won’t be so many additions to the base pointer. We increment the pointer only, and use the difference between it and the base-pointer as the length. This code is less readable though, and probably counts as premature optimisation.

This method is still O(n) and still involves a lot of jumps. Let’s see what we can do about that next.

Method 3: partially unroll the loop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
size_t strlen(const char * str)
{
   char * ptr = str;

   while(1)
   {
         if(!*(ptr++)) break;
         if(!*(ptr++)) break;
         if(!*(ptr++)) break;
         if(!*(ptr++)) break;
    }

    return (ptr - 1) - str;
}

Now we’re into serious unreadability territory, and by the way, the above is how djb implements strlen. But we have actually gained some efficiency, now for every jump operation that the loop creates, we test for the 0 value 4 times. The choice of 4 times is arbitrary here, but an interesting exercise would be to vary this number and test each possibility.

More tests will interfere with pipelines, but there will be less jumping. Who knows where the sweet spot is. Still, though this implementation is faster, it’s still O(n). Unfortunately in C, we’re doomed to an O(n) implementation, best case, but we’re still not done … we can do something about the very size of n.

Method 4: word-wise checks

Just like the example I gave earlier, where we used integers to compare 4 bytes of a string all at once, we can do something even cleverer with strlen. We can construct a test, that with one small exception, can determine if any byte within a long is set to zero. This works for 2, 4 and 8 byte word-sizes.

It works by setting up a bit pattern like “01111110 11111110 11111110 11111111″, and then we add an arbitrary word to it. Anything except 0 in any of the bytes should cause an overflow into the neighbouring bytes. So by performing an addition, and then testing the “hole” bits, we can tell that there was likely a zero. There is one case where we’ll get a false-positive, which can happen when bit 31 is set – but that’s easy to check for.

It’s too long to reproduce here, but you can see the entire glibc implementation as a great example.

With this method we have divided our problem space, N is now N/4 or N/8, and the checks happen much more quickly. Though with really small strings, this method may actually be worse. The reason for that is that word-wise checks also have to be word-aligned. So if the string starts on a mis-aligned byte we may have to scroll up to 3-characters (for 32-bit words, it’s 7-characters for 64-bits) to become word-aligned.

Additionally when we do find a zero, we still have to perform 4 tests to figure out exactly which byte it was. If you had a lot of mis-aligned 7 byte strings, this method would be highly sub-optimal.

Method 5: Outsource the problem

In the many years war between RISC and “kitchen sink” architectures, the latter appears to have won. x86 is king, and that means we usually have plenty of instructions at our disposal. Two such instruction are “repnz” (repeat while non zero), and “scasb” (scan string) that combined allow to instruct the CPU to go find the next zero in a range of memory, using whatever tricks it likes.

The CPU can implement any of the approaches we’ve already talked about, but much more is possible. Modern memory, and SRAM in particular, has been designed to make parallelised searches possible. The chips can be electrically probing many sequences of bits all at once, and feed into a tree of gates that makes sure the earlier wins – O(ln) , in hardware. This is highly optimal, and if you’re using strlen() on an x86 host right now, that’s probably what’s happening.

But what else can we do? What if we make some trade-offs about the very nature of C strings?

Method 6: Add a cache

C is rightly criticised for how it handles strings, it’s pretty dumb, and C buffer overflows remain a major source of security problems. There really isn’t any excuse. Treating strings as arbitrary regions of zero-terminated memory has one main advantage; it allows strings to be seperated in-place. One can take “Hello World” and make it “Hello” and “World” with the very simple insertion of a zero byte.

But strings are not separated a whole lot, and it tends to be a one-time operation. If we throw away this “feature”, there is a much more optimal way to get string length – just keep recording it in a cache.

1
2
3
4
5
6
7
8
9
typedef struct {
    char * str;
    size_t len;
} string;

size_t strlen(const string * s)
{
    return s.len;
}

Of course, we needn’t modify the API here, instead of creating a new type, we could just keep a static index of pointer values (say a hashMap) and record the lengths there. In fact, we could even have a compiler do this for us by defining the return value of the function as const.

But we do need to enforce the API. Now everything that deals with strings has to update this cache, and we can’t permit the programmer to mess with the string internally. Our string functions are the only valid way.

This is actually the norm. There are quite a few APIs for doing this in C, and it’s also what djb does in his code. Almost all modern languages take this approach, and have an internal record of every string length in their symbol table. Finally, we have a real O(1) solution.

What next?

But it doesn’t end there. Every solution above had trade-offs, and there are plenty of other ways of doing it.

Some text processors, for example, are designed to accommodate lots of string concatenation, so strings are optimistically allocated much more memory than they need, with lots of zeroes at the end. For these strings, if we know how much memory is allocated, we can implement strlen as a word-wise binary search, which will be O(ln).

Some XML processors, for another example, know that some very high percentage of the time that a closing tag will be exactly the same length as the name of the opening tag + 3 (‘< ', '/' and '>‘) – and take that as a best guess, and skip straight to that length and see if they find what the expect.

Other times, optimisers perform statistical analysis on their inputs and find out where the peaks are, and optimise the tests to go for those paths first. There really is an endless series of possibilities, and this is for something as simple as getting the length of a string. Underneath every little problem is a world of opportunity and a wealth of material from very clever people. (The Varnish source code, and the Architect Notes are also a great source of inspiration).

A few years ago, at Joost, we came across exactly this problem. An application that was spending about 10% of its time inside strlen. Within an hour or so of rewriting some primitives, it was down to less than a hundredth of a percent. You can’t drag and drop that kind of improvement in an IDE.

The learner-driver problem

Posted on February 11, 2009, under general.

That humans find statistics counter-intuitive is not news, there are many problems that have distinctly unsettling results. The Unfinished Game problem and The Monty Hall Problem are two of my favourites, but I thought I’d add another one. It’s not as good at those problems, but it can be insightful.

L driver

The problem

“Government data shows that 20% of qualified drivers are recently-qualified, but that one third of accidents are caused by this same group. How more likely is Bob, a recently-qualified driver to have caused an accident than Alice, a longer-qualified driver?”

On the face of it, this problem appears simple. If the 1 in 5 group is responsible for 1 in 3 accidents, then they are twice as likely to cause an accident as their complement. If this isn’t clear from arithmetic, we can put numbers on it. 99 accidents and 1000 drivers; 200 drivers produce 33 accidents, and 800 cause 66. So the rates are 0.164 accidents per recently-qualified driver, and 0.082 accidents per longer-qualified driver. It might take a minute of thought and maybe a calculator, but no bother, right? Wrong.

The real answer

The naive answer is well .. naive, we can’t make that kind of statement with any validity – because we haven’t really accounted for the fact that a single driver can cause multiple accidents, they are statistically independent. We weren’t comparing probabilities – just abstract rates.

Let’s again suppose that there are 1000 drivers, and 99 accidents. But that of the the 200 recently-qualified drivers only 1 really really bad driver caused all 33 accidents (the rest are more cautious), and that of the 800 longer-qualified drivers, 66 caused one each. The group mean averages are still the same as above, but now the likelihood of Alice or Bob being one of those accident-prone drivers is radically different. Bob has only a 1/200 probability of having caused an accident, and Alice a 66/800.

Now this is the most extreme case, but it’s valid along the continuum. The point is really that without knowing about the distribution of accidents within the groups, we can’t make comparisons between particular members of those groups. In other words; the question is an abuse of statistics.

It’s a great starting off point, because it immediately gets at exactly what a standard deviation really means, it validates very basic group statistics and it gets the right people excited. Most statisticians get it straight away, nearly everyone else doesn’t and it’s memorable.

Even more interestingly, according to my friends in the insurance industry, it turns out that the standard deviation of accidents for recently-qualified drivers really is higher than for the more experienced/jaded. The probable explanation being that there is more of a mix of brazen young nutcases, extremely cautious prudes and people who having qualified, barely then drive at all.

Photo from flickr user tz1_1zt.

Broken state programming

Posted on February 1, 2009, under general.

Cory Doctorow has shared some of his thoughts on how to write productively, and daily, in the modern age of distractions and interruptions. Cory was our keynote speaker at Apachecon San Diego, and he was kind enough to give myself and JM a great deal of advice and contacts when we met up with him, as well as some very cool business cards. He’s been friendly every since, too.

Fixing

But what struck me most about Cory, apart from his friendliness and generosity was that he appeared to have an attention-span that’s divided into micro-slices, with a cosmic ability to schedule between them. That’s a compliment, he was doing many things at once, and still giving each more coverage than should be reasonable, and certainly wasn’t ignoring anyone. “Wow, how does this guy write so much?” must occur to a lot of people that meet Cory.

A few months later when Suw Charman, who shared an office with Cory, came over to visit, she seemed all the more awed by Cory’s productivity for the proximity. But after reading his essay, I think one of this tricks is key, and through coincidence or common influence I’ve been using that same trick to avoid programming stalls.

Procrastination and Paranoia.

I don’t quite know how, but over the last few years I seem to have become a professional software developer. I certainly don’t mind, I quite like programming, and enjoy making things go fast. But nobody ever sat me down and taught me what it is to be a programmer. I’ve read plenty, but opinions differ, and there are many common memes that don’t suit.

waiting

I don’t think that programmers need zen-like concentration-zone bubbles in order to get things done, life is a series of interruptions, and when stuff breaks – or somebody calls – or whatever, you have to attend to it. I’ve had a personal office, and I hated it. I’ve worked from home, and it’s not so great. I hate isolation, and it doesn’t make me work any better. A pair of headphones is bubble enough for me, and please interrupt any time – you’re more important than something that can be restarted later. For some reason or another, I don’t find it hard to re-start.

What I do find hard is to start in the first place. Sometimes it’s procrastination, I don’t know whether it’s out of laziness (I’ve worked 120 hour weeks though, so there is some cause for doubt), or whether it’s a daunting sense of imperfection or pointlessness (“how I can start on something that won’t be perfect?”) or what. But it’s the hardest part. And I don’t just mean starting on day 1, actually that’s not so bad, but starting on every new feature, or even every day’s work.

There were three common fixes; 1.) for no explicable reason at all, I would find myself in a mood to start. 2.) I would get sufficiently paranoid about everyone thinking I was a complete lazy failure that I start something, in sort of a half-panic state, 3.) there’d be some insane deadline or sufficient external pressure that getting it done was imperative. That was that, “glad I’m not a real developer” I’d think to myself.

Leave your code in a broken state.

While thinking out the Joel on software “every developer needs an office” principle a few months ago – I was trying to get to the root of why I didn’t feel it was general. “I don’t seem to have much problem re-starting from interruptions” …. hmmm …. and I asked myself “Why?” and the answer; “Well, duh, I haven’t finished what I’m doing, it doesn’t work, and it needs to be fixed, it’s obvious”.

Fixing a broken bike

And like a lightbulb that has, no doubt, gone off for you a lot more quickly than it originally took me, I thought “well if that makes it easy to restart things, why don’t I use it to start things too?”. So instead of an obsessive-compulsive drive to at-least leave the office with my code compiling cleanly, making

1
make

exit with zero, or whatever other neat little unit of “fixedness” there is for the project, I now feel better if I leave it broken.

The more broken the better. If I make any kind of a late-day rush it’s to half-add something; a broken function stub that at least reminds me what I had in mind, or a bunch of debug

1
printf

s (shush you, I use real debuggers plenty) that simply could never be left in a release. All the better if it stops the dev build from even running.

Then the next day, or the next feature comes; there it is, a starting point, glaring imperfection that has to be fixed. And once you’ve fixed that, well the editor is already open, and you might as well move on to the very next thing. I find it a lot easier. Of course it requires care, you have to really break it properly, not just add a FIXME that you may forget about, and you never check in broken code to the real branch, but for me at least – it has worked, well I feel better at least.

Ok, so all of this is probably on page 4 of about a hundred basic time-management books I haven’t read, and page 1 of “how to be a competent developer”, but nobody every taught it to me … so I thought I’d share; try leaving your code broken, and see if it helps you start any more quickly.

Update:
Based on comments and offline feedback, I can already see at least two camps emerging. These are caricatures, but they establish the range of the spectrum.

For some developers, programming is also an art form. This camp holds to notions that, say something like OO design can be elegant and artful and has an aesthetic. Or that the choices made between the limited number of ways in which you can in fact code something are akin to poetry. This camp hates distraction, but in a cruel twist of fate is also prone to distracting ideological arguments about programming.

The other end of the scale, which I’m closer to, sees code as a mere tool for the job, with the primary focus being on doing a task well, quickly and efficiently. Like a good wrench, it’s allowed to get oily, as long as that helps to do the task better. I suspect people closer to this end of the spectrum cope with interruption more easily.

Another trend in the comments is that what really makes the difference when it comes to procrastination is interest. If you’re really interested in the task, you’ll be motivated. This may be true, but I also think it’s possible to make nearly any programming task interesting. If the underlying objective is boring, just write it in a new way, or better than it’s ever been done before, or try using a new language. Even if it’s something really boring and humdrum, try automating it, or write new analyser checks for it, and make that the real task … and so on.

The Wild Mountain Thyme

Posted on January 5, 2009, under creative commons, music.

The Wild Mountain Thyme is a well-known folk song from Scotland that I’ve been playing for a few years now, with various singers. It’s always been one of my favourite songs, it’s got a wonderfully simple melody, so I’ve recorded it.

MP3.

That’s me singing, and the arrangement is my own, which I’ve tried to make more interesting than the usual version.

The guitar and vocal were recorded simultaneously, with a dodgy USB attachment that won’t let me record two channels without it being a stereo pair, which is why the vocal is in the left channel. The bouzouki part was recorded later (it’s what you can hear in the right channel). The guitar is tuned DADGAD, the Bouzouki is in ADAD, and the song is in F major.

The ticking cat

Posted on January 4, 2009, under creative commons, music.

I’ve been playing with Garageband again, and trying to get more music recorded, with some success. First up, is a new tune, called “The Ticking Cat”, a nice (hopefully) simple reel.

MP3.

This time the melody was played first, on a banjo (which I’m steadily learning, so excuse the poor playing), and accompaniment recorded over – on a DADGAD guitar. The tune is in A-major, and it’s named after a funky-looking metronome.