The learner-driver problem

Posted on February 11, 2009, under general.

That humans find statistics counter-intuitive is not news, there are many problems that have distinctly unsettling results. The Unfinished Game problem and The Monty Hall Problem are two of my favourites, but I thought I’d add another one. It’s not as good at those problems, but it can be insightful.

L driver

The problem

“Government data shows that 20% of qualified drivers are recently-qualified, but that one third of accidents are caused by this same group. How more likely is Bob, a recently-qualified driver to have caused an accident than Alice, a longer-qualified driver?”

On the face of it, this problem appears simple. If the 1 in 5 group is responsible for 1 in 3 accidents, then they are twice as likely to cause an accident as their complement. If this isn’t clear from arithmetic, we can put numbers on it. 99 accidents and 1000 drivers; 200 drivers produce 33 accidents, and 800 cause 66. So the rates are 0.164 accidents per recently-qualified driver, and 0.082 accidents per longer-qualified driver. It might take a minute of thought and maybe a calculator, but no bother, right? Wrong.

The real answer

The naive answer is well .. naive, we can’t make that kind of statement with any validity – because we haven’t really accounted for the fact that a single driver can cause multiple accidents, they are statistically independent. We weren’t comparing probabilities – just abstract rates.

Let’s again suppose that there are 1000 drivers, and 99 accidents. But that of the the 200 recently-qualified drivers only 1 really really bad driver caused all 33 accidents (the rest are more cautious), and that of the 800 longer-qualified drivers, 66 caused one each. The group mean averages are still the same as above, but now the likelihood of Alice or Bob being one of those accident-prone drivers is radically different. Bob has only a 1/200 probability of having caused an accident, and Alice a 66/800.

Now this is the most extreme case, but it’s valid along the continuum. The point is really that without knowing about the distribution of accidents within the groups, we can’t make comparisons between particular members of those groups. In other words; the question is an abuse of statistics.

It’s a great starting off point, because it immediately gets at exactly what a standard deviation really means, it validates very basic group statistics and it gets the right people excited. Most statisticians get it straight away, nearly everyone else doesn’t and it’s memorable.

Even more interestingly, according to my friends in the insurance industry, it turns out that the standard deviation of accidents for recently-qualified drivers really is higher than for the more experienced/jaded. The probable explanation being that there is more of a mix of brazen young nutcases, extremely cautious prudes and people who having qualified, barely then drive at all.

Photo from flickr user tz1_1zt.

3 Replies to "The learner-driver problem"

gravatar

Justin Mason  on February 11, 2009

Nice demo! I’d never thought of it that way.

quick typo fix though: ‘twice as likely to cause an accident than their compliment’; that should be ‘complement’.

gravatar

colmmacc  on February 11, 2009

Thanks! I’ve fixed it, I’m always making that one :-)

gravatar

JibberJim  on February 11, 2009

Also – bad drivers who lose their licence through the courts for dangerous driving or similar, return to the roads as recently qualified, so serial accident causers are actually removed from the longer qualified pool if they create more accidents.

Leave a Comment