MSS: December 2008

Wednesday, 31 December 2008

Ep 95: Merry Christmas from Mr Science!

Reindeer and Santa Claus are the topic of this week's Mr Science Show. With Christmas here, we thought we'd look at some Christmas news, and this week we take a look at reindeer facts and the problems Santa is having at the North Pole. Due to global warming, and the global financial crisis, Santa has had to put his North Pole residence up for auction and is currently looking for a new place in Lapland.

Merry Christmas from the Mr Science Show!

Listen to his podcast here:

And remember to tell us your science highlights from 2008 to go into the running for some sciencey prizes. Answers will also contribute to our year-in-review podcast early in 2009. Let us know here.

Tuesday, 30 December 2008

The curse of the duck

Cricket fans love their stats. Even the most casual follower can rattle off the batting averages of their favourite players or tell you how many wickets such-and-such a bowler took in the last test. The most passionate followers can recite each scorecard from this year's Wisden.

The recent news of the great Indian batsman Sachin Tendulkar surpassing West Indian Brian Lara's record number of test runs has given maths-loving cricket geeks another opportunity to pull out their calculators and Excel spreadsheets. I'm openly one of these nuts and did just that.

At the time of writing, Tendulkar had scored 12,027 runs across 247 innings, to overtake Lara's 11,953 from 232 innings. After a little investigation, I found that despite his outstanding average of over 54 runs per innings, Tendulkar's most common score in test cricket is ... zero!

This was quite a shock — the most prolific run-scorer in test cricket has been out for nought (a duck in cricket parlance) 14 times, well ahead of his second most common score — which incidentally is the next lowest you can get: one!

This is completely counter-intuitive, so I took this investigation further. Australian cricketer Sir Donald Bradman is universally regarded as the best batsman ever to have played the game. His average, an astounding 99.94, is so far above every other batsman in the history of the game that he is often acclaimed as not only the best cricketer ever, but the best player ever of any sport. His average is so iconic in Australia that the postcode of the ABC (the Australian version of the BBC) is 9994 in every capital city. If it wasn't for the fact that much more test cricket is played nowadays than in the early 1900s, and for World War II interrupting his career for six years, Bradman would have scored many more than the 6996 runs he did score.

So, guess what Bradman's most common score was?

That's right, zero!

Indeed, looking at every innings by the most prolific batsmen in test history from Tendulkar at number 1 to Bradman at number 34, the most common score is zero — and by quite a long way too. The following figures show the distribution of scores from these top batsmen — on the horizontal axis you see the number of runs and the vertical axis measures the frequency of dismissals at a particular number of runs. The first chart shows every score between 0 and 100, and the second uses five-run wide bins to show scores up to 250. The data only include scores where the batsman was dismissed and so does not include not-out scores.

Scores plotted against dismissal frequency.

Scores in bins of five plotted against dismissal frequency.

Model cricket

A closer look at these distributions shows that they very closely fit what is known as an exponential distribution. An exponential distribution has the form

$y=\lambda e^{-\lambda x}.$

In this case y is the probability of being dismissed at score x with λ being constant.
A common trick when looking at distributions involving exponentials is to take logarithms of both sides to get

$ln(y) = ln(\lambda ) - \lambda x.$

The graph of this function, plotting ln(y) against x, is now a straight line with slope -λ. If the statistical data fits the exponential distribution, then the plot of the logarithm of the frequency of dismissals against the score at which dismissal happened should look roughly like a straight line.

A straight line fitted to the data. The blue dots represent observed data and the black line represents the model.

A straight line fitted to the data from the second chart above. The blue dots represent observed data and the black line represents the model.

There is a very strong straight line fit in both charts. Using a standard technique called least-squares regression, we can find the straight line that best fits the data. We can determine λ from the coefficient of x in the equation of this line, and in our case this gives λ equal to 0.023.

The mean of an exponential distribution, a sort of average, is 1/λ. In our case this gives a mean of around 43 - the same as we observe in the raw data. One can make the interesting observation that there is no such thing as the "nervous nineties": players do not "choke" and get out in the 90s, nervous before scoring a glorious test century, any more than they get out at any other score. Indeed, you could argue the opposite given the probability troughs at 94, 98 and in the 190s. You can also see that the probability of being dismissed for a duck is higher than you might expect for an exponential distribution.

So what?

Now, so far you might be thinking that all of this is only of passing statistical interest. So what if cricket scores follow an exponential distribution? Well, I'm glad you asked!
Let’s turn for a second to a different distribution, the geometric distribution. You will be familiar with this distribution from a simple 50/50 coin toss. The geometric distribution describes the number of coin tosses you need before a head (or tail) first turns up. The probability of your first head turning up on your kth toss is described as

$Prob(first\; \; head \; \; on \; \; kth \; \; toss) = (1-p)^{k-1}p,$

where p is the probability of a head turning up on each toss, that is, 0.5. The distribution is memory-less, which is one of its key descriptors. No matter what has gone before, even if you have fluked 100 tails in a row, the probability of a head turning up on the 101st throw is still p.

The geometric distribution only works for integer values of k, that is, you can only throw a coin 2, 3, 100 etc times and not 2.5 times. The exponential distribution is the continuous equivalent of this distribution, extending it to work for all numbers, not just integers. Given that cricket batting scores seem to fit a exponential distribution, this means that we can picture cricket batting scores on a geometric distribution with the probability of you being dismissed at score as

$Prob(dismissed\; \; at\; \; score\; \; k) = (1-p)^ kp.$

Can you spot the profound result here?

Remembering that the geometric distribution is memory-less, you can interpret this as saying that no matter what score you are currently on, you have the same chance p of getting out on that score as you do on any other score! Like a coin toss, the probability of you being dismissed on each score does not depend on what has gone before. A model which assumes that there is no memory is known as a constant hazard model.

This seems to go against every cricketing manual I have ever read. Accepted cricketing wisdom says that a batsman is more dangerous when (s)he "has the eye in" and has scored 10 or 20 runs. Our result seems to suggest that, apart from when a batsman is on 0, you have just as much chance of dismissing him or her on the current score as on any other score.

The next question to ask is, what is the probability of dismissing a batsman on the current score (that is, what is p in the above equation)? The mean of a geometric distribution is

$mean = \frac{1-p}{p}.$

Knowing that the mean of the exponential distribution is 1/λ, and transferring this to the geometric distribution, we get

$p = \frac{\lambda }{\lambda + 1}.$

For λ = 0.023 this gives p = 0.022. Therefore, if you were to turn the television on now and find the cricket coverage, the chance that the batsman you are watching gets out on the current score is 2.2%.

Scores near zero

The biggest deviation from the geometric distribution is for scores near zero. According to our data, the chance of being dismissed for a duck is 6.9% — around 3 times more than expected for a geometric (or exponential) distribution. But by the time the batsman has scored two or three runs, the geometric distribution starts to fit well. There is a small peak at four runs, perhaps because you can relatively easily get to four before you become comfortable — it only takes one streaky shot to the boundary. Whilst you can get to three with one shot, you are more likely to have played a few shots and so may be comparatively more "set".

The data and the geometric distribution. The blue dots represent observed data and the black line represents the model.

An analysis of scores near zero has been completed by Brendon J. Brewer from the University of New South Wales in Getting your eye in: A Bayesian analysis of early dismissals in cricket. Brewer indeed found that batsmen are more vulnerable at the beginning of their innings.

By assuming a constant hazard model, Brewer determined the effective average of a batsman before they have scored — that is, assuming a constant hazard model with probability p of dismissal equal to that of their chance of being dismissed for a duck, Brewer determined the mean of this new distribution.

In our data from the best batsmen of all time, dismissal for a duck occurred with a 6.9% chance. The mean of a geometric distribution built around this probability is

$\frac{1-0.069}{0.069} = 13.5.$

This means that even though our batsmen have a mean of about 43, before they've scored they bat like cricketers with a mean of 13.5. Even the best batsmen bat like tail-enders before they get off the mark!

Conclusions

What should we take away from this analysis?

The conclusion seems to be that there is a very small window in the beginning of a batsman's innings in which there is a greater chance of dismissal than there ordinarily is. This makes sense — batsmen take some time to acclimatise to the game conditions. But this is a small window — once the batsman has scored about three runs, you have the same chance of dismissal whatever the current score. Interestingly, tiredness does not seem to play a part — the exponential distribution holds well out to 250 runs (quite a few hours of batting).

It should be remembered that this analysis was completed on the top 34 run scorers of all time (5953 innings) and so represents the best ever batsmen. Lesser batsmen are likely to get low scores, so perhaps this window is slightly wider for them. But if we turn to the greatest of the great, Bradman, the window is essentially one run. His effective average before he had scored was a very mediocre nine runs. After he had scored two runs, this effective average had risen to 69. You had to get Bradman out very early!

More information

The data was retrieved from cricinfo during the second test between Australia and India on the 19th of October 2008;
Not-out scores were removed from the analysis;
The exponential distribution does break down a little for scores above 250 as there simply isn't enough data;
Yes, Marc has scored a duck in his cricket career.

Monday, 22 December 2008

Ep 94: The Geek Pop Virtual Music Festival

Geek Pop is the world’s only sci-pop festival - a free online music event featuring songs about science. The festival brings together science-inspired artists from around the globe in a gleeful celebration of geek culture. In 2009, Geek Pop will take place between 6-15th March.

This week on the podcast I spoke to Hayley Birch, the organiser of Geek Pop, about the festival, where the idea came from and what type of music we can look forward to.

You can subscribe to Geek Pop updates by sending an email to news@geekpop.co.uk with the subject SUBSCRIBE ME RIGHT UP. Or register your attendance at the Facebook event.

We've looked at the various scientific aspects of music in the past on Mr Science, just check out our music label.

Listen to his podcast here:

And remember to tell us your science highlights from 2008 to go into the running for some sciencey prizes. Answers will also contribute to our year-in-review podcast early in 2009. Let us know here.

Monday, 15 December 2008

Ep 93: Communicating Mathematics with the Masses

Some people find mathematics perplexing, whilst others find it beautiful. This week on the podcast, I spoke to two young men professionally employed to communicate mathematics to school-children around the country. Both these guys have a long history in science communication - I got to be good friends with them back in my science circus days.

Jamos McAlester travels Australia with Tenix Questacon Maths Squad, which is an outreach program of Questacon – The National Science and Technology Centre. The Maths Squad aims to inspire students and teachers about maths, and show how science and technology, and in particular maths, play an important role in our everyday lives. The Maths Squad also offers professional development workshops for teachers. Initially started in 1976, it has now visited thousands of towns across Australia, and there aren't too many places in Australia that Jamos hasn't been. The program makes almost 500 puzzle-based activities accessible to students and aims to highlight the broad and narrative nature of maths and its essential and pervasive range of applications. Jamos has a particular love of maths and thinks that people often find maths boring because it is taught out of context:

"calculations are the spelling of maths, not the story."

Marcus Finlay is a proactive, scientifically inclined, primary school teacher from Melbourne. As opposed to most teachers, Marcus inspires his students about science and maths rather than running away from the topics, and lists his class's attempts to build model tsunamis in the classroom as his science highlight of 2008. Back in 2001, Marcus and I wrote The Marco Show about a couple of wizards who sung songs and turned themselves in dogs - you can read about this ridiculous show on Mr Science from 2006.

I spoke to Marcus and Jamos from The Mathematical Association of Victoria's annual conference, both were giving key-note addresses on the communication of mathematics.

Listen to his podcast here:

And remember to let us know your science highlight from 2008. You will go into the running for some sciencey prizes and we'll take a look at your highlights in a podcast episode in early 2009. See the form here.

Friday, 12 December 2008

What are your scientific highlights from 2008?

As we are nearing the end of 2008, and closing in on 100 podcast episodes of the Mr Science Show, I thought it time to ask you the reader/listener about your scientific highlight of 2008. Is it the opening - and closing - of the Large Hadron Collider? Is it the discovery of water ice on Mars? Or perhaps breakthroughs in cancer treatment?

Whatever it is, let us know and everyone that writes in will go into the running for a prize. It'll be a good prize too - either some science magazines or books, something from the Mr Science store, or whatever I can get my hands on. Fill in the form below, leave a comment, or send us an email, and your thoughts will be included in an upcoming podcast reflecting on the science year that was.

THIS COMPETITION IS NOW CLOSED

Wednesday, 3 December 2008

Going mobile and QR codes

For those of you who need your content on-the-go, we've mobilised Mr Science. You can now see the Mr Science Show in a mobile phone compatible format at m.mrscienceshow.com. And for those of you with an iphone, we have made a site especially for you here. To create these sites, we used MoFuse.

Mobile sites deliberately cut back on data rich content, so pictures are smaller and many of the other widgets, which are easy to download on your desktop but add up on expensive mobile data plans, are disabled. At the moment, as mp3s are disabled, if you wish to download the podcast, you need to do it through the normal desktop version of this site (which you can see on your mobile, but contains all the data-rich stuff). I would like to make it such that if you visit Mr Science on your mobile, your phone immediately redirects you to the mobile site, but alas there is no way yet to do that (well not in blogger, there is for wordpress blogs).

QR Codes

Q R Codes (Quick Response Codes) are two-dimensional bar codes (like the one on the left) originally developed by Japanese company Denso-Wave. Having just been to Japan, I can assure you they are very popular there. They are similar to normal bar codes, but are more easily customised and allow you to access internet sites and download content to your mobile without having to type in a URL or go searching through Google.

We created a QR code for the Mr Science mobile site - it's that black and white bar-code looking thing on the left.

To use a QR code, you need a camera in your phone - they all do these days - and you need QR software - grab it from i-nigma if you don't already have it. Open up the i-nigma application and hold your camera near the code. Your mobile then reads and analyses the code and takes you to the page it is encoding. Try it on our QR code - it works on a computer screen.

In Japan, everything from restaurant menus to competitions on train station walls have associated QR codes so you can enter the competition or order from the menu on-the-go. You could use a QR code as a business card if you liked. Various places around the world are starting to use QR codes. The Brooklyn Public Library uses the codes to identity each of their branches, which they then add to flyers and posters. The Powerhouse Museum used QR codes for Sydney Design 08.

For more information, see bibliothekia and ReadWriteWeb.