Wednesday, 11 February 2009

Poor correlations, or why it's not the fault of Aussie cricketers

Not too long ago we published an article showing what looked to be a stark correlation between the price of oil and the fortunes of the Australian cricket team.

As the oil price and the Australian cricket team have both declined in recent times, it's time we updated that chart.

And unfortunately, as you can see from the graph below, we can't blame the cricketers for the price of oil (or the economic recession as seen in the Moir cartoon to the right).

Correlations between data sets can occur for 3 reasons:
1. There is a direct cause and effect relationship between the two sets - for example, if its rains a lot in one week, then umbrella sales go up - the level of rainfall has caused an increase in umbrella sales;
2. There is an underlying reason for the two data sets to move together, as opposed to one causing the other - for example, the heavy rain has also caused more road accidents - umbrella sales and road accidents may look correlated, but one is not causing the other. In some cases you would need to look through a few degrees to find the underlying cause;
3. There is no cause and effect and no underlying reason for the correlation - it's simply a coincidence or the work of a devious statistician, as we have here. Scales and time periods are also often changed to make it look like there is a correlation.
If we take the original oil and cricket data, put them on the same x- and y-scales as before, then you can easily see that whilst they are both now trending down, the correlation is no longer strong. The original correlation depended on quite a bit of manipulation of the x- and y- scales, which now means the data sets do not line up. And as the cricket team success is measured as an average win percentage over the last 40 games, it can not drop as suddenly as the oil price. Still, it's fun to speculate and play with the data.