Tuesday 30 June 2009

Correlation of the Week: Ashes success and El Nino

As we have shown on this blog a number of times (see here, here and here for starters), cricket fans love their maths. So it should come as no surprise that another cricket/maths story has recently come out, this time from the University of Reading linking cricket success with the weather! I only blog my maths/cricket geekiness, these guys have research funding!

Manoj Joshi has shown that the El Nino Southern Oscillation (ENSO) phenomenon has a significant effect on the results of The Ashes cricket series between Australia and England when the series is held in Australia. The Australian Cricket team is more likely to succeed after El Nino years, while the English cricket team does better following La Nina years (the opposite phase). Their study, Could El NiƱo Southern Oscillation affect the results of the Ashes series in Australia? was published in the journal Weather.

I didn't quite believe this at first, so I took their data, redid the maths, and it turns out that they are correct! However, the media interpretations of these results are not surprisingly a little over the top. Whilst there is a significant correlation between the state of El Nino in the year before the Ashes series and the result, the correlation itself is weak. This is an important point to keep in mind with any correlation - strength and significance are two different things - even sciencedaily got this wrong in its reporting on the topic. There is a nice explanation of these ideas here.

Strength refers to how well the data sets move with each other, significance refers to how likely it is the correlation occurred by chance. For example, you can easily get a strong correlation between two data sets if you have only a small amount of data. But as you lack data, it is unlikely that the relationship will actually be significant. In our case however, the correlation is quite weak, but the relationship is significant. The conclusion to this study should be that ENSO plays a very small role in determining the results of Ashes series in Australia, but that other factors are likely to be more important, and that simple noise and randomness will probably have more of an effect than the phase of ENSO. It is only over time that this correlation can be teased out. The study does admit this, with Joshi saying:

"There are of course many different factors governing the outcome of any given sporting contest, which would act as noise in this analysis."

But I think his statement that "the study could even influence whether the England touring team should include more fast bowlers or more 'swing' bowlers" is probably a little bold (and to his credit he does admit this)!

So, how does this all work?

There are two phases of ENSO - during El Nino, the eastern equatorial Pacific Ocean warms by about 1 degree. For Australia this means low rainfall and high temperatures. La Nina is a reverse, with more rain and a drop in temperature. The study analysed the results of all Ashes matches held in Australia from 1882-2007 and found that during El Nino years, the Australian team won 13 out of 17 series (76%), but only five out of the 13 played in La Nina years (38%). England has only won one Ashes series in the last 100 years following an El Nino event - the Bodyline series in 1932/33. The author speculates that cricket pitch conditions can affect the outcome of a match with the drier pitches of El Nino favouring fast Australian bowlers with the English slower swing bowlers enjoying La Nina.

Now to the maths. I have reproduced the results from the paper in our chart as you can see here. On the y-axis is the series result (English wins minus Australian wins). On the x-axis is the Nino 3 index, which is the mean monthly temperature anomaly in the eastern tropical Pacific: 5S-5N; 150W-90W. Of course, all the dots should be on integer values of y - some were shifted in the original paper for ease of viewing. The correlation is still correct.

Ashes result vs El Nino effect

What we can see here is a very weak correlation - the R2 value is only 0.1. R2 is the coefficient of determination and gives some information about goodness of fit. A value this low is generally accepted as suggesting no correlation at all. One interpretation is to say that about 10% of the correlation can be explained by the Nino 3 index. The paper itself quotes R (=-0.31) as opposed to R2, but to determine whether a relationship is strong or not, you need R2.

To test for significance, Joshi generated 10000 sets of randomly generated numbers to represent the Nino 3 index - each set had 32 members (the same number as the number of Ashes series) and a normal distribution with a mean of zero and a standard deviation of 0.8, similar to ENSO observations. They found that the chance of R being more negative than -0.31 was 5%, which is the level generally accepted as being significant.

There is, however, an easier way to do this - you can use t-tests (as we used in our earlier Correlation of the Week on vampire and zombie movies). To generate your t-statistic, you use the formula:


where N is the number of sample points (32). When you do this, you get a t value of -1.78. This means that if our null-hypothesis is r=0 (that is, there is no correlation), when you look this t-value up in a t-distribution table, you find that it is more negative than the critical value of -1.70 in a one-tailed test, which means it is significant. What all this means is that there is a very weak significant correlation very close to zero. I wouldn't put any money on either team based on this result! In any case, Australia is going to win....

3 comments:

  1. Correlation Significanc vs. Strength; Explained!

    Thank you. You don't know how hard it was to find an explanation of a weak correlation with strong significance.

    ReplyDelete
  2. yes.. we know how to spell up here.. Correlation Significance.

    ReplyDelete