Tuesday 30 June 2009

Correlation of the Week: Ashes success and El Nino

As we have shown on this blog a number of times (see here, here and here for starters), cricket fans love their maths. So it should come as no surprise that another cricket/maths story has recently come out, this time from the University of Reading linking cricket success with the weather! I only blog my maths/cricket geekiness, these guys have research funding!

Manoj Joshi has shown that the El Nino Southern Oscillation (ENSO) phenomenon has a significant effect on the results of The Ashes cricket series between Australia and England when the series is held in Australia. The Australian Cricket team is more likely to succeed after El Nino years, while the English cricket team does better following La Nina years (the opposite phase). Their study, Could El NiƱo Southern Oscillation affect the results of the Ashes series in Australia? was published in the journal Weather.

I didn't quite believe this at first, so I took their data, redid the maths, and it turns out that they are correct! However, the media interpretations of these results are not surprisingly a little over the top. Whilst there is a significant correlation between the state of El Nino in the year before the Ashes series and the result, the correlation itself is weak. This is an important point to keep in mind with any correlation - strength and significance are two different things - even sciencedaily got this wrong in its reporting on the topic. There is a nice explanation of these ideas here.

Strength refers to how well the data sets move with each other, significance refers to how likely it is the correlation occurred by chance. For example, you can easily get a strong correlation between two data sets if you have only a small amount of data. But as you lack data, it is unlikely that the relationship will actually be significant. In our case however, the correlation is quite weak, but the relationship is significant. The conclusion to this study should be that ENSO plays a very small role in determining the results of Ashes series in Australia, but that other factors are likely to be more important, and that simple noise and randomness will probably have more of an effect than the phase of ENSO. It is only over time that this correlation can be teased out. The study does admit this, with Joshi saying:

"There are of course many different factors governing the outcome of any given sporting contest, which would act as noise in this analysis."

But I think his statement that "the study could even influence whether the England touring team should include more fast bowlers or more 'swing' bowlers" is probably a little bold (and to his credit he does admit this)!

So, how does this all work?

There are two phases of ENSO - during El Nino, the eastern equatorial Pacific Ocean warms by about 1 degree. For Australia this means low rainfall and high temperatures. La Nina is a reverse, with more rain and a drop in temperature. The study analysed the results of all Ashes matches held in Australia from 1882-2007 and found that during El Nino years, the Australian team won 13 out of 17 series (76%), but only five out of the 13 played in La Nina years (38%). England has only won one Ashes series in the last 100 years following an El Nino event - the Bodyline series in 1932/33. The author speculates that cricket pitch conditions can affect the outcome of a match with the drier pitches of El Nino favouring fast Australian bowlers with the English slower swing bowlers enjoying La Nina.

Now to the maths. I have reproduced the results from the paper in our chart as you can see here. On the y-axis is the series result (English wins minus Australian wins). On the x-axis is the Nino 3 index, which is the mean monthly temperature anomaly in the eastern tropical Pacific: 5S-5N; 150W-90W. Of course, all the dots should be on integer values of y - some were shifted in the original paper for ease of viewing. The correlation is still correct.

Ashes result vs El Nino effect

What we can see here is a very weak correlation - the R2 value is only 0.1. R2 is the coefficient of determination and gives some information about goodness of fit. A value this low is generally accepted as suggesting no correlation at all. One interpretation is to say that about 10% of the correlation can be explained by the Nino 3 index. The paper itself quotes R (=-0.31) as opposed to R2, but to determine whether a relationship is strong or not, you need R2.

To test for significance, Joshi generated 10000 sets of randomly generated numbers to represent the Nino 3 index - each set had 32 members (the same number as the number of Ashes series) and a normal distribution with a mean of zero and a standard deviation of 0.8, similar to ENSO observations. They found that the chance of R being more negative than -0.31 was 5%, which is the level generally accepted as being significant.

There is, however, an easier way to do this - you can use t-tests (as we used in our earlier Correlation of the Week on vampire and zombie movies). To generate your t-statistic, you use the formula:

where N is the number of sample points (32). When you do this, you get a t value of -1.78. This means that if our null-hypothesis is r=0 (that is, there is no correlation), when you look this t-value up in a t-distribution table, you find that it is more negative than the critical value of -1.70 in a one-tailed test, which means it is significant. What all this means is that there is a very weak significant correlation very close to zero. I wouldn't put any money on either team based on this result! In any case, Australia is going to win....

Monday 29 June 2009

Beauty and the Geek comes to Australia

Ever seen the American show Beauty and the Geek, produced by twitterer, actor and producer Ashton Kutcher? The show takes a bunch of horribly awkward, socially inept, level-23 dragon overlords - sorry, Dungeons and Dragons aficionados - and puts them in a house with a group of stunning looking, but generally slow-on-the-uptake, girls to see what happens. Each geek is teamed up with a beauty and they live together in a room throughout the show, with challenges in each episode being used to eliminate the couples. The challenges test what the contestants are bad at - the beauties are tested in academic topics (the geeks are supposed to tutor them) and the geeks are tested with social questions (the beauties are likewise supposed to teach them about the real world). At the end of the show, one couple wins a swag of money and there is a very American resolution with the contestants saying how much they've learnt and grown.

So yes, it's ridiculous and patronising - but it's compelling viewing and coming to Australia!

If you want to apply to be a contestant on the show, download the application forms on the Channel 7 Beauty and the Geek website.

Thursday 25 June 2009

Awesome Illusion

This illusion has been doing the rounds this week (see Bad Astronomy and Richard Wiseman for a couple of science blogs I like that picked it up), but it's so good I thought it needed to be posted here also.

Look carefully at the image below. Do you see a couple of spirals, one blue and one green? Well, take a closer look - in actual fact, the blue and green are actually the same colour!

Don't believe me? Copy the image and open it up in PhotoShop or Paint and take a closer look....

You will notice that the orange curves move through the "green" spirals, but not the blue. And the purple curves don't move through the green.

If we blow this picture up even more, we can see that the colours are becoming more and more similar.

The blue and green appear to be different colours because our brain works out colours by comparing them to other surrounding colours and it does a bit of mixing. When we look at the "blue" spiral, we also take in the purple curves moving through it. This makes it look more blue. When we look at the "green" spiral, we take in the orange curves, which makes it look more green.

I know that's not a great explanation, so I'd be happy to hear a better one!

Edit by westius 2/7/09: If you doubt this illusion, check out this image - I've replaced some of the colours - you can clearly see now that the 'blue' and 'green' are the same:

Wednesday 24 June 2009

Furthering your science communication education

Interested in science communication? Love travelling? Then joining the 2010 Shell Questacon Science Circus might be for you. Their slogan this year is "Got your science degree, now get your backpack!"

If you are a recent science or technology graduate and enjoy science communication, then you can earn a prestigious Graduate Diploma from ANU, work with Questacon, the National Science and Technology Centre, and travel Australia communicating science in cities, towns, outback stations, indigenous communities, schools, nursing homes, just about everywhere. As far as I can tell, there aren't any science communication courses like this anywhere in the world.

Your fees are paid for, and you also receive an additional scholarship. You are trained in science communication from top scientists, journos, radio presenters and performers.

I did the course in 2001 and it was fantastic. To get in, you'll need to be as good looking as these folk and this lot. Not only have I now seen places in Australia that barely anyone gets to see, but I had a ball and made contacts in science communication fields. Plus, you are on tour much of the year and believe it or not, when you aren't, Canberra is a pretty good place to live.

Applications close 31 August - see the circus website for more information.

For my reflections on the 2001 year, see my Science Circus flickr photo set (I really should scan some more photos and put them up - ah, a time before digital cameras were popular...) and this ridiculous story and podcast - if you're game, listen to the podcast of that episode to hear me sing... I have improved my sound recording since then, but not my singing.

If you happen to be at the other end of your science communication post-grad education, the Public Communication of Science and Technology Network is collecting information about PhD theses completed since 2000 on topics in science communication and closely related areas. This
information, with a classification by topic area and country, will be published on their web site and a review will be presented at PCST11 conference in 2010.

This exercise is intended to provide an overview of current and recent research in the growing field of science communication that will support those starting in the field. It is also intended to promote networking between researchers in science communication.

Information is sought on doctoral theses that addressed mainly or exclusively topics such as science in media, public communication by scientists and scientific institutions, history of popularisation, science museums and science centres, science festivals and events, public consultation in science-related policy, risk communication on science-related issues, and so on.

For each thesis, they seek the following details:
  • Title;
  • author;
  • affiliation (university or similar);
  • date of completion;
  • supervisor(s);
  • examiner(s);
  • abstract (approx 250 words);
  • where accessible (online or print).
Brian Trench of Dublin City University and and Maarten van der Sanden of Delft University of Technology are leading this project. Summaries of theses should be sent by 1 October to M.C.A.vanderSanden@tudelft.nl

And if you do have a science communication PhD, make sure you dance it!

Sunday 14 June 2009

Sumo vs Chess - how their ranking systems work

At the very heart of sport is a fierce battle in which the combatants strive to outwit and outplay each other. Each thrust is matched by a parry and in the end, there can only be one winner. The rules of each sport dictate how that winner is determined, and, whether it is football, tennis, golf or chess, it is those who perform best on the day who take home the glory.

But it's not easy to get to the all-important final. True sporting supremacy cannot be decided in a one-off winner-takes-all event. Indeed, many championships are specifically designed such that the ultimate winners are not decided by the results of single events but rather by accumulated results over time. It is easy to rank teams within structured leagues like the English Premiere League, as each team plays all opponents, so that eventually the best teams rise to the top. But it is not always that easy — indeed, more often than not, it is very difficult. Keeping with the football example, how can we compare international football teams such as Australia and England who very rarely play each other? International football sides do not play as often as club teams, so we can't have the same confidence that the team with the most wins over a year, or even the highest percentage of wins, is the best team in the world. And how can we determine which wins are the most important?

Different sports have developed their own way of dealing with the subtleties of ranking their players and teams, and this article is the first of a series looking at these ranking systems. In future instalments we'll be looking at rugby, football, cricket, tennis and golf, but to start off, we explore the ranking systems of two very different sports: chess and sumo wrestling.


It should not surprise anyone that, of all sports, chess has the most mathematically elegant ranking system — it is, of course, a very cerebral pursuit. The ranking system, called the ELO system, was developed by Arpad Elo, a Hungarian-born American physics professor and chess guru.

Unlike in other sports such as tennis, where rating points can be awarded subjectively — for example, an important tennis tournament might be worth ten times more than another — Elo's idea was to mathematically estimate, based on observation, actual player ability.

Normal distributionElo's original assumption was that a player's performance varies from game to game in a way that can be described by the normal distribution: if you measure a player's skill numerically and then plot the number of times he or she has achieved at each skill value against the possible skill values, you will get a bell-shaped curve as shown in figure 1. The peak of the curve is the mean of the distribution and represents the player's true skill, while the tails represent untypically good or bad performances. So while a player can have very good and very bad days, on average they perform somewhere near their mean value. The aim of the ELO system is to estimate the mean value for each player by looking at how often they win, lose and draw against other players with different abilities — this gives you their rating. We can use the player ratings to predict the probability of one player beating another, and the smaller your chance of winning, the more rating points you get if you do win.

Elo's original model used normal distributions of player ability, however observations have shown that chess performance is probably not normally distributed — weaker players actually have a greater winning chance than Elo's original model predicted. Therefore, the modern ELO system is based on the logistic distribution which gives the lower rated player a greater, and more accurate, chance of winning.

To see how this system works in practice, let's have a closer look at the maths. Each player's ability is modelled as having a standard deviation of 200 rating points. The standard deviation is a measure of the spread of the data around the mean. In our case a low standard deviation would mean that the player's performance never strays far off the mean, while a high standard deviation means that he or she occasionally has pretty drastic off-days, both in the negative and positive sense. Traditionally in chess, ability categories like grand master, master and so on, spanned 200 rating points each, and this may be the reason why the value 200 was chosen for the standard deviation.

Based on these assumptions it's possible to work out the expected scores of a player $A$ playing against player $B$. If player $A$ has a rating of $R_ A$ and player $B$ has a rating of $R_ B$, the exact formula for the expected score of player $A$ is

\[ E_ A = \frac{1}{1+10^{\frac{R_ B-R_ A}{400}}}. \]

Similarly, the expected score for player $B$ is

\[ E_ B = \frac{1}{1+10^{\frac{R_ A-R_ B}{400}}}. \]

The following chart shows the probabilities involved in a game of chess based on Elo's original normal distribution, and the modified version employing the logistic model. The horizontal axis measures the difference between the rating of player A and player B and the vertical axis gives the chance of a win for player A. There is little difference between the two curves except in the tails, where the logistic curve gives the lower rated player a greater chance of winning.

A graph

Player ratings are updated at the end of each tournament. Imagine you are a player with rating 1784. In a tournament you play 5 games with the following results:

  1. Win against player rated 2314;
  2. Lose against player rated 1700;
  3. Draw against player rated 1302;
  4. Win against player rated 1492;
  5. Lose against player rated 1927.

As you had two wins and a draw, you scored 2.5. However, your expected score, as calculated from the above formula, was 0.045 + 0.619 + 0.941 + 0.843 + 0.306 = 2.754. Therefore you didn't do as well as was expected.

When a player’s tournament score is better than the expected score, the ELO system adjusts the player’s ranking upward. Similarly when a player’s tournament scores is less than the expected score, the rating is adjusted downward. Supposing player $A$ was expected to score $E_ A$ points but actually scored $S_ A$ points, their new rating $R^\prime _ A$ is

\[ R^\prime _ A=R_ A + K (S_ A - E_ A), \]

where $K$ is a constant which causes much debate in the chess world. Some chess tournaments use $K=16$ for masters players and $K=32$ for weaker players. This means that unusually good or bad performances weigh more heavily for weak players than they do for masters.

For our player with rating 1784, the new rating becomes

\[ R^\prime _ A = 1784 + 32(2.5-2.754) = 1776. \]

As with all ranking systems, there are controversies surrounding the ELO system. These include

  1. Mixing: Like all ranking systems, the ELO system works best when players play often against many different people. Imagine a chess club whose members generally play among themselves. Their ratings therefore reflect how good each member is compared to their club's other members. But if that club then plays against a second club in a tournament, there is every chance that one of the clubs is considerably stronger than the other. This however will not be reflected in the ratings before competition. Only after some time will the ratings reflect player ability across both clubs rather than just within each club.
  2. Initial score: How should a new player be ranked, and how credible is that rank?
  3. The constant K: What is the right value of K? If K is too low, it is harder to win points, but if K is too high the system becomes too sensitive.
  4. Selective play: In order to maintain their rankings, players may selectively play weaker players.
  5. Time: How should a player who is not playing anymore, or plays infrequently, be ranked?


Sumo masters are decided in a way that's hard to quantify.

The ranking method of sumo wrestling is almost the complete opposite of chess — much like the sports themselves! Whilst some mathematics feeds into the ranking formulation, much of what determines a sumo's rank, especially in the upper ranks, cannot be quantified.

There are six divisions in sumo wrestling — makuuchi, juryo, makushita, sandanme, jonidan and jonokuchi. The top division, makuuchi, is very popular in Japan and has a complex inner ranking system, with yokozuna the ultimate rank. The following figure shows the breakdown of sumo ranks, with the numbers in brackets representing the number of wrestlers at that level.

A graph

Sumo wrestlers fight in tournaments called basho. These tournaments run to 15 days and depending on the division consist of seven or 15 bouts. Grand Sumo tournaments, known as honbasho, determine sumo rankings and there are six throughout the year. In these tournaments, the wrestlers fight within their divisions — sekitori fight 15 matches whilst the lower divisions fight seven. As there are more wrestlers than there are matches, sumo elders called oyakata determine the match-ups the day before.

In general, rising to sekiwake, the third level of the highest division, requires that you win more bouts than you lose in tournaments. If you have a positive winning record in a tournament, you will move up, and vice versa. If your winning record is 13-2, you will climb higher than someone with an 8-7 record.

The jump from the third division, makushita, to the second division, juryo, is perhaps the most important rank distinction in sumo. Juryo is the first rank of sekitori, and the ultimate aim of most wrestlers. Wrestlers lower than this rank have to do chores for their superiors and are essentially sumo slaves. Non-sekitori wrestlers become tsukebito (personal valets) for the higher ranked wrestlers. Those at the very top of the table, yokozuna, typically have four tsukebito while everyone else in the sekitori class normally has two or three depending on prestige and seniority. There is probably no other sport in which the difference between ranks is so important — tsukebito need to accompany their superiors wherever they go, and while sekitori can relax and hang out with their fan clubs at the end of the day, or go home to their apartments, the junior wrestlers must clean the sumo stables and live in communal dormitories. The difference in salary is also huge — juryo rank receives a base salary of ¥1,036,000 (around £7,000) plus considerable add-ons and bonuses, while there is no salary below this rank.

Given this massive discrepancy, you can see why maintaining a rank of sekitori is very important for a sumo wrestler. Indeed, the ancient sport has recently been tainted by a match fixing controversy. A study by Steven Levitt and Mark Duggan in the book Freakonomics showed that 80% of wrestlers with 7-7 records win their matches at the end of tournaments, when you would expect this percentage to be closer to 50%. The authors conclude that those who already have 8 wins collude with those who are 7-7 and let them win, since they have already secured their ranking.

The rank of yokozuna is the highest and most venerable position in the sumo world. The yokozuna is the Grand Champion. Ozeki are also held in very high regard, and there are always at least two ozeki — there is no minimum on the number of yokozuna at any one time, however there may be none. There are currently two yokozuna.

To achieve these ranks, you must do more than have a positive honbasho record. By the time a wrestler reaches the rank of sekiwake, he has been able to maintain a positive honbasho record for some time. If a sekiwake starts to accumulate 10-5 or better records and occasionally upsets a yokozuna, the sumo administrative board (called the sumo kyokai) will consider a promotion to ozeki. One of the benefits of ozeki is that there is no automatic demotion based on match results — to be demoted back to sekiwake requires two consecutive losing streaks.

If an ozeki starts winning honbasho, he may be judged by the promotion council (the yokozuna shingi iinkai) for possible further promotion. And here is where the real mysteries of sumo ranking come in. The promotion council can recommend the ozeki to the riji-kai (board of directors) of the sumo kyokai. The first thing they consider are the previous three honbasho. Out of those 45 bouts, 38 is the minimum number of wins needed to be considered and the ozeki should have won two consecutive tournaments — but this is not all! The wrestlers must show respect for the sumo kyokai's rule and tradition, and towards past wrestlers, and also possess a character and attitude appropriate for a yokozuna. They must have hinkaku (dignity and grace) and have mastered basic sumo techniques such as shiko (the way a sumo holds his foot aloft before pounded it into the ground) and suri-ashi (the technique of keeping the bottom of each foot always touching the ground while moving). Once the riji-kai approves the promotion, it needs to be finally decided by the banzuke hensei kaigi (ranking arranging committee).

Famously in 1991, Hawaiian-born Samoan wrestler Konishiki, the heaviest wrestler ever in top-flight sumo, was denied yokozuna even though he had won two championships in a row. The chairman of the promotion council said, "We wanted to make doubly sure that Konishiki is worthy to be a grand champion. Therefore, we decided to wait for another tournament." It was speculated at the time that a foreign-born sumo wrestler could never make yokozuna as they could not possess the required cultural understanding. Since then however, there have been foreign-born yokozuna.

A yokozuna cannot be demoted and is expected to retire if his performance starts to dip.

As you can see, mathematics hardly comes into sumo, rather the rankings are based on trust and veneration for those in charge. Most other sports, however, do require a more objective evaluation of their stars, and we'll have a look at some of them in future articles of this series.

For more, see Plus Magazine. For more on chess, see Chess rankings, ChessBase and Knowledgerush, and for Sumo see the Sumo FAQ.

Thursday 4 June 2009

Ep 107: Ranking Cricketers

Cricket is one of the world's most statistical sports, and mathematicians in cricket-loving nations love nothing more than delving into the minutiae of the numbers and diving into averages, strike-rates and custom-made measures of batting and bowling effectiveness.

For many people, including me, cricket isn't just a sport, it is a way of life.

These words could easily have come from me, but are actually the words of Rob Eastaway, a cricket-loving mathematician from the UK, and originator of the official International Cricket Council cricket-ratings which rank not only teams, but players within each team. In this week's podcast, I chat to Rob about how you mathematically rank cricketers.

Listen to this podcast here - note a few audio issues, see below:

Ranking individual batsmen and bowlers is no small task. A common method of comparing batsmen is their average, which is the average score the batsman compiles each time he comes in to bat. This method, however, has a number of issues as it does not take into consideration the opposition, playing conditions and how recently the runs were scored. How can you compare a score of 60 against a world-class opponent on a dodgy pitch with a score of 150 against a lowly rated team in easy batting conditions? This is what Eastaway's ranking system attempts to do - and the maths is quite difficult (far more difficult, in Rob's words, than the Duckworth-Lewis method of determining the winner in a rain-effected game!) As well as taking into consideration the strength of the opposition and playing conditions, the ranking system places a greater emphasis on recent performances. The overall system has a number of feed-back loops - the individual player ratings contribute to a team's rating, which effects how many rating points an opposition player can earn against that team - remember, a score of 50 against tough opposition will be worth more than 50 against low-class opponents. Similarly, how each player in a match performs influences how many points are on offer. For example, a score of 45 out of an overall team score of 100 will be more highly valued than a score of 45 out of 450. As such, large amounts of historical data are used to come up with the final numbers. Limited overs cricket has the additional dimension of strike-rate - a batsman who scores his runs quickly will be rated more highly than a slow scorer.

The system was first developed in 1987 by Eastaway with former English cricketer Ted Dexter and colleague Gordon Vince, and at first the system was greeted with scepticism by many cricket lovers. Nowadays, however, it has gained credibility and has even been used by international cricketers to help negotiate their contracts - for example, Michael Bevan was for a long time rated number one in the One Day International version of the rankings and used this in contract negotiations, however he could not secure a Test spot. Known originally as the Deloittes Ratings and in later years the PwC Ratings, the system was officially adopted by the International Cricket Council in January 2005.

With a background in Operations Research and a love of cricket, Eastaway is essentially my idol! You can read more about the maths of his system in his article Howzat in Plus Magazine.

As many of the equations used in this system are now copyrighted, you can't find the exact algorithms published anywhere. However, if you are a big nerd like me, you might like the book Deloitte Ratings: The Complete Guide to Test Cricket in the Eighties by Marcus Berkmann. The book details the ratings changes after each Test series in the 80s, and the appendix contains many of the equations which underpin the system. I was given this book when I was 10 and didn't much understand it back then, but I was very happy to find it in storage when I returned from the UK, and I now find it a maths-cricket-nerd's delight!

I'm fascinated to know if they come up with something for Twenty 20. The ultra-shortened form of the game brings in loads more complexities, not least of which is that unless you are an opening batsman, you may not even get a bat! Here's hoping Australia can win the World Twenty20 - oh and The Ashes! If only I was in the UK this summer!

I hope you enjoy this podcast - however, please note there are some audio issues. I had a great chat with Rob in a cafe in London, however my recording equipment was set on the wrong setting and so captured a lot more background noise than I had hoped! So please hang in there - this was one of my very favourite interviews. Rob is a fascinating person and had some really interesting observations on maths and sport. I really shouldn't have gone to The Chemical Brothers the night before, I probably would have had the microphone on the right setting!

Listen to this podcast here: