Monday, 11 August 2008

Modelling Olympic Gold

After every Olympics, there is speculation about which country performed best. Should we really be surprised when China, with its huge population, and the US, with its combination of high GDP and population, top the medal table? Can we take a look at the medal tables and see which countries did indeed perform better than expected?

This is a shorter version of an article I wrote up over at Plus, so to read more, especially about some of the maths involved, see the article Harder, better, faster, stronger.

In terms of total medals won, the same five countries topped both the 2000 Sydney Olympics table and the 2004 Athens Olympics table:

PositionCountry2000 Medal Count
Country2004 Medal Count
1United States92
United States103

By-and-large the same countries rise to the top each Olympics, but a quick look at the medal tables seems to suggest two obvious variables that may play a part in a country's Olympic success — population and Gross Domestic Product (GDP). A high population gives a country more athletes to draw from, while GDP could be assumed to represent a country's prosperity, with a prosperous country more likely to spend money on frivolous activities such as sport. Adjusting for population, we see that the top 5 countries have changed, except for Australia, who has over-performed for its population:

PositionCountry2000 Medal CountPopulation ('000s) per medal
Country2004 Medal Count Population ('000s) per medal

India, with its huge population, under-performed in 2004, with one medal per one billion people, however we may expect with its rising GDP that it could come near the top of future lists. Looking at GDP, we find a new top 5, with Australia dropping out, but Cuba, Jamaica and the Bahamas again performing well:

PositionCountry2000 Medal CountGDP ($ '000,000s) per medal
Country2004 Medal Count GDP ($ '000,000s) per medal

Looking at simple plots of medal tally against population and GDP for the 2004 games, it can quickly be seen that linear models of these variables will be unsatisfactory:

The extreme values of GDP and population suggest that logarithms should be used. This makes practical sense — a country with a high population does not get to enter more athletes in the Olympics than lowly populated countries, and whilst a high population gives a strong base from which to draw quality athletes, as population increases, this effect will diminish. With regards to GDP, countries occasionally produce athletes with so much natural ability that no amount of money spent on training the opposition could defeat them. Findings in the report Do elite sports systems mean more Olympic medals? by Simon Geoffrey, Martina Kerim, Peren Arinb, Nitha Palakshappac and Sylvie Chettyd from the Department of Commerce at Massey University back this up, with the authors suggesting that "the extraordinary talent required in winning a gold medal cannot be surpassed by the employment of an elite sports system."

Looking at the countries that received more than 15 medals in 2004, plots of the logarithm of medal count against the logarithms of population and GDP show a linear relationship. Using linear regression — a form of analysis that fits a straight line to the data by minimising the distances between the data points and the fitted curve — we can find a straight line that fits well. We found that the R2 values of this fit (R2 is a statistical measure of correlation between 0 and 1) are above 0.5, suggesting that, while not quite high enough to prove a correlation, we may be on to something:

Using a linear combination of the logarithms of GDP and population, we can come up with a fitted line:

We can see that Cuba, Australia and Russia all fall above the line of best fit and so compared to the other countries who received more than 15 medals, achieved well. This could be explained by Cuba's famous tradition of boxers and the spending of Australia and Russia on sport.

The danger with any such fitted model is that you can fit anything to anything after the event — the challenge is to come up with a worthwhile representative model that can not only let teams know how they are doing now, but can predict how they may do in the future.

In the paper Who wins the Olympic games: Economic development and medal totals, Andrew B. Bernard and Meghan R. Busse from The National Bureau of Economic Research developed a model that includes population, GDP, whether the country was the Olympic host and whether the country was formerly part of the Soviet Union or eastern block. They found that countries win 1.8% more medals when host than otherwise, and similarly, found that former Soviet Union or eastern block countries, because of their forced mobilisation of resources towards sport, and countries with planned economies, won more than 3% more medals than equivalent western countries. Their model is formulated as:

where M is a country's medal count, N is the population, Y is the GDP, C, alpha and beta are constants, and Host, Soviet and Planned are constants equal to zero or some value depending on whether the country was the host, part of the Soviet block, or had a planned economy.

In their more developed models, the authors included terms to represent how countries performed at previous Olympic games — perhaps to represent the experience gained by athletes competing at multiple games. Their overall conclusion is that whilst GDP is the best single variable for predicting medal tallies, other factors such as being the host country need to be included. Indeed, their model predicted that Australia would win 17 more medals than otherwise when it hosted the Sydney Olympics — the model was only one short of the actual 18 extra medals Australia did win.

With this in mind, it is hard to look past China, as host country and with vast amounts of money pouring into Olympic sports for just this occasion, topping the medal tally.

  • Data from the World Bank and the International Olympic Committee was used in the analysis. Due to doping scandals, the medal tables may change but are accurate at the time of writing.

    1. Forget the issue of modeling Olympic gold and the impact to the home team for the moment. Now that the Olympics are 10+ days old, the most fascinating statistical element to these games is the massive differential between Gold, Silver and Bronze for China. Given the fact that Gold, Siler and Bronze occur at equal probabilities and that most sports will have more than 1 Chinese contender, what are the dynamics that would create such an unbalanced differential. Why is it that the Chinese are capable of developing the best athlete, but then are unable to also generate the second or thirdbest athlete. If you look at the current medal table, you will see that no other nation experiences such a drastic differetial. Statistically, the more medals you win the more this should all balance itself out. Has such a drastic differential ever been reported in other Olympics and under which conditions?

    2. That is a really good question. One explanation is that China is the home team, and statistically the home team increase their gold medal share over and above their overall medal share (which also goes up). So in an "away" Olympics, teams tend to get equal amounts of gold, silver and bronze, but at "home" Olympics, more golds are won.

      I am planning to write an article on the effect of the home advantage at the Olympics, so stay tuned!