Press Release: Kellyville Ridge Man scores a perfect 15 in Opal
Kellyville Ridge NSW. Local resident Tim Surendonk is celebrating today after scoring the coveted perfect 15 in Opal. As Tim explains it, a perfect 15 occurs when a user of the Opal card pays the absolute minimum amount for unrestricted travel for 6 days in a week (Tuesday-Sunday).
The Opal rules allow unlimited travel after reaching 8 paid journeys, something which the average commuter will only attain after 4 days of to-work-and-back travel. You may think that this would be easy to do--just take 8 consecutive paid trips on trains, ferries or buses--but Opal rules make it difficult. Each trip must be separated by at least 60 minutes in order to qualify as a separate journey, and the day's paid journeys max out after $15 of expenditure--any trips after that aren't "paid-for" and so do not count towards the 8 trips.
Mr Surendonk managed his perfect 15 after carefully taking 8 minimum bus trips, each separated by at least an hour, paying $2.10 for the first 7 and 30c for that last trip.
"It wasn't easy", Mr Surendonk said. "I pretty much had to devote a whole day of annual leave to achieve it, but I did get to go to the Doctor and return some library books" he admitted.
To celebrate, Mr Surendonk took his kids down to Wollongong on Tuesday for a holiday outing. "I had to pay for them, but my trip was free! Thank you Gladys Berejiklian".
Mr. Surendonk's Opal Statement.
Thanks to Tim, PhD in logic, for writing his own press release! I work with some great folk. Tim did end up getting a letter from the minister congratulating him. More in the Rouse Hill Times.
Friday 7 November 2014
Sunday 13 July 2014
Ep 155: Fact or Fiction with ANSTO
The Australian Nuclear Science and Technology Organisation undertakes research and development in nuclear science and technology. This has wide application including nuclear medicine, atmospheric monitoring, materials engineering, neutron scattering and climate change research.
ANSTO is also very active in science communication, and one of their major community engagement projects is Fact or Fiction, a 90 minute show where the audience watch clips of classic sci-fi hits before voting on whether the technology featured is actual science fact or pure science fiction. Once the audience voting has been conducted, an ANSTO scientist critiques the science featured in the film. They have also run a Fact or Fiction Survey, the results of which are illustrative of the general public understanding of science in everyday life. Another effort ANSTO is conducting is Neural Knitworks, where knitted neurons join together to create a textile brain installation.
I spoke with Rod Dowler from ANSTO's Discovery Centre about their science communication efforts, and in particular, Fact or Fiction. Listen to this show here:
In the podcast, we mentioned a song about hoverboards. I would have loved to have put it in the show, but that wouldn't be legal. So if you'd like to hear it, stream it below or buy it from iTunes right here:
Songs in the podcast:
- Is Nuclear Power The Answer? - Karstenholymoly / CC BY-NC 3.0
- Sci-fi funeral - Asmus Koefoed / CC BY-NC 3.0
- The Unbroken Thread (The Molecules of Life Remix) - morgantj / CC BY 3.0
Labels:
Art,
Movies,
Podcast,
Science Communication
Friday 20 June 2014
ABC Radio - June - Mars One
I've been doing quite a bit of regular radio with the ABC recently (ABC Riverina and ABC Central West), so I thought it would be a good idea to put up a post each month on what we've spoken about.
The main topic this month was the Mars One project, which plans to establish a permanent human settlement on Mars. This is an incredibly optimistic project, made even more interesting by the fact that it is going to be funded by a reality TV show, which will track the training and lives of the astronauts, and presumably follow them into space. A number of Australians are still in the running to be part of the final four who make it.
There are seemingly innumerable ethical issues with this project, notwithstanding the fact that at $6 Billion, it seems unbelievably cheaper than NASA estimates (~$100 Billion) for similar projects. Check out the company's FAQs - they have addressed a number of questions that immediately come to mind (food, air, fuel etc). The one outstanding question for me is - what if it goes wrong? Does the company abandon the astronauts on Mars? Does NASA have an obligation to go back to get them? What if everyone stops watching the reality TV show? I watched (well, glanced at) Big Brother One but can't tell you a lot about the following umpteen seasons...
One thing we do know is that long distance space travel is no place for extroverts. Unsurprisingly, a NASA-funded study has found that extroverts will probably drive their space companions bananas if kept in confirmed quarters for too long.
The main topic this month was the Mars One project, which plans to establish a permanent human settlement on Mars. This is an incredibly optimistic project, made even more interesting by the fact that it is going to be funded by a reality TV show, which will track the training and lives of the astronauts, and presumably follow them into space. A number of Australians are still in the running to be part of the final four who make it.
There are seemingly innumerable ethical issues with this project, notwithstanding the fact that at $6 Billion, it seems unbelievably cheaper than NASA estimates (~$100 Billion) for similar projects. Check out the company's FAQs - they have addressed a number of questions that immediately come to mind (food, air, fuel etc). The one outstanding question for me is - what if it goes wrong? Does the company abandon the astronauts on Mars? Does NASA have an obligation to go back to get them? What if everyone stops watching the reality TV show? I watched (well, glanced at) Big Brother One but can't tell you a lot about the following umpteen seasons...
One thing we do know is that long distance space travel is no place for extroverts. Unsurprisingly, a NASA-funded study has found that extroverts will probably drive their space companions bananas if kept in confirmed quarters for too long.
Labels:
Astronomy and Space,
Science Communication
Saturday 31 May 2014
Some life analysis with Twitter
There was a great post recently on Flowing Data, The Change My Son Brought, Seen Through Personal Data. It got me thinking about what my life looks like through personal data, and probably the best source of data since the advent of smartphones is Twitter. Twitter recently made it possible to download your personal archive and it makes for some interesting analysis. Along with RSS feeds, Twitter is my major source of online news, education and entertainment, and it is also useful for personal communications and microblogging.
Downloading your personal archive is easy, but you need to do a little manipulation before you can play with it. My tweets were time-stamped in UTC time (I'm not sure why - perhaps by default, perhaps because of my location settings) so I had to adjust this for time zone changes due to day-light savings and overseas trips (I didn't bother with domestic trips as I don't have an easy record of them, and they don't make too much difference - an hour here and there).
The following has a dot for every tweet I've written since the end of 2010. Take note that the x-axis is quite long (3.5 years) and the dots are quite large (bigger than a day). I haven't annotated it, but it is interesting to spot life events - the birth of my children, various periods of leave and holidays, over-tweeting during The Ashes etc. There are auto-tweets that came out at the same time each week (which I've now stopped as they're annoying). There was a definite shift in the time I rise in the morning after December 2010 when my son was born and a surge in late-night tweets after my daughter was born in 2013.
Breaking it down is a little more interesting. The following shows tweet frequency for work days and non-work days (weekends, leave) since the start of 2013. On a work day, I tweet in the main on the train. I usually catch a train around 7 or 8am in the morning and the return train around 5 or 6pm. During work hours there is a trickle through coffee breaks and lunch, and after dinner is another peak. This type of profile aligns somewhat with the findings of other social media studies (Yellow Social Media Report – 2014 - thanks @problogger), although the amount I tweet on the train is more than the norm, whilst the amount I tweet at work is less (although it is a great way to horizon scan the various fields of science in which I work, once you follow the right people).
Non-work days follow a different profile, at least until after dinner. There's a slightly later rise in the morning, dips when we would be attempting to get out of the house, a dip at an earlier dinner time and a large peak in the evening once the kids are in bed. This peak is higher than a work day, in which time I might be preparing for the next day or falling asleep on the couch. By about 10pm it is basically the same till 6am the next day.
I'm posting this at about 9am on a weekend, having written it at about 10pm last night - that fits the curve pretty well. If you are a social media marketer (of which, at last count, there are 1,083,645,638 on Twitter), target my work trips (although I'm sure you know this from all that stunning big data analysis you do). The downside of this is that the train trip is too short to read anything of any length, which would explain why the 140 characters of Twitter spikes at these times.
Downloading your personal archive is easy, but you need to do a little manipulation before you can play with it. My tweets were time-stamped in UTC time (I'm not sure why - perhaps by default, perhaps because of my location settings) so I had to adjust this for time zone changes due to day-light savings and overseas trips (I didn't bother with domestic trips as I don't have an easy record of them, and they don't make too much difference - an hour here and there).
The following has a dot for every tweet I've written since the end of 2010. Take note that the x-axis is quite long (3.5 years) and the dots are quite large (bigger than a day). I haven't annotated it, but it is interesting to spot life events - the birth of my children, various periods of leave and holidays, over-tweeting during The Ashes etc. There are auto-tweets that came out at the same time each week (which I've now stopped as they're annoying). There was a definite shift in the time I rise in the morning after December 2010 when my son was born and a surge in late-night tweets after my daughter was born in 2013.
Breaking it down is a little more interesting. The following shows tweet frequency for work days and non-work days (weekends, leave) since the start of 2013. On a work day, I tweet in the main on the train. I usually catch a train around 7 or 8am in the morning and the return train around 5 or 6pm. During work hours there is a trickle through coffee breaks and lunch, and after dinner is another peak. This type of profile aligns somewhat with the findings of other social media studies (Yellow Social Media Report – 2014 - thanks @problogger), although the amount I tweet on the train is more than the norm, whilst the amount I tweet at work is less (although it is a great way to horizon scan the various fields of science in which I work, once you follow the right people).
Non-work days follow a different profile, at least until after dinner. There's a slightly later rise in the morning, dips when we would be attempting to get out of the house, a dip at an earlier dinner time and a large peak in the evening once the kids are in bed. This peak is higher than a work day, in which time I might be preparing for the next day or falling asleep on the couch. By about 10pm it is basically the same till 6am the next day.
Labels:
Science Communication,
Visualisation
Saturday 26 April 2014
Ep 154: Blogging, podcasting, royal jelly and using chocolate to determine the speed of light
Over the Easter break, I spoke with Lish Fejer on ABC 666 Canberra on her Experimentarium segment. We spoke on various things to do with science blogging and podcasting, and matters Easter related including:
- Royal Jelly (the Royals were in town, a great link if ever I've seen one),
- Determining the speed of light using your microwave and left-over Easter chocolate.
On determining the speed of light using a microwave, see the post Instascience by Tom Gordon in which he uses paper. We used chocolate and it worked pretty well, albeit very messily. You will enjoy trying this at home, and failing just gives you another shot! Note in the broadcast I mentioned that the speed of light was 2.97 x 108 when it's actually 2.99792 x 108 (please forgive such a grievous error...)
Listen to this show here - the audio is courtesy ABC 666 Canberra:
Here is a nicely produced video on how to do this - I started out making one and made a mess of my kitchen.
Labels:
Biology,
Food,
Physics,
Podcast,
Science Communication
Sunday 30 March 2014
arXiv trawl: March 2014 - Astrobiology
This month's arXiv trawl brings us to astrobiology.
The Habitable Epoch of the Early Universe
In recent weeks, the world of cosmology has been buzzing with the news that gravitational waves - remnants of the Big Bang - may have been detected by the BICEP2 experiment. But did life come not long afterwards?
Abraham Loeb from Harvard University has posited in his paper The Habitable Epoch of the Early Universe that conditions were rife for life just 10 million years after the Big Bang. Life on Earth is about 3.5 billions years old, and it took about 1 billion years to first appear after the Earth was formed. So to think that life could have formed only 10 million years after the Big Bang - a blink of the eye in cosmological terms - certainly goes against conventional thinking.
To come to this conclusion, Loeb looked at the conditions needed for life to take hold. Astrobiologists talk of the Goldilocks zone - an orbit around a star where a planet is not too hot or too cold to have liquid water, just like here on Earth. On Earth, we are aided by an atmosphere that keeps temperatures mild. But there are other ways that a planet can stay warm enough for liquid water - tidal heating, for example, is thought to maintain a liquid ocean under the surface of Jupiter's moon Europa.
Loeb postulates another mechanism, one that I presume wouldn't be particularly good for your health if we could replicate it, but nonetheless would keep water in liquid form. In fact, he proposes that the whole Universe would have had these conditions. The cosmic background radiation is the afterglow of the Big Bang, fills the Universe and these days has a temperature of about 3K. But it wasn't always this cold, and moments after the Big Bang it would have had a temperature of billions of degrees (more actually). It has been cooling since then, and around 10-17 million years after the origin of the Universe, the cosmic radiation would have made the Universe nice and balmy with liquid water.
But even if the temperature was right, is 10-17 millions years long enough for rocky planets to form on which life can live? And were there enough heavy elements around to get the chemistry of life going? Loeb thinks maybe. Matter was pretty evenly spread around the Universe at this age, but some areas would have been more dense than others. Where the matter was more or less dense than the average, this is called a perturbation. Assuming these perturbations had a Gaussian distribution (the classic bell-shaped normal distribution), massive stars of Hydrogen and Helium could have formed at the very edge of the distribution - 8.5 standard deviations from the mean. This is pretty unlikely; if you've done management courses you'll know that 6 standard deviations from the mean (that is, Six Sigma) is what you are aiming for when detecting defects. If you're making a product, a Six Sigma event would happen roughly every few hundred million products. A 8.5 sigma event would occur less than once every few hundred trillion products.
What this means is that if the density of the early Universe had Gaussian perturbations (and it's not settled science that it did), it's not very likely such stars could have formed and in the process created heavy elements, but the Universe is a big place!
Imagine then that there were rocky planets formed from exploding first generation stars that contained the elements of life in this temperate Universe. Could there be life? 10 millions years is not a long time for life to form - it took a billion years on Earth, and then longer to evolve. But let's say it did. Could it still be out there? Well, the issue with having cosmic radiation as life's heat source is that it cooled down over time and when the Universe was 17 million years old, it wouldn't have been warm enough for liquid water. So the planets on which life resided needed to have their own heat source, and then perhaps life could have escaped through panspermia. Loeb recommends that astronomers look for biosignatures in really old stars to further investigate, something that is now technically possible as we discover more early generation stars.
Loeb also makes a more philosophical point. Some proponents of the anthropic principle claim that various fundamental physical constants are what they are - some say "fine tuned" - because they must be those values to bring about life. Loeb argues that anthropic arguments are weak, at least with regards the cosmological constant, which describes the density of energy in the Universe, as this habitable epoch would have existed for various values of the constant.
I was actually going to post a few other astrobiology arXiv papers, but I think this is enough for one Saturday! I'd be interested to hear what others think of this idea that life could have existed so early in the Universe's life.
References:
The Habitable Epoch of the Early Universe
In recent weeks, the world of cosmology has been buzzing with the news that gravitational waves - remnants of the Big Bang - may have been detected by the BICEP2 experiment. But did life come not long afterwards?
Abraham Loeb from Harvard University has posited in his paper The Habitable Epoch of the Early Universe that conditions were rife for life just 10 million years after the Big Bang. Life on Earth is about 3.5 billions years old, and it took about 1 billion years to first appear after the Earth was formed. So to think that life could have formed only 10 million years after the Big Bang - a blink of the eye in cosmological terms - certainly goes against conventional thinking.
To come to this conclusion, Loeb looked at the conditions needed for life to take hold. Astrobiologists talk of the Goldilocks zone - an orbit around a star where a planet is not too hot or too cold to have liquid water, just like here on Earth. On Earth, we are aided by an atmosphere that keeps temperatures mild. But there are other ways that a planet can stay warm enough for liquid water - tidal heating, for example, is thought to maintain a liquid ocean under the surface of Jupiter's moon Europa.
Loeb postulates another mechanism, one that I presume wouldn't be particularly good for your health if we could replicate it, but nonetheless would keep water in liquid form. In fact, he proposes that the whole Universe would have had these conditions. The cosmic background radiation is the afterglow of the Big Bang, fills the Universe and these days has a temperature of about 3K. But it wasn't always this cold, and moments after the Big Bang it would have had a temperature of billions of degrees (more actually). It has been cooling since then, and around 10-17 million years after the origin of the Universe, the cosmic radiation would have made the Universe nice and balmy with liquid water.
But even if the temperature was right, is 10-17 millions years long enough for rocky planets to form on which life can live? And were there enough heavy elements around to get the chemistry of life going? Loeb thinks maybe. Matter was pretty evenly spread around the Universe at this age, but some areas would have been more dense than others. Where the matter was more or less dense than the average, this is called a perturbation. Assuming these perturbations had a Gaussian distribution (the classic bell-shaped normal distribution), massive stars of Hydrogen and Helium could have formed at the very edge of the distribution - 8.5 standard deviations from the mean. This is pretty unlikely; if you've done management courses you'll know that 6 standard deviations from the mean (that is, Six Sigma) is what you are aiming for when detecting defects. If you're making a product, a Six Sigma event would happen roughly every few hundred million products. A 8.5 sigma event would occur less than once every few hundred trillion products.
What this means is that if the density of the early Universe had Gaussian perturbations (and it's not settled science that it did), it's not very likely such stars could have formed and in the process created heavy elements, but the Universe is a big place!
Imagine then that there were rocky planets formed from exploding first generation stars that contained the elements of life in this temperate Universe. Could there be life? 10 millions years is not a long time for life to form - it took a billion years on Earth, and then longer to evolve. But let's say it did. Could it still be out there? Well, the issue with having cosmic radiation as life's heat source is that it cooled down over time and when the Universe was 17 million years old, it wouldn't have been warm enough for liquid water. So the planets on which life resided needed to have their own heat source, and then perhaps life could have escaped through panspermia. Loeb recommends that astronomers look for biosignatures in really old stars to further investigate, something that is now technically possible as we discover more early generation stars.
Loeb also makes a more philosophical point. Some proponents of the anthropic principle claim that various fundamental physical constants are what they are - some say "fine tuned" - because they must be those values to bring about life. Loeb argues that anthropic arguments are weak, at least with regards the cosmological constant, which describes the density of energy in the Universe, as this habitable epoch would have existed for various values of the constant.
I was actually going to post a few other astrobiology arXiv papers, but I think this is enough for one Saturday! I'd be interested to hear what others think of this idea that life could have existed so early in the Universe's life.
References:
- Abraham Loeb (2013). The Habitable Epoch of the Early Universe arXiv arXiv: 1312.0613v2
Tuesday 18 March 2014
Copper Nanotubes
It's not often a chemistry journal article will make you laugh out loud. From Structural and electronic properties of chiral single-wall copper nanotubes - enjoy.
Abstract:
The structural, energetic and electronic properties of chiral (n, m) (3⩽n⩽6, n/2⩽m⩽n) single-wall copper nanotubes (CuNTs) have been investigated by using projector-augmented wave method based on density-functional theory. The (4, 3) CuNT is energetically stable and should be observed experimentally in both free-standing and tip-suspended conditions, whereas the (5, 5) and (6, 4) CuNTs should be observed in free-standing and tip-suspended conditions, respectively. The number of conductance channels in the CuNTs does not always correspond to the number of atomic strands comprising the nanotube. Charge density contours show that there is an enhanced interatomic interaction in CuNTs compared with Cu bulk. Current transporting states display different periods and chirality, the combined effects of which lead to weaker chiral currents on CuNTs.
References:
Abstract:
The structural, energetic and electronic properties of chiral (n, m) (3⩽n⩽6, n/2⩽m⩽n) single-wall copper nanotubes (CuNTs) have been investigated by using projector-augmented wave method based on density-functional theory. The (4, 3) CuNT is energetically stable and should be observed experimentally in both free-standing and tip-suspended conditions, whereas the (5, 5) and (6, 4) CuNTs should be observed in free-standing and tip-suspended conditions, respectively. The number of conductance channels in the CuNTs does not always correspond to the number of atomic strands comprising the nanotube. Charge density contours show that there is an enhanced interatomic interaction in CuNTs compared with Cu bulk. Current transporting states display different periods and chirality, the combined effects of which lead to weaker chiral currents on CuNTs.
References:
- Duan, Y., Zhang, J., & Xu, K. (2014). Structural and electronic properties of chiral single-wall copper nanotubes Science China Physics, Mechanics and Astronomy, 57 (4), 644-651 DOI: 10.1007/s11433-013-5387-8
Cricket is a matter of life and death
A guest post by Bernard Kachoyan
Ever thought of batting as a life and death struggle against hostile forces? It always seemed that way when I batted. Well you might be more accurate than you think.
The experience of a batsman can be described as a microcosm of life: when you go out to bat you are “born”, when you get out you “die”. But what happens when you are Not Out (NO)? More subtly, when you are Not Out you simply leave the sample pool, that is you live for a while then you stop being measured. In the parlance of statistics, this becomes “censored” data. In medical research the “born” moment is equivalent to when a patient is first being monitored (e.g. survival times of cancer patients after diagnosis). The question in medicine becomes, what is the “survival function”, the probability that a patient survives for X years after the start of observation? And how does the life expectancy curve of one population differ from another, in particular are people treated in a particular way different to a control group).
These type of problems are commonly addressed using Kaplan-Meier (KM) estimators. In economics, it can be used to measure the length of time people remain unemployed after a job loss. In engineering, it can be used to measure the time until failure of machine parts. Here we will apply those ideas to batting in cricket.
An important property of the KM estimate is that it is non-parametric in the sense that it does not assume any type of Normal distribution in the data, something which is patently untrue for this type of data. It also only uses the data itself to generate a survival curve (the term given to the survival function after it is drawn on a chart) and associated confidence limits. Hence the KM survival curve may look odd in that it declines in a series of steps at the observation times and the function between sampled observations is constant. However, when a large enough sample is taken, the KM approaches the true survival function for that population.
An important advantage of the KM method is that it can take into account censored data, particularly censoring if a patient withdraws from a study, i.e. is lost from the sample before the final outcome is observed. This makes it perfect for dealing with the NOs as described above.
When referring to batsmen, “death” means getting out, being “censored” means completing the innings before getting out (remaining NOT OUT) and “time” means number of runs scored (tj = scoring j runs). The idea of the KP estimator is pretty simple.
Such KM curves have attractive properties, which perhaps explain their popularity in medical research for over half a century. They are fairly easy to calculate and they provide a visual depiction of all of the raw data—including the times of actual failure, yet still give a sense of the underlying probability model.
Let’s now apply the KM estimator to some cricket statistics. In this case I have arbitrarily chosen the batting statistics of Steve Waugh, Sachin Tendulkar (up to 2010 to keep roughly the same number of innings as Waugh) and Don Bradman. Without the consideration of the censored data (the Not Outs), then the curve simply reverts to the percentage of scores less than or equal to a certain number of runs - the value on the x axis. This is shown in Figure 1. Bradman of course is still clearly in a class of his own.
If we now properly include the NOs in the formulation we get survival curves as shown in Figure 2. I have omitted the Tendulkar curves here for clarity. As expected, the survival rates go up as the NOs do not indicate a true “death”. In Steve Waugh’s case, the increase is noticeable (I didn’t say “significant”!) since he has a large number of NOs compared to most batsmen within his number of test innings.
This is shown more starkly in Figure 3, where I have plotted both the censored and uncensored curves for Waugh and Tendulkar. I have plotted them on a logarithmic scale to highlight differences. It can be seen that Waugh’s censored survival curve (cf the raw curve) tracks Tedulkar’s very closely until a score of about 100. This reflects the large number of Waugh’s NOT OUTS (43 vs 29 in roughly the same number of innings, 260 vs 278). The diversity of the curves after that not only reflects the propensity of Tendulkar to go on to big scores, but also that a large number of Tendulkar’s not outs were after he had already scored a century (15 vs 2 for Waugh).
Figure 1 and Figure 2
Figure 3
The basic KM methodology has been around since the 1950s and of course has been extended in various ways by professional statisticians and alternative methods proposed. But their simplicity means it is still widely used.
There are several drawbacks, some of which can be seen in Figures 1-3. Firstly, the vertical drop at specific times is drawn from the data, and should not be seen as indicating particular “danger times”. This is particularly evident at larger scores where the naturally small sample size means that three are fewer data points (i.e. scores where a batsman actually gets out). So some sort of smoothing of the curve is thus necessary to provide an estimate of the true underlying functional dependency.
This reduction on the sample at large values also means the effect of each individual failure on the size of the step-down increases.
Another drawback of the KM method is that the estimate of the probability of surviving each “danger time” depends only on the number of patients at risk at that time. So if there are censored values the actual time between the last failure and the time of censoring is not considered.
It is natural at this point to question the underlying assumption of the KM method that the patients (i.e. innings) are independent. Is it common to talk in cricket about form slumps or purple patches. This can be examined statistically by considering the autocorrelation function of the scores, shown in Figure 4 assuming stationarity, where Waugh has been omitted for clarity. The figure clearly shown no evidence for time/innings correlation and although strictly speaking un-correlation does not imply true independence, it is evidence that the innings can be considered independent for the purposes of this analysis.
Figure 4
The question naturally arises is whether we can say anything statistically about whether the difference between survival curves is significant (cf treated vs control groups in medicine). Confidence intervals can be placed on the derived curves using the so-called Greenwood formula, dating back to the 1920s, or its more modern variations. These will suffer the drawback of being less accurate in the tail of the curves, where by definition the sample size is smallest. Not only will the formulas return a greater error because of that, the validity per se comes into question as the expressions rely on a normal approximation (through the central limit theorem), hence can only be considered valid for remaining innings bigger than say 20 or so.
Unfortunately, as we have seen above it is in the tails of the curve where the distinctions between very good and great batsman are often found.
Similarly a number of ways of comparing curves exist in the statistical literature, such as Kolmogorov–Smirnov test, the Log-rank test or the Cox proportional hazards test. These can rapidly become very mathematically complicated, especially if we want to try and distinguish one part of the curve specifically (say the high end).
Although I haven’t done the hard yards in this article, my intuition tells me we might be hard pressed to prove statistically significant differences between the Waugh and Tendulkar corrected survival curves. This is the drawback of applying statistical tests into areas where their applicability is not clear.
In any case, it can be seen that batting can most certainly be considered a true life and death struggle.
Ever thought of batting as a life and death struggle against hostile forces? It always seemed that way when I batted. Well you might be more accurate than you think.
The experience of a batsman can be described as a microcosm of life: when you go out to bat you are “born”, when you get out you “die”. But what happens when you are Not Out (NO)? More subtly, when you are Not Out you simply leave the sample pool, that is you live for a while then you stop being measured. In the parlance of statistics, this becomes “censored” data. In medical research the “born” moment is equivalent to when a patient is first being monitored (e.g. survival times of cancer patients after diagnosis). The question in medicine becomes, what is the “survival function”, the probability that a patient survives for X years after the start of observation? And how does the life expectancy curve of one population differ from another, in particular are people treated in a particular way different to a control group).
These type of problems are commonly addressed using Kaplan-Meier (KM) estimators. In economics, it can be used to measure the length of time people remain unemployed after a job loss. In engineering, it can be used to measure the time until failure of machine parts. Here we will apply those ideas to batting in cricket.
An important property of the KM estimate is that it is non-parametric in the sense that it does not assume any type of Normal distribution in the data, something which is patently untrue for this type of data. It also only uses the data itself to generate a survival curve (the term given to the survival function after it is drawn on a chart) and associated confidence limits. Hence the KM survival curve may look odd in that it declines in a series of steps at the observation times and the function between sampled observations is constant. However, when a large enough sample is taken, the KM approaches the true survival function for that population.
An important advantage of the KM method is that it can take into account censored data, particularly censoring if a patient withdraws from a study, i.e. is lost from the sample before the final outcome is observed. This makes it perfect for dealing with the NOs as described above.
When referring to batsmen, “death” means getting out, being “censored” means completing the innings before getting out (remaining NOT OUT) and “time” means number of runs scored (tj = scoring j runs). The idea of the KP estimator is pretty simple.
- The conditional probability that an individual dies in the time interval from ti to ti+1, given survival up to time ti is estimated as di/ni where di is the number who die at time ti, and ni is the number alive just before time ti, including those who will die at time ti
- Then the conditional probability that an individual survives beyond ti+1 is (ni – di)/ ni
- When there is no censoring, ni is just the number of survivors just prior to time ti. With censoring, ni is the number of survivors minus the number of losses (censored cases). It is only those surviving cases that are still being observed (have not yet been censored) that are "at risk" of an observed death
- The KP estimator of the survivor function at time t for tj ≤ t ≤ tj+1 is then formally:
Such KM curves have attractive properties, which perhaps explain their popularity in medical research for over half a century. They are fairly easy to calculate and they provide a visual depiction of all of the raw data—including the times of actual failure, yet still give a sense of the underlying probability model.
Let’s now apply the KM estimator to some cricket statistics. In this case I have arbitrarily chosen the batting statistics of Steve Waugh, Sachin Tendulkar (up to 2010 to keep roughly the same number of innings as Waugh) and Don Bradman. Without the consideration of the censored data (the Not Outs), then the curve simply reverts to the percentage of scores less than or equal to a certain number of runs - the value on the x axis. This is shown in Figure 1. Bradman of course is still clearly in a class of his own.
If we now properly include the NOs in the formulation we get survival curves as shown in Figure 2. I have omitted the Tendulkar curves here for clarity. As expected, the survival rates go up as the NOs do not indicate a true “death”. In Steve Waugh’s case, the increase is noticeable (I didn’t say “significant”!) since he has a large number of NOs compared to most batsmen within his number of test innings.
This is shown more starkly in Figure 3, where I have plotted both the censored and uncensored curves for Waugh and Tendulkar. I have plotted them on a logarithmic scale to highlight differences. It can be seen that Waugh’s censored survival curve (cf the raw curve) tracks Tedulkar’s very closely until a score of about 100. This reflects the large number of Waugh’s NOT OUTS (43 vs 29 in roughly the same number of innings, 260 vs 278). The diversity of the curves after that not only reflects the propensity of Tendulkar to go on to big scores, but also that a large number of Tendulkar’s not outs were after he had already scored a century (15 vs 2 for Waugh).
Figure 1 and Figure 2
Figure 3
The basic KM methodology has been around since the 1950s and of course has been extended in various ways by professional statisticians and alternative methods proposed. But their simplicity means it is still widely used.
There are several drawbacks, some of which can be seen in Figures 1-3. Firstly, the vertical drop at specific times is drawn from the data, and should not be seen as indicating particular “danger times”. This is particularly evident at larger scores where the naturally small sample size means that three are fewer data points (i.e. scores where a batsman actually gets out). So some sort of smoothing of the curve is thus necessary to provide an estimate of the true underlying functional dependency.
This reduction on the sample at large values also means the effect of each individual failure on the size of the step-down increases.
Another drawback of the KM method is that the estimate of the probability of surviving each “danger time” depends only on the number of patients at risk at that time. So if there are censored values the actual time between the last failure and the time of censoring is not considered.
It is natural at this point to question the underlying assumption of the KM method that the patients (i.e. innings) are independent. Is it common to talk in cricket about form slumps or purple patches. This can be examined statistically by considering the autocorrelation function of the scores, shown in Figure 4 assuming stationarity, where Waugh has been omitted for clarity. The figure clearly shown no evidence for time/innings correlation and although strictly speaking un-correlation does not imply true independence, it is evidence that the innings can be considered independent for the purposes of this analysis.
Figure 4
The question naturally arises is whether we can say anything statistically about whether the difference between survival curves is significant (cf treated vs control groups in medicine). Confidence intervals can be placed on the derived curves using the so-called Greenwood formula, dating back to the 1920s, or its more modern variations. These will suffer the drawback of being less accurate in the tail of the curves, where by definition the sample size is smallest. Not only will the formulas return a greater error because of that, the validity per se comes into question as the expressions rely on a normal approximation (through the central limit theorem), hence can only be considered valid for remaining innings bigger than say 20 or so.
Unfortunately, as we have seen above it is in the tails of the curve where the distinctions between very good and great batsman are often found.
Similarly a number of ways of comparing curves exist in the statistical literature, such as Kolmogorov–Smirnov test, the Log-rank test or the Cox proportional hazards test. These can rapidly become very mathematically complicated, especially if we want to try and distinguish one part of the curve specifically (say the high end).
Although I haven’t done the hard yards in this article, my intuition tells me we might be hard pressed to prove statistically significant differences between the Waugh and Tendulkar corrected survival curves. This is the drawback of applying statistical tests into areas where their applicability is not clear.
In any case, it can be seen that batting can most certainly be considered a true life and death struggle.
Friday 14 March 2014
What to do with old swimming caps
Since the start of the 2013/2014 ocean swimming season, around 20,000 swimming caps have been handed out to competitors at the various ocean swims in NSW. If you are a regular ocean swimmer, it doesn’t take too long before you have more caps than you know what to do with. Some may be used again in the pool, and some given to friends and family, but the vast majority of these caps will end up in land-fill having spent most of the season at the bottom of your swimming bag.
This year at the 2014 Coogee Island Challenge, we are running a swimming cap “amnesty”. Bring down your old caps that you no longer use, or donate after your swim, and we will save your caps from an ignominious end in land-fill or as rubbish scattered on the beach. There are no companies that recycle swimming caps, mostly made of latex or silicone, so the collected caps will be donated to the following organisations for re-use (which is better than recycling anyway):
- The Frugal Forest is a One Off Makery project, supported by Midwaste, the Australia Council for the Arts and Glasshouse Port Macquarie. Drawing in artists, musicians, scientists, community, business and industry, we aim to build an intricately detailed forest entirely from salvage. Why? Because nothing is wasted in the forest, and we could really learn from that.
- Reverse Garbage is Australia’s largest creative reuse centre, committed to diverting resources from landfill – approximately 35,000 cubic metres or 100 football fields’ worth per year. Almost 40 years from its founding, Reverse Garbage is now an internationally recognised, award-winning environmental co-operative committed to promoting sustainability through the reuse of resources, as well as providing support to other community, creative and educational organisations.
- Various Council Pools to be used by those needing a cap on the day.
Where, specifically: The oceanswims.com marquee.
When: 13 April 8.30-11.30am
There seems to be a gap in the market here for an entrepreneurial organic chemist. Swimming caps are usually made from latex or silicone (rubbery materials) which are polymers. Are there any options at all for recycling such materials? From what I read, it's just not economical, but I wonder will that change as the raw starting materials for such products (that is, petroleum) become rarer and so more expensive.
Tuesday 11 March 2014
Ep 153: Complex Network Analysis in Cricket
Complex network analysis is an area of network science and part of graph theory that can be used to rank things, one of the most famous examples of which is the Google PageRank algorithm. But it can also be applied to sport. Cricket is a sport in which it is difficult to rank teams (there are three forms of the game, the various countries do not play each other very often etc.), whilst it is notoriously difficult to rank individual players (for how the ICC do it, see Ep 107: Ranking Cricketers).
Satyam Mukherjee at Northwestern University became a bit famous when The economist picked up his work (more famous than when we picked it up!) and he has published extensively on complex network analysis as applied to cricket rankings. I had a very interesting chat with Satyam about his various works concerning the evaluation of cricket strategy, leadership, team and individual performance, and the papers we discuss in the podcast are listed below. One of the more interesting findings was that left-handed captains and batsmen are generally ranked higher than their right-handed counterparts, whilst this is not true for left-handed bowlers.
Tune in to this episode here:
Songs in the podcast:
- Satyam Mukherjee (2013). Ashes 2013 - A network theory analysis of Cricket strategies arXiv arXiv: 1308.5470v1
- Satyam Mukherjee (2013). Left handedness and Leadership in Interactive Contests arXiv arXiv: 1303.6686v1
- Satyam Mukherjee (2012). Quantifying individual performance in Cricket - A network analysis of Batsmen and Bowlers arXiv arXiv: 1208.5184v2
- Satyam Mukherjee (2012). Complex Network Analysis in Cricket : Community structure, player's role and performance index arXiv arXiv: 1206.4835v4
- Satyam Mukherjee (2012). Identifying the greatest team and captain - A complex network approach to cricket matches arXiv arXiv: 1201.1318v2
Thursday 20 February 2014
instascience
#instascience is a project by @kip_stewart and @kickstartphysics which involves short, sharp science videos on instagram. Here are a couple of videos, it's worth checking out.
Monday 10 February 2014
World beer consumption and scientific productivity
The above is posted without any commentary from me and comes from a site called figshare. This site is part of the open data movement, which posits that scientific data should be available for everyone to analyse and then draw their own conclusions. Hence, I leave it to you to follow the references, grab the data and see what you think - you might like the keep in mind the idea that correlation does not equal causation - there might just be something driving both GDP and beer drinking.
References:
Christopher J. Lortie (2010). Letter to the Editor: A global comment on scientific publications, productivity, people, and beer Scientometrics DOI: 10.1007/s11192-009-0077-z
Christopher Lortie (2013). World beer consumption & scientific productivity figshare DOI: 10.6084/m9.figshare.664162
Friday 31 January 2014
arXiv trawl: January 2014 - Social Media
An interesting way of keeping your ear to the ground regarding the latest happenings in the scientific world is to monitor the arXiv. The arXiv (pronounced "archive”) is a repository for electronic preprints of scientific papers. Scholarly peer review of scientific papers can take a long time, so many scientists use archives like this to share their findings and to seek comment on their work before official publication. As such, the content of the arXiv is many and varied; there are weird and wonderful topics, and papers in various states of review. Some will never get published anywhere else, whilst others are seminal (for example, Perelman’s proof of the Poincare conjecture). But by the very definition of preprint, they are all calling for comment. I dived in recently, and here are some highlights of the last few months on the arXiv concerning social media.
Because MySpace --> Facebook
Researchers from Princeton, in the report Epidemiological modeling of online social network dynamics, modelled the rise and fall of MySpace by likening it to a disease and used epidemiological methods to model how it infected the population, and how the population eventually became immune. The number of times the term “MySpace” was searched for in Google was used as a measure of the site's popularity (or how infected the population was). This data was sourced from Google Trends. If you would like to read more about the maths involved, check out Sick of Facebook? Read on… in Plus. As you can see below, they fitted a nice curve to the data. Cute.
All good so far. Stories concerning social media are favourites of conventional media, and naturally this was picked up: Facebook could fade out like a disease. What the newspapers focused on was the work fitting the epidemiological model to Facebook data (Google searches for “Facebook”) and the conclusion that Facebook is heading for a "rapid decline", and between 2015 and 2017 will lose 80% of the users.
There are two questions that arise from this:
1) Is Google Trends data a good measure of the popularity of a website?
2) Just because the MySpace data fits this curve does not mean Facebook will.
Facebook was made aware of this study, and their reply was pretty excellent. They did their own study using Google Trends on searches for "Princeton" and found that:
“Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.
While we are concerned for Princeton University, we are even more concerned about the fate of the planet — Google Trends for "air" have also been declining steadily, and our projections show that by the year 2060 there will be no air left.”
Thanks to FlowingData for the link to Facebook's reply.
It’s also a nice example of a study that will never be followed up by the conventional media. Even if this work makes an entirely accurate prediction of Facebook’s future, there will be no follow up newspaper article in 2020.
How to get yourself retweeted, sort of
Everybody likes to be popular. To this end, Ronald Hochreiter and Christoph Waldhauser authored A Genetic Algorithm to Optimize a Tweet for Retweetability in which they look at the factors that make a tweet popular. They developed a Twitter-like network of connected people and pushed tweets into the network to see which ones were retweeted. They modified the tweets each time they ran the simulation using a genetic algorithm. These algorithms work like evolution. Each tweet had a number of “chromosomes” that were modified with random mutations each time the model was run, with those mutations that brought about better results (that is, more retweets) kept, and those that didn’t, further randomly mutated. The chromosomes of the tweet concerned the polarity of the tweet (I think this means whether it’s positive or negative, it’s not well explained), how emotional the tweet is, the length of the tweet, the time of day it’s sent, and the number of URLs and hastags contained.
So how should you compose your next tweet? Well, unfortunately the paper doesn’t really say. It shows some results that don’t translate particularly well into reality. Their conclusions are that the genetic algorithm works pretty well, and that more work is needed. Fair enough, there are plenty of papers out there that simply outline how a new model works rather than having exciting results; I’ve written a few myself.
Don’t share your mobile phone number on social networks
Call Me MayBe: Understanding Nature and Risks of Sharing Mobile Numbers on Online Social Networks wins this month’s award for best reference to a pop song in a scientific paper title. The researchers examined how sensitive personal information spreads around social networks by collecting 76347 unique mobile numbers posted by 85905 users on Twitter and Facebook. They then used these mobile numbers to gain sensitive information about their owners from other social networks.
This in itself is an interesting study of how easy it is to collect personal information online, but they didn’t leave it there. They then communicated the observed risks to the owners by calling them up with the mobile numbers they found. Some users were surprised to know about the online presence of their number, while others had intentionally posted it online for business purposes. They found that 38.3% of users who were unaware of the online presence of their number had posted their number themselves on the social network.
Where’s my hoverboard?
Searching the Internet for evidence of time travellers makes the bold claim in its abstract that, regarding the search for time travellers online, it is “perhaps the most comprehensive to date”. Essentially what they did was search the web for information that shouldn’t have been known at the time of publishing – only time travellers from the future could have possessed such prescient knowledge. The two events they were looking for evidence of were the viewing of Comet ISON and the inauguration of Pope Francis – both big events that people in the future would know and care about. To do this, they needed to look for information published before these events occurred. They found that Bing and Facebook were no good for this study as they didn’t make clear at what date the information was published, or the date could be easily edited. So they used Twitter, on which tweets are nicely time-stamped. They called for time travellers to use the hashtag #ICanChangeThePast2 in September 2013 and looked at tweets before this time. They also examined Google Trends for searches a time traveller might have made.
Disappointingly, they found no evidence that any time travellers concerned themselves with posting on twitter or doing google searches.
I am going to go for a run before I press publish on this post. So, if there are any time travellers out there, come and join me at Erskineville Oval at 1pm Thursday 30th January.
(Edit: There were two people and a dog down at the oval. The dog chased and barked at me in a very knowing fashion. The time-travellers of the future are apparently long haired, short brown dachshunds.)
References:
- John Cannarella, & Joshua A. Spechler (2014). Epidemiological modeling of online social network dynamics. arXiv: 1401.4208v1
- Ronald Hochreiter, & Christoph Waldhauser (2014). A Genetic Algorithm to Optimize a Tweet for Retweetability Proceedings of MENDEL 2013: 13-18. 2013. arXiv: 1401.4857v1
- Prachi Jain, & Ponnurangam Kumaraguru (2013). Call Me MayBe: Understanding Nature and Risks of Sharing Mobile Numbers on Online Social Networks. arXiv: 1312.3441v1
- Robert J. Nemiroff, & Teresa Wilson (2013). Searching the Internet for evidence of time travelers. arXiv: 1312.7128v1
Thursday 16 January 2014
I think you've had enough, Mr. Bond
James Bond is likely to be impotent, at high risk of liver disease, and the fact he likes his martini "shaken, not stirred" is because of alcohol-induced tremors.
If you weren't already convinced that a real-life James Bond would be a terrible spy - he tells people his actual name for goodness sake - the article Were James Bond’s drinks shaken because of alcohol induced tremor? outlines the likely health issues Britain's most famous fictional spy would be suffering in real life due to his outrageous alcoholism.
The researchers read all 14 James Bond books and noted down each time he had a drink, and how much. They also noted when he was unable to drink - for instance, when incarcerated. Not including the days when he was unable to drink, they found his weekly alcohol consumption to be 92 units (standard drinks in Australia - 10 ml of pure alcohol), over four times the recommended amount. His maximum daily consumption was 49.8 units. Out of the 87.5 days he was able to drink, he only had 12.5 alcohol free days.
This type of behaviour is not consistent with his Lothario character, given that his sexual function is likely to be severely impaired by his drinking. It's also not particularly consistent with his ability to shoot straight outside of the bedroom, where sobriety is a necessity to defeat the bad guys. On the other hand, drinking is likely to have decreased his risk aversion, and previous studies have shown that drinking encourages unsafe sex.
But before you become too crushed by the fact that a fictional hero might not actually be scientifically sound, perhaps Bond is smarter than the authors of this report suspect. A 1999 report Shaken, not stirred: bioanalytical study of the antioxidant activities of martinis found that shaken martinis have superior antioxidant activity and this could have decreased his risk of cataracts and cardiovascular disease. There is hope. I don't feel as bad as I did when I discovered that Santa Claus is a fat, diabetic drunk.
Here's Bond in drinking action in Casino Royale.
And here's a handy infographic:
References:
Graham Johnson, Indra Neil Guha & Patrick Davies (2013). Were James Bond’s drinks shaken because of alcohol induced tremor? BMJ DOI: 10.1136/bmj.f7255
Trevithick CC, Chartrand MM, Wahlman J, Rahman F, Hirst M, & Trevithick JR (1999). Shaken, not stirred: bioanalytical study of the antioxidant activities of martinis. BMJ (Clinical research ed.), 319 (7225), 1600-2 PMID: 10600955
If you weren't already convinced that a real-life James Bond would be a terrible spy - he tells people his actual name for goodness sake - the article Were James Bond’s drinks shaken because of alcohol induced tremor? outlines the likely health issues Britain's most famous fictional spy would be suffering in real life due to his outrageous alcoholism.
The researchers read all 14 James Bond books and noted down each time he had a drink, and how much. They also noted when he was unable to drink - for instance, when incarcerated. Not including the days when he was unable to drink, they found his weekly alcohol consumption to be 92 units (standard drinks in Australia - 10 ml of pure alcohol), over four times the recommended amount. His maximum daily consumption was 49.8 units. Out of the 87.5 days he was able to drink, he only had 12.5 alcohol free days.
This type of behaviour is not consistent with his Lothario character, given that his sexual function is likely to be severely impaired by his drinking. It's also not particularly consistent with his ability to shoot straight outside of the bedroom, where sobriety is a necessity to defeat the bad guys. On the other hand, drinking is likely to have decreased his risk aversion, and previous studies have shown that drinking encourages unsafe sex.
But before you become too crushed by the fact that a fictional hero might not actually be scientifically sound, perhaps Bond is smarter than the authors of this report suspect. A 1999 report Shaken, not stirred: bioanalytical study of the antioxidant activities of martinis found that shaken martinis have superior antioxidant activity and this could have decreased his risk of cataracts and cardiovascular disease. There is hope. I don't feel as bad as I did when I discovered that Santa Claus is a fat, diabetic drunk.
Here's Bond in drinking action in Casino Royale.
And here's a handy infographic:
References:
Graham Johnson, Indra Neil Guha & Patrick Davies (2013). Were James Bond’s drinks shaken because of alcohol induced tremor? BMJ DOI: 10.1136/bmj.f7255
Trevithick CC, Chartrand MM, Wahlman J, Rahman F, Hirst M, & Trevithick JR (1999). Shaken, not stirred: bioanalytical study of the antioxidant activities of martinis. BMJ (Clinical research ed.), 319 (7225), 1600-2 PMID: 10600955
Subscribe to:
Posts (Atom)