Sunday 30 March 2014

arXiv trawl: March 2014 - Astrobiology

This month's arXiv trawl brings us to astrobiology.

The Habitable Epoch of the Early Universe

In recent weeks, the world of cosmology has been buzzing with the news that gravitational waves - remnants of the Big Bang - may have been detected by the BICEP2 experiment. But did life come not long afterwards?

Abraham Loeb from Harvard University has posited in his paper The Habitable Epoch of the Early Universe that conditions were rife for life just 10 million years after the Big Bang. Life on Earth is about 3.5 billions years old, and it took about 1 billion years to first appear after the Earth was formed. So to think that life could have formed only 10 million years after the Big Bang - a blink of the eye in cosmological terms - certainly goes against conventional thinking.

To come to this conclusion, Loeb looked at the conditions needed for life to take hold. Astrobiologists talk of the Goldilocks zone - an orbit around a star where a planet is not too hot or too cold to have liquid water, just like here on Earth. On Earth, we are aided by an atmosphere that keeps temperatures mild. But there are other ways that a planet can stay warm enough for liquid water - tidal heating, for example, is thought to maintain a liquid ocean under the surface of Jupiter's moon Europa.

Loeb postulates another mechanism, one that I presume wouldn't be particularly good for your health if we could replicate it, but nonetheless would keep water in liquid form. In fact, he proposes that the whole Universe would have had these conditions. The cosmic background radiation is the afterglow of the Big Bang, fills the Universe and these days has a temperature of about 3K. But it wasn't always this cold, and moments after the Big Bang it would have had a temperature of billions of degrees (more actually). It has been cooling since then, and around 10-17 million years after the origin of the Universe, the cosmic radiation would have made the Universe nice and balmy with liquid water.

But even if the temperature was right, is 10-17 millions years long enough for rocky planets to form on which life can live? And were there enough heavy elements around to get the chemistry of life going? Loeb thinks maybe. Matter was pretty evenly spread around the Universe at this age, but some areas would have been more dense than others. Where the matter was more or less dense than the average, this is called a perturbation. Assuming these perturbations had a Gaussian distribution (the classic bell-shaped normal distribution), massive stars of Hydrogen and Helium could have formed at the very edge of the distribution - 8.5 standard deviations from the mean. This is pretty unlikely; if you've done management courses you'll know that 6 standard deviations from the mean (that is, Six Sigma) is what you are aiming for when detecting defects. If you're making a product, a Six Sigma event would happen roughly every few hundred million products. A 8.5 sigma event would occur less than once every few hundred trillion products.

What this means is that if the density of the early Universe had Gaussian perturbations (and it's not settled science that it did), it's not very likely such stars could have formed and in the process created heavy elements, but the Universe is a big place!

Imagine then that there were rocky planets formed from exploding first generation stars that contained the elements of life in this temperate Universe. Could there be life? 10 millions years is not a long time for life to form - it took a billion years on Earth, and then longer to evolve. But let's say it did. Could it still be out there? Well, the issue with having cosmic radiation as life's heat source is that it cooled down over time and when the Universe was 17 million years old, it wouldn't have been warm enough for liquid water. So the planets on which life resided needed to have their own heat source, and then perhaps life could have escaped through panspermia. Loeb recommends that astronomers look for biosignatures in really old stars to further investigate, something that is now technically possible as we discover more early generation stars.

Loeb also makes a more philosophical point. Some proponents of the anthropic principle claim that various fundamental physical constants are what they are - some say "fine tuned" - because they must be those values to bring about life. Loeb argues that anthropic arguments are weak, at least with regards the cosmological constant, which describes the density of energy in the Universe, as this habitable epoch would have existed for various values of the constant.

I was actually going to post a few other astrobiology arXiv papers, but I think this is enough for one Saturday! I'd be interested to hear what others think of this idea that life could have existed so early in the Universe's life.

  1. Abraham Loeb (2013). The Habitable Epoch of the Early Universe arXiv arXiv: 1312.0613v2

Tuesday 18 March 2014

Copper Nanotubes

It's not often a chemistry journal article will make you laugh out loud. From Structural and electronic properties of chiral single-wall copper nanotubes - enjoy.

The structural, energetic and electronic properties of chiral (n, m) (3⩽n⩽6, n/2⩽m⩽n) single-wall copper nanotubes (CuNTs) have been investigated by using projector-augmented wave method based on density-functional theory. The (4, 3) CuNT is energetically stable and should be observed experimentally in both free-standing and tip-suspended conditions, whereas the (5, 5) and (6, 4) CuNTs should be observed in free-standing and tip-suspended conditions, respectively. The number of conductance channels in the CuNTs does not always correspond to the number of atomic strands comprising the nanotube. Charge density contours show that there is an enhanced interatomic interaction in CuNTs compared with Cu bulk. Current transporting states display different periods and chirality, the combined effects of which lead to weaker chiral currents on CuNTs.

  • Duan, Y., Zhang, J., & Xu, K. (2014). Structural and electronic properties of chiral single-wall copper nanotubes Science China Physics, Mechanics and Astronomy, 57 (4), 644-651 DOI: 10.1007/s11433-013-5387-8

Cricket is a matter of life and death

A guest post by Bernard Kachoyan

Ever thought of batting as a life and death struggle against hostile forces? It always seemed that way when I batted. Well you might be more accurate than you think.

The experience of a batsman can be described as a microcosm of life: when you go out to bat you are “born”, when you get out you “die”. But what happens when you are Not Out (NO)? More subtly, when you are Not Out you simply leave the sample pool, that is you live for a while then you stop being measured. In the parlance of statistics, this becomes “censored” data. In medical research the “born” moment is equivalent to when a patient is first being monitored (e.g. survival times of cancer patients after diagnosis). The question in medicine becomes, what is the “survival function”, the probability that a patient survives for X years after the start of observation? And how does the life expectancy curve of one population differ from another, in particular are people treated in a particular way different to a control group).

These type of problems are commonly addressed using Kaplan-Meier (KM) estimators. In economics, it can be used to measure the length of time people remain unemployed after a job loss. In engineering, it can be used to measure the time until failure of machine parts. Here we will apply those ideas to batting in cricket.

An important property of the KM estimate is that it is non-parametric in the sense that it does not assume any type of Normal distribution in the data, something which is patently untrue for this type of data. It also only uses the data itself to generate a survival curve (the term given to the survival function after it is drawn on a chart) and associated confidence limits. Hence the KM survival curve may look odd in that it declines in a series of steps at the observation times and the function between sampled observations is constant. However, when a large enough sample is taken, the KM approaches the true survival function for that population.

An important advantage of the KM method is that it can take into account censored data, particularly censoring if a patient withdraws from a study, i.e. is lost from the sample before the final outcome is observed. This makes it perfect for dealing with the NOs as described above.

When referring to batsmen, “death” means getting out, being “censored” means completing the innings before getting out (remaining NOT OUT) and “time” means number of runs scored (tj = scoring j runs). The idea of the KP estimator is pretty simple.
  1. The conditional probability that an individual dies in the time interval from ti to ti+1, given survival up to time ti is estimated as di/ni where di is the number who die at time ti, and ni is the number alive just before time ti, including those who will die at time ti
  2. Then the conditional probability that an individual survives beyond ti+1 is (ni – di)/ ni
  3. When there is no censoring, ni is just the number of survivors just prior to time ti. With censoring, ni is the number of survivors minus the number of losses (censored cases). It is only those surviving cases that are still being observed (have not yet been censored) that are "at risk" of an observed death
  4. The KP estimator of the survivor function at time t for tj ≤ t ≤ tj+1 is then formally:

Such KM curves have attractive properties, which perhaps explain their popularity in medical research for over half a century. They are fairly easy to calculate and they provide a visual depiction of all of the raw data—including the times of actual failure, yet still give a sense of the underlying probability model.

Let’s now apply the KM estimator to some cricket statistics. In this case I have arbitrarily chosen the batting statistics of Steve Waugh, Sachin Tendulkar (up to 2010 to keep roughly the same number of innings as Waugh) and Don Bradman. Without the consideration of the censored data (the Not Outs), then the curve simply reverts to the percentage of scores less than or equal to a certain number of runs - the value on the x axis. This is shown in Figure 1. Bradman of course is still clearly in a class of his own.

If we now properly include the NOs in the formulation we get survival curves as shown in Figure 2. I have omitted the Tendulkar curves here for clarity. As expected, the survival rates go up as the NOs do not indicate a true “death”. In Steve Waugh’s case, the increase is noticeable (I didn’t say “significant”!) since he has a large number of NOs compared to most batsmen within his number of test innings.

This is shown more starkly in Figure 3, where I have plotted both the censored and uncensored curves for Waugh and Tendulkar. I have plotted them on a logarithmic scale to highlight differences. It can be seen that Waugh’s censored survival curve (cf the raw curve) tracks Tedulkar’s very closely until a score of about 100. This reflects the large number of Waugh’s NOT OUTS (43 vs 29 in roughly the same number of innings, 260 vs 278). The diversity of the curves after that not only reflects the propensity of Tendulkar to go on to big scores, but also that a large number of Tendulkar’s not outs were after he had already scored a century (15 vs 2 for Waugh).

Figure 1 and Figure 2

Figure 3

The basic KM methodology has been around since the 1950s and of course has been extended in various ways by professional statisticians and alternative methods proposed. But their simplicity means it is still widely used.

There are several drawbacks, some of which can be seen in Figures 1-3. Firstly, the vertical drop at specific times is drawn from the data, and should not be seen as indicating particular “danger times”. This is particularly evident at larger scores where the naturally small sample size means that three are fewer data points (i.e. scores where a batsman actually gets out). So some sort of smoothing of the curve is thus necessary to provide an estimate of the true underlying functional dependency.

This reduction on the sample at large values also means the effect of each individual failure on the size of the step-down increases.

Another drawback of the KM method is that the estimate of the probability of surviving each “danger time” depends only on the number of patients at risk at that time. So if there are censored values the actual time between the last failure and the time of censoring is not considered.

It is natural at this point to question the underlying assumption of the KM method that the patients (i.e. innings) are independent. Is it common to talk in cricket about form slumps or purple patches. This can be examined statistically by considering the autocorrelation function of the scores, shown in Figure 4 assuming stationarity, where Waugh has been omitted for clarity. The figure clearly shown no evidence for time/innings correlation and although strictly speaking un-correlation does not imply true independence, it is evidence that the innings can be considered independent for the purposes of this analysis.

Figure 4

The question naturally arises is whether we can say anything statistically about whether the difference between survival curves is significant (cf treated vs control groups in medicine). Confidence intervals can be placed on the derived curves using the so-called Greenwood formula, dating back to the 1920s, or its more modern variations. These will suffer the drawback of being less accurate in the tail of the curves, where by definition the sample size is smallest. Not only will the formulas return a greater error because of that, the validity per se comes into question as the expressions rely on a normal approximation (through the central limit theorem), hence can only be considered valid for remaining innings bigger than say 20 or so.

Unfortunately, as we have seen above it is in the tails of the curve where the distinctions between very good and great batsman are often found.

Similarly a number of ways of comparing curves exist in the statistical literature, such as Kolmogorov–Smirnov test, the Log-rank test or the Cox proportional hazards test. These can rapidly become very mathematically complicated, especially if we want to try and distinguish one part of the curve specifically (say the high end).

Although I haven’t done the hard yards in this article, my intuition tells me we might be hard pressed to prove statistically significant differences between the Waugh and Tendulkar corrected survival curves. This is the drawback of applying statistical tests into areas where their applicability is not clear.

In any case, it can be seen that batting can most certainly be considered a true life and death struggle.

Friday 14 March 2014

What to do with old swimming caps

Since the start of the 2013/2014 ocean swimming season, around 20,000 swimming caps have been handed out to competitors at the various ocean swims in NSW. If you are a regular ocean swimmer, it doesn’t take too long before you have more caps than you know what to do with. Some may be used again in the pool, and some given to friends and family, but the vast majority of these caps will end up in land-fill having spent most of the season at the bottom of your swimming bag.

This year at the 2014 Coogee Island Challenge, we are running a swimming cap “amnesty”. Bring down your old caps that you no longer use, or donate after your swim, and we will save your caps from an ignominious end in land-fill or as rubbish scattered on the beach. There are no companies that recycle swimming caps, mostly made of latex or silicone, so the collected caps will be donated to the following organisations for re-use (which is better than recycling anyway):
  • The Frugal Forest is a One Off Makery project, supported by Midwaste, the Australia Council for the Arts and Glasshouse Port Macquarie. Drawing in artists, musicians, scientists, community, business and industry, we aim to build an intricately detailed forest entirely from salvage. Why? Because nothing is wasted in the forest, and we could really learn from that.
  • Reverse Garbage is Australia’s largest creative reuse centre, committed to diverting resources from landfill – approximately 35,000 cubic metres or 100 football fields’ worth per year. Almost 40 years from its founding, Reverse Garbage is now an internationally recognised, award-winning environmental co-operative committed to promoting sustainability through the reuse of resources, as well as providing support to other community, creative and educational organisations. 
  • Various Council Pools to be used by those needing a cap on the day.
Where: Coogee Island Challenge Ocean Swim, Coogee Beach
Where, specifically: The marquee.
When: 13 April 8.30-11.30am

There seems to be a gap in the market here for an entrepreneurial organic chemist. Swimming caps are usually made from latex or silicone (rubbery materials) which are polymers. Are there any options at all for recycling such materials? From what I read, it's just not economical, but I wonder will that change as the raw starting materials for such products (that is, petroleum) become rarer and so more expensive.

Tuesday 11 March 2014

Ep 153: Complex Network Analysis in Cricket

Complex network analysis is an area of network science and part of graph theory that can be used to rank things, one of the most famous examples of which is the Google PageRank algorithm. But it can also be applied to sport. Cricket is a sport in which it is difficult to rank teams (there are three forms of the game, the various countries do not play each other very often etc.), whilst it is notoriously difficult to rank individual players (for how the ICC do it, see Ep 107: Ranking Cricketers).

Satyam Mukherjee at Northwestern University became a bit famous when The economist picked up his work (more famous than when we picked it up!) and he has published extensively on complex network analysis as applied to cricket rankings. I had a very interesting chat with Satyam about his various works concerning the evaluation of cricket strategy, leadership, team and individual performance, and the papers we discuss in the podcast are listed below. One of the more interesting findings was that left-handed captains and batsmen are generally ranked higher than their right-handed counterparts, whilst this is not true for left-handed bowlers.

Tune in to this episode here:

Songs in the podcast:
  • Satyam Mukherjee (2013). Ashes 2013 - A network theory analysis of Cricket strategies arXiv arXiv: 1308.5470v1  
  • Satyam Mukherjee (2013). Left handedness and Leadership in Interactive Contests arXiv arXiv: 1303.6686v1  
  • Satyam Mukherjee (2012). Quantifying individual performance in Cricket - A network analysis of Batsmen and Bowlers arXiv arXiv: 1208.5184v2  
  • Satyam Mukherjee (2012). Complex Network Analysis in Cricket : Community structure, player's role and performance index arXiv arXiv: 1206.4835v4  
  • Satyam Mukherjee (2012). Identifying the greatest team and captain - A complex network approach to cricket matches arXiv arXiv: 1201.1318v2