Friday, 31 January 2014

arXiv trawl: January 2014 - Social Media



An interesting way of keeping your ear to the ground regarding the latest happenings in the scientific world is to monitor the arXiv. The arXiv (pronounced "archive”) is a repository for electronic preprints of scientific papers. Scholarly peer review of scientific papers can take a long time, so many scientists use archives like this to share their findings and to seek comment on their work before official publication. As such, the content of the arXiv is many and varied; there are weird and wonderful topics, and papers in various states of review. Some will never get published anywhere else, whilst others are seminal (for example, Perelman’s proof of the Poincare conjecture). But by the very definition of preprint, they are all calling for comment. I dived in recently, and here are some highlights of the last few months on the arXiv concerning social media.

Because MySpace --> Facebook

Researchers from Princeton, in the report Epidemiological modeling of online social network dynamics, modelled the rise and fall of MySpace by likening it to a disease and used epidemiological methods to model how it infected the population, and how the population eventually became immune. The number of times the term “MySpace” was searched for in Google was used as a measure of the site's popularity (or how infected the population was). This data was sourced from Google Trends. If you would like to read more about the maths involved, check out Sick of Facebook? Read on… in Plus. As you can see below, they fitted a nice curve to the data. Cute.



All good so far. Stories concerning social media are favourites of conventional media, and naturally this was picked up: Facebook could fade out like a disease. What the newspapers focused on was the work fitting the epidemiological model to Facebook data (Google searches for “Facebook”) and the conclusion that Facebook is heading for a "rapid decline", and between 2015 and 2017 will lose 80% of the users.

There are two questions that arise from this:

1) Is Google Trends data a good measure of the popularity of a website?
2) Just because the MySpace data fits this curve does not mean Facebook will.

Facebook was made aware of this study, and their reply was pretty excellent. They did their own study using Google Trends on searches for "Princeton" and found that:

“Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

While we are concerned for Princeton University, we are even more concerned about the fate of the planet — Google Trends for "air" have also been declining steadily, and our projections show that by the year 2060 there will be no air left.”



Thanks to FlowingData for the link to Facebook's reply.

It’s also a nice example of a study that will never be followed up by the conventional media. Even if this work makes an entirely accurate prediction of Facebook’s future, there will be no follow up newspaper article in 2020.

How to get yourself retweeted, sort of

Everybody likes to be popular. To this end, Ronald Hochreiter and Christoph Waldhauser authored A Genetic Algorithm to Optimize a Tweet for Retweetability in which they look at the factors that make a tweet popular. They developed a Twitter-like network of connected people and pushed tweets into the network to see which ones were retweeted. They modified the tweets each time they ran the simulation using a genetic algorithm. These algorithms work like evolution. Each tweet had a number of “chromosomes” that were modified with random mutations each time the model was run, with those mutations that brought about better results (that is, more retweets) kept, and those that didn’t, further randomly mutated. The chromosomes of the tweet concerned the polarity of the tweet (I think this means whether it’s positive or negative, it’s not well explained), how emotional the tweet is, the length of the tweet, the time of day it’s sent, and the number of URLs and hastags contained.

So how should you compose your next tweet? Well, unfortunately the paper doesn’t really say. It shows some results that don’t translate particularly well into reality. Their conclusions are that the genetic algorithm works pretty well, and that more work is needed. Fair enough, there are plenty of papers out there that simply outline how a new model works rather than having exciting results; I’ve written a few myself.

Don’t share your mobile phone number on social networks

Call Me MayBe: Understanding Nature and Risks of Sharing Mobile Numbers on Online Social Networks wins this month’s award for best reference to a pop song in a scientific paper title. The researchers examined how sensitive personal information spreads around social networks by collecting 76347 unique mobile numbers posted by 85905 users on Twitter and Facebook. They then used these mobile numbers to gain sensitive information about their owners from other social networks.

This in itself is an interesting study of how easy it is to collect personal information online, but they didn’t leave it there. They then communicated the observed risks to the owners by calling them up with the mobile numbers they found. Some users were surprised to know about the online presence of their number, while others had intentionally posted it online for business purposes. They found that 38.3% of users who were unaware of the online presence of their number had posted their number themselves on the social network.

Where’s my hoverboard?

Searching the Internet for evidence of time travellers makes the bold claim in its abstract that, regarding the search for time travellers online, it is “perhaps the most comprehensive to date”. Essentially what they did was search the web for information that shouldn’t have been known at the time of publishing – only time travellers from the future could have possessed such prescient knowledge. The two events they were looking for evidence of were the viewing of Comet ISON and the inauguration of Pope Francis – both big events that people in the future would know and care about. To do this, they needed to look for information published before these events occurred. They found that Bing and Facebook were no good for this study as they didn’t make clear at what date the information was published, or the date could be easily edited. So they used Twitter, on which tweets are nicely time-stamped. They called for time travellers to use the hashtag #ICanChangeThePast2 in September 2013 and looked at tweets before this time. They also examined Google Trends for searches a time traveller might have made.

Disappointingly, they found no evidence that any time travellers concerned themselves with posting on twitter or doing google searches.

I am going to go for a run before I press publish on this post. So, if there are any time travellers out there, come and join me at Erskineville Oval at 1pm Thursday 30th January.

(Edit: There were two people and a dog down at the oval. The dog chased and barked at me in a very knowing fashion. The time-travellers of the future are apparently long haired, short brown dachshunds.)

References:

  • John Cannarella, & Joshua A. Spechler (2014). Epidemiological modeling of online social network dynamics. arXiv: 1401.4208v1
  • Ronald Hochreiter, & Christoph Waldhauser (2014). A Genetic Algorithm to Optimize a Tweet for Retweetability Proceedings of MENDEL 2013: 13-18. 2013. arXiv: 1401.4857v1
  • Prachi Jain, & Ponnurangam Kumaraguru (2013). Call Me MayBe: Understanding Nature and Risks of Sharing Mobile Numbers on Online Social Networks. arXiv: 1312.3441v1  
  • Robert J. Nemiroff, & Teresa Wilson (2013). Searching the Internet for evidence of time travelers. arXiv: 1312.7128v1