Thursday, 9 October 2008, data mining and mashups

I've recently been putting together a Guide to Web 2.0 for The Helix Magazine and one of the most interesting aspects has been exploring the various mashups and applications of is a brilliant online music service and currently my favourite "web 2.0" application. By downloading a plugin for itunes (or whatever music player you have) that "scrobbles" each song you play (that is, tells what you are listening to), a picture of your music taste builds up, and people with similar listening tastes are found. Artists are recommended to you according to your tastes, charts of your songs built up and "radio stations" perfectly tailored to you can be streamed online. But it is better than radio as there are no ads and you like every song.

By the way, I am westius on

Millions of songs are scrobbled every day by users. This data helps develop a massive database of user music preferences, and because of it's API, it is possible to access information and develop interesting tools.

As users can tag their music with genres that they think aptly describe their songs and artists, it is possible to determine your own tag cloud of musical preferences. Using an excellent script at, I came up with my own tag cloud, as you can see here.

It is possible from such tag clouds to examine how listeners fall into different categories through a process known as Data Mining. Data mining is essentially the process of sorting through enormous amounts of data and picking out the relevant stuff. Using principal components analysis - a mathematical technique which reduces multidimensional data sets to lower dimensions for analysis - and k-means clustering - an algorithm to cluster n objects into k groups - Liekens came up with 5 broad groups of listeners:
  1. Electronic/pop
  2. Rock
  3. Indie
  4. Metal
  5. Hip-hop
Clearly this list does not reflect everyone on (where are the classical music listeners?), but it does reflect the majority. I was surprised that Indie is a group in itself and am intrigued by the bundling of electronic and pop together - there are some tweaks to the maths you can make that could come up with different groups, and better results might be possible with a bigger data set . Hip-Hop listeners were the most clearly defined group. You can read more about the maths and how these groups are separated in the original article.

Another interesting thing you can do is compare your music tastes to your friends. This pic is a difference cloud comparing my music tastes with that of my good friend intranation. We have a roughly 40% similarity in music genre tastes, with the green tags those that I have more of in my collection, and the red those genres that intranation listens to more than me. No real surprises there.

Mashups are all the rage at the moment. The term refers to web applications that combine data from more than one source into a single integrated tool. For instance, domain, an Australian real-estate site, adds data from Google Maps to provide location information. My current favourite mashup is idiomap. idiomap is a digital music magazine that personalises its content according to your interests in music, which it learns from your profile. It gives you stories and reviews of the artists and genres you like, helps you discover new music and mashes in video and audio from youtube and other sources. idiomag aggregates music articles from over 100 different sources. You can also tweak the articles you like so if you receive something you don't like, you won't get it again. I subscribe to the RSS feed of my personalised idiomap magazine and so far its been great and has included reviews of music DVDs of artists I like and schedules of when bands will be playing and appearing on TV. Good stuff.

I will probably put out a few more blogs like this as I explore this world of mashups. And for podcast listeners, yes hopefully I will get one of them out soon too!


  1. Try to play and win with us.real online gambling only here Do not be afraid to play and take risks. Who does not risk that does not drink champagne.

  2. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read  about their market situation nowadays.
    Data science training in Bangalore
    Data science online training

  3. ExcelR offers a mixed learning model where members can benefit themselves study hall, teacher drove online sessions and e-learning (recorded sessions) with a solitary enlistment. data science course in pune

  4. Well, The information which you posted here is very helpful & it is very useful for the needy like me.., Wonderful information you posted here. Thank you so much for helping me out to find the Data analytics course in Mumbai Organisations and introducing reputed stalwarts in the industry dealing with data analyzing & assorting it in a structured and precise manner. Keep up the good work. Looking forward to view more from you.

  5. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    machine learning courses in Bangalore

  6. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

  7. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.
    Data science course in mumbai