Tag Archives: Entertainment

What the World is looking for

Why is Andre AgassiLarge scale sociological research has never been this easy. Google’s search suggestions are based on what people search for on their search engine. This can be a fairly good reflection of what people are currently interested in, making it a powerful tool for research. (You could also save these results and look at them over time to see trends in these preferences, but that’s a topic for a different day..)

So, to learn what questions people are asking about Andre Agassi, just go to Google’s search box and type “Why is Andre Agassi” and wait for a second. (People want to know why he’s famous, why he’s bald, why he broke up with Brooke Shields, and why he wore a wig.)

Or, to see what India is interested in learning, just type “How to” on google.co.in and you’ll find – perhaps to your surprise – that Indians want to learn:

  • how to kiss
  • how to lose weight
  • how to download youtube videos
  • how to get pregnant (clearly less important than kissing well)

Search for How to on Google India

On the other hand, the UK wants to know

  • how to make loom bands (but why?)
  • how to lose weight
  • how to make pancakes (which may not be a good idea if  you want to lose weight)
  • how to write a cv

Search for How to on Google UK

The US wants to learn

  • how to train your dragon 2 (that’s the animated film)
  • how to tie a tie
  • how to hard boil eggs
  • how to lose weight

Search for How to on Google US

What’s clear is that people of all three nations have losing weight as one of their top 4 priorities, but vary quite a bit in their preferences otherwise.

At Gramener, we put together a compilation of the search results for common questions.

Search for questions on Google

There are several nuggets in here. The world is generally curious about why Salman Khan is not married, and why he’s not in jail. But the preference and order of questions varies from country to country.

Why is Salman Khan

Focus on inventions vary a lot across regions too. Indians are the only ones who seem concerned about who invented zero. For the British, football comes ahead of the Internet and Electricity.

Who invented

You can explore these are more at https://gramener.com/search/

If you find any interesting query patterns please let us know either in the comments below or via Twitter. We’ll add it here.

The language of tweets

This post is part of the output of the Bangalore Fifth Elephant Hacknight.

followers

What you see above are the words most often used on Twitter by Indians. (Click for a larger image). The size of the bubble indicates how often the word is used.

We were looking at whether there are specific words that people with a large number of followers use, that are distinct from people with few followers. The words on the left (also coloured red) are used mainly by people with few followers. The words on the right (also coloured green) are mainly used by people with many followers.

(At this point, it’s worth discussing the dataset. These are 1 week’s worth of geocoded tweets, mainly around India (but including Pakistan, Nepal, etc.) It’s interesting that there were just 80,000 geocoded tweets in this period – and many of them were FourSquare entries.

It’s interesting that people )with low followers often talk about “know”, “high” and ‘”traffic”. People with many followers have significantly more hashtags. Whether this is a cause or an effect of having many followers is, of course, debatable. But the correlation is quite definite.

It also appears that those with more followers are polite. The “good morning”s and “thank you”s are quite to the right. Those with more followers are more likely to say “good” than “bad”, and vice versa. Perhaps there’s something about having Twitter followers that leads to happiness – or is it the other way around?

replies

This picture shows you the words more often used in replies (on the left, in red) when compared to new tweets (on the right, in green).

“haha” and “lol” appear rather prominently in replies. Either folks who reply are an amused bunch, or it’s the funny tweets that get more replies. A lot of replies are also to thank people. The dominance of Mumbai, Maharashtra and Delhi on the right is easiest explained by the presence of the words “@foursquare” and “mayor” – most of these tweets appear to be FourSquare related.

morning

The above shows the words used in the morning (up to 12 noon) vs the evening. Clearly, people mention “morning” in the morning – often, but not always, in the context of “good morning”. The evenings were, at least on this week, were dominated by Euro 2012.

The visualisation used above is a document contrast diagram. Each word is drawn as a bubble, whose size represents its frequency. The horizontal position determines whether the word is closer to one aspect or another – e.g. replies on the left vs new tweets on the right. This is a very quick and easy way of understanding what characterises an aspect (e.g. which words are often used with good vs bad), as well as the context in which words are used.

Student browsing patterns

This is a guest post by Rahul Gonsalves of Pixelogue.

About a week ago, Anand suggested that we spend a day some weekend working collaboratively on data visualisation. I jumped at the chance to spend a day working and learning from him and this is how we found ourselves at the Gramener office on a Sunday morning.

We decided to look at a dataset that Anand has blogged about before – computer usage of MSIT students at CIHL, a consortium of universities based out of IIIT, Hyderabad. Over a period of seven weeks, students’ computer usage was tracked. The data includes application usage and duration, internet browsing patterns, and even keystrokes, broken down by user. If this data sounds like a privacy landmine, that’s because it is! The only consolation is that all the students involved in the study consented to have their usage tracked, and so were presumably aware of what was happening.

We decided to look at a subset of this data – at their internet usage and to try and answer the following question: What websites do people browse at different times of day? Are there interesting patterns that emerge? Do “social” websites constitute a significant portion of their browsing time? etc.

We created an interactive visualisation, as well as an Excel based one. The interactive version is available at http://gramener.com/siteusage/

On Excel, the variables at our disposal included:

  1. User
  2. URL
  3. Time of browsing

We pulled the data into Excel, and had the following table:

excel-1

We then split up the time values in Excel into their component pieces (hour and minute), so that 22-11-2011 10:19 becomes:

excel-2

You can see the raw data and the formulas used in the following screenshot:

excel-3

We combined the hour into a value which we called “Minute of the Day”, which is merely a numeral value of the minute from 12AM. 1am is 60, 2am is 120, 3am is 180 and so forth.

We then used a pivot table to plot the domain accessed by frequency, which allowed us to generate the top 10 most accessed domains (Facebook, unsurprisingly was 2nd, right behind a local address 10.10.10.68, which is presumably a development server.)

excel-4

We arranged the domains on the horizontal axis, with the hour of day listed on the y-axis, as below:

At this point in time, Anand pulls out his Excel magic, and pulls in the number of times within that hour that a particular domain was accessed. COUNTIFS looks counts the number of times the domain was accessed at that particular minute. IFERROR ensures that errors are counted as zeroes. (This formula works only in Excel 2007 and later.)

excel-6

The results of applying this particular formula across the whole table is given below:

excel-7

Using the conditional formatting tools, we are able to apply a colour scale that changes the cell background colour — a darker green implies a higher frequency while a lighter colour implies a lower incidence at that point in time.

excel-8

The extreme preponderance of the top hit (the local dev server, 10.10.10.68) led to a not very useful visualisation, with only the highest values being marked out.

excel-9

Using a logarithmetic scale helps give a better heatmap, as can be seen in the following screenshots.

excel-a

We finally arrived at the following heatmap, which offers some insights into the ways that the students at this particular course spent their time.

excel-b

We talked about different ways of depicting this data, which resulted in the following interactive visualization of the way a student spends his or her time on an average day in Hyderabad. We hope you enjoy it!