Student browsing patterns

This is a guest post by Rahul Gonsalves of Pixelogue.

About a week ago, Anand suggested that we spend a day some weekend working collaboratively on data visualisation. I jumped at the chance to spend a day working and learning from him and this is how we found ourselves at the Gramener office on a Sunday morning.

We decided to look at a dataset that Anand has blogged about before – computer usage of MSIT students at CIHL, a consortium of universities based out of IIIT, Hyderabad. Over a period of seven weeks, students’ computer usage was tracked. The data includes application usage and duration, internet browsing patterns, and even keystrokes, broken down by user. If this data sounds like a privacy landmine, that’s because it is! The only consolation is that all the students involved in the study consented to have their usage tracked, and so were presumably aware of what was happening.

We decided to look at a subset of this data – at their internet usage and to try and answer the following question: What websites do people browse at different times of day? Are there interesting patterns that emerge? Do “social” websites constitute a significant portion of their browsing time? etc.

We created an interactive visualisation, as well as an Excel based one. The interactive version is available at http://gramener.com/siteusage/

On Excel, the variables at our disposal included:

  1. User
  2. URL
  3. Time of browsing

We pulled the data into Excel, and had the following table:

excel-1

We then split up the time values in Excel into their component pieces (hour and minute), so that 22-11-2011 10:19 becomes:

excel-2

You can see the raw data and the formulas used in the following screenshot:

excel-3

We combined the hour into a value which we called “Minute of the Day”, which is merely a numeral value of the minute from 12AM. 1am is 60, 2am is 120, 3am is 180 and so forth.

We then used a pivot table to plot the domain accessed by frequency, which allowed us to generate the top 10 most accessed domains (Facebook, unsurprisingly was 2nd, right behind a local address 10.10.10.68, which is presumably a development server.)

excel-4

We arranged the domains on the horizontal axis, with the hour of day listed on the y-axis, as below:

At this point in time, Anand pulls out his Excel magic, and pulls in the number of times within that hour that a particular domain was accessed. COUNTIFS looks counts the number of times the domain was accessed at that particular minute. IFERROR ensures that errors are counted as zeroes. (This formula works only in Excel 2007 and later.)

excel-6

The results of applying this particular formula across the whole table is given below:

excel-7

Using the conditional formatting tools, we are able to apply a colour scale that changes the cell background colour — a darker green implies a higher frequency while a lighter colour implies a lower incidence at that point in time.

excel-8

The extreme preponderance of the top hit (the local dev server, 10.10.10.68) led to a not very useful visualisation, with only the highest values being marked out.

excel-9

Using a logarithmetic scale helps give a better heatmap, as can be seen in the following screenshots.

excel-a

We finally arrived at the following heatmap, which offers some insights into the ways that the students at this particular course spent their time.

excel-b

We talked about different ways of depicting this data, which resulted in the following interactive visualization of the way a student spends his or her time on an average day in Hyderabad. We hope you enjoy it!

Data science news

Visualization provides decision-makers with the big picture

“Good visualizations can help people to make good decisions, such as selecting a suitable candidate in elections,” researcher Mikko Berg explains.

Berg’s dissertation belongs to the field of media technology and combines cognitive science, psychology and vision research. The dissertation explores how the brain processes visualization when the person handles complex data.

Berg examined how the use of visualization facilitates understanding by observing how people used the online candidate selectors created for the EU and communal elections held in Finland in 2004.

Predictive Analytics: The Race is On

High-performance analytics (HPA) allows modelers to work faster with full sets of data—not sample sets. This can provide the type of nuanced analytics that allows an insurer to be more successful in high-risk situations, create innovative—and profitable—new means of insuring customers, and respond to large-scale disasters in a more relevant and cost-effective way.

Intelligent use of Big Data

Business Intelligence could unlock of some of the estimated 85% of corporation data that is unstructured according to leading industry figure Bill Inmon.

Big Data Bubbles Up Trouble!

Today Big Data has made the impossible, possible. Collecting and analyzing unstructured data types such as social media data, web click streams, network and data center logs is no longer a daunting task. While Hadoop and MapReduce are the technologies behind the scenes to crunch massive volumes of data, advanced visualizations have become the art that show us the best (and worst) parts about our data.

British Columbia health care on the cutting edge of Big Data

Big Data can reveal the overlooked negative effects of a drug or, conversely, unexpected benefits and new treatments. It can lead us to insights into how a drug has worked over a vast population, enabling researchers to determine which doses are most effective, which are wrong-headed or perhaps dangerous. Tracking Big Data can help you find out if a medical procedure really gets the results we think and desire. Or it might reveal diagnostic and treatment mistakes we are making and suggest overlooked ways of improving health care. Plug in genetics, a factor that often influences how medicines and treatments work on an individual basis, and you are looking at the possibility of bespoke, genomic medicine – health care tailored to an individual, in much the same way as a tailor customizes a suit or dress.

Big Data can and will save lives. And it will almost certainly save money.

Big Data for Retail is Flying Off the Shelves

The holy grail of retail has been to anticipate what consumers need even before they realize they need it. There’s no better way to beat the competition than to make an attractive offer and get a customer’s business before they even realize they need your product, or consider evaluating alternatives.

Big Data Analytics Gold for the Call Center

Corporate call centers and call center providers are embracing new analytic tools to dig deeper into the big data they generate.

Visualising Sensex returns in Excel

You don’t need specialised data visualisation software. Excel is a powerful tool in itself. For example, you could re-create and extend our When to Invest visualisations in Excel. Here’s an example:

bse-sensex-returns-visualisation

Each row indicates a month in which you’ve made an investment. The columns indicate how long you’ve held the investment for. For example, the 10th row, 10th column shows the returns you’d make if you invested in the 10th month (Oct 1991) and held it for 10 months (until Aug 1992). Red indicates negative returns, green indicates positive returns.

Here’s a tutorial on how you could do this.

First, let’s get the data

Let’s start by downloading the Sensex data. Visit the BSE Historical Indices page, choose the SENSEX and select a monthly feed for Jan 1991.

bse-indices

Click submit to see the results for the year 1991.

bse-sensex-monthly

Copy and paste the table into Excel.

Now, repeat this for each year from 1991 to 2012, and you’ll have the full database of monthly Sensex prices. (It’s possible to scrape, copying and pasting 21 pages is faster.)

bse-excel-data

Next, let’s calculate the returns

To find the returns after 1 month, 2 months, etc, we can use the OFFSET formula:

offset-formula

OFFSET($B2, $C1, 0) says:

  • Start with cell B2 (and when I copy and paste, always get me data from column B)
  • Move down C1 rows (and when I copy and paste, always get the offset from row 1)
  • Move right 0 columns

As we copy and paste the formula, we’ll get the opening price 1 month, 2 months, 3 months after the starting month. (The 1, 2, 3… months come from the first row).

Now, we need to calculate the annualised return from this price. That’s done by the following formula:

=POWER(OFFSET($B2,C$1,0)/$B2, 12/C$1)-1

return-formula

Put another way, this is the same as:

=POWER( Future_Amount / Initial_Amount, 12/Number_of_months) – 1

Copying and pasting this formula to all the cells gives you a sheet that shows the returns for every period.

returns

Lastly, let’s colour it.

If you want to see what the returns look like as a gradient, select the cells, go to Conditional Formatting – Color Scales and choose one of the scales there.

conditional-formatting

You may then want to tweak the colours for each value. Go to any of the cells, then select Conditional Formatting – Manage Rules, and double-click the “Graded Color Scale” rule. Then you can change what Red, Yellow and Green stand for. (In the example below, they’ve been set to –50%, 0% and +50% returns respectively.)

formats

This will give you the visualisation below when you zoom out.

bse-sensex-returns-visualisation

Another way of looking at the same data is to only show those periods where the return was more than, say, the average inflation rate (let’s say 11%).

formats2

… this is what the result will look like.

bse-sensex-returns-inflation

We’re not trying to endorse Excel as a product: We admire all spreadsheets. Excel is merely our abbreviation for “Excel and other spreadsheets”. The same principles would apply equally well to Open Office, Google Docs, or any other spreadsheet of your choice.

There’s a good chance Excel is already on your machine. You’re likely to be familiar with it. Why bother learning a new tool, or a programming language, when you’re already familiar with something?

The point is, most tools can visualise well. But you need to have the imagination.