Colouring the calendar

Sometimes, just view­ing a time series as a sim­ple graph isn’t enough.

The graph be­low shows the daily vis­it­ors to a lead­ing Indian web­site in 2011. The over­all trends are ap­par­ent. There was a dip in Mar-Apr, and again in Oct, fol­lowed by a steady rise in November.

analytics-line

But what’s also ap­par­ent is a weekly cyc­lic­al­ity: the steady pat­tern of rises and falls sev­er­al times a month, that dis­turbs this trend.

Yet, there’s con­sid­er­able in­sight with­in that cyc­lic­al­ity, that a cal­en­dar heat­map can bring out. Here is the same data on a cal­en­dar heat­map. This is simply a cal­en­dar on which the val­ues are plot­ted as a range of col­ours: red for few­er vis­it­ors, green for more vis­it­ors.

analytics-calendar

analytics-octoberThose dips you saw on the line graph? Those were Sundays, when brows­ing activ­ity di­ves down con­sist­ently. However, as you can see from above, not all Sundays are equal. July 31st and August 7th, though they were Sundays, had con­sid­er­able traf­fic. Similarly, week­days can also ex­per­i­ence dips. Jun 23rd is an ex­ample of a some­what un­usu­al dip, and so is Oct 26th – Diwali.

Calendar heat­maps provide a way of ex­plor­ing in­form­a­tion at a far rich­er level of de­tail than tra­di­tion­al line graphs or bar graphs do.

For ex­ample, they fo­cus on weekly trends. In busi­nesses where there is a weekly cyc­lic­al­ity, it be­comes much easi­er to spot an un­usu­al week­day. In the month of August (see be­low), it’s fairly ob­vi­ous from both graphs that August 14th had a bad dip. But what be­comes clear­er from the cal­en­dar map (but not the line graph) is that August 13th was a re­l­at­ively bad Saturday, and August 16th was a re­l­at­ively bad Tuesday.

analytics-Aug

analytics-octoberSecondly, they fo­cus on in­di­vidu­al days. Its a lot easi­er to see the ex­act date on which an event oc­curred. For ex­ample, in the graph along­side, there has been a big dip in October. The most sig­ni­fic­ant has been in the last week, spe­cific­ally on October 26th. Once you know the date, it’s easy to as­so­ci­ate the change in be­ha­vi­our with Diwali as its cause.

On the line graph be­low, you can see the ma­jor dip in October. However, map­ping this spe­cific­ally to Diwali is a far tougher task.

analytics-line

Below is an­other cal­en­dar heat­map – this time, show­ing the per­cent­age of vis­it­ors from New Delhi. Consider the month of August. We saw from the earli­er cal­en­dar map that there was a de­cline in traf­fic between August 13 – 16. If that de­crease was uni­form across cit­ies, the col­ours be­low would be uni­form too. However, New Delhi’s per­cent­age share de­clines as well on these days.

analytics-calendar-delhi-pc

Apparently, the people at New Delhi are more likely to spend the day out­side on Independence Day than most oth­er cit­ies! In fact, they seem to spend the whole of August avoid­ing brows­ing. However, the same can­not be dur­ing of Diwali. Delhi-ites are as likely / un­likely to be brows­ing dur­ing Diwali as any den­iz­ens of any oth­er city.

The next time you look at data with weekly pat­terns, where you need to fig­ure out quickly when ex­actly the num­bers rose or fell, do try out a cal­en­dar heat­map.

Data visualisation course at IIIT

We are of­fer­ing Data Visualisation course at IIIT Hyderabad and JNTU Hyderabad as part of the Master of Science in Information Technology (MSIT) out­reach pro­gram­me. This pro­gram­me is offered by a con­sor­ti­um of uni­ver­sit­ies in col­lab­or­a­tion with Carnegie Mellon with the sup­port of State gov­ern­ment of Andhra Pradesh.

Through this part­ner­ship, Gramener is col­lab­or­at­ing to cre­ate course con­tent, design cur­riculum as per in­dustry stand­ards and also have joint part­ner­ship to ex­ecute pro­jects on pre­dict­ive ana­lyt­ics and data visu­al­isa­tion.

The course has 5 mod­ules:

  1. Handling big data
    • How to scrape data from ex­tern­al sources
    • How to parse and trans­form it in­to a form­at you need
  2. Analysis
    • Segmentation
    • Predictive ana­lyt­ics
  3. Vector graph­ics
    • Drawing graphs us­ing SVG
    • Tools to ma­nip­u­late SVG
  4. Templates
    • Programmatically cre­at­ing graphs us­ing tem­plates
    • Using data to drive the tem­plates
  5. Gramener visu­al­isa­tion server
    • Using lib­rar­ies to cre­ate visu­al­isa­tions

The course is also avail­able on­line to those who are in­ter­ested. You may email us at contact@gramener.com to ac­cess the con­tent, ex­er­cises and videos.

The top Indian one-day batsmen

I’ve al­ways been curi­ous… who among India’s pro­li­fic one-day run-getters had a good strike rate. This pic­ture be­low shows you the top 50 ODI run get­ters for India.

batting-plain-summary

The little squares in­dic­ates one play­er. The size in­dic­ates the num­ber of runs scored, and the col­our in­dic­ates the av­er­age strike rate. (Red is poor, green is high).

Firstly, you can see that Sachin, apart from be­ing a pro­li­fic run-getter, is slightly above av­er­age. The same can’t be said of the next three: Saurav, Azhar and Rahul. The next three how­ever, Yuvra, Sehwag and Dhoni, are as fast or faster run-getters than Sachin – es­pe­cially Sehwag.

We do have a few low scorers there – Sunil Gavaskar, Ravi Shastri, Mohinder Amarnath, Dilip Vengsarkar, etc.. but al­low­ance must be made for the in­crease in run rate over time:

In the 1975 World Cup, for in­stance, the av­er­age run-rate in the en­tire tour­na­ment was 3.91 runs per over; in the next edi­tion, in 1979, it dropped to 3.54. Compare that with the run-rate in the most re­cent edi­tion of the World Cup, when the over­all tour­na­ment scor­ing rate ex­ceeded five for the first time, and it’s ob­vi­ous that the way the ODI is played has changed hugely over 35 years.

For Indian bats­men, the strike rate seems to go up at about 3.4% every dec­ade. Adjusting for that, this is what the pic­ture looks like:

batting-adjusted-summary

These play­ers do look a bit bet­ter now, but they’re still fairly slow. The big ex­cep­tion of that gen­er­a­tion was Kapil Dev. His strike rate is the only one that rivals Virender Sehwag’s rate today.

Based on this pic­ture, if I were to pick the top 3 fast run-getters across time, I’d pick Kapil Dev, Sehwag and Yusuf Pathan. The slowest would prob­ably be Mohinder Amarnath, Manoj Prabhakar and Sadagopan Ramesh.

We can drill a little deep­er in­to their per­form­ance, at a match-level. In the pic­ture be­low, each box is a match, col­our coded by strike rate.

batting-plain-detailed

A pat­tern emerges here: higher totals (on the top left for each play­er) are scored at a higher strike rate. This isn’t par­tic­u­larly sur­pris­ing, how­ever.

Another in­ter­est­ing view is to see how our bats­men fare again­st the rest of the world. On an ad­jus­ted basis, this is what it looks like:

batting-adjusted-world

Shahid Afridi, with an av­er­age strike rate of 115 stands way above the rest – and the second play­er on this list is Sehwag. Interestingly, Afridi has just a few less runs than the le­gendary Viv Richards, but these have been scored at a much faster rate than even the mas­ter blaster.

Visit gramener.com/cricket to see the crick­et visu­al­isa­tion live.

The visu­al­isa­tion you just saw is a Treemap. It’s a very power­ful way of com­par­ing ele­ments in a hier­archy with re­l­at­ive im­port­ance. Some oth­er ways you can use tree­maps in a busi­ness con­text are:

  • Profitability by busi­ness unit. The col­our in­dic­ates profits, and size in­dic­ates sales. Large un­prof­it­able units stand out clearly.
  • Sales growth by cat­egory. The size in­dic­ates the cat­egory sales, and col­our in­dic­ates growth over a peri­od.
  • Risk by cus­tom­er seg­ment. The size in­dic­ates the ex­pos­ure to each seg­ment / sub-segment. The col­our in­dic­ates de­gree of risk.