Visualising Text

Anand and Ganes spoke yesterday on Visualising Text at The Fifth Elephant – a conference on data – at NIMHANS, Bangalore.

Here’s the video:

… and here’re the slides:

.. and some pictures:

625155743AyyqIVmCAAAM1Km625173049

The reaction on Twitter was rather encouraging:

visualing-text-tweets-wordcloud

gkjohn 27 Jul 11:09 The talk I have been waiting to hear at #the5el@sanand0 on Visualising Text. http://funnel.hasgeek.com/5el/268-visualising-text

Rathinap 27 Jul 11:17 #the5el Ganes and Anand on “Visualising Text” Expecting info around toolsets for text processing and analysis

harikt 27 Jul 11:17 Visualising text #the5el http://twitpic.com/ac799b

gkjohn 27 Jul 11:19 In other news @sanand0 has typed out the text from every single Calvin & Hobbes comic strip. #the5el

zeusisdead 27 Jul 11:20 RT @gkjohn: In other news @sanand0 has typed out the text from every single Calvin & Hobbes comic strip. #the5el

gkjohn 27 Jul 11:20 But, thanks to the joys that are lawyrs, @sanand0 was asked to take the Calvin & Hobbes text archive down. http://www.s-anand.net/blog/the-calvin-and-hobbes-search-takedown/ #the5el

varunbansal84 27 Jul 11:22 Visualising text by Anand S #the5el http://twitter.com/varunbansal84/status/228744741297192960/photo/1

ramkrsna 27 Jul 11:23 Walk in to see the Calvin & Hobbes text tag cloud, “Bill” Boyd Watterson comes alive #the5el

varunbansal84 27 Jul 11:25 Anand S @sanand0 Chief Data Scientist at Gramener on stage at #the5el

_mekin 27 Jul 11:28 RT @gkjohn In other news @sanand0 has typed out the text from every single Calvin & Hobbes comic strip. #the5el

Rathinap 27 Jul 11:29 Anand and Ganes receiving applause after applause #the5el for Text mining speech

ishan_srivastav 27 Jul 11:30 RT @gkjohn: In other news @sanand0 has typed out the text from every single Calvin & Hobbes comic strip. #the5el

konarkmodi 27 Jul 11:31 Awesome text visualizations at #the5el cool examples ” Calvin and Hobbes to bank legders to #hasgeek job boards ” loving it !!

sharad_ag 27 Jul 11:32 Interesting visualizing text talk by anand #the5el

vijay750 27 Jul 11:38 S.Anand showing some insightful word visualizations at #the5el.

gkjohn 27 Jul 11:42 “Applying Sentiment Analysis to the Bible” http://www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/ via @sanand0#the5el

_mekin 27 Jul 11:47 Loving @sanand0‘s talk. Interesting experiments with visualization of text, a great presentatIon & useful resources to takeaway #the5el

simplysaru 27 Jul 11:47 RT @gkjohn: “Applying Sentiment Analysis to the Bible” http://www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/ via @sanand0 #the5el

jackerhack 27 Jul 11:48 RT @_mekin: Loving @sanand0‘s talk. Interesting experiments with visualization of text, a great presentatIon & useful resources to takeaway #the5el

erraj99 27 Jul 11:50 RT @konarkmodi: Awesome text visualizations at #the5el cool examples ” Calvin and Hobbes to bank legders to #hasgeek job boards ” loving it !!

gkjohn 27 Jul 11:50 Explore character appearances and prominent relationships in the Mahabharatha http://gramener.com/mahabharatha/ by @sanand0 #the5el

ivabz 27 Jul 11:53 @gkjohn Corrections – Link : http://gramener.com/mahabharatha/closeness @sanand0

jaidevd 27 Jul 11:53 Which names score more at SSC / HSC? Find out at @sanand0 ‘s talk on Visualizing Text #the5el

simplysaru 27 Jul 11:54 RT @gkjohn: Explore character appearances and prominent relationships in the Mahabharatha http://gramener.com/mahabharatha/ by@sanand0 #the5el

ivabz 27 Jul 11:54 @gkjohn is he using D3.js to visualize it? can ask question on behalf? #the5el @sanand0

govindk 27 Jul 11:56 Kesri & Anand are rocking the house #5el #fifthelement awesome @sanand0

ramkrsna 27 Jul 11:57 Loved the Ganes Kesari’s TN board analysis with names #the5el.

gkjohn 27 Jul 11:58 @ivabz I’m too far away to ask @sanand0 Sorry! #the5el

gkjohn 27 Jul 12:00 Heh. @sanand0 is an Excel fan. #the5el

Heryerdeonline 27 Jul 12:00 RT @_mekin: Loving @sanand0‘s talk. Interesting experiments with visualization of text, a great presentatIon & useful resources to…

harikt 27 Jul 12:05 Anand crowded at #the5el. Awesome presentation. Hats of to u guys. http://twitpic.com/ac7mm1

prabhatsaraswat 27 Jul 13:15 sad I missed the visualization talk!!, can I see a recap somewhere? #5el #the5el

arpiit 27 Jul 13:15 Please ask him to put it up! @sanand0 RT: @harikt: Anand crowded at #the5el. Awesome presentation. http://twitpic.com/ac7mm1

jackerhack 27 Jul 13:16 RT @ashwan: At the @the5el Open House. I’m all alone. *sniff*

freegeek 27 Jul 13:17 RT @dorait: An Exploration in Analysis and Visualization – updated. My talk at @the5el https://www.slideshare.net/login?from_source=http%3A%2F%2Fwww.slideshare.net%2Fupload

tuxtoti 27 Jul 13:17 The social graph of Mahabharata was pretty cool. #5el @sanand0

ylohia 27 Jul 13:22 Ooh, interesting. RT @gkjohn: “Applying Sentiment Analysis to the Bible” http://www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/ via @sanand0 #the5el

ravim85 27 Jul 13:31 met @sanand0 couple of days back n was wondering whats his twitter id. Thanks to @thej. It was my pleasure meeting Anand

anushayadav 27 Jul 13:31 RT @gkjohn: Explore character appearances and prominent relationships in the Mahabharatha http://gramener.com/mahabharatha/ by@sanand0 #the5el

pacificleo 27 Jul 13:33 RT @ylohia: Ooh, interesting. RT @gkjohn: “Applying Sentiment Analysis to the Bible”http://www.openbible.info/blog/2011/10/applying-sentiment-analysis-to-the-bible/ via @sanand0 #the5el

arpith 27 Jul 13:37 RT @gkjohn: But, thanks to the joys that are lawyrs, @sanand0 was asked to take the Calvin & Hobbes text archive down. http://www.s-anand.net/blog/the-calvin-and-hobbes-search-takedown/ #the5el

srprabhu 27 Jul 13:40 Excellent talk on Visualising Text by @sanand0 at @the5el conference…impressed to hear about work done on Calvin&Hobbes and toolsets

sanand0 27 Jul 13:42 Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

jackerhack 27 Jul 13:42 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

hasgeek 27 Jul 13:42 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

arpiit 27 Jul 13:43 @pjain @abhishemgupta RT: @hasgeek: RT @sanand0: Slides of our talk on Visualising Text at #the5elhttp://www.slideshare.net/gramener/visualising-text

gkjohn 27 Jul 13:44 Always a pleasure listening to @sanand0 speak. Thanks #the5el Next time – keynote.

sankarshan 27 Jul 13:46 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

aberdeenterrier 27 Jul 13:48 RT @gkjohn: Explore character appearances and prominent relationships in the Mahabharatha http://gramener.com/mahabharatha/by @sanand0 #the5el

vaamarnath 27 Jul 13:49 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

tanish2k 27 Jul 13:50 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

tanish2k 27 Jul 13:50 RT @gkjohn: Always a pleasure listening to @sanand0 speak. Thanks #the5el Next time – keynote.

gkjohn 27 Jul 13:50 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

srraja 27 Jul 13:52 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

rrichard09 27 Jul 13:55 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

followsiddharth 27 Jul 13:58 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

anushayadav 27 Jul 14:00 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

rhetonik 27 Jul 14:01 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

RaisonD 27 Jul 14:05 @sanand0 Your talk was brilliant! #the5el

prabhatsaraswat 27 Jul 14:06 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

PraveenaSridhar 27 Jul 14:09@sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text” cc @tiwarisac

govindk 27 Jul 14:16 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

ivabz 27 Jul 14:53 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

t3rmin4t0r 27 Jul 15:33 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

harshkumar1 27 Jul 15:38 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

harshkumar1 27 Jul 15:38 RT @gkjohn: Always a pleasure listening to @sanand0 speak. Thanks #the5el Next time – keynote.

t3rmin4t0r 27 Jul 15:38 At the #5el conf – a lot of stuff is vague and fluffy with clouds. The @flipkart and @sanand0 talks (and his t-shirt) stood out so far.

harshkumar1 27 Jul 15:40 Looking forward for a key note from @sanand0 @ #the5el

pmandrek 27 Jul 17:05 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

swaroop7 27 Jul 17:11 Text visualization by @sanand0 is the best talk Ive attended today with Gokul @RamaVattigunta @Kamaleshbc at #the5el @the5el

harikt 27 Jul 17:16 RT @sanand0: Slides of our talk on Visualising Text at #the5el http://www.slideshare.net/gramener/visualising-text

ivabz 27 Jul 17:20 Like all said, @sanand0 is Data-Rockstar at #the5el. @the5el

ivabz 27 Jul 18:21 @sanand0 really cool. @gkjohn

nikarjunagi 27 Jul 19:40 Great hadoop stack walk thru by @vinayakh, amazing visualization stories by @sanand0 with bonus insight into y so few birthdays in august!

goldbone 27 Jul 21:01 RT @gkjohn: Explore character appearances and prominent relationships in the Mahabharatha http://gramener.com/mahabharatha/ by@sanand0 #the5el

karthik_sripal 27 Jul 21:40 RT @sanand0: Thanks @t3rmin4t0r — for the story behind our T-shirt, see http://blog.gramener.com/361/common-birthdays 🙂

Interview: Analytics India Magazine

This is the original version of the two-part interview that appeared on Analytics India Magazine as:

In an interview with Analytics India Magazine, Anand S talks about Gramener and his topic of talk at The Fifth Elephant, which is “Visualizing Text”.

What according to you are currently the most important and specific Data Visualisation needs for the industry?

Consider this price and sales table for four cities:

anscombe-table

Can you figure out how each city is performing? Notice that the average for each city is the same.

Now take a look at the same data, plotted.

anscombe-graph

The patterns are a lot clearer now, and you can quickly see that:

  • The four cities are completely different in behaviour and need different strategies for growth.
  • That Delhi is price sensitive, while Bangalore and Hyderabad are not
  • There is at least 1 data point each at Hyderabad and Mumbai that look like abberations

This is an example of the issue industry faces today: significantly larger quantities of data, but still visualised as plain tables.

The human mind is much better suited to process pictures than numbers. Data visualisation is about communicating the same message to our minds as a picture story rather than a table of numbers.

Where do you see the bulk of your business coming from?

From companies that have large volumes of data. So far, in our case, it’s been from Banking, Healthcare, Pharma, Retail, Telecom and Utilities.

However, two things have surprised us.

First, several small well-run enterprises tend to have much more data than we expect — driven by the dropping cost of data collection infrastructure.

Second, there is much more variation within companies than across companies. For example, even in large utilities, there’s much more information in sales and operations than there is in administration or HR. We find more in common between the sales data of an FMCG and the sales data of a bank than between the sales and HR data of any organisation.

As a result, we’ve become more functionally focused than industry focused — with the bulk of our business coming from sales, operations, and finance, in that order.

Gramener operates in a very niche area. How does a typical requirement gathering to delivery cycle looks like in Data Visualisation?

We build a series of “templates” for clients. A template takes data and transforms it into a visualisation. Here’s what a typical cycle looks like:

  1. Brainstorm with clients on the key problem areas
  2. Get an anonymised dataset in that area — the larger the better
  3. Work offline (without any preconceptions of existing reports) and create a series of visuals
  4. Share with the client, understand the decisions they need to take, and rework accordingly
  5. Implement the visualisation as a template in their environment

You’ll notice that we’re combining two things: independent discovery (working without the influence of existing material) and a review process — both of which are essential to delivering a useful product.

How do you see Data Visualisation evolving today in the industry as a whole?

The first entry point for data visualisation is through in-house tools — typically Excel.

Slowly, industries recognise the value of taking this further and start looking for improved tools in this area. A number of product companies are catering to this need — Tableau, Spotfire and Qlikview in the data exploration space; R and SAS in the analytics space; Microstrategy and Cognos in the Business Intelligence space; and Gramener’s visualisation server in the data visualisation space.

Often, this is outsourced, just as analytics is increasingly outsourced today. Data analytics firms will slowly evolve to include visualisation as part of their core offerings.

What are the most important contemporary trends that you see emerging in this space across the globe?

There are three worth mentioning:

  1. Unstructured data. Data visualisation is no longer in the realm of pure numbers. Text analysis is relatively mature and is being applied routinely to various problems. Even a pure text corpus like The Mahabharatha can be visualised. Images, audio and video are rapidly becoming analysable and visualisable.
  2. Cognitive research. What we know of the human eye and brain is increasingly making its way into practical visualisations. For example, most men can name only 11 colours (women can name about 15), but can differentiate between over a million colours when placed next to each other. So, while a heatmap that places regions adjacently can be coloured with millions of shades, a bubble chart should have 11 colours at most. Such rules of thumb and now baked into the software people build for data visualisation these days.
  3. Mobility. Tablets and phones are the most popular modes of consuming information, increasingly. A strong trend is embracing this medium along with its limitations (e.g. size and form factor) as well as advantages (e.g. touch screen, geo-location).

Would you like to share any example of a visualisation insight that generated a huge positive impact for your client?

This is an anonymised version of our very first visualisation.

We were working with a telecom client who provided us with minutes-of-usage data. We plotted this time-series on a calendar, creating the Calendar Map you see above. Red cells show days with lower usage, and green cells show days with higher usage.

This made it possible to spot a number of patterns that were relatively hidden until then. For example, on this calendar map, it’s obvious that call volumes are lower on Sundays. But 31st July was a relatively good Sunday, with high call volumes. That’s tough to spot on a line graph because it’s not high in absolute terms — just high for a Sunday.

With this visualisation, our client discovered a number of insights in calling pattern behaviour of their customer segments. For example, the share of rural traffic rises on Sundays, mainly because urban traffic falls while rural traffic is unaffected. It also made it possible for them to identify specific days on which their competitors’ call volumes shot up, and helped them identify which competitor’s campaigns were proving effective against them.

The visualisation is now integrated into their strategic review process.

Are there some favorite instances you’ve seen of data being interpreted in a visual manner?

Here’s a visualisation of the social network of geeks across different cities in India. An interactive version of this is available athttp://gramener.com/codersearch. We built this to identify who would be a good candidate to hire, as well as decide which city is the best hunting ground for geeks.

Each circle represents a developer. The size indicates the number of followers they have on Github. The colour indicates the language they code in. Networks of followers are connected by lines and clustered together.

   

This is an instance of transforming relatively unstructured data into quantitative metrics (distance between a pair of people; density of a network; etc) and displaying them purely visually, without any numbers. As a result, it conveys far more richness and meaning intuitively to the viewers.

Another instance is this visualisation of the entire history of batting in Indian one-day cricket. The size of the box represents the number of runs scored by the player. The colour indicates the speed at which they scored those runs (red is slow, green is fast.)

It’s evident that among the big scorers, Sehwag is India’s fastest run-getter. Clicking on the players shows an second drill-down featuring every match they’ve played. An interactive version of this is available at http://gramener.com/cricket/batting-India-plain

This compresses over 150 pages of information into a single sheet without any loss. Part of the power of data visualisation comes in this ability to compress information and compactly convey insights.

What are the most significant challenges you face being in the forefront of Data Visualisation space?

Our biggest challenge is recruiting.

Data visualisation requires a combination of statistics, programming and design. People with all three skills. Finding good people with one of these talents is hard enough. The combination is near impossible. As a result, we’re spending considerable time building our internal training programme and hope to churn out more people with the skills we need.

Another challenge is the corporate procurement cycle.

Almost every person on the business side understands the value of data visualisation instantly. Once we hit the procurement team, however, there is the learning curve around how to classify data visualisation (software? service? consulting?) and how to price it (by number of users? reports? templates? rows of data?) We spend a fair bit of time educating our corporate customers and evolving our commercial models.

What is your projection of the growth of the Data Visualisation practice in the future?

Depending on the report you look at, the number is anywhere between $5 billion to $50 billion. We don’t have enough data to predict this with confidence. But in our experience, of the 100+ organisations we’ve met, every single one clearly expressed a desire for data visualisation. Whatever the market size is, it’s large enough for us and a number of other players put together.

What are the next plans for Gramener?

Growing the team is our immediate priority. We’re 60 people strong, spread across Hyderabad, Coimbatore and Bangalore. To meet our pipeline, we need to rapidly recruit and train many more.

We’re moving into visualisations of non-quantitative data. There’s a lot more text out there than numbers, and it’s possible to mine information from that. For example, even a pure-text corpus like the Mahabharatha lends itself to social network analysis.

We’re also expanding our partner network and working with systems integrators. Our plan is to focus more on the product, and work with implementation partners to actually create the visualisations.

Anything else you wish to add?

Data visualisation is a skill. We’re also trying to develop this skill in the community. We’re partnering with IIIT, JNTU, etc on a course on data visualisation that’s offered to programmers. We’re working on a course on data visualisation for the non-programmer as well.

We’re also involved in the data community, with groups such as datameet. Once, the problem used to be “How do we get the data?” Today, the problem’s more of “What do we do with all this data?” We’re hoping to work with such people, helping them understand how to analyse and visualise large scale data.

If you’d like to learn more, please feel free to reach out to us: http://gramener.com/contact

Tomorrow’s business leader

Gramener was chosen as the winner in the Lufthansa pioneering spirit episode # 3 for being the ‘best pioneer of tomorrow’ in the industry, winning with a score of 85/100.

(Click to watch the episode)

Pioneering Spirit, a product of Lufthansa is a unique television series that showcases the inventions of business pioneers and gives entrepreneurs a once-in-a-lifetime opportunity to be selected as ‘pioneers of tomorrow’ .

Three entrepreneurs with the most compelling ideas get a chance in each episode to present their business plan face to face with a doyen of their industry to win the title.

Gramener’s vision was presented by COO -Naveen Gattu and was selected as the best among the other competitors from the Industry after a rigorous assessment by Mr.William Bissell (Managing Director FabIndia) who was the judge for the episode.

You may watch the repeat telecast of Gramener winning on ET Now on these dates:

6:30pm on 25th July, Wednesday
6:30 pm on 28th July, Saturday
6:30 pm on 29th July, Sunday