Creating Data Story through Clustering

Understanding associations within indicators

Writeup by Analytics team

Problem Statement

Technology has been a major enabler to innovation as well as business growth in the last decade or two. With the rise of the internet, mobile technology, and governments investing in a technology infrastructure, smaller countries have been able to make their mark and become a significant player in the world economy. World Bank approached Gramener to help understand the relationship b/w technology and other relevant indicators, so as to develop a compelling story that can be showcased on their website.


Prior to data-collection, a list of questions and draft stories were prepared and shared with World Bank. These draft stories included, but were not limited to, the impact of tourism on GDP, the impact of government investment on latest technology, the impact of science and education on a country’s ability to innovate, etc. After a list of stories were identified, data was taken in the form of indicators for different countries from the World Bank website here

Data was organized by country, each country having an ‘indicator’ whose values were measured as ‘ranks’ or ‘indices’. Each country was also tied back to a region and economic group. Four income groups were identified; “high income”, “upper middle income”, “lower middle income”, and “low income”.

Gramener’s approach

Based on the finalized story, relevant indicators across technology, innovation, business, and entrepreneurship were identified across data-sets, and countries were grouped into regions as well as income groups. Each country was visualized.

K-means clustering was performed on business, technology, and innovation indicators to identify four distinct groups; “most favorable”, “favorable”, “somewhat favorable”, and “least favorable”. The analysis was presented in the form of a data-story, with each pane showing a scatter-plot visualizing two indicators. Relevant insights from the plot were highlighted and summarized.

Scatter-plots were created to identify relationships b/w the various indicators, and relationships that stood out were expanded on. These scatter-plots could be viewed by region as well as income group. Finally, the entire analysis was collated and brought together as one compelling data story.

By Income Group
By Region

Visualizations in the story were interactive, and the user could select a custom list of indicators to visualize on the scatter-plot.

Benefit to the Client

Cause and effect relationships were brought out from the analysis, and the entire analysis was represented visually in the form of a compelling data story. The final data story was published on the World Bank web-site.

Rwanda Rising

Rwanda today is synonymous with ‘growth’. The Sub-Saharan African nation has taken long strides in building an economy that was once in rubbles. The recently published ‘Ease of Doing Business’ rankings by the World Bank Group is a testimony to Rwanda’s growing economic stature.

Analysis: Bhasker Reddy

Design: Ankita Dash


Talking Movies – ‘OK’, how about Talking Data!!

By Amit Pishe

During childhood, we were all at some point pushed or encouraged to participate in Story Telling competition. Story telling had key characters and depiction of inanimate things described through words which kind of used to create some picture or motion in our minds. Stories were used to teach and learn.

Nothing much has changed today in applying the same ‘storytelling’ concept for data/datasets – in what we now call Narratives/ Data Narratives.

Big Data is the buzz word, however visualising and explaining the entire dashboard may pose some challenges, not all the info displayed may be required for all the users.

Narratives/Data Narratives come in handy to pass on all the relevant information, yet being easy to grasp it succinctly. Narratives are basically the Talking data (highlighting insights, trends, unique patterns, exploring factors shaping data) for the data driven stories, they provide crisp, concise information. More advanced in a visualization dashboard format, trying to tell us the characteristics of data.

Leveraging the Data Narratives and writing one (in general):

  1. Understanding the overall data components (metrics & dimension), individually mapping those to weave a story line for a given chart/dashboard. Story line is the central theme of the narration. Always question yourself, why are you considering only that particular data component
  2. Storyline can be translated in multiple ways, pick any one version and compare how it scores with the rest. Are all the data points included?
  3. Flow of information/ insight should transition effectively throughout story without abruptly ending
  4. Target audience needs to be kept in mind – BFSI, Health care, Media, Telecom etc. to provide information accordingly.
  5. Get to know the Inputs, Analysis of the data, result orientation and finally conclusion

Sample Template/use case (Investment Banking): for a given Business user (Racey) by an Indian bound Mutual Fund investment banking firm.

Wealth Management Report

Date: dd-mm-yyyy

Dear Racey,

Information about your portfolio

a. Portfolio update for Last quarter vs current month growth/loss. Net worth for the year ending financial fiscal. Peer rating compared to other investors. Benchmarking the performance.

b. Equity portfolio – returns on investments information, fund index and NIFTY. Type of investments in different segments (Mid cap, advantage funds, MIP and so on). Display chart with comparison between CRISIL, NIFTY, current portfolio returns

Similar investors at our bank has given a return of 8.3%. Crisil Composite Bond Fund Index has given a return of 5.4%. NIFTY has given a return of 2.0%. Your portfolio has given a return of 15.8%, i.e. on par with or better than Similar investors at our bank, Crisil Composite Bond Fund Index and NIFTY

c. Fixed income portfolio details (~ 1% of total portfolio) are mostly in intermediate term funds, this investment is driven by short term plan growth investments options.

d. Your Portfolio description through charts spread across segments

Your equity investments (99% of total portfolio) are mostly in large cap growth oriented funds. Your portfolio is dominated largely by Canara Robeco Treasury Advantage Fund – Institutional Plan- Growth option, UTI – Treasury Advantage Fund – Institutional-Growth and HSBC MIP – Savings – Growth.

e. Suggestion on investment – based on average cash balance in savings account (~15% of savings into xxx plan). I would suggest investing in Pure Value fund growth options

For the detailed report, please refer

Gist: In the sample example, the data components are underlined (Portfolio, Equity portfolio, Fixed income portfolio and so on). The entire dashboard would have multiple other charts, the key data component points will provide holistic view without going into too much specifics.

  1. Main summary is to highlight on how the user portfolio is performing over variety of time periods, provide benchmarking info and analyse the saving patterns of the user.
  2. Based on the savings, suggest the investment as a conclusion.