Predictive Modelling Of Stakeholder Behaviors Using Past (Large) Datasets


Stakeholder here refers to consumers, voters, patients and subjects of any heterogeneous or homogeneous groups. In short, groups of people who interact with a business or institution.


Gramener’s corporate customers store rich data about their stakeholders – be it consumers, be it patients/subjects, be it shareholders or be it voters/participants of a survey. Their past behavior/actions contain plenty of information about their preferences to certain products, triggers that led to switch to a competitor’s offering, or motivations to become an experimental subject in the lifecycle of drug discovery. In this document Gramener explains how it has used past data to predict the behavior of a corporate conglomerate’s shareholders.

Note 1: To protect the identity of this customer, Gramener is refraining from stating the exact nature of their business while ensuring the essence of the predictive analysis is preserved.

Note 2: It is important to note that principles used here are equally applicable in predicting the behavior of stakeholders in any nature of business irrespective of industry, size or geography.

Intended Audience: References to Data Modeling & Statistical methods are part of this write-up. Familiarity of these concepts are not a pre-requisite to appreciate the essence of the message in the document. However, readers who would like to know the details of these techniques may refer to Appendix A – ‘Project Details & Execution’.

Business Case

A US based business conglomerate welcomes their shareholders to participate in corporate decision making. This is achieved by shareholders voting to let know of their opinion on various issues. This business conglomerate would like to use past data and predict the voting behavior of these shareholders. High voting percentages is a representation of the engagement levels these shareholders have with this company.

Past Data is the key resource

This customer has plenty of information on billions of its shareholder’s past behavior; terabytes of data with hundreds of columns and billions of rows. They would like to predict and influence the voting percentages by analyzing the past data. Gramener with its’ analyzes and visualization solutions donned the mantle of the consultant and the doer for this predictive exercise.

Approach in brief

The approach can be split into four broad headers

Consolidating currently recognized variables & data


A good predictive model needs to recognize all the variables which influence the problem in a holistic way. Hence, creating an exhaustive list of influencing variables is a critical first step to study the problem at hand.

Supplementing extrinsic variables as additional influencers

CaptureWhile starting a predictive model it is very important to look beyond the currently known variables to see any other large influencers. With this in mind and in discussion with the customer Gramener synthesized extrinsic and relevant data which were added to the columns to be considered along with known variables. For example, though the customer knew details at a shareholder level, similar data for the industry and competition landscape were scraped from the internet to be included.

Dimensionality reduction

CaptureWhile considering all influencing variables were critical, this increased the dimensions of the problem beyond manageable levels with over 600 variables. Delineating those variables which influence the voting percentage above a threshold level helped to reduce the problem dimension to manageable levels. Prioritizing and identifying these impactful variables was done by striking a balance between analytical techniques and inputs from the business user.

Data Modelling & Choice of Algorithms

CaptureWhile there are many time tested statistical models for predictive modelling and a natural choice would have been a ‘classical’ administration of such an algorithm, Gramener tempered the choice of such algorithms with business and practical considerations. Details of this approach is provided in Appendix A. The selection of algorithms was moderated with domain knowledge, customer preferences and practical considerations.

Simplified consumption

CaptureThe outputs were implemented using Gramener’s visualization engine which produced visual outputs for uniform consumption and action across the board.

Inherent Challenges & Gramener’s value add

Selection of right tools & methods while too many variables influence outcomes

In the start, Gramener had to deal with 600 variables. Neither the magnitude of impact of variables on the outcome are uniform and nor a pattern can be gleaned by merely studying the variables. Segregating those variables which are statistically significant was not easy considering the volume of data.

Analytical techniques used helped to reduce the number of variables judiciously during the preprocessing and elimination stages. A misstep at this preprocessing stage would have muted the impact of some important variables leading to unintended consequences and wrongful conclusions.

Right proportions of analytical techniques and business inputs

Over dependency on existing businesses biases or over reliance of analytical techniques both could have lead to wrong outcomes.

The right balance was very critical – using existing biases and strengthen it with a guidance from analyses methods. Experience of doing similar projects, team’s expertise in both descriptive and predictive aspects of data and ability to tweak analytical models based on business considerations – all this led to achieve the balance.

Keeping data models relevant to the business scenario

Rather than going by variables and data, the emphasis was on understanding customer’s motivation in doing the predictive modelling.

The customer would not have been able to act on some of the recommendations irrespective of what the statistical model suggested and hence these limitations were pre-built into the algorithms. This helped to ensure that levers identified which will be used to influence the desired outcomes did not lack practical considerations. For example, the communication channels like snail mails which were traditionally unreliable were eliminated from the algorithm despite the collected data having lot of information about snail mail channels.

Key Conclusion

From a set of past data, Gramener’s methodology helped this company understand plausible actions and practical insights on predicting the outcome of the target metric – voting % in this case. The fact that outputs given were visuals further helped the teams to consume these insights without any loss in translation.

General Conclusion

This is an example of Gramener’s work on how predictive models and algorithms can be used to understand and influence stakeholder behavior from large data sets. These same techniques are applicable in many business scenarios to predict the likely:

  • Enrollment of subjects in a clinical trial
  • Adoption of a new product by consumers
  • Churn of loyal customers from a telecom network
  • Impact of a new advertisement campaign on brand loyalists

Appendix A

Project Details & Execution Further details on the project executions are provided here and is meant for readers who are familiar with Big Data and Statistical vocabularies. The execution approach can be split into phases

Application of known & extrinsic variables from Varied data sources

Structured Data – Variables and associated data existed in multiple data sources within the company. Assimilation of this data from various data sources was the first step in understanding the problem better. With terabytes of data and more than billions of rows, the fastest way to process was with automated queries. This reduced human intervention errors and also sped up the first cut analysis which helped to understand the problem’s dimensions better – number of variables, need for data cleansing and data volumes etc.

Unstructured Data – External data from unstructured sources was collated and appended to the structured data taken from the corporate database. This helped to bring new dimensions in analyzing the data and increased the possibility of a holistic approach for generating insights. (For example: publicly available competitor shareholder information was brought in to be used along with existing data)

Dimensionality reduction through Modularized Variable Pre-processing

Problem dimension extended to over 600 variables and associated data. This made the problem very difficult to manage and made it untenable for any meaningful processing. The dimensions had to be reduced while ensuring all the impactful variables and associated data were still in consideration. Modularizing and pairing the different variables made the analysis quick and repeatable for identification of all those impactful factors

Quantitative Measures: Group Means – 90 percentile methodology, Multivariate analysis were some of the techniques used to test the impact of a variable on the sensitivity of the identified target metric – voting %. These class of techniques were found less taxing on the hardware but were very effective in quantifying the magnitude impact of each variable on the target metric.

Significance testing measures: Quantitative measures like T test, Walds test etc. were some of the techniques which were further used to delineate random influence of a variable. This clearly established the real significance of each variable on the target metric.

Choice of Data Models and Algorithms: Constructing the predictive model

Preprocessing and delineation of impactful variables helped to reduce the dimensions of the problem and now advanced analytics could be done on the most meaningful set of variables.

The two main purposes for data model building were:

  1. Cluster all the voters into common logical groups based on profiles
  2. Prescribe actionable ways to predict and improve voter participation

Data models were constructed for each of the voters involved. This helped to cluster each voter into groups based on the outcome of their participation. Those shareholders with similar traits leading to low or high voting participation could now be targeted to make marketing campaigns more effective. For example, Shareholders belonging to certain geography, income group and certain education levels with low number of outstanding shares had generally voted less frequently.

Data models thus helped in identifying those levers which may improve voting participation. For example, giving the shareholders more number of days available to vote would improve their participation rate from 18% to 22%.

For predictive modelling, Decision Trees were found transparent in their working and visual in their outputs. However, decision tree algorithms were not administered as-is. Judicious choice of split levels helped to tweak the algorithm to suit the problem at hand. For example, some split levels were ignored to accommodate practical considerations –lead times (# of days) below certain split levels were discarded despite their statistical significance, since it was not practical to have communication reach the voters within these lead times.

Outputs as Visuals for easy consumption: Exploratory and Interactive

The most impactful insights from the analysis were condensed into visuals. From a set of past data, now we had a set of visually consumable outputs as actions & insights. The impact of these actions on the predictability of voting % was clear since they were easily understood by all due to visual representation. This led to meaningful action oriented discussions among the customer teams since mental models were common.

Visuals were also exploratory:  Apart from predictive modelling, Gramener’s visual outputs helped all users irrespective of their skill levels to explore the insights better on how each variable influenced the outcome. For example, sending a communication on Tuesdays had the maximum impact on the voting % – this was not something that intuitively known before the analysis. When this was visually represented, it become evident for all & led to further exploration of the impact of other business days on the outcome.

Interactive decision trees: The decision trees were mines of various insights and which were converted into interactive web links. User could interact with these links for various contexts that they were particularly interested in. For example, a user may be interested in knowing the impact of a communication channel, while another user may want to see the impact of a geographic cluster of voters.

Data science news

How businesses can benefit from visual analytics

Enterprises seek innovative techniques that help them draw attention to key messages and allow them make informed business decisions in complex situations. Visual analytics is one such method that allows decision makers to gain insight into complex problems. It simplifies data values, makes it easy to understand and helps enterprises in communicating important messages and insights which otherwise would be difficult to understand without deep technical expertise.

The practice of presenting information visually is nothing new and the industry has witnessed a growing progression in the techniques down the years; starting with hand-drawn simple charts and tables followed by spreadsheets giving rise to graphs such as bar graphs, pie charts, and line graphs.

Data Visualization: Providing Data Insights that Cause Change

Great data insights don’t mean much if the folks controlling change don’t understand them or don’t have the time to pour over columns of data. Enter data visualization; the key to getting data insights to cause change that improves your market performance.

Data visualization
Data visualization is a little like herding cattle — it’s expensive and time-consuming, but, ultimately, necessary if you want to generate profits from your cows.
Of course, data visualization is only 1 means of corralling big data into something useful. Data analysis using statistical tools to generate descriptive, predictive, and prescriptive data analysis also synthesizes meaning from big data.
Even with data analysis, data visualization makes it easier to see not only descriptive data like height, age, and income, but predictive analytics reflecting the relationships among data, and prescriptive data showing the best alternative solutions.

Big Data Analytics, Mobile Technologies And Robotics Defining The Future Of Digital Factories 

47% of manufacturers expect big data analytics to have a major impact on company performance making it core to the future of digital factories.

36% expect mobile technologies and applications to improve their company’s financial performance today and in the future.
49% expect advanced analytics to reduce operational costs and utilize assets efficiently.

Gartner Predicts Three Big Data Trends for Business Intelligence

Big data has given businesses a window into valuable streams of information from customer purchasing habits to inventory status. However, internal data streams give only a limited picture, especially with the growth of digital business.

Three trends Gartner has identified describe information’s ability to transform business processes over the next few years.

No. 1: By 2020, information will be used to reinvent, digitalize or eliminate 80% of business processes and products from a decade earlier.
No. 2: By 2017, more than 30% of enterprise access to broadly based big data will be via intermediary data broker services, serving context to business decisions.
No. 3 By 2017, more than 20% of customer-facing analytic deployments will provide product tracking information leveraging the IoT.

Data science news

Paint by Numbers. Harness the Power of Data Visualization 

The spreadsheet is an object of obsession for data analysts and quantitative thinkers. But for business leaders, sorting through countless rows and columns of raw data isn’t ideal because their time is precious.

Data is crucial to your business’s success since it can drive some of your most important decisions. But to get anything out of it, the data  has to tell a story. And that’s where data visualization comes in. The easiest way to tell a data story may be with visuals (an infographic, a chart or a graph) and this can be very effective.

When data storytelling is done well, it engages the intended audience and allows for quick, clear communication. It can guide your business in the right direction and help many parties make informed decisions.

Why Graphic Content and Data Visualization Are Good for Business

In the content-hungry world of online media, there’s more potential than ever for evergreen graphic and data visualization content.
Various marketing channels crowded with competing messaging means marketers are constantly under pressure to create compelling content. Online platforms used to highlight messaging and promote branded content require a steady infusion of fresh content to keep audiences engaged. Developing a continuous stream of fresh content that generates audience interest can be challenging for any brand. However, it can also be an opportunity in terms of visual content marketing.

In the content-hungry world of online media, there’s more potential for evergreen content that can be used to create engagement over time. The need to constantly create compelling content is driven by people who are scouring the Internet for information that can help them understand market trends, compare competing value propositions, or aid decision-making. In many cases, brands are sitting on a lot of data that could position them as thought leaders, provide the insights audiences are seeking, and attract people to their business. While many brands use static infographics to share market intelligence, brands can also use existing standards for graphics processing and open-source technologies to turn the same data into engaging, interactive visualizations.

2015 Will Be the Year of Turning Big Data into Actionable Insights

We are living in the era of Big Data. The amount of data being collected has been growing exponentially, with no sign of slowing down anytime soon. It is predicted that in 2015, worldwide spending on Big Data-related software, hardware, and services will grow to $125 billion (IDC). Big Data provides a treasure trove of possibilities for marketers, but we have spent so much time fine-tuning our data collection that we have not moved beyond basic forms of analysis. This will be the year that companies small and large will take their data analytics to the next level. Turning data into truly actionable insights will supercharge your marketing and make your data more precious than ever.

Gone is the time when Business Intelligence (BI) meant an expensive, clunky tool from a software company like Oracle or SAP. BI tools have become more user-friendly and have put more emphasis on data visualization. Having a data scientist who can run extremely complex data models is great, but spreadsheets can be hard to present to your board of directors. That is why it is crucial for analysts to offer great communication and storytelling. Analysts need to present data in a way that makes sense to almost anybody. New BI and data visualization tools are making this much more attainable, as well as becoming easier to access through the cloud. According to Gartner, “by 2016, 25% of net-new BI and analytics platform deployments will be in the form of subscriptions to cloud services.” BI and data visualization are becoming easier to use and more readily available. This will open up the opportunity for companies of all sizes to gain incredible insights from their data.