Invasion of the info age Indiana


Invasion of the info age Indiana

Ishan Srivastava

They incessantly talk about mining and scraping and digging out. An innocent bystander might mistake the discussion to be connected to geological activity. But this is a crowd made up of people passionate about only one thing data. Loads and loads of it. From geeks wearing t-shirts with Hacker written over them to social scientists and NGO activists in kurtas. From 14-year-old hackers to greyhaired policy researchers. They usually meet over mailing lists, Google groups and video calls. But now the movement is getting more visible with events in Gurgaon, Bangalore, Hyderabad and Chennai.

And if you sit through these discussions, you will be surprised at whats thrown up. You will, for instance, find out that when it rains ground water level actually goes down in Orissa even as it goes up in Rajasthan (as expected). That students born in August and September, on an average, do better than other students in 10th and 12th exams in India, while those born in June score the lowest on average. It is the excitement of finding such counter-intuitive facts that brings enthusiasts together. Some just work towards procuring this information while others work to derive meaningful trends from it. Earlier if you wanted to find such trends, you would have to go join a research firm and work with data provided by them. Today, availability of such information has converted every inquisitive person into a scientist and every computer into a laboratory, says S Anand, chief data scientist at startup Gramener. com and a key person behind the movement. Official reports don’t always address your questions. Working with data allows you to ask questions which matter to you and find your own answers.

Some are also driven by gaps in information that exist in our society. Many large cities in India, including metros like Chennai, dont even have an updated list of bus stops or bus routes. Some people in data meets embark on documenting this information and making it available to the public in a usable form for free.

It all started in January last year when Anand was at Infosys along with his colleague and friend Thejesh G N. The duo dabbled in data analysis and visualisation techniques before getting in touch with other like-minded people. The interactions started with mailing lists and soon led to Skype sessions. A common theme was where do we get more data. We were of the view that in India if we looked hard enough for data we would find it, he says.

Data is obtained through various means. It can be publically available information or information accessed by making use of the RTI Act. Other sources include research from books and outreach programmes, i. e. collecting data from the field.

Beginning October last year, the group also started holding small meetings with about 10-20 participants at the first few sessions. Its first formal event, the Open Data Camp (ODC), was held at Googles office in Bangalore on March 24 this year. Around 250 registered, of which about 150 attended. Another one took place at Hyderabad on June 23 at the Indian School of Business campus. The group has been growing online, too, with 2-3 registrations on its mailing list every day. The community is expanding mainly through word of mouth, says Nisha Thompson, project manager at India Water Portal and a key figure behind the organisation of data meets.

Data meets in various cities are independent of each other, have different organisers and have their own focus, based on members interests and backgrounds. While people interested in the tech aspects dominate the Pune and Gurgaon sessions, people in Bangalore focus more on finding sources of data and their application. In Hyderabad, they focus on corporate data while the Chennai crowd is predominantly drawn from the social sector. The common thread that binds the community, albeit loosely, is shared interests without a larger defined agenda.

In terms of focus, August last year was a turning point for most such groups. It was the first time they were approached by NGOs who needed data as well as analysis on the information they had. This was the perfect collaboration. It went from a geek forum to geek plus NGO forum, says Anand.

Soon, individuals with a strong background in technology and who excelled in techniques like scraping (pulling data from web pages and other forms of readable formats like PDF) were joined by social researchers and activists who saw this exercise as an effective tool to create more transparency and accountability in governance. Transparent Chennai, a non-profit organisation that takes up pedestrians problems, slum issues, accountability of councillors and public toilets, was one such organisation. Data about the poor is simply not collected and most of what is provided to us can just be a bundle of papers, says Nithya Raman, founder of Transparent Chennai and also a speaker at a recent data meet in Bangalore.

The NGO has been working with data meets to create sets of data like measuring access to water and mapping out people living in recognised and unrecognised slums. More often than not, the result is an improvement upon existing government data, which is refined further to make it relevant for more users. In case of slums, for instance, their data suggests that more people live in unrecognised slums, with no clear rights, than in the official database. We are also inviting municipal corporation officials to attend data meets from now on, says Raman. From research to action, this is how we are trying to close the loop.

However, it doesn’t mean that this is the only path that the data community will take. We have deliberately kept the community loose. We dont want to set directions. Even if we try to, it wont work, says Anand. It is primarily a knowledge sharing platform driven by people’s interests.

Along with larger events, there are a number of smaller events, too, which provide a platform for focused discussion as well as take interested data novices into the fold. They may go by various names hackathons, scrapathons, designjams, datajams aimed at different groups of people but the primary thing which brings them all together is the unrelenting love for data.

Tech community comes together to talk all things technical


Tech community comes together to talk all things technical

Deepa Kurup

The hot field: Rahul Kulkarni of Google speaking on ‘Crunching big data, Google scale’ at The Fifth Elephant, a two-day conference on big data in Bangalore.

The hot field: Rahul Kulkarni of Google speaking on ‘Crunching big data, Google scale’ at The Fifth Elephant, a two-day conference on big data in Bangalore.

Data enthusiasts from start-ups, big corporations and business enterprises spent two full days at The Fifth Elephant, a big data conference that commenced here on Friday.

The sessions, mostly technical, focussed on infrastructural needs and technologies revolving around managing big data, the hot field of analytics and the critical but less-talked about area of data visualisation.

The event, organised by tech event management firm HasGeek, had around 700 participants registered, and speakers ranged from industry heads — including representatives and researchers from Google, RedHat, Oracle and Flipkart — to techies who’re working with interesting ideas and evolving technologies in the start-up ecosystem in Bangalore, and in the country.

Though big data itself is an emerging field and several conferences are held by corporation along these lines, it is this participation from niche and less-known players doing equally fascinating work that distinguishes this event from the others.

As one young researcher and aspiring entrepreneur from Chennai, attending the event “at his own expense”, puts it: “The event is interesting because we get to meet people that otherwise you only hear of as doing promising work in tech forums. I’ve always wanted to start up on my own, and I am in the processing of doing the groundwork for that. So, an event like this gives me an opportunity to perhaps meet others and figure out how.” That this is an event attended by the ‘tech community’, as opposed to ‘just company representatives’ is what made it worth his while to travel from Chennai.

The sessions presented a wide range of topics, mostly technical. For instance, there was a session by Prabhu Ramachandran, a researcher from the Indian Institute of Technology, Bombay, on his work with Mayavi, a free python-based 3D data visualiser for scientific computing. This Open Source library has evolved over the past decade and is widely used by scientists across research domains. While his was an inspiring tale, several sessions also focussed on the biz aspect of data analytics.

Biswajit Pal, Subhasish Mishra and Manav Shroff from Hewlett-Packard had a presentation on how HP analyses the buying patterns of its customers to predict what they would buy next year, helping them plan for inventory. Another interesting talk was by S. Anand and Ganes Kesari on text visualisation.

Their presentation showed several examples from their own work to show how textual data can be visually represented to help the viewer recognise patterns and make meaning from vast quantities of text.


With a good mish-mash of technology and the application of technologies in the world of business, the event had a fair share of entrepreneurs attending it.

Sriharsha Nagaraj, director of business development at Compassites, one of the participants, said: “I attend a lot of conferences. But what’s exciting here is that there’s a good focus on business and what people who wish to use the benefits of these technologies want.” This is unique to conferences in Bangalore, where events see a lot of community presence and support, unlike say in Gurgaon or other places where tech conferences are dominated by company heads rather that those dabbling in cutting-edge technologies, he adds. The broad appeal of this conference also perhaps has to do with the community-centric approach to designing the conference that HasGeek is known for.

“Our events are organised through a community-driven open voting process. Here, all the speakers have to submit proposals to a public system that we call the ‘funnel’ where their talks are up-voted or down-voted depending on the interest the community has,” says Kiran Jonalagadda, founder-member of HasGeek.

“With this, we ensure that speakers coming to talk at an event like The Fifth Elephant are focussing on the real insights and processes of working with data that the community expects,” he says.

He is extremely pleased with the turnout, which he believes is a “clear indicator of just how excited people are about the possibilities with gaining insights from data”.

Big data that’s worth big bucks


Big data that’s worth big bucks

Deepa Kurup

Realising its worth: More and more businesses, even in India, are looking to crunch their large data sets to see what works and what doesn’t. File photo: AP

Realising its worth: More and more businesses, even in India, are looking to crunch their large data sets to see what works and what doesn’t. File photo: AP

Huge amounts of data are being crunched to create meaningful information

Last week, an official business meet between the chiefs of social media giant Facebook and retail behemoth Walmart raised a few eyebrows.

While officially it was given to understand that Walmart, a retail major which lags behind others such as Amazon in online retail, was looking to enhance its social media presence, tech forums deliberated on the real purpose of the “relationship meet” — data. With over 800 million users, and needless to say, a lot of intricate and often geo-tagged personal data uploaded by them, Facebook presents a data trove like none other, and Walmart, which has been on the ball as far as technology goes, knows that. Just a few months ago, Walmart’s acquisition of ‘Social Calendar’ — a hugely popular Facebook app that people use to track birthdays — was also, obviously, about getting access to and using data, mostly personal, to make better and more customised business decisions.

Today, companies, both at home and globally, are waking up to the value of data. The growing interest in big data has obviously to do with the fact that it is worth big bucks. Driven by the explosion of social media, the all-pervasive use of mobile networks and cloud storage, data has gotten bigger and bigger, so much so that the term ‘big data’ — used in tech parlance to refer to data sets that are large and tough to manage — has come to be known as one that has no prescribed upper limit.

As storage capacity, computing power and parallel processing capabilities expand, the value of data is being realised better. That is, huge amounts of data (this could be data generated within the enterprise or data on it generated online or on social media) is being crunched to create insights or meaningful information. And increasingly, this process, which used to take hours and even days, is now being done in real time. While tools such as Hadoop allowed for real-time analysis of data, Google’s Dremel and other Open Source implementations that are developing in this ecosystem, allows for ad-hoc querying of big data in real time.

Around half a decade ago, when analytics was still much in its infancy, a popular and provocative article in asked if analytics signalled the ‘end of theory’. In the petabyte age, the article pondered, will scientific analysis based on hypothesis, modelling and testing be rendered obsolete? Is theory not relevant anymore?

Today, big data enthusiasts agree. An ‘analyst’ is more of a “tool expert”, or someone proficient in using various data analytical tools, and there is a lot of demand in the market for someone who can do this well, says Rahul Kulkarni, senior product manager at Google India.


More and more businesses, even in India, are looking to crunch their large data sets to see what works and what doesn’t.

“And people are seeing the value in that. Earlier, people were not enthusiastic about storing data, but now they know that data contains insights that can aid crucial decision-making,” he explains. Earlier, taking this data and analysing it was a two to three week cycle, but now most of this is possible in real time, and the benefits of that are immense, he says.

However, several obstacles limit their ability to turn this massive amount of unstructured data into profit, points out Mitesh Agarwal, Chief Technology Officer and Director, System Solution Consulting, Oracle India. The most prominent obstacle among them is a lack of understanding on how to add big data capabilities to the overall information architecture to build an all-pervasive big data architecture. “When big data is distilled and analysed in combination with traditional enterprise data, enterprises can develop a more thorough and insightful understanding of their business, which can lead to enhanced productivity, a stronger competitive position and greater innovation — all of which can have a significant impact on the bottom line.”

Technology-wise, companies are now focussing on ways to make the analytics and query interface as simple as possible. While internally Google uses Dremel to do this for its own processes, for its clients, Google provides analytics as a service. “What we attempt to deliver is analytics interfaces that are so simple that a marketing officer can use it to pose ad-hoc queries to the data set, and be able to extract information that can be used meaningfully,” Mr. Kulkarni explains.


As an emerging tech field, several Indian companies, big and small, have their eyes set on analytics. The bigger outsourcers, such as Wipro, TCS and Infosys, are into analytics services; several other larger global companies across segments ranging from automobile to pharmaceutical, are getting their analytics done here.

Apart from them, many smaller companies and start-ups are into analytics services, and in some sense it is a natural progression from business process outsourcing to knowledge process outsourcing to analytics, says S. Anand, Chief Data Scientist at Gramener, a data visualisation company. His company is into analytics products and specialises in the emerging tech field of data visualisation.

“During the nineties, the services model did well and the products-model in IT did not pick up. That seems to be changing, and in a field like analytics, it now appears we may have the advantage on both,” he says.