Here’s the video of the data wrangling session (1 hour.)
Among other things, this session analyses the fascinating OKCupid dataset, where thousands of users have answered over 2,000 questions, covering:
- Cognitive questions (What causes Earth’s seasons?)
- Opinions (Do you think reptiles are cool?)
- Politics (Do you think taxes are justified?)
- Preferences (Do you enjoy gossip?)
- Sex (Do you read erotic fiction?)
This blog post is not about the talk, however. Before we started, we asked what the audience would like covered to know — and this is about what they asked.
Many people were interested in how projects are executed. For example, How do I execute a project? What happens behind the scenes? Can you show us the discarded analyses / visuals? What challenges do you face with big data? This workshop wasn’t meant for that, but we’ll plan one to talk about this.
Another popular theme was what resources are required for analysis. For example, What technologies do we use? What tools should we learn? What statistical / machine learning techniques should we know? This is a topic in itself, but we will iterate that you can be a good analyst without learning statistics and with just Excel.
A third set of questions were around how we analyse data. For example, How do we clean data? How do we analyse data in a new domain? How do we determine the dependency between variables? This was the focus of the session.
A fourth set of questions emerged after the talk, and in one-on-one conversations. They were all about how to get started. For example, I’m new to data analytics — how do I get started? I’m experienced in other fields, but what to enter analytics — how do I get started? I’m looking for projects in analytics — how do I get started?
Our advice is:
- Practice. The 10,000 hour rule applies. The more datasets you analyse, the more you learn. Contests such as from Kaggle or CrowdAnalytix help.
- Work together. Join communities such as DataMeet or DataKind to find others like you.
- Learn. Courses help you learn in a structured way. There are several of these, including from EdX, IIMB, ISB, Jigsaw, Udacity, UpGrad