Getting started with analytics

We held work­shop on Data Wrangling and EDA with UpGrad. Slides for wrangling & EDA and the ana­lyses on Cricket & Bank Marketing are on­line.

Kathir explaining EDA at Upgrad
Kathir ex­plain­ing EDA at Upgrad

Here’s the video of the data wrangling ses­sion (1 hour.)

Among oth­er things, this ses­sion ana­lyses the fas­cin­at­ing OKCupid data­set, where thou­sands of users have answered over 2,000 ques­tions, cov­er­ing:

  • Cognitive ques­tions (What causes Earth’s sea­sons?)
  • Opinions (Do you think rep­tiles are cool?)
  • Politics (Do you think taxes are jus­ti­fied?)
  • Preferences (Do you en­joy gos­sip?)
  • Sex (Do you read erot­ic fic­tion?)

This blog post is not about the talk, how­ever. Before we star­ted, we asked what the audi­ence would like covered to know — and this is about what they asked.

Many people were in­ter­ested in how pro­jects are ex­ecuted. For ex­ample, How do I ex­ecute a pro­ject? What hap­pens be­hind the scenes? Can you show us the dis­carded ana­lyses / visu­als? What chal­lenges do you face with big data? This work­shop wasn’t meant for that, but we’ll plan one to talk about this.

Another pop­ular theme was what re­sources are re­quired for ana­lys­is. For ex­ample, What tech­no­lo­gies do we use? What tools should we learn? What stat­ist­ic­al / ma­chine learn­ing tech­niques should we know?  This is a top­ic in it­self, but we will it­er­ate that you can be a good ana­lyst without learn­ing stat­ist­ics and with just Excel.

A third set of ques­tions were around how we ana­lyse data. For ex­ample, How do we clean data? How do we ana­lyse data in a new do­main? How do we de­term­ine the de­pend­ency between vari­ables? This was the fo­cus of the ses­sion.

A fourth set of ques­tions emerged af­ter the talk, and in one-on-one con­ver­sa­tions. They were all about how to get star­ted. For ex­ample, I’m new to data ana­lyt­ics — how do I get star­ted? I’m ex­per­i­enced in oth­er fields, but what to en­ter ana­lyt­ics — how do I get star­ted? I’m look­ing for pro­jects in ana­lyt­ics — how do I get star­ted?

Our ad­vice is:

  1. Practice. The 10,000 hour rule ap­plies. The more data­sets you ana­lyse, the more you learn. Contests such as from Kaggle or CrowdAnalytix help.
  2. Work to­geth­er. Join com­munit­ies such as DataMeet or DataKind to find oth­ers like you.
  3. Learn. Courses help you learn in a struc­tured way. There are sev­er­al of these, in­clud­ing from EdX, IIMB, ISBJigsawUdacity, UpGrad

Leave a Reply