Sharing analysis

Here are three links that you should go through this week:

  1. If you’re look­ing for ways to share your ana­lys­is, RStudio 1.0 is out. The biggest fea­ture is R Notebooks, which are like Jupyter note­books. At Gramener, we’re us­ing RStudio server to col­lab­or­ate. Airbnb’s Knowledge Repo is an­other op­tion.
  2. If you’re fil­ter­ing data, be aware of Simpson’s para­dox. It ex­plains how Derek Jeter’s bat­ting av­er­age is higher than David Justice’s though the lat­ter per­formed bet­ter every year.
  3. Prepare for data sci­ence in­ter­views with this com­pil­a­tion of 109 data sci­ence in­ter­view ques­tions.

Speaking of Simpson’s para­dox, be wary of stat­ist­ic­al sig­ni­fic­ance as well:

Awesome public datasets

Here are three links you should go through this week:

  1. A cata­logue of open pub­lic data­sets, grouped by do­main. (With this list, you won’t be short of sample data for any do­main.)
  2. How to stay aware of the latest in data sci­ence? A short col­lec­tion of news­let­ters with their fre­quency and qual­ity.
  3. What mis­takes do we make when de­cid­ing based on data? The cog­nit­ive bi­as cheat sheet lists dozens of bi­ases we have and con­denses them in­to a poster.

Be warned – all three links suf­fer from this same prob­lem:

Automated data science platforms

Here are three data stor­ies this week that are worth your time:

  1. Will auto­mated data sci­ence plat­forms take over? A dis­cus­sion on red­dit (12 min read)
  2. Which are the top LinkedIn groups for data sci­ence? An ana­lys­is on KDNuggest (7 min read)
  3. A his­tory of the last 1,000 years of European roy­al fam­il­ies (visu­al­isa­tion)

We’ll be share links every week. (But that could be ex­tra­pol­a­tion.)