Sharing analysis

Here are three links that you should go through this week:

  1. If you’re looking for ways to share your analysis, RStudio 1.0 is out. The biggest feature is R Notebooks, which are like Jupyter notebooks. At Gramener, we’re using RStudio server to collaborate. Airbnb’s Knowledge Repo is another option.
  2. If you’re filtering data, be aware of Simpson’s paradox. It explains how Derek Jeter’s batting average is higher than David Justice’s though the latter performed better every year.
  3. Prepare for data science interviews with this compilation of 109 data science interview questions.

Speaking of Simpson’s paradox, be wary of statistical significance as well:

Awesome public datasets

Here are three links you should go through this week:

  1. A catalogue of open public datasets, grouped by domain. (With this list, you won’t be short of sample data for any domain.)
  2. How to stay aware of the latest in data science? A short collection of newsletters with their frequency and quality.
  3. What mistakes do we make when deciding based on data? The cognitive bias cheat sheet lists dozens of biases we have and condenses them into a poster.

Be warned – all three links suffer from this same problem:

Automated data science platforms

Here are three data stories this week that are worth your time:

  1. Will automated data science platforms take over? A discussion on reddit (12 min read)
  2. Which are the top LinkedIn groups for data science? An analysis on KDNuggest (7 min read)
  3. A history of the last 1,000 years of European royal families (visualisation)

We’ll be share links every week. (But that could be extrapolation.)