Here are three links that you should go through this week:
- If you’re looking for ways to share your analysis, RStudio 1.0 is out. The biggest feature is R Notebooks, which are like Jupyter notebooks. At Gramener, we’re using RStudio server to collaborate. Airbnb’s Knowledge Repo is another option.
- If you’re filtering data, be aware of Simpson’s paradox. It explains how Derek Jeter’s batting average is higher than David Justice’s though the latter performed better every year.
- Prepare for data science interviews with this compilation of 109 data science interview questions.
Speaking of Simpson’s paradox, be wary of statistical significance as well:
Here are three links you should go through this week:
- A catalogue of open public datasets, grouped by domain. (With this list, you won’t be short of sample data for any domain.)
- How to stay aware of the latest in data science? A short collection of newsletters with their frequency and quality.
- What mistakes do we make when deciding based on data? The cognitive bias cheat sheet lists dozens of biases we have and condenses them into a poster.
Be warned – all three links suffer from this same problem:
Here are three data stories this week that are worth your time:
- Will automated data science platforms take over? A discussion on reddit (12 min read)
- Which are the top LinkedIn groups for data science? An analysis on KDNuggest (7 min read)
- A history of the last 1,000 years of European royal families (visualisation)
We’ll be share links every week. (But that could be extrapolation.)