One of our colleagues nearly lost a piece of analysis recently. Here’s the most boring / valuable advice she can get on how to organise analysis — or any form of work for that matter. Of course, you could always learn git.
RAM matters. Our experience is that RAM is the biggest bottleneck with large datasets. Things speed up an order of magnitude when all your processing is in-memory. A 16GB RAM is an ideal configuration. Do not go below 8GB.
Big drives. The next biggest driver is the hard disk speed. But you don’t necessarily need an SSD. If your data fits in memory, then most data access is sequential. An SSD is only ~2X faster than a regular hard disk, but much more expensive. (If you’re running a database, then an SSD makes more sense.) For hard disks, larger hard disks are also faster due to higher storage density. So prefer the 1 TB disks.
The CPU doesn’t matter. Make sure you have more cores than data intensive processes, but other than that, it’s not an issue.
However, one common theme we find is that heavy data science work happens on the cloud, not on the laptop. That’s what you need to be looking for — a good cloud environment that you can connect to.
For example, this Frontanalytics report recommends a basic laptop with long battery life, the ability multi-task (i.e. multiple cores), and a backlit keyboard for the night.