Which charting library to use?

Here are three links you should go through this week.

  1. What I learned re­cre­at­ing one chart us­ing 24 tools is an ex­cel­lent com­par­is­on by Lisa of 12 visu­al­isa­tion ap­plic­a­tions and 12 lib­rar­ies, with a good sum­mary of which tool to use when.
  2. Can we pre­dict flu deaths with ML and R? Read this R note­book for a step-by-step walk-through of pre­dict­ing wheth­er a pa­tient will sur­vive or not. (There’s also a part 2 that im­proves on this mod­el.)
  3. One of our col­leagues nearly lost a piece of ana­lys­is re­cently. Here’s the most bor­ing / valu­able ad­vice she can get on how to or­gan­ise ana­lys­is — or any form of work for that mat­ter. Of course, you could al­ways learn git.

    If that doesn’t fix it, git.txt con­tains the phone num­ber of a friend of mine who un­der­stands git. Just wait through a few minutes of ‘It’s really pretty sim­ple, just think of branches as…’ and even­tu­ally you’ll learn the com­mands that will fix everything.

Gramener wins at Express IT awards

Gramener has won the Silver prize in the pres­ti­gi­ous Analytics Solutions cat­egory at this year’s Express IT Awards. The awards were ad­judged by em­in­ent jury pan­el com­pris­ing of cor­por­ate strategists, aca­dem­i­cians and thought-leaders from the IT in­dustry. Team Gramener is proud to re­ceive the award from the Chief guest of the event , Honourable Union min­ister Mrs Nirmala Seetharaman.

A Data Scientist’s Laptop

What con­fig­ur­a­tion should a data sci­ent­ist go for?

A KDnuggets poll in­dic­ates a 3-4 core 5-16GB Windows ma­chine.

A StackExchange thread re­com­mends a 16GB RAM, 1TB SSD Linux sys­tem with a GPU.

Quora thread nudges con­verges around 16GB RAM.

RAM mat­ters. Our ex­per­i­ence is that RAM is the biggest bot­tle­neck with large data­sets. Things speed up an or­der of mag­nitude when all your pro­cessing is in-memory. A 16GB RAM is an ideal con­fig­ur­a­tion. Do not go be­low 8GB.

Big drives. The next biggest driver is the hard disk speed. But you don’t ne­ces­sar­ily need an SSD. If your data fits in memory, then most data ac­cess is se­quen­tial. An SSD is only ~2X faster than a reg­u­lar hard disk, but much more ex­pens­ive. (If you’re run­ning a data­base, then an SSD makes more sense.) For hard disks, lar­ger hard disks are also faster due to higher stor­age dens­ity. So prefer the 1 TB disks.

The CPU doesn’t mat­ter. Make sure you have more cores than data in­tens­ive pro­cesses, but oth­er than that, it’s not an is­sue.

However, one com­mon theme we find is that heavy data sci­ence work hap­pens on the cloud, not on the laptop. That’s what you need to be look­ing for — a good cloud en­vir­on­ment that you can con­nect to.

For ex­ample, this Frontanalytics re­port re­com­mends a ba­sic laptop with long bat­tery life, the abil­ity multi-task (i.e. mul­tiple cores), and a back­lit key­board for the night.

Maybe you just need USB port in your arms.

Damn. Not only did he not install it, he sutured a 'Vista-Ready' sticker onto my arm.