How does one create new data visualisations? Apart from the art, is there a science to it?
Let’s explore a few popular charts. We have the vertical bar graph
or the horizontal bar graph
. The stacked bar
. The variwide or Marimekko chart
. The waterfall
. The scatterplot
. The treemap
. And so on.
The first thing you’ll observe is that all of these are a series of rectangles. (We’re treating the dots on the scatterplot as little squares.) The only thing that varies across these charts is the position and size of the rectangles – and the colour as well.
That gives us a hint. Perhaps there are many ways of creating visualisations just by changing the position, size and colour of rectangles. For example the horizontal bar graph
can be constructed as follows:
- The x position is constant for each rectangle. It starts at zero.
- The width is proportional to the value of the series
- The y position is proportional to the index of the values (1,2,3,…)
- The height is constant for each of the bars
- The colour is constant too.
Whereas, if we look at a horizontal stacked bar
, then:
- The x position is proportional to the cumulative value of the series.
- The width is proportional to the value of the series
- The y position is constant at zero
- The height is constant for each of the bars
- The colour is based on the index of the values (distinct colours labelled 1,2,3,…)
Generalising this, we can construct a table like this that shows the structure of various visualisations:
| Chart | x | width | y | height | colour |
|---|---|---|---|---|---|
| Vertical bar chart | index | constant | constant | value | constant |
| Stacked bar | index | constant | cumulative | value | index |
| Waterfall | index | constant | cumulative | value | constant |
| Scatterplot | value | constant | value | constant | index |
| Horizontal bar chart | constant | value | index | constant | constant |
| Variwide | cumulative | value | constant | value | constant |
That leads to a line of thought: what if we tweaked this table? Would we get new visualisations that might be interesting?
Let’s experiment with a few.
What if we took the waterfall chart, and made the constant widths proportional to value, instead? The waterfall chart shows a cumulative series of values (e.g. percentages). This new chart – a cascade chart – allows us to depict each bar’s relative importance as well as value.
What if we kept the width, height and y constant, and just let the x values vary as the index? It would just be a row of boxes. But we’d have the option of colouring them with a value. This could be useful when showing performance along a discrete series (e.g. attendance by weekday).
What if we allowed the x, y, width, height and colour to vary with a different value? The graph looks like a scatterplot, but every dimension here – position, size, colour, even aspect ratio – indicates some informational measure.
This chart can, for example, show the position and spread of two metrics. For example, if the X-axis were sales, and the Y-axis were price, each bar could be the distribution of price and sales in a branch, with the colour indicating growth of the branch.
Just using the combinations discussed above, there are 75 possible types of visualisations – many of which are meaningful in different circumstances. And this is just using rectangles.
What we’ve done here is mapped data to attributes of a visualisation. This is part of a generalised approach to graphics, similar to that covered by Leland Wilkinson’s Grammar of Graphics and implemented in libraries like ggplot2 or D3. Once we establish that basic concept – that a chart is a mapping of attributes to data – the variety of charts you’ll be able to create is unlimited, and you move from being a user of charts to a composer of data-driven visualisations.