How does one create new data visualisations? Apart from the art, is there a science to it?

Let’s explore a few popular charts. We have the vertical bar graph or the horizontal bar graph . The stacked bar . The variwide or Marimekko chart . The waterfall . The scatterplot . The treemap . And so on.

The first thing you’ll observe is that all of these are a series of rectangles. (We’re treating the dots on the scatterplot as little squares.) The only thing that varies across these charts is the position and size of the rectangles – and the colour as well.

That gives us a hint. Perhaps there are many ways of creating visualisations just by changing the position, size and colour of rectangles. For example the horizontal bar graph can be constructed as follows:

- The
**x position**is**constant**for each rectangle. It starts at zero. - The
**width**is proportional to the**value**of the series - The
**y position**is proportional to the**index**of the values (1,2,3,…) - The
**height**is**constant**for each of the bars - The
**colour**is**constant**too.

Whereas, if we look at a horizontal stacked bar , then:

- The
**x position**is proportional to the**cumulative**value of the series. - The
**width**is proportional to the**value**of the series - The
**y position**is**constant**at zero - The
**height**is**constant**for each of the bars - The
**colour**is based on the**index**of the values (distinct colours labelled 1,2,3,…)

Generalising this, we can construct a table like this that shows the structure of various visualisations:

Chart | x | width | y | height | colour |
---|---|---|---|---|---|

Vertical bar chart | index | constant | constant | value | constant |

Stacked bar | index | constant | cumulative | value | index |

Waterfall | index | constant | cumulative | value | constant |

Scatterplot | value | constant | value | constant | index |

Horizontal bar chart | constant | value | index | constant | constant |

Variwide | cumulative | value | constant | value | constant |

That leads to a line of thought: what if we tweaked this table? Would we get new visualisations that might be interesting?

Let’s experiment with a few.

What if we took the waterfall chart, and made the **constant** widths proportional to **value**, instead? The waterfall chart shows a cumulative series of values (e.g. percentages). This new chart – a cascade chart – allows us to depict each bar’s relative importance as well as value.

What if we kept the **width**, **height** and **y** constant, and just let the **x** values vary as the **index**? It would just be a row of boxes. But we’d have the option of **colour**ing them with a **value**. This could be useful when showing performance along a discrete series (e.g. attendance by weekday).

What if we allowed the **x**, **y**, **width,** **height** and **colour** to vary with a different **value**? The graph looks like a scatterplot, but every dimension here – position, size, colour, even aspect ratio – indicates some informational measure.

This chart can, for example, show the position *and* spread of two metrics. For example, if the X-axis were sales, and the Y-axis were price, each bar could be the distribution of price and sales in a branch, with the colour indicating growth of the branch.

Just using the combinations discussed above, there are 75 possible types of visualisations – many of which are meaningful in different circumstances. And this is just using rectangles.

What we’ve done here is mapped data to attributes of a visualisation. This is part of a generalised approach to graphics, similar to that covered by Leland Wilkinson’s Grammar of Graphics and implemented in libraries like ggplot2 or D3. Once we establish that basic concept – that a chart is a mapping of attributes to data – the variety of charts you’ll be able to create is unlimited, and you move from being a user of charts to a composer of data-driven visualisations.