The Grammar of Graphics
Data and Aesthetics
Meaning in data visualization, just like meaning in natural language, is built upon a set of rules (grammar). In natural language, there are different grammatical elements, such as “noun”, “verb”, “adverb” and so on. The “grammar of graphics”" similarly has its own elements. Here are the first 3, which are essential:
|Data||The dataset to be plotted|
|Aesthetics||The scales onto which we map our data|
|Geometries||The visual elements used for our data|
The first and most fundamental element is the Dataset to be visualized. Next, Aesthetics govern how to map the data onto the plot scales. Then Geometries describe the type of visual element to be used. There are several different types of Geometries or geoms.
ggplot below, the 3 essential elements:
- The Dataset to be visualized,
- The Aesthetics,
aes(), tell ggplot that the variable
wtis to be mapped to the x axis and the variable
mpgis to be mapped to the y axis.
- The Geom called
geom_pointis then called to add a layer of points to show the selected variables.
library(tidyverse) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
Aesthetics: Colour, Size, Shape & Transparency
Now let’s add in more aesthetic detail. Let’s colour the points according to the number of cylinders:
ggplot(mtcars, aes(x = wt, y = mpg, colour = cyl)) + geom_point()
At the moment, cylinder is being treated as a continuous variable and as such, the colouring is applied along a spectrum from dark to light blue.
Recall in the previous notes that we forced ggplot to treat
cyl as a factor. We can do so again here and note how the plot changes:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) + geom_point()
This discrete colouring is more appropriate and makes it easier to pick out cars by cylinder number. It highlights immediately that cars with more cylinders tend to have lower fuel efficiency.
Let’s add even more detail into the Aesthetic description. This time, we will map
size to the variable
disp (engine size):
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl), size = disp)) + geom_point()
We can pick out members of the population of cars by another variable,
gear and mapping it to the aesthetic detail of
shape. To do so, the variable must be categorical and hence we have told
ggplot to treat
gear as a factor.
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl), size = disp, shape = factor(gear))) + geom_point()
Finally, we can pick out members of the population using transparency with the aesthetic detail
alpha. This detail is capable of handling continuous variables:
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl), size = disp, shape = factor(gear), alpha = wt)) + geom_point()