This lesson is being piloted (Beta version)

Customising plots created with ggplot2

Overview

Teaching: 60 min
Exercises: 30 min
Questions
  • How can I add custom titles and labels to my plot?

  • How can I combine multiple panels into a single plot?

  • How can I change the overall appearance of a plot?

Objectives
  • Change the default title and labels in a plot.

  • Use facetting to increase the readability of complex plots.

  • Use themes to adjust the overall appearance of a plot.

We start by loading the required package. ggplot2 is also included in the tidyverse package.

library(tidyverse)

If you don’t have the data loaded in your current R session you’ll have to import into R before you can proceed.

interviews_plotting <- read_csv("data_output/interviews_plotting.csv")
Parsed with column specification:
cols(
  .default = col_logical(),
  key_ID = col_integer(),
  village = col_character(),
  interview_date = col_datetime(format = ""),
  no_membrs = col_integer(),
  years_liv = col_integer(),
  respondent_wall_type = col_character(),
  rooms = col_integer(),
  memb_assoc = col_character(),
  affect_conflicts = col_character(),
  liv_count = col_integer(),
  items_owned = col_character(),
  no_meals = col_integer(),
  months_lack_food = col_character(),
  instanceID = col_character(),
  number_month_lack_food = col_integer(),
  number_items = col_integer()
)
See spec(...) for full column specifications.

Adding Labels and Titles

By default, the axes labels on a plot are determined by the name of the variable being plotted. This is a reasonable default but often not sufficient for publication quality figures. However, ggplot2 offers lots of customization options, like specifying the axes labels, and adding a title to the plot with relatively few lines of code. We will add more informative x and y axis labels to our plot of proportion of house type by village and also add a title. In one of the previous lessons you’ve already seen how to change the y-axis label with ylab(). Unsurprisingly, the x-axis label can be changed with xlab(). The ggtitle() function allows you to set the plot title.

ggplot(data = interviews_plotting, aes(fill = respondent_wall_type, x = village)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Village") + ggtitle("Proportion of wall type by village")

plot of chunk barplot-wall-types-labeled

Customising legends

That is a bit better but the legend still uses the variable name as title. The guides() function allows you to modify all aspects of the legends present in the plot. Each plot can contain multiple guides, one for each aesthetic that is mapped to a variable. In this bar plot the fill is mapped to respondent_wall_type. This is a categorical variable, for which ggplot generates a legend (instead of a color bar used for continuous variables). You’ll use guides(fill = guide_legend()) to modify its appearance.

ggplot(data = interviews_plotting, aes(fill = respondent_wall_type, x = village)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Village") + ggtitle("Proportion of wall type by village") +
    guides(fill=guide_legend(title = "Wall type"))

plot of chunk barplot-wall-type-legend

Using guides() allows you to easily modify modify many aspects of the legend. However, it does not change the category labels because the legend simply reflects the mapping of data values to visual properties. To change the category labels you have to adjust that mapping directly. The functions used to define the mapping all have names of the form scale_<aesthetic>_<type>. Here you are dealing with a discrete scale for the fill aesthetic, so the correct function to use is scale_fill_discrete(). The labels argument allows you to set the category labels.

ggplot(data = interviews_plotting, aes(fill = respondent_wall_type, x = village)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Village") + ggtitle("Proportion of wall type by village") +
    scale_fill_discrete(labels = c("burnt bricks", "cement", "mud daub", "sun bricks")) + 
    guides(fill=guide_legend(title = "Wall type"))

plot of chunk barplot-wall-type-scale

Facetting

Rather than creating a single plot with side-by-side bars you may want to create multiple plots. This is especially true if you want to display multiple variables at once.

ggplot2 has a special technique called faceting that allows you to split one plot into multiple panels based on a factor included in the dataset. Let’s use this to split a plot showing the relationship of irrigation association membership and wall type for each of the villages.

ggplot(data = interviews_plotting, aes(fill = memb_assoc, x = respondent_wall_type)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Wall type") + ggtitle("Proportion of irrigation association membership by wall type") +
    facet_wrap( ~ village)

plot of chunk barplot-facet

As you can see this uses association membership to colour the bars with one set of bars for each wall type and one panel per village. While this is a generally useful technique, and would be even more useful if you were dealing with data from more villages, the result in this particular case could be better. The three panels appear a bit cramped and the wall type labels are hard to read.

Let’s try spreading the panels across two rows instead.

ggplot(data = interviews_plotting, aes(fill = memb_assoc, x = respondent_wall_type)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Wall type") + ggtitle("Proportion of irrigation association membership by wall type") +
    facet_wrap( ~ village, nrow = 2)

plot of chunk barplot-facet-rows

That looks a bit better. It also has the benefit of making the labels a bit more readable.

Exercise

Adjust the legend title and wall type labels as you did before. Which scale_*() function do you have to use to adjust the labels now?

Solution

ggplot(data = interviews_plotting, aes(fill = memb_assoc, x = respondent_wall_type)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Village") + 
    ggtitle("Proportion of irrigation association membership by village") +
    scale_x_discrete(labels = c("burnt bricks", "cement", "mud daub", "sun bricks")) + 
    guides(fill=guide_legend(title = "Association member")) +
    facet_wrap( ~ village, nrow = 2)

plot of chunk barplot-facet-rows-exercise

Themes

A good stratagey to deal with x-axis labels that are to long for the available space is to rotate them by 45$^\circ$. The placement and appearance of the labels is controlled by the theme. A ggplot theme determines the appearance of all parts of a plot that aren’t related to the data. Individual components of a theme can be adjusted with the theme() function. This accepts optional arguments for all components. Values for these arguments are typically functions with names of the form element_<type>(). To change the rotation of the x-axis labels (axis.text.x) you’ll need element_text(), which allows you to set the angle. One other adjustment is needed. By default the center of each text label is lined up with its corresponding tick mark. That works well for horizontal labels but not once they have been rotated. Set hjust = 1 to align the end of the label instead.

ggplot(data = interviews_plotting, aes(fill = memb_assoc, x = respondent_wall_type)) +
    geom_bar(position = "fill") +
    stat_count(geom = "text", 
             aes(label = stat(count)),
             position=position_fill(vjust=0.5), colour="white") +
    ylab("Proportion") + xlab("Village") + ggtitle("Proportion of irrigation association membership by village") +
    facet_wrap( ~ village, nrow = 2) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot of chunk barplot-facet-rotate

This approach is great if you just want to tweak one or two things. For large scale changes this quickly becomes tedious. Fortunately, ggplot provides several pre-defined themes that facilitate large-scale changes to the appearance of a plot.

Let’s take a look at one of these. As you’ll have noticed, ggplot uses a grey background for plots. That works well enough on screen but may be undesirable in print. You can set the background to white using the function theme_bw().

ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count, 
                                       fill = memb_assoc, color = memb_assoc)) +
  geom_boxplot(alpha = 0.5) +
  theme_bw()

plot of chunk boxplot-theme-bw

In addition to theme_bw(), which changes the plot background to white, ggplot2 comes with several other themes which can be useful to quickly change the look of your visualization. The complete list of themes is available at https://ggplot2.tidyverse.org/reference/ggtheme.html. theme_minimal() and theme_light() are popular, and theme_void() can be useful as a starting point to create a new hand-crafted theme.

The ggthemes package provides a wide variety of options (including an Excel 2003 theme). The ggplot2 extensions website provides a list of packages that extend the capabilities of ggplot2, including additional themes.

Exercise

Experiment with at least two different themes. Build the previous plot using each of those themes. Which do you like best?

Customization

Exercise

You already know how to customise things like axis labels and point sizes. Change your plot to incorporate those changes.

Solution

ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count, 
                                       fill = memb_assoc, color = memb_assoc)) +
  geom_boxplot(alpha = 0.5) + labs(x = "Wall type", y = "Number of livestock owned",
                                   fill="Association membership", 
                                   color="Association membership") +
  theme_bw()

plot of chunk boxplot_custom-1

In addition to adjusting the lables you can also change the font and its size. This can be usefulto improve readability and to meet the requirements of a publisher. If you are on Windows, you may have to install the extrafont package, and follow the instructions included in the README for this package.

Exercise

Take a look at the ggplot2 cheat sheet, or the ggplot2 reference. Can you figure out how to change the font size.

Solution

ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count, 
                                       fill = memb_assoc, color = memb_assoc)) +
  geom_boxplot(alpha = 0.5) + labs(x = "Wall type", y = "Number of livestock owned",
                                   fill="Association membership", 
                                   color="Association membership") +
  theme_bw() + theme(text=element_text(size = 16))

plot of chunk boxplot_custom-2

If you like the changes you created better than the default theme, you can save them as an object to be able to easily apply them to other plots you may create. We can also add plot.title = element_text(hjust = 0.5) to center the title:

grey_theme <- theme(axis.text.x = element_text(colour = "grey20", size = 12, 
                    angle = 45, hjust = 0.5, vjust = 0.5),
                    axis.text.y = element_text(colour = "grey20", size = 12),
                    text = element_text(size = 16),
                    plot.title = element_text(hjust = 0.5))


ggplot(data = interviews_plotting, aes(x = respondent_wall_type, y = liv_count, 
                                       fill = memb_assoc, color = memb_assoc)) +
  geom_boxplot(alpha = 0.5) + labs(x = "Wall type", y = "Number of livestock owned",
                                   fill="Association membership", 
                                   color="Association membership") +
  theme_bw() + grey_theme

plot of chunk ggplot-custom-themes

Exercise

With all of this information in hand, take some time to eplore the dataset. Can you identify an aspect of the data that may be interesting to visualise?

Create an informative and visually appealing plot that showcases this aspect of the data. The ggplot cheat sheet and the R graph gallery may provide some inspiration.

Here are some ideas you might explore:

  • Take another look at the scatterplot of household size and number of items owned you created this morning. Can you use geom_count() to improve this plot?
  • Create a plot showing how often respondents have been involved in conflicts with other irrigators (affect_conflicts). Does this differ between those that are members of an irrigation association and those that aren’t?
    • What type of plot is best suited for this?
    • Is the order in which ggplot() presents the factor levels appropriate? How could you change that?
  • To what extend does the number of months a household has not had sufficient food during the last year (number_month_lack_food) affect the number of meals members of the household have per day?
    • Does this differ between villages?
    • Note that the responses for number_month_lack_food are always recorded as whole months. Can you adjust the grid lines in the plot to only occur at values that are valid responses?

Note: Feel free to transform the data or compute derived variables as necessary.

Key Points

  • ggplot2 allows plots to be customised in many ways.

  • The appearance of plot elements can be adjusted individually or via themes.