Ordinations (Data Reduction and Visualization)

43 General Graphing Principles

Learning Objectives

To introduce the quick.metaMDS() function which encodes desired default settings for metaMDS() arguments.

To review how to create graphics with base R capabilities, including symbol shapes, colors, and combining multiple graphics in a single figure.

To review how to create graphics with ggplot2, including adjusting aesthetics, themes, and using faceting to create separate but related graphics.

To identify ways to save graphics as digital files.

Key Packages

require(vegan, tidyverse)

Introduction

One of the advantages of software like R is that it can perform sophisticated analyses yet also has extremely powerful graphical capabilities.  This allows us to conduct our statistical analyses and to customize the ordinations that visualize patterns.

These notes include examples using the base R graphing capabilities and the (more powerful) features of ggplot2.  As you will glimpse here, virtually every element of a graph can be customized.

Aside: The quick.metaMDS() Function

We’ll use a NMDS ordination to illustrate the general graphing principles.  We saw in the NMDS chapter that the metaMDS() function includes many arguments.  We may want to change arguments from their defaults (e.g., autotransform), but we also often want to keep many of these arguments the same within our code.  Rather than typing them out every time, we can more efficiently hard code them in a function:

quick.metaMDS <- function(dataframe, dimensions, n.try = 40) {
require(vegan)

metaMDS(comm = dataframe,
  autotransform = FALSE,
  distance = "bray",
  engine = "monoMDS",
  k = dimensions,
  weakties = TRUE,
  model = "global",
  maxit = 300,
  try = n.try,
  trymax = 100,
  wascores = TRUE)
}

The arguments that change in this function are shown in red font.  To use this function, we need to specify dataframe and the desired number of dimensions.  The n.try argument refers to the minimum number of starts from new random coordinates.  This argument includes a default of 40 – we can overwrite this value if desired (e.g., n.try = 50) but if we don’t specify this argument the function will execute with 40 starts from new random coordinates.

To adjust other aspects of the NMDS ordination, we can either:

  • Permanently change them within the code of the function (e.g., edit function code so that trymax = 200, or so that distance = "euclidean")
  • Include them in the set of arguments associated with quick.metaMDS()

The quick.metaMDS() function is available in the GitHub repository.  Save it to the ‘functions’ subfolder within the folder where your R project is located.

Oak Example

We’ll use the oak plant community dataset to illustrate graphical procedures.  Use the load.oak.data.R script to load and make initial adjustments to the oak plant community dataset.  The resulting object, Oak1, contains abundances of the 103 most abundant species, each relativized by its maxima.

source("scripts/load.oak.data.R")

 

We’ll conduct a NMDS ordination of the Oak1 object and use that as our example below.  We’ll load the quick.metaMDS() function and then call it:

source("functions/quick.metaMDS.R")

Oak1.z <- quick.metaMDS(dataframe = Oak1,
dimensions = 3)

Base R Graphing

The base R graphing capabilities are helpful to know in their own right, but are also helpful because they are the basis for other approaches such as ggplot2.  For example, the shapes assigned to points in ggplot() are identified using the numerical codes shown below for plotting character (pch).

For more information about general plotting and graphics in R, see Venables & Ripley (2000), Dalgaard (2008), Murrell (2006), and Sarkar (2008).  In addition, there are numerous helpful websites (e.g., https://www.statmethods.net/graphs/index.html).

 

As we’ve seen numerous times, R contains functions whose actions depend on the class of the object being manipulated.  For example, the plot() function can be applied to class ‘metaMDS’ objects:

plot(Oak1.z)

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by open circles and species centroids by red ‘+’ signs.

We accepted all defaults during this action; one of those defaults is to plot both the site coordinates and species coordinates.  We can specify that we only want to display the site coordinates:

plot(Oak1.z,
display = "sites")

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by open circles.

Plotting Functions

To customize graphics, it is helpful to break the graphing down into a series of functions.  Several commonly used functions are shown in the following table.  See the help files for arguments and details.

Function Purpose
plot() Plot graph.  Can specify whether to plot with points (default; type = "p"), textual labels (type = "t"), or axes only (no data; type = "n").  The latter type gives you control over the points and other elements via subsequent functions.
points() Plot points onto existing graph.
text() Add text to existing graph.
lines() Add line(s) to existing graph.
legend() Add legend to existing graph.
title() Add title(s) to existing graph.
abline() Add reference lines to existing graph.  Can specify the intercept (a) and slope (b) of the line, the y-value for a horizontal line (h), and the x-value for a vertical line (v).

Key Plotting Arguments

Each of the above functions has associated arguments as described in the help files.  However, they also have a ‘…’ at the end of the list of arguments.  This means that we can also specify additional arguments.  In particular, we can draw on the arguments from other graphing functions.  The following – and many more – arguments are available; see the help files for functions such as par(), which sets global parameters, for details.

Argument Description
main Text to print as main title for the figure.
sub Text to print as a sub-title for the figure.
cex Scaling for symbols and text.  Default is 1; 1.5 is 50% larger while 0.5 is 50% smaller.
pch Plotting character – either an integer specifying a symbol or a single character to be used as the default in plotting points.  The ‘examples’ section of the help file for the points() function includes code to produce this visual list of options:

 

col Default plotting color.  See below for details.
xaxt Axis type for horizontal axis: "n" = none; "s" = standard (default)
yaxt Axis type for vertical axis: "n" = none; "s" = standard (default)
xlim The limits of the horizontal axis.  Unless otherwise specified, R automatically sets the limits of x to be slightly larger than the minimum and maximum values of the data being plotted. Specify as xlim = c(x1, x2).  Note that x1 > x2 is allowed and leads to a ‘reversed axis’.
ylim The limits of the vertical axis.  Unless otherwise specified, R automatically sets the limits of y so that they are slightly larger than the minimum and maximum values of the data being plotted. Specify as ylim = c(y1, y2).  Note that y1 > y2 is allowed and leads to a ‘reversed axis’.
xlab Text to print as the label on the horizontal axis.
ylab Text to print as the label on the vertical axis.
lty Line type:

  • 0 = blank
  • 1 = solid (default)
  • 2 = dashed
  • 3 = dotted
  • 4 = dotdash
  • 5 = longdash
  • 6 = twodash
lwd Line width: default is 1, 2 is twice as thick.
asp Aspect ratio, calculated as y/x.
pin A vector giving the current plot dimensions, (width,height), in inches.
font Font style:

  • 1 = plain
  • 2 = bold
  • 3 = italic
  • 4 = bold italic
  • 5 = symbol
family Font family. The actual font may differ depending on whether you are running R on Windows or Mac.  In windows, the default options are "serif" (Times New Roman), "sans" (Arial), "mono" (Courier New), and "symbol" (Symbol).
ps Font point size.  Default is 16.  Text size = ps * cex.

Choosing Colors

Colors are an extremely important aspect of visualizations.  They can be identified in multiple ways:

  • By integer number, for each named color
  • By name, for each named color (see below)
  • By hexadecimal string for many more colors.  Each hexademical string has the form “#RRGGBB” where each of the pairs RR, GG, BB consists of two hexadecimal digits giving a value in the range 00 to FF for that color (red, green, blue).

For example, ‘chartreuse’ can be specified as colors()[47] or col = "chartreuse" or col = "#7FFF00".

In recent versions of R, the name or code of a specified color is highlighted in that color when it is typed in a script or in the Console.

Named Colors

There are 657 named colors within R.  224 of these are shades of grey and gray (either spelling is accepted).  This image identifies the non-grey/gray colors by name:

I created the above image using the following code, which is tweaked from that available at http://sape.inf.usi.ch/quick-reference/ggplot2/colour.  That website also includes nice graphics showing other aspects of color (hue, saturation, chroma, RGB combinations, etc.).

color.list <- data.frame(color = colors()) %>%
filter(! str_detect(color, "grey|gray"))

d <- data.frame(color = color.list,
y = seq(0, nrow(color.list)-1) %% 55,
x = seq(0, nrow(color.list)-1) %/% 55)

ggplot(data = d) +
scale_x_continuous(name="", breaks=NULL, expand=c(0, 0)) +
scale_y_continuous(name="", breaks=NULL, expand=c(0, 0)) +
scale_fill_identity() +
geom_rect(aes(xmin=x, xmax=x+1, ymin=y, ymax=y+1), fill="white") +
geom_rect(aes(xmin=x+0.05, xmax=x+0.95, ymin=y, ymax=y+1, fill=color)) +
geom_text(aes(x=x+0.5, y=y+0.5, label=color),
colour="black", hjust=0.5, vjust=0.5, size=3)

ggsave("graphics/colors.png", width = 6.5, height = 6.5,
units = "in", dpi = 600)

Color Palettes (ColorBrewer)

It is very common to need multiple colors to distinguish groups in a graphic.  The ColorBrewer website (http://colorbrewer2.org/) is a valuable source of information about this.

ColorBrewer provides color palettes for a large number of scenarios, including sequential or diverging or qualitative classes.  You simply specify how many classes you want to show.  In addition, you can filter palettes based on other criteria such as:

  • Colorblind safe
  • Print friendly – suitable for color printing
  • Photocopy safe – suitable for greyscale printing or photocopying

The colors within the selected palette are identified on the screen by their hexadecimal string.

If using ggplot2, palettes can be called by name using scale_color_brewer() and scale_fill_brewer().

An Example in Base R

We can use these arguments to build a custom figure.  We will do so in stages – first we’ll create an empty plot with no axes – this is important because doing so scales the axes according to the coordinates of the sites even though the units are not displayed on either axis.  Once we’ve done that, we’ll add a title and individual points using separate functions:

plot(Oak1.z,
display = "sites",
type = "n",

xlab = "",
xaxt = "n",
ylab = "",
yaxt = "n")

title(main = "Oak1 NMDS (no rare species)")

points(Oak1.z$points,
pch = 21,
cex = 3,
col = "red",
bg = "yellow",
lwd = 2)

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by yellow circles.

Textual Labels

When textual labels are added to an ordination, the graph can easily become difficult to read where points are in close proximity and labels therefore overlap.  Several vegan functions help with this problem:

  • ordilabel() – uses opaque labels; can choose a variable with which to prioritize the order in which labels are assigned.
  • ordipointlabel() – automatically optimizes the location of the labels to improve readability:
      ordipointlabel(Oak1.z, display = "sites")
  • orditorp() – adds text only where it does not cover already-present labels, and points otherwise.
  • orditkplot() – produces an ordination in which you can manually select and move labels to improve readability.  You can then save the output as an EPS file or dump it into R, where it is saved as an object that you can then plot.
      orditkplot(Oak1.z, display = "sites")
    (note that this function opens a new graphical window)

Combining Multiple Graphs in a Single Figure

In addition to customizing an individual graph, we can build single figures that contains multiple graphs (or other images).

The mfcol and mfrow arguments within par() are particularly important in this regard.  Both arguments require a vector in the form c(nr, nc) specifying the number of rows (nr) and columns (nc) to create; each element in this matrix of figures will be a separate image.  The arguments differ in terms of whether graphs are drawn working down the columns (mfcol) or across the rows (mfrow).

To use this capability, you execute par() first and specify either mfcol or mfrow.  You can then draw the figures themselves.

par(mfcol = c(1,2))

plot(Oak1.z)

plot(Oak1.z, display = "sites")

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by open symbols in both graphics. The red plus signs in the left graphic indicate the positions of species centroids within the ordination space. The scales vary slightly between the two graphics.

These changes to the graphical window remain in effect until the graphic window is closed or mfrow/mfcol is reset:

par(mfcol = c(1,1))

Graphing with ggplot2

The ggplot2 package is, in my opinion, one of the game-changers within R.  It has changed the way many people think about and create graphics.  Once you are familiar with its structure, it is much more powerful and versatile than the base R plotting capabilities.  I highly recommend it.

 

Chang (2013) focuses on this package, and the help files for this package are online (https://ggplot2.tidyverse.org/reference/) and include many visual examples.  Another source of information is chapter 1 from Wickham & Grolemund (2017).  Finally, the package’s cheatsheet (https://rstudio.github.io/cheatsheets/data-visualization.pdf) provides a compact visual summary of its capabilities.

As a measure of its broad appeal, dozens of other packages have been written that organize data for use in ggplot2 or that automate the production of particular types of graphics within the ggplot2 ‘universe’.  A few of these are highlighted in the chapter about visualizing and interpreting ordinations.

The Grammar of Graphics

In ggplot2, graphical objects have a particular ‘grammar’.  Each graphic can be described as a series of components.  Figures 1 and 2 below (from Wickham 2010) illustrate how a graphical object can be created by combining these elements.

 

The advantage of this grammar is that each element can be controlled separately.  For example, the geometric object can be changed from points to lines without having to change other elements of the graphic.  Similarly, a bar chart and a pie chart differ simply in their coordinate system.

 

The functions that provide other capabilities in ggplot2 are in several classes, including:

  • geoms – geometric objects, such as whether to plot the data as points, lines, bars, etc.
  • guides – axes and legends
  • scales – how to customize appearance of geoms
  • facets – multiple panels in same graph
  • themes – formatting of axes, panel background, etc.

 

For example, the ggplot2 cheatsheet provides this template:

ggplot(data = <Data>) +
<Geom_Function>(mapping = aes(<Mappings>),
stat = <Stat>,
position = <Position>) +
<Coordinate_Function> +
<Facet_Function> +
<Scale_Function> +
<Theme_Function>

The elements in bold font are required; all others are optional.

We will create a simple graphic and then use it as a template while covering other aspects of ggplot2 grammar.

ggplot2::ggplot()

The workhorse function is ggplot() – this is where the data are identified and the broad ‘structure’ of the graph (e.g., variables that will define the x and y axes) is established.  The elements are then modified or added in other functions.  Functions are linked together by adding a trailing ‘+’ after each function – this way, the whole set of functions is executed as a unit.  This approach also makes it easy to selectively turn elements off by commenting the relevant line out.

 

Here is an example, focusing just on the site scores.  For convenience, we begin by combining these scores with the Oak_explan dataframe:

Oak_explan <- merge(x = Oak_explan,
y = Oak1.z$points,
by = "row.names")

library(ggplot2)

p <- ggplot(data = Oak_explan,
aes(x = MDS1, y = MDS2)) +
geom_point()

p

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by black points.

Note that we saved the above graph to an object (p); this allows us to call p and tweak it without having to repeat those lines of code.

Aesthetics

Every element of a graphic can be customized.  Elements can be ‘hard-coded’ (applied consistently throughout) or can vary with the data.  This distinction can be confusing when learning ggplot2: elements that vary with the data must be referenced within the aes() (aesthetic) function within that element.

 

For example, we can redraw p with the same color scheme as in the base plotting above:

p +
geom_point(size = 3,
  shape = 21,
  color = "red",
  fill = "yellow")

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by yellow points.

In this case, I changed the symbology in the image printed on the screen but I did not change the object p.

 

However, if we wanted the symbol color to vary amongst the levels of a grouping variable, we would specify the color within the aes() function and assign it to that grouping variable:

p + geom_point(aes(color = GrazCurr))

Scales

The colors that are assigned to the levels can be specified by including a scale_color_*() function.  The * in this function name reflects the fact that there are many options, including:

  • scale_color_brewer() – use one of the pre-defined ColorBrewer color schemes.
  • scale_color_discrete() – assign a different color to each level, using a default color scheme.
  • scale_color_manual() – manually specify the colors to be used.  Assigned to the levels in the order that they are recognized in the data.

For example if we want to manually set the colors for the two levels of our current grazing status variable:

p + geom_point(aes(color = GrazCurr)) +
scale_color_manual(values = c("red", "blue"))

This color scheme is shown in the graphic that is saved to a file near the end of this chapter.

 

Similar options exist for other aesthetic parameters, including:

  • scale_fill_*()
  • scale_linetype_*()
  • scale_linewidth_*()
  • scale_shape_*()
  • scale_size_*()

See the help files for more information about these and other scale_*() functions.

Themes

Every element of a ggplot graphic can be controlled.  The visual appearance of the graphic is set through the theme, which controls its many elements:

Theme elements in ggplot2. Image from https://henrywang.nl/ggplot2-theme-elements-demonstration/.

 

Personally, I like graphs that have a white background and dark borders.  There is a pre-defined themes, theme_bw(), for this in ggplot2.  Pre-defined themes can be used directly or combined with additional customizations.  For example, we can omit the axis labels and numerals as is common when drawing a NMDS ordination.

 

Furthermore, typing out all aspects of the code every time is cumbersome and can easily lead to inconsistencies among graphics.  One way to create consistent graphics is to create an object that contains our desired settings for theme():

theme.custom <-  theme_bw() +
theme(axis.line = element_line(),
  axis.ticks = element_blank(),
  axis.text = element_blank(),
  axis.title = element_blank())

Note that theme.custom is an object, not a function, and therefore should not have parentheses after it.

 

Now, we can call this object whenever we want to apply these thematic settings to a graphic:

p +
theme.custom

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by black points.

 

Another feature of ggplot2 is that there are intelligent default values, and those defaults are hierarchical.  For example, we suppressed both axes in theme.custom above.  To suppress just the x axis, the arguments within theme() can be changed to ‘*.x’.  Try adjusting theme.custom by changing the axis.text argument to axis.text.x. and re-creating the above image – it will now show the vertical axis but not the horizontal axis.

Combining Multiple Graphs in a Single Figure

In ggplot2, multiple graphs are easily combined using faceting.  Faceting simply creates a separate graph for each level of the grouping variable.  For example, suppose we want to view the current grazing statuses separately:

ggplot(data = Oak_explan, aes(x = MDS1, y = MDS2)) +
geom_point(size = 3, shape = 2) +
facet_grid(cols = vars(GrazCurr)) +
theme.custom

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by open triangles.  Although all stands were part of a single ordination, they are graphed separately based on whether or not they experienced grazing when data were collected.

These graphics are based on the same data object and have the same scales for each axis.

 

We can easily create very different graphics just by choosing whether and how to facet.  Faceting can be done by columns (cols; as was done above), or by rows, or both.  A single variable with a large number of levels can be shown via the facet_wrap() function.

There are also arguments that allow scales to vary among facets.

 

The geofacet package allows you to facet in a manner that mimics the original geographic parameters.  For example, state-level data could be facetted with the facets approximating the actual arrangement of the states on the earth’s surface.

 

Other packages build on the ggplot2 platform to provide additional features for creating more complex sets of graphics.  Examples include the cowplot and ggarrange packages.

Saving Graphics

There are several ways to save graphics produced in R.

First, you can export images directly from the RStudio Plots window.  This opens a pop-up window in which you can choose the file type, image name, and image size (width and height, in pixels).  This can work well for a small number of graphs but is laborious if you are creating multiple graphs.

Second, you can zoom in on the graph from the RStudio Plots window, manually resize it if desired, and then right click and choose either ‘Save image as’ or ‘Copy Image’.  This is more convenient, but it is hard to ensure that multiple graphs are the same size.

Third, you can directly save the image to a file that you specify, while also hard-coding the dimensions that you want.  The file will be saved in the working directory as specified in your R project.  The code to do so differs between base R and ggplot.

Saving Graphics in Base R

Different functions exist to save the image in different graphical formats: bmp(), jpeg(), png(), and tiff().  The basic approach is to:

  • Create a ‘graphic device’ with the file name and formatting instructions
  • Plot the image
  • Close the graphic device via dev.off()

Since the image is plotted directly to the file and not displayed, you have to open and inspect it.  You may have to tweak settings and re-run your script to get the graphic to display as desired.

 

For example, let’s save an ordination in which the stands are color-coded by their current grazing status:

png("graphics/baseR.NMDS5.png",
width = 4,
height = 3,
units = "in",
pointsize = 10,
res = 800)

plot(Oak1.z,
display= "sites",
type = "n",
xaxt = "n",
yaxt = "n",
xlab = "",
ylab = "")

points(Oak1.z$points[Oak$GrazCurr == "No",],
pch = 19,
col = "red")

points(Oak1.z$points[Oak$GrazCurr == "Yes",],
pch = 15,
col = "blue")

legend(x = "topleft",
pch = c(19,15),
col = c("red", "blue"),
legend = c("No (ungrazed)", "Yes (grazed)"))

dev.off()

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by points, the color and shape of which indicate whether or not they experienced grazing when data were collected.

Saving Graphics in ggplot2 (ggplot2::ggsave())

The ggsave() function:

  • Saves the last graph that was displayed to the specified filename
  • Recognizes the desired file format by the suffix provided in filename; options include .eps, .ps, .tex, .pdf, .jpeg, .tiff, .png, and .bmp.
  • Allows you to control the:
    • Size of the image (width, height)
    • Units in which size is specified (units = c("in", "cm", "mm"))
    • Resolution of image in dots per inch (dpi).

 

For example, let’s save an ordination in which the stands are color-coded by their current grazing status:

ggplot(data = Oak_explan,
  aes(x = MDS1, y = MDS2)) +
geom_point(aes(shape = GrazCurr, color = GrazCurr),
size = 2)+

labs(shape = "Current Grazing",
color = "Current Grazing") +
scale_color_manual(values = c("red", "blue")) +
scale_shape_manual(values = c(19, 15)) +
theme.custom

ggsave(filename = "graphics/ggplot.NMDS6.png",
width = 4,
height = 3,
units = "in",
dpi = 800)

First two axes of a 3-dimensional NMDS ordination of the oak plant community dataset.  Stands are shown by points, the color and shape of which indicate whether or not they experienced grazing when data were collected.

Conclusions

R is an incredibly powerful and flexible software for creating graphics based on the same data objects that are analyzed in it.

The base R plotting capabilities are helpful to know, but even more impressive graphics can be created easily via ggplot2 and associated packages.

References

Dalgaard, P. 2008. Introductory statistics with R. Second edition (First edition, 2002). Springer, New York, NY.

Murrell, P. 2006. R graphics. Chapman & Hall/CRC, Boca Raton, LA.

Sarkar, D. 2008. Lattice: multivariate data visualization with R. Springer, New York, NY.

Venables, W.N., and B.D. Ripley. 2000. Modern applied statistics with S. Springer-Verlag, New York, NY.

Wang, H. 2020. ggplot2 Theme Elements Demonstration. https://henrywang.nl/ggplot2-theme-elements-demonstration/

Wickham, H. 2010. A layered grammar of graphics. Journal of Computational and Graphical Statistics 19(1):3-28. DOI:10.1198/jcgs.2009.07098

Media Attributions

  • baseR.NMDS1
  • baseR.NMDS2
  • pch.symbols
  • colors
  • baseR.NMDS3
  • baseR.NMDS4
  • ggplot2.logo
  • Wickham.2010_Figure1
  • Wickham.2010_Figure2
  • ggplot.NMDS1
  • ggplot.NMDS2
  • Wang_ggplot.themes
  • ggplot.NMDS3
  • ggplot.NMDS5
  • baseR.NMDS5
  • ggplot.NMDS6

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book