Transformations

Jonathan D. Bakker

Foundational Concepts

6 Transformations

Learning Objectives

To consider how transformations relate to the research questions being addressed.

To illustrate how to transform data in R.

Monotonic Transformations

Monotonic transformations are applied identically to all data elements. This means that the action taken on an individual element is unchanged whether you consider it alone or as part of a set. For example, calculating the square root of a value is unaffected by whether that value is part of a set. This is contrasted with relativizations, where the result of the action depends on other elements in the set.

There are many potential transformations that can be applied to data; we will review the most common ones here. McCune & Grace (2002, p. 67) note that transformations can be conducted for statistical or ecological reasons. However, many of the techniques we will cover do not require normality and other assumptions of parametric techniques. Thus, we can focus our transformations on the ecological questions that we seek to answer.

If you apply a transformation to univariate data such as an explanatory variable, the data should generally be back-transformed to the original units for presentation – as is true for all types of analyses.

Roots (square root, cube root, etc.)

Root transformations can be applied to count data, which generally follow a Poisson distribution. Vegetation work rarely uses higher-order roots, but studies in other systems do. For example, marine benthic studies may include organisms from phyla that span several orders of magnitude in abundance – there might be one starfish but tens of thousands of smaller invertebrates. Fourth-root transformations are often applied to this type of data so that the numerically dominant smaller taxa do not overwhelm comparisons among sample units.

Logarithms

Biomass or ratio data are often log-transformed. This commonly involves base-10 or natural logarithms (make sure to note which you use!).

If your data include zeroes, you may need to add a small value to all data because you can’t calculate log(0). This can be done manually, or you can use an existing function such as log1p().

Arcsin-square root

The arcsin-square root transformation is used with proportional data such as percent cover. It doesn’t work for negative values or values > 1.

Some authors strongly discourage using this transformation for univariate analyses (Warton & Hui 2011), but I have not seen this recommendation carried over to multivariate contexts.

Binary

A binary transformation converts continuous data to 0 or 1 based on whether a criterion is met. This is often used to convert abundance data to presence/absence. Another example of this as a transformation would be to evaluate whether abundance data exceed a static value such as ‘5 individuals’ or ‘5% cover’.

Depending on the criterion, a binary adjustment can also be a type of relativization (see the ‘Relativizations‘ chapter).

Transformations

Transformations are applied identically to all elements within an object.

Applications in R

In R, transformations are easily performed by applying a function to a matrix; the function is automatically applied to every element in the matrix. The transformed data are generally assigned to a new object so that the original data remain intact.

Here are the above transformations:

R Function	Note
`sqrt(x)` or `x^(1/2)`	Square root of `x`
`x^(1/4)`	Fourth root of `x`
`log10(x)`	Logarithm (base 10) of `x`
`log(x)`	Natural logarithm of `x`
`log1p(x)`	Natural logarithm of `x` + 1
`asin(sqrt(x))`	Arcsin square root of `x`
`ifelse(x > 0, 1, 0)`	Convert `x` to presence/absence

Oak Plant Communities Example

Let’s illustrate these transformations using our oak plant communities dataset. Begin by opening the R project and the loading the data:

Oak <- read.csv("data/Oak_data_47x216.csv", header = TRUE, row.names = 1)Oak_species <- read.csv("data/Oak_species_189x5.csv", header = TRUE)

Create separate objects for the response and explanatory data:
Oak_abund <- Oak[ , colnames(Oak) %in% Oak_species$SpeciesCode]Oak_explan <- Oak[ , ! colnames(Oak) %in% Oak_species$SpeciesCode]

See the ‘Loading Data‘ chapter if you do not understand what these actions accomplished.

Transforming the Response Variables

Applying a transformation to an object automatically applies it to each element within the object. Let’s apply a square root transformation to our response variables:

Sqrt_Oak_abund <- sqrt(Oak_abund)

Compare the two objects to verify that the data changed as intended.

The corresponding value in Sqrt_Oak_abund of any value in Oak_abund can be calculated using the function that we applied to the object – this is one way to see that this was a transformation rather than a relativization.

Transforming Explanatory Variables

There are many types of explanatory variables – continuously distributed predictors, experimental factors, etc. If therefore would generally not make sense to apply the same transformation to a matrix of explanatory variables.

It does make sense to transform individual variables. Each explanatory variable can be evaluated separately to determine which type of transformation, if any, is appropriate.

Let’s transform the number of large oak trees. This is a count, so we’ll use a log transformation:

Oak_explan$log_Quga <- log10(Oak_explan$Quga.gt60cm + 1)

I added one to all values to account for the possibility that a stand may not have had any large oak trees.

In fact, this variable is already present in the data frame as the variable ‘LogQuga.gt60cm’. Compare these two variables to verify that our calculation was done correctly. The existing variable is reported to two decimal places so we’ll round our variable the same:

Oak_explan$log_Quga <- round(Oak_explan$log_Quga, 2)rownames(Oak_explan[which(Oak_explan$log_Quga != Oak_explan$LogQuga.gt60cm), ])

Verify the outcome of the above comparison by changing from a test for inequality (!=) to a test for equality (==).

Concluding Thoughts

Decisions about whether and how to transform the data can strongly affect the conclusions of subsequent analyses. Most of the techniques that we are using in this course make minimal statistical assumptions, which means that adjustments do not have to be made for statistical reasons but rather can focus on the ecological questions of interest.

Transformations can be applied to both response variables and explanatory variables. Response variables are often transformed en masse, while explanatory variables are transformed individually. Each explanatory variable can be evaluated separately to determine which type of transformation, if any, is appropriate.

Transformations should be scripted rather than permanently changing the raw data file. Scripting ensures flexibility to try other adjustments, skip them entirely, etc.

The transformations that have been discussed here are for continuously distributed variables. For categorical explanatory variables, other actions may be required such as combining similar categories together or restricting analyses to focus on a subset of the categories. These decisions should be based on the objectives of the analysis and the ecological questions that you seek to answer.

References

McCune, B., and J.B. Grace. 2002. Analysis of ecological communities. MjM Software Design, Gleneden Beach, OR.

Warton, D.I., and F.K. Hui. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92:3-10.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.