Classification

Introduction

Thus far, we’ve covered some essential background material (data adjustments, matrix algebra, distance measures) and how to compare observations from a priori groups (PERMANOVA and other techniques).  The next important topic that we will cover is how to identify groups a posteriori – in other words, how to classify observations into groups.

Again, we will focus on techniques that are distance-based.

We’ll focus on three broad categories of techniques:

  • Cluster analysis – to identify groups (clusters) of observations with similar sets of responses.  For more detailed information, see Legendre & Legendre (2012, ch. 8) and Borcard et al. (2018, ch. 4).  Cluster analyses can be done hierarchically or for a pre-specified number of groups (k-means).  The groups that are produced can be used in a variety of ways.
  • Discriminant analysis – to assess how well observations were classified into groups, and how many were misclassified.  This is the focus of chapter 8 of Manly & Navarro Alberto (2017).  DA can be conducted using a linear model or based on distances.
  • Classification and regression trees – to classify observations into groups based on a response (or matrix of responses) together with one or more explanatory variables.  We will introduce this concept using a univariate response (De’ath & Fabricius 2000) and then extend it to a multivariate response (De’ath 2002).  Random forests are an extension of this technique that are often used for univariate responses with large sample sizes.

References

Borcard, D., F. Gillet, and P. Legendre. 2018. Numerical ecology with R. 2nd edition. Springer, New York, NY.

De’ath, G., and K.E. Fabricius. 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178-3192.

De’ath, G. 2002. Multivariate regression trees: a new technique for modeling species-environment relationships. Ecology 83:1105-1117.

Legendre, P., and L. Legendre. 2012. Numerical ecology. Third English edition. Elsevier, Amsterdam, The Netherlands.

Manly, B.F.J., and J.A. Navarro Alberto. 2017. Multivariate statistical methods: a primer. Fourth edition. CRC Press, Boca Raton, FL.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book