Appendix 1: Order of Data Adjustments

McCune & Grace (2002) provided a very helpful summary of the order in which data adjustments should be made in Tables 9.3 (species data) and 9.4 (environmental data).  Those tables are summarized, in a slightly updated form, here.

Species Data

This assumes that the data are in a matrix with sample units as rows and species data as columns.

Action to be considered Criteria
1. Calculate descriptive statistics (beta diversity, average skewness of columns, CV of row totals, CV of column totals). Repeat after each step below. Always
2. Delete rare species (<5% of sample units) Unless contrary to study goals
3. Monotonic transformation. If applied to species, usually applied uniformly to all of them so that all are scaled the same. Consider:

·         Average skewness of columns

·         Data range: how many orders of magnitude?

·         Beta diversity

4. Row or column relativizations Appropriateness depends on the question being addressed.  Regardless of whether you relativize or not, you should briefly state and justify your decision.

Are units for all variables the same?

Is relativization built into distance measure or subsequent analyses?

CV of row totals

CV of column totals

5. Check for outliers based on average distance of each point from all other points.  Calculate standard deviation of these average distances.  Describe outliers and take steps to reduce influence, if necessary. SD            Degree of problem

<2               No problem

2-2.3          Weak outlier

2.3-3          Moderate outlier

>3               Strong outlier

 

Environmental Data

This assumes that the data are in a matrix with sample units as rows and quantitative environmental data as columns.

Action to be considered Criteria
1. Calculate descriptive statistics (skewness and range for each column). Repeat after each step below. Always
2. Monotonic transformation. Apply to individual variables (columns) depending on need. Consider log or square root transformation for variables with skewness > 1 or spanning several orders of magnitude.

Consider arcsine square root transformation for proportion data.

3. Column relativizations Consider column relativization (e.g., by norm or standard deviate) if environmental variables are to be used in a distance-based analysis that does not automatically relativize the variables.

Not necessary for analyses that use the variables one at a time or for analyses with built-in standardization (e.g., PCA of a correlation matrix).

4. Check for univariate outliers and take corrective steps if necessary. Examine scatterplots or frequency distributions, or relativize by standard deviate and check for high absolute values.

 

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book