ANOVA / MANOVA

Jonathan D. Bakker

Group Comparisons

14 ANOVA / MANOVA

Learning Objectives

To review the structure of ANOVA and MANOVA.
To review the assumptions of ANOVA and MANOVA when testing for differences among a priori groups.

Introduction

Analysis of variance (ANOVA) is an extremely popular and versatile technique for univariate analyses. Multivariate analysis of variance (MANOVA) is less commonly used and more restrictive in its assumptions. However, the structure of these techniques is a helpful starting point for the techniques we’re focusing on.

Assumptions of ANOVA and MANOVA

ANOVA and MANOVA are parametric techniques, meaning that they make assumptions about the parameters (distributional form) of the population from which samples are taken. For our purposes here, the key assumptions of ANOVA are that:

Errors are normally distributed
Variances are equal among groups

With MANOVA, these assumptions are more restrictive:

Errors conform to multivariate normality (i.e., are normally distributed in all dimensions)
The entire variance-covariance matrix is homogeneous (i.e., variances of all variables and covariances between all pairs of variables are equal across all groups)

(See the discussion about exchangeable units for another assumption, independence, that is relevant for both parametric and permutation tests.)

ANOVA is optimal and preferred for data that meet these assumptions but can be far from optimal when data do not meet them. Of course, it is also limited to univariate responses.

MANOVA is designed for multivariate responses, but is even more sensitive than ANOVA to deviations from its assumptions. It therefore is often inappropriate for community-level ecological data.

Key Takeaways

ANOVA is a method of partitioning variation to different sources and expressing those patterns in a test statistic. Many other techniques have a similar structure.

Structure of ANOVA (and MANOVA)

Although ANOVA is not optimal for all types of data, its basic structure remains insightful in two ways:

As an example of how to partition variation
As an example of how a test statistic can represent that partitioning

Partitioning Variation

Recall that ANOVA focuses on the variation within a dataset. In a simple one-way ANOVA, three sources of variation are distinguished:

SST: total variation within data (i.e., total sum of squares), calculated as the sum of the squared deviations between every observation and the grand mean of the data
SSB: variation between groups
SSW: variation within groups

These sources of variability are related:
SST = SSB + SSW

Verbally, we have determined how much of the total variation (SST) is due to differences between the groups (SSB) and how much of it is due to variation within groups (SSW). The variation within the groups is the residual or unexplained variation.

MANOVA partitions the variation similarly to ANOVA, except that it accounts for both the variance within variables and the covariance between pairs of variables.

This idea of partitioning variability is central to several of the techniques we will be discussing.

Every Test Requires a Test Statistic

The ANOVA test statistic describes how the variation is partitioned. The F-statistic is calculated as the ratio of variation among groups to variation within groups, weighting each term by its degrees of freedom (df):
[latex]F = \frac {SSB / (t - 1)}{SSW / (N - t)}[/latex]
where t is the number of groups and N is the total sample size (note that this equation is for a balanced one-way ANOVA; the details would differ slightly for other designs).

Like ANOVA, every test requires a test statistic. Which test statistic to use is decided by those who create a test.

As a parametric test, ANOVA assesses the likelihood of this test statistic by comparing the value obtained from the actual data with the theoretical distribution of its possible values. Our tests are permutation-based and therefore will compare the test statistic with a set of possible values obtained by permuting the data.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.