Foundational Concepts

8 Matrix Algebra Basics

Learning Objectives

To describe the basics of matrix algebra.

To continue using R.

Resources

Gotelli & Ellison (2004, appendix)

Introduction

This chapter takes a detour to cover some key mathematical concepts that are foundational to everything that follows in this course.  Some of this will seem abstract initially, but it’s followed by a worked regression example and an introduction to eigenanalysis which forms the basis for principal component analysis (PCA)!

By convention, mathematicians use bold uppercase text (X) to indicate a matrix and bold lowercase text (x) to indicate a vector.  However, these conventions don’t apply in R: bold formatting isn’t available, and we can name a matrix object like we name any other object.

Vectors

Recall that a scalar is a single numerical value:
x <- 5

A vector is a series of n scalars.  Mathematically, a vector is assumed to be a column vector (i.e., n rows x 1 column), though see below for some nuance about this.

Vectors can be created in several ways.  One way is to combine a series of values using the c() function (note that this is a lower-case c):
v <- c(1,3,5,7,9); v

1 3 5 7 9

Another way is by specifying a sequence of values:
v <- seq(from = 1, to = 9, by = 2); v # Equivalent
v <- seq(1, 9, 2) #same, without argument names

These commands are extremely versatile.

 Vectors can be indexed using [] just like we do for a matrix.  However, a matrix is indexed with respect to [row, column], while a vector just has to be indexed with respect to the position of an element within the vector:
v[3]

 

Nuances About Vectors in R

R displays vectors as row vectors (i.e., 1 row x n columns) as this takes less screen space than a column vector (i.e., n rows x 1 column). See the output of v above for an example of this.  Mathematically, however, vectors are always assumed to be column vectors (n x 1).

R attempts to make all operations work: “if you use a vector in an operation that succeeds or fails depending on the vector’s orientation, R will assume that you want the operation to succeed and will proceed as if the vector has the necessary orientation” (Ellner & Guckenheimer 2011, p. 13).  This can cause problems because an operation can work even when you don’t want it to, or where it should not mathematically!

Matrices

A matrix is a two-dimensional object consisting of m vectors.  Since each vector is of length n, the size of the matrix is n x m.  Note that all vectors have to be the same length.

 

The matrix() function can be used to create a matrix from a sequence of numbers.  The nrow argument tells R how many rows to include.
A <- matrix(data = c(3,-1,0,4,5,2), nrow = 3); A

     [,1] [,2]
[1,]    3    4
[2,]   -1    5
[3,]    0    2

 

Alternately, you could use the ncol argument to specify how many columns to include.  You don’t need to specify both because R determines the other dimension based on the number of elements included in the data argument.  If you specify a number of rows (or columns) that is not an integer divisor of the total number of cells, R will display a warning message and recycle the numbers to complete the matrix:
A1 <- matrix(data = 1:9, nrow = 4); A1

 

The matrix() function includes other arguments besides nrow and ncol.  The arguments are described in the R help:
?matrix

Note the default values for each argument.  For example, the byrow argument defaults to FALSE, meaning that data are not added one row at a time but rather are added on a column-by-column basis.  If you set byrow = TRUE, data are added on a row-by-row basis:
B <- matrix(data = c(3,-1,0,4,5,2), nrow = 3, byrow = TRUE); B

Compare objects A and B to verify that the data have been added to the matrix in different orders.

 

Recall that we use square brackets to index elements in a matrix (or data frame), and we always discuss elements relative to [row, column].  This indexing enables you to call the elements that you’re interested in.  These elements can be viewed, assigned to another object name, or manipulated:
A[3 , 1]
C <- B[1:2 , ]; C
B[1 , ] <- B[1 , ] * 10; B  # What does this do?
B[B > 2] # And this?

 

The size of a matrix is important for operations such as matrix addition, subtraction, and multiplication.  Its size is expressed by its dimensions, which can be obtained using the dim() function, or by requesting the number of rows or columns specifically:
dim(A)
nrow(A)
ncol(A)

Although a vector can mathematically be considered a matrix of n x 1 dimensions, as noted above, these functions don’t work for vectors; use length() instead.  If you want to force a vector to be recognized as a matrix, you have to set it to that class.

 

Matrices can be combined with respect to rows or columns:
rbind(A,B) # stacked
cbind(A,B) # side-by-side

However, matrices can only be combined if the appropriate dimension is the same for each.  For example, two matrices that have the same number of rows (and any number of columns) can be combined with respect to columns (i.e., side-by-side):
cbind(A, B)
cbind(A, C) # Doesn’t work!

The number of columns in each matrix doesn’t affect cbind().  However, the opposite is true with rbind(): the matrices must have the same number of columns, and the number of rows doesn’t matter.

 

Keep in mind that there must be a logical reason to combine the matrices in a particular order.  If A and B are plot x species matrices, what are we assuming when we combine them using cbind()?  In addition, with rbind() note that the columns that are being stacked together must contain the same class of data.

Matrix Addition and Subtraction

Matrix addition and subtraction are performed element-by-element.  Therefore, the matrices to be added or subtracted must be of identical dimensions – both rows and columns.  The number of rows does not have to equal the number of columns, however.

To demonstrate this, we’ll begin by re-assigning A, B, and C to correspond to Equation A.1 from Gotelli & Ellison (2004):
A <- matrix(data = c(3,-1,0,4,5,2), nrow = 3)
B <- matrix(data = c(4,1,-2,8,-3,6), nrow = 3)
C <- A + B; C

Matrices are commutative with respect to addition – the order of the terms doesn’t matter:
A + B
B + A

Matrices are also associative with respect to addition – the order in which the operations are conducted, as determined by which terms are in parentheses, doesn’t matter:
(A + B) + C
A + (B + C)

However, matrices are not commutative or associative with respect to subtraction (Gotelli & Ellison are incorrect in the first edition of their book; I haven’t checked if this has been fixed in later editions):
A - B
B - A

(A - B) - C
A - (B - C)

Scalar and Matrix Multiplication

Two types of multiplication are possible with vectors and matrices: element-by-element and matrix.

Scalar Multiplication

Element-by-element multiplication is the standard type of multiplication (indicated by *) and is the default in R.  This is most easily seen by multiplying a matrix by a scalar (single value).  Each element in the matrix is simply multiplied by the scalar.  This process is commutative, associative, and distributive.  For example, here is Equation A.2 from Gotelli & Ellison (2004):
k <- 3
A <- matrix(data = c(1,2,3,0,2,0,-5,3), nrow = 2)
k * A
A * k

c <- 5
c * (k * A)
(c * k) * A

B <- matrix(data = c(1,2,3,0,2,0,-5,3), nrow = 2)
k * (A + B)
k * A + k * B

This type of multiplication can even be applied to matrices, if those matrices are of the same size.  The value in one position in the matrix is simply multiplied by the value in the same position in the other matrix.  For example:
A * A
View this file to verify that we have simply squared every value.

Matrix Multiplication

Matrix multiplication is a bit more complicated than scalar multiplication – in fact, it is not always possible.  The number of columns in the first matrix must equal the number of rows in the second matrix – if they are not equal, matrix multiplication cannot occur.

One way to figure out whether two matrices can be matrix multiplied is by writing down the dimensions of each matrix.  For example, here are the matrices from Equation A.3 of Gotelli & Ellison (2004):
A <- matrix(data = c(1,2,3,0,3,-2), nrow = 2) # A is a 2x3 matrix
B <- matrix(data = c(1,3,-1,-2,2,3,2,2,3), nrow = 3) # B is a 3x3 matrix

Matrix multiplying A and B involves combining a (2×3) matrix and a (3×3) matrix.

  • Since the inside numbers are the same, they can be matrix multiplied.
  • The outside dimensions indicate the dimensions of the resulting matrix (here, 2 rows x 3 columns).

Why?  Well, it relates to how the matrix multiplication is actually conducted.  Recall that we always refer to rows first, columns second.  In matrix multiplication, each element in a row of the first matrix is multiplied by the element in the corresponding column and position of the second matrix, and the products are summed together.  If the inner dimensions of the two matrices aren’t equal, there will be elements in one matrix that cannot be multiplied by something in the other matrix.

The calculation for the first row of A and first column of B in our example is:

Element locations:      ([1,1] * [1,1]) + ([1,2] * [2,1]) + ([1,3] * [3,1])

Data:                           (1 * 1) + (3 * 3) + (3 * -1) = 7

This process is illustrated graphically for two values below.

Example of how matrix multiplication is the sum of the products of the elements in a row of one matrix (A) and the elements in a column of another matrix (B). From Wikipedia.

 

The matrix multiplication symbol (%*%) tells R to repeat this calculation for each pair of elements in the matrices:
A %*% B

Assuming the matrices are of the proper dimensions to actually be multiplied, matrix multiplication is associative:
A %*% (B %*% B)
(A %*% B) %*% B

However, it is not commutative:
A %*% B
B %*% A # Doesn’t work

 

Matrix Multiplication

The order of the matrices matters during matrix multiplication.  As a result, the terminology needs to be precise.  For example, the equation AB can be equivalently described as:

  • The premultiplication of B by A
  • The postmultiplication of A by B

Matrix Transposition and Symmetry

In some cases, matrix multiplication is not possible in one configuration but would be possible if the rows and columns of one of the matrices were reversed.  This reversal is known as transposition, and is easily done in R:
A    # 2x3 matrix
t(A) # 3x2 matrix
The transpose of A is written as A’ or AT.

 

We saw above that BA did not work, but compare this with BAT:
B %*% t(A)     # Why does this work?
Note that the dimensionality of BAT, and the elements within this product, differ from those in AB.

 

Many matrices are square – the two dimensions are the same.  Square matrices can have many desirable properties.  In particular, square matrices can be symmetric, meaning that they are identical to their transpose.  This can be verified by visually comparing a matrix with its transpose or by subtracting one from the other (if symmetric, what should the result be?).
A <- matrix(data = c(1,2,2,2,4,3,2,3,-4), nrow = 3) # What are the dimensions of this matrix?
A
t(A)
A - t(A)

 

However, not all square matrices are symmetrical:
B <- matrix(data = c(1,3,-1,-2,2,3,2,2,3), nrow = 3)
B
t(B)
B - t(B)

 

We will see many symmetric square matrices (diagonal matrices, identity matrices, variance-covariance matrices, distance matrices, etc.) throughout this course.

 

The diagonal elements of a matrix are often important, and can be easily extracted:
diag(x = A)

 

This command can also be used to modify the values on the diagonal of a matrix:
A
diag(x = A) <- 10; A

 

Key Features of (Some) Matrices

Matrices with the following features are particularly useful:

  • Square
  • Symmetric (identical to their transpose)

All symmetric matrices are square, but not all square matrices are symmetric.

Matrix Inversion

An identity matrix is a symmetric square matrix with 1s on the diagonal and 0s elsewhere.  It can be created using the diag() function.  Since an identity matrix is symmetric, the number of rows and columns are the same and you only need to specify one dimension:
I <- diag(x = 4); I

 

Identity matrices are particularly important with respect to matrix inversion.  The inverse of matrix A is a matrix that, when matrix multiplied by A, yields an identity matrix of the same dimensions.  The inverse of A is written as A-1.  The solve() function can determine the inverse of a matrix:
A
solve(A)
A %*% solve(A) # Verify that the product is an identity matrix
(use the round() function if necessary to eliminate unnecessary decimal places)

 

Inversions only exist for square matrices, but not all square matrices have an inverse.

 

Sometimes, a matrix has to be inverted to yield something other than an identity matrix.  We can do this by adding a second argument to the solve() function.  Suppose we have the equation AA-1 = B and need to calculate A-1:
A <- matrix(data = c(1,2,2,2,4,3,2,3,-4), nrow = 3)
B <- matrix(data = c(1,2,3,2,3,1,3,2,1), nrow = 3)
A.inv <- solve(A, B)
A %*% A.inv # Verify the result! What should this be?

 

A matrix is orthogonal if its inverse and its transpose are identical.  An identity matrix is an example of an orthogonal matrix.

Concluding Thoughts

Matrix algebra is an incredibly powerful tool when dealing with multivariate data, and we will be using these concepts throughout the quarter.  For example, distance measures convert observations of m variables on n sample units (i.e., a n x m data matrix) into a n x n symmetric square distance matrix.

References

Ellner, S.P., and J. Guckenheimer. 2011. An introduction to R for dynamic models in biology. http://www.cam.cornell.edu/~dmb/DynamicModelsLabsInR.pdf

Gotelli, N.J., and A.M. Ellison. 2004. A primer of ecological statistics. Sinauer Associates, Sunderland, MA.

Media Attributions

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Applied Multivariate Statistics in R Copyright © 2024 by Jonathan D. Bakker is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book