Linear Algebra Review and Reference: Basic Concepts, Operations, and Matrix Calculus, Exams of Algebra

Linear Algebra Review and Reference. Zico Kolter (updated by Chuong Do). October 7, 2008. Contents. 1 Basic Concepts and Notation. 2. 1.1 Basic Notation .

Typology: Exams

2022/2023

Uploaded on 05/11/2023

dyanabel
dyanabel 🇺🇸

4.7

(20)

287 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear Algebra Review and Reference
Zico Kolter (updated by Chuong Do)
October 7, 2008
Contents
1 Basic Concepts and Notation 2
1.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Matrix Multiplication 3
2.1 Vector-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Matrix-Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Operations and Properties 7
3.1 The Identity Matrix and Diagonal Matrices . . . . . . . . . . . . . . . . . . 8
3.2 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Linear Independence and Rank . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.7 The Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.8 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.9 Range and Nullspace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 12
3.10 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.11 Quadratic Forms and Positive Semidefinite Matrices . . . . . . . . . . . . . . 17
3.12 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.13 Eigenvalues and Eigenvectors of Symmetric Matrices . . . . . . . . . . . . . 19
4 Matrix Calculus 20
4.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 Gradients and Hessians of Quadratic and Linear Functions . . . . . . . . . . 23
4.4 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 Gradients of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Eigenvalues as Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Linear Algebra Review and Reference: Basic Concepts, Operations, and Matrix Calculus and more Exams Algebra in PDF only on Docsity!

Linear Algebra Review and Reference

 - October 7, Zico Kolter (updated by Chuong Do) 
  • 1 Basic Concepts and Notation Contents
    • 1.1 Basic Notation
  • 2 Matrix Multiplication
    • 2.1 Vector-Vector Products
    • 2.2 Matrix-Vector Products
    • 2.3 Matrix-Matrix Products
  • 3 Operations and Properties
    • 3.1 The Identity Matrix and Diagonal Matrices
    • 3.2 The Transpose
    • 3.3 Symmetric Matrices
    • 3.4 The Trace
    • 3.5 Norms
    • 3.6 Linear Independence and Rank
    • 3.7 The Inverse
    • 3.8 Orthogonal Matrices
    • 3.9 Range and Nullspace of a Matrix
    • 3.10 The Determinant
    • 3.11 Quadratic Forms and Positive Semidefinite Matrices
    • 3.12 Eigenvalues and Eigenvectors
    • 3.13 Eigenvalues and Eigenvectors of Symmetric Matrices
  • 4 Matrix Calculus
    • 4.1 The Gradient
    • 4.2 The Hessian
    • 4.3 Gradients and Hessians of Quadratic and Linear Functions
    • 4.4 Least Squares
    • 4.5 Gradients of the Determinant
    • 4.6 Eigenvalues as Optimization

1 Basic Concepts and Notation

Linear algebra provides a way of compactly representing and operating on sets of linear equations. For example, consider the following system of equations:

4 x 1 − 5 x 2 = − 13 − 2 x 1 + 3 x 2 = 9.

This is two equations and two variables, so as you know from high school algebra, you can find a unique solution for x 1 and x 2 (unless the equations are somehow degenerate, for example if the second equation is simply a multiple of the first, but in the case above there is in fact a unique solution). In matrix notation, we can write the system more compactly as Ax = b

with

A =

[

]

, b =

[

]

As we will see shortly, there are many advantages (including the obvious space savings) to analyzing linear equations in this form.

1.1 Basic Notation

We use the following notation:

  • By A ∈ Rm×n^ we denote a matrix with m rows and n columns, where the entries of A are real numbers.
  • By x ∈ Rn, we denote a vector with n entries. By convention, an n-dimensional vector is often thought of as a matrix with n rows and 1 column, known as a column vector. If we want to explicitly represent a row vector — a matrix with 1 row and n columns — we typically write xT^ (here xT^ denotes the transpose of x, which we will define shortly).
  • The ith element of a vector x is denoted xi:

x =

x 1 x 2 .. . xn

2.1 Vector-Vector Products

Given two vectors x, y ∈ Rn, the quantity xT^ y, sometimes called the inner product or dot product of the vectors, is a real number given by

xT^ y ∈ R =

[

x 1 x 2 · · · xn

]

y 1 x 2 .. . yn

∑^ n

i=

xiyi.

Observe that inner products are really just special case of matrix multiplication. Note that it is always the case that xT^ y = yT^ x. Given vectors x ∈ Rm, y ∈ Rn^ (not necessarily of the same size), xyT^ ∈ Rm×n^ is called the outer product of the vectors. It is a matrix whose entries are given by (xyT^ )ij = xiyj , i.e.,

xyT^ ∈ Rm×n^ =

x 1 x 2 .. . xm

[

y 1 y 2 · · · yn

]

x 1 y 1 x 1 y 2 · · · x 1 yn x 2 y 1 x 2 y 2 · · · x 2 yn .. .

xmy 1 xmy 2 · · · xmyn

As an example of how the outer product can be useful, let 1 ∈ Rn^ denote an n-dimensional vector whose entries are all equal to 1. Furthermore, consider the matrix A ∈ Rm×n^ whose columns are all equal to some vector x ∈ Rm. Using outer products, we can represent A compactly as,

A =

x x · · · x | | |

x 1 x 1 · · · x 1 x 2 x 2 · · · x 2 .. .

xm xm · · · xm

x 1 x 2 .. . xm

[

]

= x 1 T^.

2.2 Matrix-Vector Products

Given a matrix A ∈ Rm×n^ and a vector x ∈ Rn, their product is a vector y = Ax ∈ Rm. There are a couple ways of looking at matrix-vector multiplication, and we will look at each of them in turn. If we write A by rows, then we can express Ax as,

y = Ax =

— aT 1 — — aT 2 — .. . — aTm —

x =

aT 1 x aT 2 x .. . aTmx

In other words, the ith entry of y is equal to the inner product of the ith row of A and x, yi = aTi x. Alternatively, let’s write A in column form. In this case we see that,

y = Ax =

a 1 a 2 · · · an | | |

x 1 x 2 .. . xn

 (^) a 1

 (^) x 1 +

 (^) a 2

 (^) x 2 +... +

 (^) an

 (^) xn.

In other words, y is a linear combination of the columns of A, where the coefficients of the linear combination are given by the entries of x. So far we have been multiplying on the right by a column vector, but it is also possible to multiply on the left by a row vector. This is written, yT^ = xT^ A for A ∈ Rm×n, x ∈ Rm, and y ∈ Rn. As before, we can express yT^ in two obvious ways, depending on whether we express A in terms on its rows or columns. In the first case we express A in terms of its columns, which gives

yT^ = xT^ A = xT

a 1 a 2 · · · an | | |

[

xT^ a 1 xT^ a 2 · · · xT^ an

]

which demonstrates that the ith entry of yT^ is equal to the inner product of x and the ith column of A. Finally, expressing A in terms of rows we get the final representation of the vector-matrix product,

yT^ = xT^ A

[

x 1 x 2 · · · xn

]

— aT 1 — — aT 2 — .. . — aTm —

= x 1

[

— aT 1 —

]

  • x 2

[

— aT 2 —

]

  • ... + xn

[

— aTn —

]

so we see that yT^ is a linear combination of the rows of A, where the coefficients for the linear combination are given by the entries of x.

2.3 Matrix-Matrix Products

Armed with this knowledge, we can now look at four different (but, of course, equivalent) ways of viewing the matrix-matrix multiplication C = AB as defined at the beginning of this section. First, we can view matrix-matrix multiplication as a set of vector-vector products. The most obvious viewpoint, which follows immediately from the definition, is that the (i, j)th

It may seem like overkill to dissect matrix multiplication to such a large degree, especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math) at the beginning of this section. However, virtually all of linear algebra deals with matrix multiplications of some kind, and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this, it is useful to know a few basic properties of matrix multiplication at a higher level:

  • Matrix multiplication is associative: (AB)C = A(BC).
  • Matrix multiplication is distributive: A(B + C) = AB + AC.
  • Matrix multiplication is, in general, not commutative; that is, it can be the case that AB 6 = BA. (For example, if A ∈ Rm×n^ and B ∈ Rn×q, the matrix product BA does not even exist if m and q are not equal!)

If you are not familiar with these properties, take the time to verify them for yourself. For example, to check the associativity of matrix multiplication, suppose that A ∈ Rm×n, B ∈ Rn×p, and C ∈ Rp×q. Note that AB ∈ Rm×p, so (AB)C ∈ Rm×q. Similarly, BC ∈ Rn×q, so A(BC) ∈ Rm×q. Thus, the dimensions of the resulting matrices agree. To show that matrix multiplication is associative, it suffices to check that the (i, j)th entry of (AB)C is equal to the (i, j)th entry of A(BC). We can verify this directly using the definition of matrix multiplication:

((AB)C)ij =

∑^ p

k=

(AB)ikCkj =

∑^ p

k=

( (^) n ∑

l=

AilBlk

Ckj

∑^ p

k=

( (^) n ∑

l=

AilBlkCkj

∑^ n

l=

( (^) p ∑

k=

AilBlkCkj

∑^ n

l=

Ail

( (^) n ∑

k=p

BlkCkj

∑^ n

l=

Ail(BC)lj = (A(BC))ij.

Here, the first and last two equalities simply use the definition of matrix multiplication, the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition. This technique for proving matrix properties by reduction to simple scalar properties will come up often, so make sure you’re familiar with it.

3 Operations and Properties

In this section we present several operations and properties of matrices and vectors. Hope- fully a great deal of this will be review for you, so the notes can just serve as a reference for these topics.

3.1 The Identity Matrix and Diagonal Matrices

The identity matrix , denoted I ∈ Rn×n, is a square matrix with ones on the diagonal and zeros everywhere else. That is,

Iij =

1 i = j 0 i 6 = j

It has the property that for all A ∈ Rm×n,

AI = A = IA.

Note that in some sense, the notation for the identity matrix is ambiguous, since it does not specify the dimension of I. Generally, the dimensions of I are inferred from context so as to make matrix multiplication possible. For example, in the equation above, the I in AI = A is an n × n matrix, whereas the I in A = IA is an m × m matrix. A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted D = diag(d 1 , d 2 ,... , dn), with

Dij =

di i = j 0 i 6 = j

Clearly, I = diag(1, 1 ,... , 1).

3.2 The Transpose

The transpose of a matrix results from “flipping” the rows and columns. Given a matrix A ∈ Rm×n, its transpose, written AT^ ∈ Rn×m, is the n × m matrix whose entries are given by (AT^ )ij = Aji.

We have in fact already been using the transpose when describing row vectors, since the transpose of a column vector is naturally a row vector. The following properties of transposes are easily verified:

  • (AT^ )T^ = A
  • (AB)T^ = BT^ AT
  • (A + B)T^ = AT^ + BT

3.3 Symmetric Matrices

A square matrix A ∈ Rn×n^ is symmetric if A = AT^. It is anti-symmetric if A = −AT^. It is easy to show that for any matrix A ∈ Rn×n, the matrix A + AT^ is symmetric and the

Here, the first and last two equalities use the definition of the trace operator and matrix multiplication. The fourth equality, where the main work occurs, uses the commutativity of scalar multiplication in order to reverse the order of the terms in each product, and the commutativity and associativity of scalar addition in order to rearrange the order of the summation.

3.5 Norms

A norm of a vector ‖x‖ is informally a measure of the “length” of the vector. For example, we have the commonly-used Euclidean or ℓ 2 norm,

‖x‖ 2 =

∑n

i=

x^2 i.

Note that ‖x‖^22 = xT^ x. More formally, a norm is any function f : Rn^ → R that satisfies 4 properties:

  1. For all x ∈ Rn, f (x) ≥ 0 (non-negativity).
  2. f (x) = 0 if and only if x = 0 (definiteness).
  3. For all x ∈ Rn, t ∈ R, f (tx) = |t|f (x) (homogeneity).
  4. For all x, y ∈ Rn, f (x + y) ≤ f (x) + f (y) (triangle inequality).

Other examples of norms are the ℓ 1 norm,

‖x‖ 1 =

∑^ n

i=

|xi|

and the ℓ∞ norm, ‖x‖∞ = maxi |xi|.

In fact, all three norms presented so far are examples of the family of ℓp norms, which are parameterized by a real number p ≥ 1, and defined as

‖x‖p =

( (^) n ∑

i=

|xi|p

) 1 /p .

Norms can also be defined for matrices, such as the Frobenius norm,

‖A‖F =

∑^ m

i=

∑^ n

j=

A^2 ij =

tr(AT^ A).

Many other norms exist, but they are beyond the scope of this review.

3.6 Linear Independence and Rank

A set of vectors {x 1 , x 2 ,... xn} ⊂ Rm^ is said to be (linearly) independent if no vector can be represented as a linear combination of the remaining vectors. Conversely, if one vector belonging to the set can be represented as a linear combination of the remaining vectors, then the vectors are said to be (linearly) dependent. That is, if

xn =

∑^ n−^1

i=

αixi

for some scalar values α 1 ,... , αn− 1 ∈ R, then we say that the vectors x 1 ,... , xn are linearly dependent; otherwise, the vectors are linearly independent. For example, the vectors

x 1 =

 (^) x 2 =

 (^) x 3 =

are linearly dependent because x 3 = − 2 x 1 + x 2. The column rank of a matrix A ∈ Rm×n^ is the size of the largest subset of columns of A that constitute a linearly independent set. With some abuse of terminology, this is often referred to simply as the number of linearly independent columns of A. In the same way, the row rank is the largest number of rows of A that constitute a linearly independent set. For any matrix A ∈ Rm×n, it turns out that the column rank of A is equal to the row rank of A (though we will not prove this), and so both quantities are referred to collectively as the rank of A, denoted as rank(A). The following are some basic properties of the rank:

  • For A ∈ Rm×n, rank(A) ≤ min(m, n). If rank(A) = min(m, n), then A is said to be full rank.
  • For A ∈ Rm×n, rank(A) = rank(AT^ ).
  • For A ∈ Rm×n, B ∈ Rn×p, rank(AB) ≤ min(rank(A), rank(B)).
  • For A, B ∈ Rm×n, rank(A + B) ≤ rank(A) + rank(B).

3.7 The Inverse

The inverse of a square matrix A ∈ Rn×n^ is denoted A−^1 , and is the unique matrix such that A−^1 A = I = AA−^1.

Note that not all matrices have inverses. Non-square matrices, for example, do not have inverses by definition. However, for some square matrices A, it may still be the case that

It can be shown that if {x 1 ,... , xn} is a set of n linearly independent vectors, where each xi ∈ Rn, then span({x 1 ,... xn}) = Rn. In other words, any vector v ∈ Rn^ can be written as a linear combination of x 1 through xn. The projection of a vector y ∈ Rm^ onto the span of {x 1 ,... , xn} (here we assume xi ∈ Rm) is the vector v ∈ span({x 1 ,... xn}), such that v is as close as possible to y, as measured by the Euclidean norm ‖v − y‖ 2. We denote the projection as Proj(y; {x 1 ,... , xn}) and can define it formally as,

Proj(y; {x 1 ,... xn}) = argminv∈span({x 1 ,...,xn})‖y − v‖ 2.

The range (sometimes also called the columnspace) of a matrix A ∈ Rm×n, denoted R(A), is the the span of the columns of A. In other words,

R(A) = {v ∈ Rm^ : v = Ax, x ∈ Rn}.

Making a few technical assumptions (namely that A is full rank and that n < m), the projection of a vector y ∈ Rm^ onto the range of A is given by,

Proj(y; A) = argminv∈R(A)‖v − y‖ 2 = A(AT^ A)−^1 AT^ y.

This last equation should look extremely familiar, since it is almost the same formula we derived in class (and which we will soon derive again) for the least squares estimation of parameters. Looking at the definition for the projection, it should not be too hard to convince yourself that this is in fact the same objective that we minimized in our least squares problem (except for a squaring of the norm, which doesn’t affect the optimal point) and so these problems are naturally very connected. When A contains only a single column, a ∈ Rm, this gives the special case for a projection of a vector on to a line:

Proj(y; a) =

aaT aT^ a

y.

The nullspace of a matrix A ∈ Rm×n, denoted N (A) is the set of all vectors that equal 0 when multiplied by A, i.e.,

N (A) = {x ∈ Rn^ : Ax = 0}.

Note that vectors in R(A) are of size m, while vectors in the N (A) are of size n, so vectors in R(AT^ ) and N (A) are both in Rn. In fact, we can say much more. It turns out that

{ w : w = u + v, u ∈ R(AT^ ), v ∈ N (A)

= Rn^ and R(AT^ ) ∩ N (A) = ∅.

In other words, R(AT^ ) and N (A) are disjoint subsets that together span the entire space of Rn. Sets of this type are called orthogonal complements, and we denote this R(AT^ ) = N (A)⊥.

3.10 The Determinant

The determinant of a square matrix A ∈ Rn×n, is a function det : Rn×n^ → R, and is denoted |A| or det A (like the trace operator, we usually omit parentheses). Algebraically, one could write down an explicit formula for the determinant of A, but this unfortunately gives little intuition about its meaning. Instead, we’ll start out by providing a geometric interpretation of the determinant and then visit some of its specific algebraic properties afterwards. Given a matrix (^) 

   

— aT 1 — — aT 2 — .. . — aTn —

consider the set of points S ⊂ Rn^ formed by taking all possible linear combinations of the row vectors a 1 ,... , an ∈ Rn^ of A, where the coefficients of the linear combination are all between 0 and 1; that is, the set S is the restriction of span({a 1 ,... , an}) to only those linear combinations whose coefficients α 1 ,... , αn satisfy 0 ≤ αi ≤ 1, i = 1,... , n. Formally,

S = {v ∈ Rn^ : v =

∑^ n

i=

αiai where 0 ≤ αi ≤ 1 , i = 1,... , n}.

The absolute value of the determinant of A, it turns out, is a measure of the “volume” of the set S.^2 For example, consider the 2 × 2 matrix,

A =

[

]

Here, the rows of the matrix are

a 1 =

[

]

a 2 =

[

]

The set S corresponding to these rows is shown in Figure 1. For two-dimensional matrices, S generally has the shape of a parallelogram. In our example, the value of the determinant is |A| = −7 (as can be computed using the formulas shown later in this section), so the area of the parallelogram is 7. (Verify this for yourself!) In three dimensions, the set S corresponds to an object known as a parallelepiped (a three- dimensional box with skewed sides, such that every face has the shape of a parallelogram). The absolute value of the determinant of the 3 × 3 matrix whose rows define S give the three-dimensional volume of the parallelepiped. In even higher dimensions, the set S is an object known as an n-dimensional parallelotope.

(^2) Admittedly, we have not actually defined what we mean by “volume” here, but hopefully the intuition

should be clear enough. When n = 2, our notion of “volume” corresponds to the area of S in the Cartesian plane. When n = 3, “volume” corresponds with our usual notion of volume for a three-dimensional object.

  • For A ∈ Rn×n, |A| = |AT^ |.
  • For A, B ∈ Rn×n, |AB| = |A||B|.
  • For A ∈ Rn×n, |A| = 0 if and only if A is singular (i.e., non-invertible). (If A is singular then it does not have full rank, and hence its columns are linearly dependent. In this case, the set S corresponds to a “flat sheet” within the n-dimensional space and hence has zero volume.)
  • For A ∈ Rn×n^ and A non-singular, |A−^1 | = 1/|A|.

Before giving the general definition for the determinant, we define, for A ∈ Rn×n, A\i,\j ∈ R(n−1)×(n−1)^ to be the matrix that results from deleting the ith row and jth column from A. The general (recursive) formula for the determinant is

|A| =

∑^ n

i=

(−1)i+j^ aij |A\i,\j | (for any j ∈ 1 ,... , n)

∑^ n

j=

(−1)i+j^ aij |A\i,\j | (for any i ∈ 1 ,... , n)

with the initial case that |A| = a 11 for A ∈ R^1 ×^1. If we were to expand this formula completely for A ∈ Rn×n, there would be a total of n! (n factorial) different terms. For this reason, we hardly ever explicitly write the complete equation of the determinant for matrices bigger than 3 × 3. However, the equations for determinants of matrices up to size 3 × 3 are fairly common, and it is good to know them:

|[a 11 ]| = a 11 ∣ ∣ ∣ ∣

[

a 11 a 12 a 21 a 22

]∣

∣ =^ a^11 a^22 −^ a^12 a^21 ∣ ∣ ∣ ∣ ∣ ∣ 

a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33

a 11 a 22 a 33 + a 12 a 23 a 31 + a 13 a 21 a 32 −a 11 a 23 a 32 − a 12 a 21 a 33 − a 13 a 22 a 31

The classical adjoint (often just called the adjoint) of a matrix A ∈ Rn×n, is denoted adj(A), and defined as

adj(A) ∈ Rn×n, (adj(A))ij = (−1)i+j^ |A\j,\i|

(note the switch in the indices A\j,\i). It can be shown that for any nonsingular A ∈ Rn×n,

A−^1 =

|A|

adj(A).

While this is a nice “explicit” formula for the inverse of matrix, we should note that, numer- ically, there are in fact much more efficient ways of computing the inverse.

3.11 Quadratic Forms and Positive Semidefinite Matrices

Given a square matrix A ∈ Rn×n^ and a vector x ∈ Rn, the scalar value xT^ Ax is called a quadratic form. Written explicitly, we see that

xT^ Ax =

∑^ n

i=

xi(Ax)i =

∑^ n

i=

xi

( (^) n ∑

j=

Aij xj

∑^ n

i=

∑^ n

j=

Aij xixj.

Note that,

xT^ Ax = (xT^ Ax)T^ = xT^ AT^ x = xT

A +

AT

x,

where the first equality follows from the fact that the transpose of a scalar is equal to itself, and the second equality follows from the fact that we are averaging two quantities which are themselves equal. From this, we can conclude that only the symmetric part of A contributes to the quadratic form. For this reason, we often implicitly assume that the matrices appearing in a quadratic form are symmetric. We give the following definitions:

  • A symmetric matrix A ∈ Sn^ is positive definite (PD) if for all non-zero vectors x ∈ Rn, xT^ Ax > 0. This is usually denoted A ≻ 0 (or just A > 0), and often times the set of all positive definite matrices is denoted Sn ++.
  • A symmetric matrix A ∈ Sn^ is positive semidefinite (PSD) if for all vectors xT^ Ax ≥
    1. This is written A  0 (or just A ≥ 0), and the set of all positive semidefinite matrices is often denoted Sn +.
  • Likewise, a symmetric matrix A ∈ Sn^ is negative definite (ND), denoted A ≺ 0 (or just A < 0) if for all non-zero x ∈ Rn, xT^ Ax < 0.
  • Similarly, a symmetric matrix A ∈ Sn^ is negative semidefinite (NSD), denoted A  0 (or just A ≤ 0) if for all x ∈ Rn, xT^ Ax ≤ 0.
  • Finally, a symmetric matrix A ∈ Sn^ is indefinite, if it is neither positive semidefinite nor negative semidefinite — i.e., if there exists x 1 , x 2 ∈ Rn^ such that xT 1 Ax 1 > 0 and xT 2 Ax 2 < 0.

It should be obvious that if A is positive definite, then −A is negative definite and vice versa. Likewise, if A is positive semidefinite then −A is negative semidefinite and vice versa. If A is indefinite, then so is −A. One important property of positive definite and negative definite matrices is that they are always full rank, and hence, invertible. To see why this is the case, suppose that some matrix A ∈ Rn×n^ is not full rank. Then, suppose that the jth column of A is expressible as a linear combination of other n − 1 columns:

aj =

i 6 =j

xiai,

in practice to numerically compute the eigenvalues and eigenvectors (remember that the complete expansion of the determinant has n! terms); it is rather a mathematical argument. The following are properties of eigenvalues and eigenvectors (in all cases assume A ∈ Rn×n has eigenvalues λi,... , λn and associated eigenvectors x 1 ,... xn):

  • The trace of a A is equal to the sum of its eigenvalues,

trA =

∑^ n

i=

λi.

  • The determinant of A is equal to the product of its eigenvalues,

|A| =

∏^ n

i=

λi.

  • The rank of A is equal to the number of non-zero eigenvalues of A.
  • If A is non-singular then 1/λi is an eigenvalue of A−^1 with associated eigenvector xi, i.e., A−^1 xi = (1/λi)xi. (To prove this, take the eigenvector equation, Axi = λixi and left-multiply each side by A−^1 .)
  • The eigenvalues of a diagonal matrix D = diag(d 1 ,... dn) are just the diagonal entries d 1 ,... dn.

We can write all the eigenvector equations simultaneously as

AX = XΛ

where the columns of X ∈ Rn×n^ are the eigenvectors of A and Λ is a diagonal matrix whose entries are the eigenvalues of A, i.e.,

X ∈ Rn×n^ =

x 1 x 2 · · · xn | | |

 (^) , Λ = diag(λ 1 ,... , λn).

If the eigenvectors of A are linearly independent, then the matrix X will be invertible, so A = XΛX−^1. A matrix that can be written in this form is called diagonalizable.

3.13 Eigenvalues and Eigenvectors of Symmetric Matrices

Two remarkable properties come about when we look at the eigenvalues and eigenvectors of a symmetric matrix A ∈ Sn. First, it can be shown that all the eigenvalues of A are real. Secondly, the eigenvectors of A are orthonormal, i.e., the matrix X defined above is an orthogonal matrix (for this reason, we denote the matrix of eigenvectors as U in this case).

We can therefore represent A as A = U ΛU T^ , remembering from above that the inverse of an orthogonal matrix is just its transpose. Using this, we can show that the definiteness of a matrix depends entirely on the sign of its eigenvalues. Suppose A ∈ Sn^ = U ΛU T^. Then

xT^ Ax = xT^ U ΛU T^ x = yT^ Λy =

∑^ n

i=

λiy^2 i

where y = U T^ x (and since U is full rank, any vector y ∈ Rn^ can be represented in this form). Because y^2 i is always positive, the sign of this expression depends entirely on the λi’s. If all λi > 0, then the matrix is positive definite; if all λi ≥ 0, it is positive semidefinite. Likewise, if all λi < 0 or λi ≤ 0, then A is negative definite or negative semidefinite respectively. Finally, if A has both positive and negative eigenvalues, it is indefinite. An application where eigenvalues and eigenvectors come up frequently is in maximizing some function of a matrix. In particular, for a matrix A ∈ Sn, consider the following maximization problem,

maxx∈Rn xT^ Ax subject to ‖x‖^22 = 1

i.e., we want to find the vector (of norm 1) which maximizes the quadratic form. Assuming the eigenvalues are ordered as λ 1 ≥ λ 2 ≥... ≥ λn, the optimal x for this optimization problem is x 1 , the eigenvector corresponding to λ 1. In this case the maximal value of the quadratic form is λ 1. Similarly, the optimal solution to the minimization problem,

minx∈Rn^ xT^ Ax subject to ‖x‖^22 = 1

is xn, the eigenvector corresponding to λn, and the minimal value is λn. This can be proved by appealing to the eigenvector-eigenvalue form of A and the properties of orthogonal matrices. However, in the next section we will see a way of showing it directly using matrix calculus.

4 Matrix Calculus

While the topics in the previous sections are typically covered in a standard course on linear algebra, one topic that does not seem to be covered very often (and which we will use extensively) is the extension of calculus to the vector setting. Despite the fact that all the actual calculus we use is relatively trivial, the notation can often make things look much more difficult than they are. In this section we present some basic definitions of matrix calculus and provide a few examples.

4.1 The Gradient

Suppose that f : Rm×n^ → R is a function that takes as input a matrix A of size m × n and returns a real value. Then the gradient of f (with respect to A ∈ Rm×n) is the matrix of