Linear Algebra Notes: Eigenvalues, Eigenvectors, and Diagonalization, Study notes of Linear Algebra

An introduction to linear algebra, covering topics such as matrix-vector multiplication, linear combinations of columns, and standard matrices of linear transformations. The author explains the dot product and how matrices can be used to represent linear equations and transformations. The document also touches on matrix-matrix multiplication and its definition. The notes are concise and provide a good foundation for further study of linear algebra.

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

lalitlallit
lalitlallit 🇺🇸

4.1

(10)

226 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear Algebra Notes
Nikhil Srivastava
February 9, 2015
Scalars are lowercase, matrices are uppercase, and vectors are lowercase bold. All vectors
are column vectors (i.e., a vector in Rnis an n×1 matrix), unless transposed.
1 Column Picture of Matrix Vector Multiplication
Suppose Ais an m×nmatrix with rows rT
1,...,rT
mand columns c1,...,cn. In high school,
we are taught to think of the matrix-vector product Axas taking dot products of xwith
rows of A:
Ax=
rT
1
rT
2
. . .
rT
m
x=
rT
1x
rT
2x
. . .
rT
mx
=
(r1·x)
(r2·x)
. . .
(rm·x)
This makes sense if we regard matrices as essentially a convenient notation to represent
linear equations, since each linear equation naturally gives a dot product.
A different perspective is to view Axas taking a linear combination of the columns
c1,...,cnof A, with coefficients equal to the entries of x:
Ax= [c1|c2|. . . |cn]
x1
x2
. . .
xn
=x1c1+x2c2+. . . +xncn.
This view makes it transparent that matrices can be used to represent linear transforma-
tions with respect to a pair of bases. Suppose T:RnRmis a linear transformation, and
let e1,...,enbe the standard basis of Rn. Then we can write any xRnas
x=x1e1+. . . +xnen
for some coefficients xi; indeed, this is what we mean when we identify xwith its standard
coordinate vector:
[x] =
x1
x2
. . .
xn
.
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Linear Algebra Notes: Eigenvalues, Eigenvectors, and Diagonalization and more Study notes Linear Algebra in PDF only on Docsity!

Linear Algebra Notes

Nikhil Srivastava

February 9, 2015

Scalars are lowercase, matrices are uppercase, and vectors are lowercase bold. All vectors are column vectors (i.e., a vector in Rn^ is an n × 1 matrix), unless transposed.

1 Column Picture of Matrix Vector Multiplication

Suppose A is an m × n matrix with rows rT 1 ,... , rTm and columns c 1 ,... , cn. In high school, we are taught to think of the matrix-vector product Ax as taking dot products of x with rows of A:

Ax =

rT 1 rT 2

... rTm

 x^ =

rT 1 x rT 2 x

... rTmx

(r 1 · x) (r 2 · x)

... (rm · x)

This makes sense if we regard matrices as essentially a convenient notation to represent linear equations, since each linear equation naturally gives a dot product. A different perspective is to view Ax as taking a linear combination of the columns c 1 ,... , cn of A, with coefficients equal to the entries of x:

Ax = [c 1 |c 2 |... |cn]

x 1 x 2

... xn

 =^ x^1 c^1 +^ x^2 c^2 +^...^ +^ xncn.

This view makes it transparent that matrices can be used to represent linear transforma- tions with respect to a pair of bases. Suppose T : Rn^ → Rm^ is a linear transformation, and let e 1 ,... , en be the standard basis of Rn. Then we can write any x ∈ Rn^ as

x = x 1 e 1 +... + xnen

for some coefficients xi; indeed, this is what we mean when we identify x with its standard coordinate vector:

[x] =

x 1 x 2

... xn

Since T is linear, it is completely determined by its values on the basis:

T (x) = T (x 1 e 1 +... + xnen) = x 1 T (e 1 ) +... xnT (en).

Since these vectors are equal, their coordinate vectors in the standard basis of the “output” vector space Rm^ must also be equal:

[T (x)] = [T (x 1 e 1 +... + xnen)] = x 1 [T (e 1 )] +... xn[T (en)].

But since matrix-vector multiplication is the same thing as taking linear combinations, we may write this as

[T (x)] =

[

[T (e 1 )]|[T (e 2 )]|... |[T (en)]

]

x 1 x 2

... xn

where [v] denotes the standard coordinate vector of v. Note that anything that appears inside a matrix must be some sort of coordinate vector: it does not make sense to put an abstract vector inside a matrix whose entries are numbers. However, as before, we will identify v with its standard coordinate vector [v] and drop the brackets when we are working in the standard basis. With this identification, we can write:

T (x) =

[

T (e 1 )|T (e 2 )|... |T (en)

]

x 1 x 2

... xn

The matrix above is called the standard matrix of T , and is denoted by [T ]. It is a complete description of T , with respect to the standard basis. One of the remarkable things about linear transformations is that they have such compact descriptions — this is not at all true of arbitrary functions from Rn^ to Rm. Remark. The column view of matrix-vector multiplication also explains why matrix-matrix multiplication is defined the way it is (which usually seems mysterious the first time you see it). It is the unique definition for which

[S ◦ T ] = [S][T ].

where (S ◦ T )(v) = S(T (v)) denotes the composition of two linear transformations S and T. More concretely, for any matrix A and column vectors c 1... , cn, it is the unique definition for which A

[

c 1 |c 2 |... |cn

]

[

Ac 1 |Ac 2 |... |Acn

]

which since B is invertible is equivalent to

B[T ]B B−^1 [x] = [T (x)].

Thus, we must have B[T ]B B−^1 = [T ],

or equivalently [T ]B = B−^1 [T ]B,

which are the explicit formulas for change of basis of matrices/linear transformations.

3 Diagonalization

The standard basis e 1 ,... , en of Rn^ is completely arbitrary, and as such it is just a conven- tion invented by humans so that they can start writing vectors in some basis. But linear transformations that occur in nature and elsewhere often come equipped with a much better basis, given by their eigenvectors. Let T : V → V be a linear operator (i.e., a linear transformation from a vector space to itself). If v 6 = 0 is a nonzero vector and T (v) = λv for some scalar λ (which may be complex), then v is called an eigenvector of T and λ is called an eigenvalue of v. There is an analogous definition for square matrices A, in which we ask that Av = λv. Note that T (v) = λv if and only if [T ][v] = λ[v], so in particular the operator T and the matrix [T ] have the same eigenvalues. This fact holds in every basis (see HW3 question 6), so eigenvalues are intrinsic to operators and do not depend on the choice of basis used to write the matrix. The eigenvalues of a matrix may be computed by solving the characteristic equation det(λI − A) = 0. Since this is a polynomial of degree n for an n × n matrix, the fundamental theorem of algebra tells us that it must have n roots, whence every n × n matrix must have n eigenvalues. Once the eigenvalues are known, the corresponding eigenvectors can be obtained by solving systems of linear equations (λI − A)v = 0. See your Math 54 text for more information on how to compute these things. In any case, something wonderful happens when we have an operator/matrix with a basis of linearly independent eigenvectors, sometimes called an eigenbasis. For instance, let T : Rn^ → Rn^ be such an operator. Then, we have

T (b 1 ) = λ 1 b 1 , T (b 2 ) = λ 2 b 2 ,... , T (bn) = λnbn,

for linearly independent b 1 ,... , bn. Thus, we may write an arbitrary x ∈ Rn^ in this basis:

x = x′ 1 b 1 + x′ 2 b 2 +... + x′ nbn,

and then

T (x) = T (x′ 1 b 1 + x′ 2 b 2 +... + x′ nbn) = x′ 1 T (b 1 ) + x′ 2 T (b 2 ) +... + x′ nT (bn) by linearity of T = x′ 1 λ 1 b 1 + x′ 2 λ 2 b 2 +... + x′ nλnbn.

So, applying the transformation T is tantamount to multiplying each coefficient x′ i by λi. In particular T acts on each coordinate completely independently by scalar multiplication, and there are no interactions between the coordinates. This is as about as simple as a linear transformation can be. If we write down the matrix of T in the basis B consisting of its eigenvectors, we find that [T ]B is a diagonal matrix with the eigenvalues λ 1 ,... , λn on the diagonal. Appealing to the change of basis formula we derived in the previous section, this means that

[T ] = B[T ]B B−^1.

In general, this can be done for any square matrix A, since every A is equal to [T ] for the linear transformation T (x) = Ax. Using the letter D to denote the diagonal matrix of eigenvalues, this gives A = B−^1 DB.

Factorizing a matrix in this way is called diagonalization, and a matrix which can be diag- onalized (i.e., one with a basis of eigenvectors) is called diagonalizable. Not all matrices are diagonalizable, but the vast majority of them are.

4 Coupled Oscillator Example

Here is an example of diagonalization in action. Suppose we have two unit masses connected by springs with spring constant k as in Figure 12.1 of the book. Let x 1 (t) and x 2 (t) be the positions of the springs at time t. Then, subject to initial positions and velocities x 1 (0), x 2 (0), x˙ 1 (0), x˙ 2 (0), the system is governed by the coupled differential equations:

∂^2 x 1 (t) ∂^2 t

= −kx 1 (t) − k(x 1 (t) − x 2 (t)),

∂^2 x 2 (t) ∂^2 t

= −kx 2 (t) − k(x 2 (t) − x 1 (t)).

It is not immediately obvious how to solve this because each partial derivatives depends on both of the variables. We will now show how to reduce it to the diagonal case. We can write this as a single differential equation in the vector-valued function

x(t) =

[

x 1 (t) x 2 (t)

]

as follows: ∂^2 x(t) ∂t^2

= Ax(t),

where

A =

[

− 2 k k k − 2 k

]

which in terms of x is just

[ x 1 (t) x 2 (t)

]

[

cos(

kt) + cos(

3 kt) cos(

kt) − cos(

3 kt)

]

Explicit Matrix Notation. Some people understand things better if they are written in terms of explicit matrix factorizations. If we diagonalize A = BDB−^1 , we can rewrite our equation as ∂^2 ∂t^2

x(t) = BDB−^1 x(t).

Defining

a(t) = B−^1 x(t) (this is the same as (∗), in matrix notation),

this becomes ∂^2 ∂t^2

Ba(t) = BDa(t).

Since B is a fixed matrix that does not depend on t, it commutes with the partial derivative and we have

B

∂^2

∂t^2

a(t) = BDa(t).

Multiplying both sides by B−^1 gives

∂^2 ∂t^2

a(t) = Da(t),

which is the same diagonal system we solved above. The eigenvectors of A are called the normal modes of the system.

5 Symmetric and Orthogonal Matrices

Not all matrices are diagonlizable, but there are some very important classes that are. A real matrix A is called symmetric if A = AT^. To state the main theorem about diagonalizing symmetric matrices, we will need a definition. Definition. A collection of vectors v 1 ,... , vn is called orthonormal if they are pairwise orthogonal, i.e., (vi · vj ) = 0 for i 6 = j

and they are unit vectors: (vi · vi) = 1.

In matrix notation, an orthonormal set of vectors has the property that

V T^ V = I,

where V = [v 1 |... |vn] is a matrix with the vi as columns; such a matrix is called an orthogonal matrix. This last identity implies that V −^1 = V T^ ,

which reveals one of the very desirable properties of orthonormal bases: for any x, the change of basis is simply

[x]V = V −^1 [x] = V T^ [x] =

(v 1 · x)

... (vn · x)

That is, the coefficients are given by dot products

x = (x · v 1 )v 1 + (x · v 2 )v 2 +... + (x · vn)vn,

which are much easier to calculate than solving linear equations as in Section 2 Theorem. If A is symmetric, then it is diagonalizable. Moreover, all of its eigenvalues are real and it has an eigenbasis of orthonormal eigenvectors. Thus, A = V DV T^ for some orthogonal V and diagonal D.

6 Complex Inner Products, Hermitian Matrices, and

Unitary Matrices

There is an important class of complex matrices which are also diagonalizable with orthogonal eigenvectors. However, the notion of orthogonality (which is a geometric notion induced by the dot product) is different for complex vectors. To see that the usual real dot product is deficient in the complex case, consider that [ 1 i

]

[

i

]

= 1^2 + i^2 = 0,

so we have a nonzero vector which is orthogonal to itself. This makes no sense geometrically. It turns out that there is a way to redefine the dot product which recovers all the nice geometric properties that we have in the real case:

〈x|y〉 = x∗ 1 y 1 + x∗ 2 y 2 +... + x∗ nyn.

This is the same as the real dot product, except we take the complex conjugate (denoted by ∗) of the first vector x. Note that if both x and y are real then this doesn’t change anything. We will refer to this as an “inner product” to distinguish it from the real dot product. With the inner product in hand, we say that a set of vectors u 1 ,... , un in Cn^ is orthonor- mal if 〈ui|uj 〉 = 0 for i 6 = j

and 〈ui|ui〉 = 1.

  1. ‖x + y‖ ≤ ‖x‖ + ‖y‖. (triangle inequality)
  2. If 〈x|y〉 = 0 then ‖x + y‖^2 = ‖x‖^2 + ‖y‖^2. (Pythagoras Theorem)
  3. |〈x|y〉| ≤ ‖x‖‖y‖. (Cauchy-Schwartz Inequality).

Item (2) is a simple exercise, and item (1) can be easily derived from item (3). Here is the proof of item (3): first observe that it is equivalent to show that ∣∣ ∣∣

x ‖x‖

∣∣^ y ‖y‖

∣∣ = |〈x|y〉| ‖x‖‖y‖

where the first equality is because of linearity in both coordinates with respect to real scalars. So it suffices to show that |〈x|y〉| ≤ 1

for all unit vectors x, y. We now compute

〈x − y|x − y〉 = 〈x|x〉 + 〈y|y〉 − 〈y|x〉 − 〈x|y〉.