Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Eigenvalues and Eigenvectors, Linear Differential Equations | CSE 494, Study notes of Computer Science

Arizona State University (ASU) - Tempe Computer Science

Material Type: Notes; Class: Social Media Mining; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-tgo 🇺🇸

9 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

CNS 185: A Brief Review of Linear Algebra

An understanding of linear algebra is critical as a stepping-o point for understanding neural net-

works. This handout includes basic denitions, then quickly progresses to elementary but powerful

techniques such as eigenbases. For your private edication, a few exercises are included, identied

by bullets



; some exercises come with hints and answers. Take what you will from this handout,

but be forewarned that future problem sets

will

require most of the concepts developed here, so it

behooves you to be comfortable with them. Don't delay in asking a TAifyou can't gure it out on

your own.

1 STARTING DEFINITIONS

1.1 Matrix structure

Matrices are most often represented as rectangular arrays of scalars.

1

The

m



n

matrix

A

has

m

rows and

n

columns. The subscript notation

A

i

is used to reference the

i

th row of the matrix, and

A

ij

is used to reference the scalar in the

i

th row and

j

th column of

A

.

For example,

A

is a 2



3 matrix:

A

=



4 2

,

5

1 0

,

8



; A

13

=

,

5

A

column vector

(quite often referred to simply as a vector) is an

n



1 matrix, where

n

is referred

to as the

dimension

of the vector. The scalar

v

i

is the

i

th elementofvector

v

.

A matrix with the same number of rows and columns is, not surprisingly, referred to as a

square

matrix

. A commonly used notational convention is to use capital letters (i.e.,

A

) to denote matrices

and lower case letters (i.e.,

v

) to denote vectors.

1.2 Matrix transpose

The transpose of

m



n

matrix

A

is denoted

A

T

.

A

T

is an

n



m

matrix whose elements are:

(

A

T

)

ij

=

A

ji

The transpose of a column vector is called a

row vector

. An ob ject twice transposed will produce

the original object: (

A

T

)

T

=

A

.

1.3 Addition and multiplication

Adding two matrices

A

and

B

results in a matrix whose elements are the sums of the corresponding

elements from

A

and

B

:

If

C

=

A

+

B

;

then

C

ij

=

A

ij

+

B

ij

1

In the examples presented, scalars will be real numbers, but in general they can be complex.

Discover Study notes of Computer Science Arizona State University (ASU) - Tempe

Partial preview of the text

Download Eigenvalues and Eigenvectors, Linear Differential Equations | CSE 494 and more Study notes Computer Science in PDF only on Docsity!

CNS 185: A Brief Review of Linear Algebra

An understanding of linear algebra is critical as a stepping-o p oint for understanding neural net- works. This handout includes basic de nitions, then quickly progresses to elementary but p owerful techniques such as eigenbases. For your private edi cation, a few exercises are included, identi ed by bullets ; some exercises come with hints and answers. Take what you will from this handout, but b e forewarned that future problem sets wil l require most of the concepts develop ed here, so it b eho oves you to b e comfortable with them. Don't delay in asking a TA if you can't gure it out on your own.

1 STARTING DEFINITIONS

1.1 Matrix structure

Matrices are most often represented as rectangular arrays of scalars.^1 The m n matrix A has m rows and n columns. The subscript notation Ai is used to reference the ith row of the matrix, and Aij is used to reference the scalar in the ith row and j th column of A.

For example, A is a 2 3 matrix:

A =

; A 13 = 5

A column vector (quite often referred to simply as a vector) is an n 1 matrix, where n is referred to as the dimension of the vector. The scalar vi is the ith element of vector v.

A matrix with the same numb er of rows and columns is, not surprisingly, referred to as a square matrix. A commonly used notational convention is to use capital letters (i.e., A) to denote matrices and lower case letters (i.e., v) to denote vectors.

1.2 Matrix transp ose

The transp ose of m n matrix A is denoted AT^. AT^ is an n m matrix whose elements are:

(AT^ )ij = Aj i

The transp ose of a column vector is called a row vector. An ob ject twice transp osed will pro duce the original ob ject: (AT^ )T^ = A.

1.3 Addition and multiplication

Adding two matrices A and B results in a matrix whose elements are the sums of the corresp onding elements from A and B:

If C = A + B; then Cij = Aij + Bij (^1) In the examples presented, scalars will b e real numb ers, but in general they can b e complex.

(A and B must have the same dimensions to b e able to add them together.) Addition is commutative and asso ciative, just like regular addition.

A matrix A multiplied by a scalar k pro duces a new B = k A whose elements are the elements of A each multiplied by k. Multiplying two matrices together is more complicated: multiplying m n matrix A by n p matrix B pro duces an m p matrix C = AB whose elements are de ned to b e:

If C = AB; then Cik =

X^ n

j =

Aij Bj k

In this example, a 4 2 matrix is multiplied by a 2 3 matrix to pro duce a 4 3 matrix:

Note that matrix multiplication can only b e p erformed b etween two matrices A and B if A has exactly as many columns as B has rows.

Like ordinary multiplication, matrix multiplication is asso ciative and distributive, but unlike ordi- nary multiplication, it is not commutative:

AB 6 = BA, in general

From the de nitions of multiplication and transp ose, we derive the following identity:

(AB)T^ = BT^ AT

1.4 Inner pro duct

The inner product (also known as the dot product) of n-dimensional vectors x and y is de ned as xT^ y which is a scalar^2. By our de nitions of matrix transp ose and matrix multiplication, this means that the inner pro duct is the sum of the pro ducts of corresp onding elements from the two vectors:

xT^ y =

X^ n

i=

xi yi

If the inner pro duct of two vectors is zero, they are said to b e or thog onal , which has the usual geometric connotation of p erp endicularity.

1.5 Square matrices

The diag onal of an n n square matrix A are the elements Aii running diagonally from the top left corner to the b ottom right. A diagonal matrix is a matrix which has zero es everywhere o the diagonal.

(^2) When working with complex vectors, we use the inner pro duct x (^) y , which returns a real value when y = x. x is the complex conjugate of the transp ose of x. The complex conjugate x has Re[xi ] = Re[xi ] and I m[xi ] = I m[xi ].

Find a condition for which the equation

Ax = x

( a scalar) has interesting non-zero solutions for x.

Use this condition to write an equation that must satisfy in order to get non-zero solutions to

the following equation:

x 1 x 2

Solve the equation you got for . You will get two p ossible values. Using one of them, nd values for x 1 and x 2 that satisfy equation (1).

Note that if x satis es Ax = x, so do es x. So you won't b e able to solve for a unique x 1 and x 2 , just for the direction that x should lie in. That direction is called the eigenvector direction, and all vectors parallel to it are eigenvectors with the same eigenvalue.

Now use the other to nd the other eigenvector direction.

To restate, if x 6 = 0 and Ax = x then is an eigenvalue of A and x is an eigenvector of A with eigenvalue .

For larger square matrices, we can nd eigenvalues and eigenvectors using the same approach you used for the 2 2 matrix. First, we lo ok for values of such that A I is singular, i.e., jA Ij = 0. Using the formula for determinants, this leads to a p olynomial of degree n, which is called the characteristic polynomial of A. You will recall from algebra that every p olynomial of degree n has exactly n (not necessarily distinct) complex ro ots (some of which may b e real, of course). Therefore, every matrix A has exactly n (not necessarily distinct and p ossibly complex) eigenvalues. Once the eigenvalues are known, the eigenvectors can b e determined.

Find a matrix A for which 0 is an eigenvalue, and nd all the eigenvectors.

A common convention is to chose eigenvectors to b e unit vectors^4 , i.e. xT^ x = 1.

2.1 Some words ab out eigenvalues

If you write the eigenvector directions as column vectors and put them side by side, you get a new matrix| call it E. Convince yourself that since the columns of E are eigenvectors, the following is true:

AE = E

where 1 is the rst column's eigenvalue, and 2 is the second column's eigenvalue We can multiply on the right by E^1 (assuming E is invertible) to get

A = E

E^1

This expression helps us describ e the true signi cance of the matrix A, and why eigenvalues are so imp ortant. (^4) If the eigenvectors are complex, there is no obvious way to nd a unique representation for the eigenvector by normalizing, since an eigenvector multiplied by any complex numb er is still an eigenvector, with the same eigenvalue. For convenience one can set x^ x = 1.

Multiplying on the left by A is the same as p erforming the following sequence of op erations:

rst multiplying by E^1. Think of this as doing a linear change of co ordinates. That is, we change co ordinates to some sp ecial co ordinate system.
In that co ordinate system, multiply by

. But this is a particularly easy matrix to multiply with, since the co ordinates don't mix! We can easily visualize what is going on: the rst co ordinate gets stretched (or squeezed) by 1 , the second by 2.

Then go back to your original co ordinates, by multiplying by the inverse of E^1 , namely E.

That is, there is some sp ecial co ordinate system in which multiplying by A just stretches the two co ordinates indep endently. Clearly this is the natural co ordinate system for the problem, the one we want to b e thinking in. Many times it is enough to know the eigenvalues: we know we could always transform the problem into the sp ecial system if we wanted to. We just pretend that we've already done the transformation.

This illustrates a very imp ortant concept that cannot b e stressed enough: the real guts of a matrix, what it really do es, don't dep end on what co ordinate system we use to describ e it. Here, if A is a p ositive de nite matrix, then E is an orthonormal matrix and represents simply a rotation.^5 Who cares if we rotate co ordinates around? They're our co ordinates, not the physical problem's. The eigenvalues are what really matter.

Find the eigenvalues and eigenvectors of the following two matrices:

and

The arguments we used ab ove all rely on our casual assumption that E is invertible. This is usually an acceptable assumption to make, for the sorts of matrices commonly encountered in neural network theory. But it can b e helpful to have some understanding of the other p ossibilities.^6

2.2 Some more useful facts ab out eigenvectors

In the following, assume when necessary that E is invertible, with eigenvectors e 1 : : : en and corre- sp onding eigenvalues 1 : : : n.

If C has eigenvectors and eigenvalues fei ; i g, then the matrix B = C I has eigenvectors and eigenvalues fei ; i g.

If C is a real symmetric matrix, (i.e. Cij = Cj i ), then all the eigenvalues of C are real. We can also cho ose all the eigenvectors to b e real. 7

(^5) Positive de nite matrix: a matrix M such that xT (^) Mx > 0 for all non-zero x. Orthonormal matrix: One where MT^ M = I. The imp ortant p oint is that if these conditions are satis ed, the matrix E is just a co ordinate rotation and/or a re ection. (Though re ections aren't prop erly rotations, we almost always include them when we say, abusing notation, \rotation matrix". (^6) A quick summary: Supp ose the n (p ossibly complex) eigenvalues of M are distinct. Then to each eigenvalue there is a unique (up to multiplication by a complex numb er) eigenvector, and all the eigenvectors are linearly indep endent (i.e. they span C n^ ). Now supp ose there are m eigenvalues with the same value . In this case, unfortunately, there might not b e m linearly indep endent eigenvectors all with the same eigenvalue , in which case E must b e singular. In b oth these cases, there is nothing sp ecial ab out the eigenvalue 0 { the issue is only whether an eigenvalue is a multiple ro ot of the characteristic p olynomial. (^7) Hint: Start with a p otentially complex eigenvalue and its p otentially complex eigenvector x, satisfying Cx = x.

disturbance to x, would x return to the xed p oint, or would it sho ot o in some direction? The stability of xed p oints is of great practical imp ortance in a world full of natural small random disturbances. For example, the b ottom of a spherical b owl is a stable xed p oint: fruit stays down there. But the top of a glass sphere is an unstable xed p oint: we could very carefully balance an apple on top of it { but any small disturbance, and the apple will fall o.

For dxdt = x, convince yourself that x = 0 is a stable xed p oint if < 0 and is unstable if > 0.

The phrasing and solution to the ab ove problem assume is real. What if is complex? Convince yourself that equation 2 still holds. The imaginary part of just represents an oscillation (eiw^ t^ = cos w t + i sin w t ). So the condition ab ove, to b e completely general, should really read \stable xed p oint if the real part of < 0, unstable if the real part of > 0". What happ ens if the real part of = 0 exactly?

Now consider the following equation: x = x_ ! x

One of the nice things ab out linear di erential equations is that we can always take a single n-th order equation and turn it into n coupled rst-order equations by rewriting some of the variables. So we de ne x 1 x_, x 2 x, to get the equivalent equations

x_ 1 x _ 2

x 1 x 2

Convince yourself that these equations indeed represent the same system.

You will have noticed that we already wrote this down in matrix form. We will now get a chance to use what we saw in section 2. Call the vector on the left x_, the matrix on the right hand side A, and the vector on the right hand side x, so the equation is x_ = Ax.

We said that we can nd sp ecial co ordinates where our matrix do esn't mix co ordinates (that is, it is a diagonal matrix). Supp ose that we nd matrices E and , where is a diagonal matrix that holds the eigenvalues, such that A = EE^1 , as in section 2.1. Then

x_ = EE^1 x

Multiplying on the left by E^1 , and rememb ering that as a linear op eration in commutes with di erentiation by time, we get d dt

(E^1 x) = (E^1 x)

Let's just say that we de ne new co ordinates x^0 = E^1 x. Then we get an equation that lo oks like

_

x^01 x^ _^0 2

x^01 x^02

But this is just two completely sparate equations, each one in the simple single-variable form we saw at the b eginning of this section! We know how to solve that, and how to know whether their xed p oint is stable; and since these equations are the same as our original ones (simply represented in di erent co ordinates), if these are stable so are the original ones, and vice-versa.

In the following system,

x_ 1 x _ 2

x 1 x 2

is the xed-p oint (0,0) stable or unstable? Why? You need to consider b oth equations at once.

In equation 3, if = 3 and! = 1, is (0,0) stable or unstable?

How ab out if = 2 and! = 2?

Note that the xed p oint do esn't always has to b e at (0,0). We just put it there in these examples for simplicity. The eigenvalue analysis still holds, however.

4 THE TRACE IS THE SUM OF THE EIGENVALUES

Take an n by n matrix A. Then Tr A

X

i

Aii

is called the Trace of A. Let 1 ; : : : ; n b e the eigenvalues of A, with corresp onding eigenvectors e 1 ; : : : ; en. Then Tr A =

P

i i^.^ We^ will^ show^ this^ b^ elow^ in^ two^ ways.

First note what this means for dynamical systems: if the matrix that describ es the linearization ab out a given xed p oint of the dynamics has Trace equal to zero, then either (1) all its eigenvalues have zero real part; or (2) some have a negative real part and some have a p ositive real part. In the second (more usual) case, therefore, the xed p oint is a saddle and is unstable.

Metho d 1 (easy but not b eautiful)

Tr (AB ) =

X

i

X

j

Aij Bj i

A

simply from the de nition of matrix multiplication. The order in which we do the sums do esn't matter, however, so we can quickly see that

Tr (AB) =

X

j

X

i

Aij Bj i =

X

j

X

i

Bj i Aij = Tr (BA)

that is, Trace is commutative.

Now recall (if necessary from the basic math class) that A can b e written as

A = EE^1

where the columns of E are the eigenvectors of A and is a diagonal matrix with the eigenvalues of A as its diagonal elements. (We have assumed E is invertible.) Then

Tr A = Tr (EE^1 ) = Tr ((E )E^1 ) = Tr (E^1 (E)) = Tr ((E^1 E)) = Tr

This last is just the sum of the eigenvalues.

Metho d 2 (much more interesting concepts here)

Recall that for any square matrix the eigenvalues are found by obtaining the solutions to the char- acteristic p olynomial: det(A I) = 0

Eigenvalues and Eigenvectors, Linear Differential Equations | CSE 494, Study notes of Computer Science

Related documents

Partial preview of the text

Download Eigenvalues and Eigenvectors, Linear Differential Equations | CSE 494 and more Study notes Computer Science in PDF only on Docsity!

1 STARTING DEFINITIONS

1.1 Matrix structure

A =

; A 13 = 5

1.2 Matrix transp ose

1.3 Addition and multiplication

X^ n

(AB)T^ = BT^ AT

1.4 Inner pro duct

X^ n

1.5 Square matrices

the following equation: 

2.1 Some words ab out eigenvalues

AE = E

A = E

E^1

2.2 Some more useful facts ab out eigenvectors

 _

 In the following system, 

4 THE TRACE IS THE SUM OF THE EIGENVALUES

X

P

Metho d 1 (easy but not b eautiful)

X

X

A

X

X

X

X

Metho d 2 (much more interesting concepts here)

the following equation:

_

In the following system,