





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Social Media Mining; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Unknown 1989;
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






CNS 185: A Brief Review of Linear Algebra
An understanding of linear algebra is critical as a stepping-o p oint for understanding neural net- works. This handout includes basic de nitions, then quickly progresses to elementary but p owerful techniques such as eigenbases. For your private edi cation, a few exercises are included, identi ed by bullets ; some exercises come with hints and answers. Take what you will from this handout, but b e forewarned that future problem sets wil l require most of the concepts develop ed here, so it b eho oves you to b e comfortable with them. Don't delay in asking a TA if you can't gure it out on your own.
Matrices are most often represented as rectangular arrays of scalars.^1 The m n matrix A has m rows and n columns. The subscript notation Ai is used to reference the ith row of the matrix, and Aij is used to reference the scalar in the ith row and j th column of A.
For example, A is a 2 3 matrix:
A column vector (quite often referred to simply as a vector) is an n 1 matrix, where n is referred to as the dimension of the vector. The scalar vi is the ith element of vector v.
A matrix with the same numb er of rows and columns is, not surprisingly, referred to as a square matrix. A commonly used notational convention is to use capital letters (i.e., A) to denote matrices and lower case letters (i.e., v) to denote vectors.
The transp ose of m n matrix A is denoted AT^. AT^ is an n m matrix whose elements are:
(AT^ )ij = Aj i
The transp ose of a column vector is called a row vector. An ob ject twice transp osed will pro duce the original ob ject: (AT^ )T^ = A.
Adding two matrices A and B results in a matrix whose elements are the sums of the corresp onding elements from A and B:
If C = A + B; then Cij = Aij + Bij (^1) In the examples presented, scalars will b e real numb ers, but in general they can b e complex.
(A and B must have the same dimensions to b e able to add them together.) Addition is commutative and asso ciative, just like regular addition.
A matrix A multiplied by a scalar k pro duces a new B = k A whose elements are the elements of A each multiplied by k. Multiplying two matrices together is more complicated: multiplying m n matrix A by n p matrix B pro duces an m p matrix C = AB whose elements are de ned to b e:
If C = AB; then Cik =
j =
Aij Bj k
In this example, a 4 2 matrix is multiplied by a 2 3 matrix to pro duce a 4 3 matrix:
Note that matrix multiplication can only b e p erformed b etween two matrices A and B if A has exactly as many columns as B has rows.
Like ordinary multiplication, matrix multiplication is asso ciative and distributive, but unlike ordi- nary multiplication, it is not commutative:
AB 6 = BA, in general
From the de nitions of multiplication and transp ose, we derive the following identity:
The inner product (also known as the dot product) of n-dimensional vectors x and y is de ned as xT^ y which is a scalar^2. By our de nitions of matrix transp ose and matrix multiplication, this means that the inner pro duct is the sum of the pro ducts of corresp onding elements from the two vectors:
xT^ y =
i=
xi yi
If the inner pro duct of two vectors is zero, they are said to b e or thog onal , which has the usual geometric connotation of p erp endicularity.
The diag onal of an n n square matrix A are the elements Aii running diagonally from the top left corner to the b ottom right. A diagonal matrix is a matrix which has zero es everywhere o the diagonal.
(^2) When working with complex vectors, we use the inner pro duct x (^) y , which returns a real value when y = x. x is the complex conjugate of the transp ose of x. The complex conjugate x has Re[xi ] = Re[xi ] and I m[xi ] = I m[xi ].
Find a condition for which the equation
Ax = x
( a scalar) has interesting non-zero solutions for x.
Use this condition to write an equation that must satisfy in order to get non-zero solutions to
x 1 x 2
x 1 x 2
Solve the equation you got for . You will get two p ossible values. Using one of them, nd values for x 1 and x 2 that satisfy equation (1).
Note that if x satis es Ax = x, so do es x. So you won't b e able to solve for a unique x 1 and x 2 , just for the direction that x should lie in. That direction is called the eigenvector direction, and all vectors parallel to it are eigenvectors with the same eigenvalue.
Now use the other to nd the other eigenvector direction.
To restate, if x 6 = 0 and Ax = x then is an eigenvalue of A and x is an eigenvector of A with eigenvalue .
For larger square matrices, we can nd eigenvalues and eigenvectors using the same approach you used for the 2 2 matrix. First, we lo ok for values of such that A I is singular, i.e., jA Ij = 0. Using the formula for determinants, this leads to a p olynomial of degree n, which is called the characteristic polynomial of A. You will recall from algebra that every p olynomial of degree n has exactly n (not necessarily distinct) complex ro ots (some of which may b e real, of course). Therefore, every matrix A has exactly n (not necessarily distinct and p ossibly complex) eigenvalues. Once the eigenvalues are known, the eigenvectors can b e determined.
Find a matrix A for which 0 is an eigenvalue, and nd all the eigenvectors.
A common convention is to chose eigenvectors to b e unit vectors^4 , i.e. xT^ x = 1.
If you write the eigenvector directions as column vectors and put them side by side, you get a new matrix| call it E. Convince yourself that since the columns of E are eigenvectors, the following is true:
where 1 is the rst column's eigenvalue, and 2 is the second column's eigenvalue We can multiply on the right by E ^1 (assuming E is invertible) to get
This expression helps us describ e the true signi cance of the matrix A, and why eigenvalues are so imp ortant. (^4) If the eigenvectors are complex, there is no obvious way to nd a unique representation for the eigenvector by normalizing, since an eigenvector multiplied by any complex numb er is still an eigenvector, with the same eigenvalue. For convenience one can set x^ x = 1.
Multiplying on the left by A is the same as p erforming the following sequence of op erations:
. But this is a particularly easy matrix to multiply with, since the co ordinates don't mix! We can easily visualize what is going on: the rst co ordinate gets stretched (or squeezed) by 1 , the second by 2.
That is, there is some sp ecial co ordinate system in which multiplying by A just stretches the two co ordinates indep endently. Clearly this is the natural co ordinate system for the problem, the one we want to b e thinking in. Many times it is enough to know the eigenvalues: we know we could always transform the problem into the sp ecial system if we wanted to. We just pretend that we've already done the transformation.
This illustrates a very imp ortant concept that cannot b e stressed enough: the real guts of a matrix, what it really do es, don't dep end on what co ordinate system we use to describ e it. Here, if A is a p ositive de nite matrix, then E is an orthonormal matrix and represents simply a rotation.^5 Who cares if we rotate co ordinates around? They're our co ordinates, not the physical problem's. The eigenvalues are what really matter.
Find the eigenvalues and eigenvectors of the following two matrices:
and
The arguments we used ab ove all rely on our casual assumption that E is invertible. This is usually an acceptable assumption to make, for the sorts of matrices commonly encountered in neural network theory. But it can b e helpful to have some understanding of the other p ossibilities.^6
In the following, assume when necessary that E is invertible, with eigenvectors e 1 : : : en and corre- sp onding eigenvalues 1 : : : n.
If C has eigenvectors and eigenvalues fei ; i g, then the matrix B = C I has eigenvectors and eigenvalues fei ; i g.
If C is a real symmetric matrix, (i.e. Cij = Cj i ), then all the eigenvalues of C are real. We can also cho ose all the eigenvectors to b e real. 7
(^5) Positive de nite matrix: a matrix M such that xT (^) Mx > 0 for all non-zero x. Orthonormal matrix: One where MT^ M = I. The imp ortant p oint is that if these conditions are satis ed, the matrix E is just a co ordinate rotation and/or a re ection. (Though re ections aren't prop erly rotations, we almost always include them when we say, abusing notation, \rotation matrix". (^6) A quick summary: Supp ose the n (p ossibly complex) eigenvalues of M are distinct. Then to each eigenvalue there is a unique (up to multiplication by a complex numb er) eigenvector, and all the eigenvectors are linearly indep endent (i.e. they span C n^ ). Now supp ose there are m eigenvalues with the same value . In this case, unfortunately, there might not b e m linearly indep endent eigenvectors all with the same eigenvalue , in which case E must b e singular. In b oth these cases, there is nothing sp ecial ab out the eigenvalue 0 { the issue is only whether an eigenvalue is a multiple ro ot of the characteristic p olynomial. (^7) Hint: Start with a p otentially complex eigenvalue and its p otentially complex eigenvector x, satisfying Cx = x.
disturbance to x, would x return to the xed p oint, or would it sho ot o in some direction? The stability of xed p oints is of great practical imp ortance in a world full of natural small random disturbances. For example, the b ottom of a spherical b owl is a stable xed p oint: fruit stays down there. But the top of a glass sphere is an unstable xed p oint: we could very carefully balance an apple on top of it { but any small disturbance, and the apple will fall o.
For dxdt = x, convince yourself that x = 0 is a stable xed p oint if < 0 and is unstable if > 0.
The phrasing and solution to the ab ove problem assume is real. What if is complex? Convince yourself that equation 2 still holds. The imaginary part of just represents an oscillation (eiw^ t^ = cos w t + i sin w t ). So the condition ab ove, to b e completely general, should really read \stable xed p oint if the real part of < 0, unstable if the real part of > 0". What happ ens if the real part of = 0 exactly?
Now consider the following equation: x = x_ ! x
One of the nice things ab out linear di erential equations is that we can always take a single n-th order equation and turn it into n coupled rst-order equations by rewriting some of the variables. So we de ne x 1 x_, x 2 x, to get the equivalent equations
x_ 1 x _ 2
x 1 x 2
Convince yourself that these equations indeed represent the same system.
You will have noticed that we already wrote this down in matrix form. We will now get a chance to use what we saw in section 2. Call the vector on the left x_, the matrix on the right hand side A, and the vector on the right hand side x, so the equation is x_ = Ax.
We said that we can nd sp ecial co ordinates where our matrix do esn't mix co ordinates (that is, it is a diagonal matrix). Supp ose that we nd matrices E and , where is a diagonal matrix that holds the eigenvalues, such that A = EE ^1 , as in section 2.1. Then
x_ = EE ^1 x
Multiplying on the left by E ^1 , and rememb ering that as a linear op eration in commutes with di erentiation by time, we get d dt
(E ^1 x) = (E ^1 x)
Let's just say that we de ne new co ordinates x^0 = E ^1 x. Then we get an equation that lo oks like
x^01 x^ _^0 2
x^01 x^02
But this is just two completely sparate equations, each one in the simple single-variable form we saw at the b eginning of this section! We know how to solve that, and how to know whether their xed p oint is stable; and since these equations are the same as our original ones (simply represented in di erent co ordinates), if these are stable so are the original ones, and vice-versa.
x_ 1 x _ 2
x 1 x 2
is the xed-p oint (0,0) stable or unstable? Why? You need to consider b oth equations at once.
In equation 3, if = 3 and! = 1, is (0,0) stable or unstable?
How ab out if = 2 and! = 2?
Note that the xed p oint do esn't always has to b e at (0,0). We just put it there in these examples for simplicity. The eigenvalue analysis still holds, however.
Take an n by n matrix A. Then Tr A
i
Aii
is called the Trace of A. Let 1 ; : : : ; n b e the eigenvalues of A, with corresp onding eigenvectors e 1 ; : : : ; en. Then Tr A =
i i^.^ We^ will^ show^ this^ b^ elow^ in^ two^ ways.
First note what this means for dynamical systems: if the matrix that describ es the linearization ab out a given xed p oint of the dynamics has Trace equal to zero, then either (1) all its eigenvalues have zero real part; or (2) some have a negative real part and some have a p ositive real part. In the second (more usual) case, therefore, the xed p oint is a saddle and is unstable.
Tr (AB ) =
i
j
Aij Bj i
simply from the de nition of matrix multiplication. The order in which we do the sums do esn't matter, however, so we can quickly see that
Tr (AB) =
j
i
Aij Bj i =
j
i
Bj i Aij = Tr (BA)
that is, Trace is commutative.
Now recall (if necessary from the basic math class) that A can b e written as
A = EE ^1
where the columns of E are the eigenvectors of A and is a diagonal matrix with the eigenvalues of A as its diagonal elements. (We have assumed E is invertible.) Then
Tr A = Tr (EE ^1 ) = Tr ((E )E ^1 ) = Tr (E ^1 (E)) = Tr ((E ^1 E)) = Tr
This last is just the sum of the eigenvalues.
Recall that for any square matrix the eigenvalues are found by obtaining the solutions to the char- acteristic p olynomial: det(A I) = 0