Matrix Properties - Matrix Computation - Lecture Slides, Slides of Advanced Computer Architecture

These lecture slides are very easy to understand and very helpful to built a concept about the Matrix computation.The key points discuss in these slides are:Matrix Properties, Singular Value Decomposition, Geometric Interpretation, Uniqueness, Unit Vector, Linearly Independent Vector, Right Singular Vector, Diagonal Matrix, Low-Rank Approximation, Eckart-Young Theorem

Typology: Slides

2012/2013

Uploaded on 04/27/2013

ashalata
ashalata 🇮🇳

3.8

(18)

106 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 5
1 / 18
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Matrix Properties - Matrix Computation - Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!

  • Lecture

Overview

Matrix properties via singular value decomposition (SVD)

Geometric interpretation of SVD

Applications

Uniqueness

Suppose in addition to v 1 , there is another linearly independent vector

w with ‖w‖ 2 = 1 and ‖Aw‖ 2 = σ 1

Define a unit vector v 2 , orthogonal to v 1 as a linear combination of v 1

and w

v 2 =

w − (v

> 1 w)v^1

‖w − (v

> 1 w)v^1 ‖^2

Since ‖A‖ 2 = σ 1 , ‖Av 2 ‖ 2 ≤ σ 1 , but this must be an equality, for

otherwise w = cv 1 + sv 2 for some constants c and s with

|c|

2

  • |s|

2 = 1, we would have ‖Aw‖ < σ 1

v 2 is a second right singular vector of A corresponding to σ 1

Once σ 1 , v 1 , and v 1 are determined, The remainder of SVD is

determined by the action of A on the space orthogonal to v 1

Since v 1 is unique up to a sign, the orthogonal space is unique defined

and so are the remaining singular values

Matrix properties via SVD

Theorem

The rank of A is r , the number of nonzero singular values.

Proof.

The rank of a diagonal matrix is equal to the number of its nonzero

entries, and in SVD, A = UΣV >^ where U and V are of full rank. Thus,

rank(A) = rank(Σ) = r

Theorem

‖A‖ 2 = σ 1 , and ‖A‖F =

σ

2 1 +^ · · ·^ +^ σ

2 r

Proof.

As U and V are orthogonal, A = UΣV

> , ‖A‖ 2 = ‖Σ‖ 2. By definition,

‖Σ‖ 2 = max‖x‖=1 ‖Σx‖ 2 = max{|σi |} = σ 1. Likewise, ‖A‖F = ‖Σ‖F , and

by definition ‖Σ‖F =

σ

2 1 +^ · · ·^ +^ σ

2 r

Low-rank approximation

Theorem

(Eckart-Young 1936) Let A = UΣV

> = U diag(σ 1 ,... , σr , 0 ,... , 0)V

> .

For any ν with 0 ≤ ν ≤ r , Aν =

∑ν

i=1 σi^ ui^ v

> i ,

‖A − Aν ‖ 2 = min rank(B)≤ν

‖A − B‖ 2 = σν+

Proof.

Suppose there is some B with rank(B) ≤ ν such that

‖A − B‖ 2 < ‖A − Aν ‖ 2 = σν+1. Then there exists an (n − ν)-dimensional

subspace W ∈ IR

n such that w ∈ W ⇒ Bw = 0. Then

‖Aw‖ 2 = ‖(A − B)w‖ 2 ≤ ‖A − B‖ 2 ‖w‖ 2 < σν+1‖w‖ 2

Thus W is a (n − ν)-dimensional subspace where ‖Aw‖ < σν+1‖w‖. But

there is a (ν + 1)-dimensional subspace where ‖Aw‖ ≥ σν+1‖w‖, namely

the space spanned by the first ν + 1 right singular vector of A. Since the

sum of the dimensions of these two spaces exceeds n, there must be a

nonzero vector lying in both, and this is a contradiction.

Low-rank approximation

Theorem

A is the sum of r rank one matrices: A =

∑r

i=

σi ui v> j

Theorem

(Eckart-Young 1936) Let A = UΣV

> = U diag(σ 1 ,... , σr , 0 ,... , 0)V

> .

For any ν with 0 ≤ ν ≤ r , Aν =

∑ν

i=1 σi^ ui^ v

> i ,

‖A − Aν ‖ 2 = min rank(B)≤ν

‖A − B‖ 2 = σν+

Proof.

Let Σν = U(A − Aν )V

> , then

Σν = U (diag(σ 1 ,... , σν , σν+1,... , σp ) − diag(σ 1 ,... , σν , 0 ,... , 0))V

>

= U diag(0,... , 0 , σν+1,... , σp ) V

>

, consequently ‖A − Aν ‖ 2 = ‖Σν ‖ 2 = σν+1.

Sensitivity of square systems

If

A =

∑^ n

i=

σi ui v

> i =^ UΣV^

>

is the SVD of A, then

x = A

− 1 b = (UΣV

> )

− 1 b =

∑^ n

i=

u

> i b

σi

vi

The magnitude of σn has bearing on the sensitivity of the Ax = b

problem

The solution x is increasingly sensitive to perturbations

Condition

Consider the parameterized system

(A + εF )x(ε) = b + εf x(0) = x

where F ∈ IR

n×n and f ∈ IR

n

If A is nonsingular, then x() is differentiable in a neighborhood of

zero

Moreover, ˙x = A

− 1 (f − F x) and the Taylor series expansion

x(ε) = x + ε x˙(0) + O(ε

2 )

Using any vector norm

‖x(ε) − x‖

‖x‖

≤ |ε|‖A

− 1 ‖

‖f ‖

‖x‖

+ ‖F ‖

  • O(ε

2 )

Condition number (cont’d)

Note that κ(·) depends on the underlying norm

κ 2 (A) = ‖A‖ 2 ‖A

− 1 ‖ 2 =

σ 1 (A)

σn(A)

Thus, the 2-norm condition of A measures the elongation of the

hyperellipsoid {Ax : ‖x‖ 2 = 1}

If κ(A) is large, then A is said to be an ill-conditioned matrix

Any two condition numbers κα(·) and κβ (·) are equivalent if constants

c 1 and c 2 can be found such that c 1 κα(A) ≤ κβ (A) ≤ c 2 κα(A)

1 n

κ 2 (A) ≤ κ 1 (A) ≤ nκ 2 (A) 1 n κ∞(A)^ ≤^ κ^2 (A)^ ≤^ nκ∞(A) 1 n^2

κ 1 (A) ≤ κ∞(A) ≤ n

2 κ 1 (A)

For any p-norm, we have κ(A) ≥ 1, and matrices with small

conditional number are said to be well-conditioned

Minimum norm least square solution

Theorem

The minimum norm least squares solution to a linear system Ax = b, that

is, the shortest vector x that achieves minx ‖Ax − b‖ is unique, and is

given by

ˆx = V Σ

† U

> b

where

1 /σ 1 0 · · · 0

.. .

1 /σr

The matrix A

† = V Σ

† U

> is the pseudoinverse of A

Minimum norm solution (cont’d)

The optimal y has the following components

yi =

ci σi

for i = 1,... , r

yi = 0 for i = r + 1,... , n

In vector form

y = Σ

† c

Notice there is no other choice for y, which is therefore unique:

minimum residual forces the choice of y 1 ,... , yr , and minimum norm

solution forces the other entries of y

The minimum norm least squares solution is

ˆx = V y = V Σ

† c = V Σ

† U

> b

The residual is

‖Ax − b‖

2 = ‖Σy − c‖

2

∑^ m

i=r +

c

2 i =

∑^ m

i=r +

(u

> i b)

2

which is the projection of b onto the complement of the range of A

Least squares solution of homogeneous linear systems

Consider Ax = 0 or min‖x‖=1 ‖Ax‖ and A = UΣV

> , the solution is

x = α 1 vn−k+1 +... + αk vn

where k is the largest integer such that

σn−k+1 =... = σn, and α

2 1 +^...^ +^ α

2 k = 1

Consider the unit-norm least square solution

‖Ax‖ = ‖UΣV

> x‖ ≡ ‖ΣV

> x‖ = ‖Σy‖

where y = V

> x (note that ‖y‖ = 1)

Thus the unit norm vector y that minimizes

σ

2 1 y^

2 1 +^...^ +^ σ

2 ny^

2 n

which is achieved by concentrating all the mass of y w.r.t smallest σ

y 1 =... = yn−k = 0

and thus x = V y = y 1 v 1 +... + yn−k+1vn−k+1 +... + ynvn and

α 1 = yn−k+1,.. ., αk = yn