Numerical Linear Algebra, Lecture Notes - Mathematics - 2, Study notes of Mathematical Methods for Numerical Analysis and Optimization

Least Square Approximation, Singular Value Decomposition, QR decomposition , Back substitution

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
4 Least-Squares Approximation
We will first deal with the problem of solving an overdetermined system of the form
Ax=b,(5)
where ARm×nand bRmare given and m>n. The latter means that we have more
equations than unknowns, which makes the system overdetermined. Hence, instead, we
may try to change the problem to solving
min
xRn"Axb"2.(6)
For the time being, we will assume that the rank of the matrix rank(A)=dim{Ax:x
Rn}is rank(A)=n, i.e. the matrix has full rank.
Then, to solve (6) we can look for critical points of the function
f(x)="Axb"2
2=xTATAx2bTAx+bTb.
Setting the gradient to zero yields
0=f(x)=2ATAx2ATb=0.
Since rank(A)=n, we also have rank(ATA)=n(as we will prove later), which means
that the matrix ATAis invertible. Thus we have in principle the following result.
Theorem 4.1 (Normal Equations) The solution of the least-squares problem (6) with
a matrix ARm×n, which has full rank, is the unique solution of the Normal Equations
(ATA)x=ATb.
Proof: The solution to the normal equations is indeed a critical point. Furthermore, since
the Hessian of the function fis given by the positive definite matrix ATA, we have a local
minimum, which is actually a global one, since fis a convex function. !
The normal equations are usually not a good way of computing the solution to the least-
squares problem since the involved matrix ATAis usually badly conditioned.
We are now going to investigate two better ways of computing the solution to the least-
squares problem. To this end, the following result concerning orthogonal matrices is ex-
tremely helpful.
Definition 4.2 A matrix QRn×nis called orthogonal if QTQ=I.
Note that an orthogonal matrix is always invertible and that its rows/columns form an
orthonormal basis of Rn.
10
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Numerical Linear Algebra, Lecture Notes - Mathematics - 2 and more Study notes Mathematical Methods for Numerical Analysis and Optimization in PDF only on Docsity!

4 Least-Squares Approximation

We will first deal with the problem of solving an overdetermined system of the form

Ax = b, (5)

where A ∈ R

m×n and b ∈ R

m are given and m > n. The latter means that we have more

equations than unknowns, which makes the system overdetermined. Hence, instead, we

may try to change the problem to solving

min

x∈R n

‖Ax − b‖ 2

For the time being, we will assume that the rank of the matrix rank(A) = dim{Ax : x ∈

R

n } is rank(A) = n, i.e. the matrix has full rank.

Then, to solve (6) we can look for critical points of the function

f (x) = ‖Ax − b‖

2

2

= x

T A

T Ax − 2 b

T Ax + b

T b.

Setting the gradient to zero yields

0 = ∇f (x) = 2A

T Ax − 2 A

T b = 0.

Since rank(A) = n, we also have rank(A

T A) = n (as we will prove later), which means

that the matrix A

T A is invertible. Thus we have in principle the following result.

Theorem 4.1 (Normal Equations) The solution of the least-squares problem (6) with

a matrix A ∈ R

m×n , which has full rank, is the unique solution of the Normal Equations

(A

T A)x = A

T b.

Proof: The solution to the normal equations is indeed a critical point. Furthermore, since

the Hessian of the function f is given by the positive definite matrix A

T A, we have a local

minimum, which is actually a global one, since f is a convex function.!

The normal equations are usually not a good way of computing the solution to the least-

squares problem since the involved matrix A

T A is usually badly conditioned.

We are now going to investigate two better ways of computing the solution to the least-

squares problem. To this end, the following result concerning orthogonal matrices is ex-

tremely helpful.

Definition 4.2 A matrix Q ∈ R

n×n is called orthogonal if Q

T Q = I.

Note that an orthogonal matrix is always invertible and that its rows/columns form an

orthonormal basis of R

n .

Proposition 4.3 Let Q ∈ R

n×n be orthogonal and A ∈ R

n×n then ‖Qx‖ 2 = ‖x‖ 2 for all

x ∈ R

n and ‖QA‖ 2 = ‖A‖ 2.

Proof: We have

‖Qx‖

2

2 = (Qx)

T (Qx) = x

T Q

T Qx = x

T x = ‖x‖

2

2

Similarly,

‖QA‖ 2 = sup

‖x‖ 2 =

‖QAx‖ 2 = sup

‖x‖ 2 =

‖Ax‖ 2 = ‖A‖ 2.

5 Singular Value Decomposition

One way of interpreting the singular value decomposition, which we want to discuss now,

is to find new orthonormal basis vectors in R

m and R

n , which allow to rewrite the linear

mapping A : R

n → R

m in a geometrically simpler form.

To understand it, we need to recall a few facts from linear algebra. We already introduced

the rank of a matrix. We also need the null space

ker(A) = {x ∈ R

n : Ax = 0}

and the intrinsic relations

n = dim ker(A) + rank(A), rank(A) = rank(A

T ).

Note that an element x ∈ ker(A) satisfies Ax = 0 and hence also A

T Ax = 0, i.e. it belongs

to ker(A

T A). If, on the other hand x ∈ ker(A

T A) then A

T Ax = 0, thus 0 = x

T (A

T Ax) =

‖Ax‖

2 2 and hence x ∈ ker(A). This means

ker(A) = ker(A

T A).

However, the dimension formula above immediately gives

rank(A) = rank(A

T ) = rank(A

T A) = rank(AA

T ),

This means that the symmetric and positive semi-definite matrices A

T A ∈ R

n×n and

AA

T ∈ R

m×m have the same rank.

As before, we choose for A

T A a system of eigenvalues λ 1 ≥ λ 2 ≥... ≥ λ n ≥ 0 and

associated eigenvectors v 1 ,... , v n

∈ R

n satisfying A

T Av j = λ j v j and v

T

j

v k = δ jk

. Exactly

the first r = rank(A) eigenvalues are positive and the remaining ones are zero.

According to the considerations we just made, also the matrix AA

T must have exactly r

positive eigenvalues. We will now see that, interestingly, the positive eigenvalues of AA

T

Theorem 5.2 The general solution to the least-squares problem (6) is given by

x =

r ∑

j=

cj

σ j

v j

m ∑

j=r+

α j v j =: x

m ∑

j=r+

α j v j

Here, σ j denotes the singular values of the matrix A and v j are the columns of the matrix V

from the singular value decomposition. Moreover, c = U

T b, and the α j are free to choose.

From all possible solutions, the solution x

is the solution having minimal Euclidean norm.

Proof: The representation of the general form follows from x = V y. The solution x

has

minimal norm, since we have for a general solution that

‖x‖

2

2

= ‖x

2

2

m ∑

j=r+

α

2

j

≥ ‖x

2

2

using the fact that the v j form an orthonormal system.!

Example 5.3 Let us consider the system Ax = b with the following input:

A =

, b =

We can hopefully immediately see that the norm-minimal solution is given by x

= (0, 2)

T

and that the unavoidable error is (

2

  • 3

2 )

1 / 2. Now, we have

A

T A =

so that λ 1 = 4 and λ 2 = 0. Hence, A has the singular value σ 1 = 2. Corresponding

eigenvectors are v 1 = e 2 and v 2 = e 1

. This defines the other eigenvector u 1

1

σ 1

Av 1

T , and hence we can choose u 2 = e 2 and u 3 = e 3. Thus, we have

V =

, U = I =

Furthermore, c = U

T b = b and y 1 = c 1 /2 = 2, y 2 = 0, y 3 = 0, which gives x

= 2v 1

T .

There are efficient methods available to compute the singular value decomposition, like

the Golub-Reinsch bidiagonalisation algorithm. However, we will not discuss them here.

Instead, we will discuss an alternative decomposition, which will be useful in various ways.

6 The QR decomposition

The goal of this section is to prove, for every matrix A ∈ R

m×n , m ≥ n, the existence of a

factorisation of the form

A = QR,

where Q ∈ R

m×m is orthogonal and R ∈ R

m×n is an upper triangular matrix.

Definition 6.1 A matrix R ∈ R

m×n is called upper triangular if rij = 0 for all i > j, i.e.

if

R =

We will do this using elementary reflections.

Definition 6.2 A Householder matrix is a matrix of the form

H(w) = I − 2 ww

T ∈ R

m×m ,

where w ∈ R

m satisfies either ‖w‖ 2 = 1 or w = 0.

Note that:

  1. H(w) is symmetric.
  2. H(w)H(w) = I (check), so that H(w) is orthogonal. Hence, a product of House-

holder matrices is orthogonal.

  1. det(H(w)) = −1 for w )= 0.
  2. Storing H(w) only requires storing the m elements of w.
  3. The general form of a Householder matrix is given by

H(w) = I − 2

ww

T

w

T w

for an arbitrary vector w )= 0. Here, we find both the inner product w

T w ∈ R and

the outer product ww

T ∈ R

m×m .

such that ‖v‖ 2 = 2a

T w. Hence, the application of H(w) to a becomes

H(w)a = a − 2(a

T w)w = a − ‖v‖ 2

v

‖v‖ 2

= a − v = ‖a‖ 2 e 1

Note that we could also project onto −e 1 by using v = a + ‖a‖ 2 e 1

. In general, one should

pick the sign to avoid numerical cancellation, i.e. one should use v = a + sign(a 1 )‖a‖ 2 e 1.

The cost of a matrix-vector multiplication is usually O(m

2 ). Here, however, we can com-

pute the product between a Householder matrix and a vector in O(m) time. To compute

H(w)a = a − 2(a

T w)w, we first compute the number λ = a

T w in O(m) time, then multi-

ply this number with each component of w to form the vector λw and finally we subtract

twice this vector from the vector a. All of this can be done in O(m) time.

Theorem 6.4 For every A ∈ R

m×m , m ≥ n, there exist an orthogonal matrix Q ∈ R

m×m

and an upper triangular matrix R ∈ R

m×n such that A = QR.

Proof: Let a 1

∈ R

m be the first column of A. We find a first Householder matrix H 1

R

m×m such that

H

1

A =

α 1

. A

1

with an (m − 1) × (m − 1) matrix A 1. Considering the first column a˜ 2 ∈ R

m− 1 of this

matrix, we can find a second Householder matrix

H

2

∈ R

(m−1)×(m−1) such that

H

2 a˜ 2

α 2 e 1

∈ R

m− 1

. The matrix

H

2

H

2

is easily seen to be orthogonal and we derive

H

2

H

1

A =

α 1 ∗... ∗

0 α 2

. A 2

and we can proceed in this form to establish

H

n− 1

H

n− 2

· · · H

2

H

1

A = R,

where R is an upper triangular matrix. Since all H j are orthogonal so is there product and

also their inverse.!

Note that the proof is actually constructive and describes a numerical method for deriving

the QR factorisation of a matrix A.

Returning once again to our least-squares problem (6), we see that we can now produce a

solution by noting that

‖Ax − b‖

2

2

= ‖QRx − b‖

2

2

= ‖Rx − c‖

2

2

R

1

x −

c 1

c 2

2

= ‖R

1 x − c 1

2

2

  • ‖c 2

2

2

where we have set c = Q

T b ∈ R

m and where R 1 ∈ R

n×n denotes the upper part of the

upper triangular matrix R.

The solution to the least-squares problem is therefore given by the solution of

R

1 x = c 1

where R 1

∈ R

n×n is a square upper-triangular matrix. This system can be solved quite

easily, as we will discuss in the next section.

7 Back Substitution

We are dealing now with the following problem. We seek a solution x ∈ R

n of the equation

Rx = b, (7)

where b ∈ R

n is given and R ∈ R

n×n is an upper triangular matrix.

The system has a unique solution if the determinant of R is different from zero. Since

we have an upper triangular matrix, the determinant equals the product of all diagonal

entries. Hence, R is invertible if rii )= 0 for 1 ≤ i ≤ n.

Assuming this, we see that the last equation of (7) is simply given by

rnnxn = bn,

which resolves to x n = b n /r nn

. Next, the equation from row n − 1 is

rn− 1 ,n− 1 xn− 1 + rn− 1 ,nxn = bn− 1.

But we already know x n

. Hence, we can compute x n− 1 = (b n− 1 − r n− 1 ,n x n )/r n− 1 ,n− 1

The general algorithm for computing the solution in this situation is called back substitu-

tion.