Linear Algebra, Lecture Notes - Mathematics - 6, Study notes of Mathematics

Linear Squares, Solution Useful Properties of Matrix, Orthogonal Basis, QR factorization, Operation count and robustness of QR

Typology: Study notes

2010/2011

Uploaded on 09/07/2011

andreasge
andreasge 🇬🇧

4.2

(12)

236 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
IB Paper 7: Linear Algebra Handout 6
Tom Hynes
7. Least Squares Solution of Ax = b and QR factorisation
Suppose we have carried out an experiment, in which
the parameter b has been measured at different times t,
b = 0.25 at t = 1
b = 1.0 at t = 0
b = 1.25 at t = 1
b = 3.50 at t = 2
and that we are seeking to fit a relationship to the
data:-
DtCb
+
=
or for a quadratic fit
2
b C Dt Et
= + +
where the constants C, D and E are to be found.
In matrix form the linear case is general case general quadratic case
1 1 0.25
1 0 1
1 1 1.25
1 2 3.5
C
D
=
1
2
1
2
1
1
... ..
.
1
.
.
m
m
b
b
C
D
t
b
t
t
=
2
1 1
1
2
2
2 2
2
1
1
...
... ... ...
1
m
m m
t t
b
C
b
t t D
E
b
t t
=
A x = b
These equations are obviously inconsistent and there is no way C and D (and E) can be found to
solve all of them. We need, instead, to find
x
which represents in some sense "the best fit".
as close as possible to
A x b
Now the number of columns in A is the number of arbitrary constants in the function used for the
fit, and the number of rows is the number of data points. For least squares problems, then, the
m × n matrix A usually has the following properties:-
(i) m > n (often m >> n)
(ii) the columns of A are independent. (rank of A is n.)
We will assume that (i) and (ii) hold.
b1 = 0.25
b
2
= 1
b3 = 1.25
b4 = 3.5
0
1
1
2
1
2
3
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Linear Algebra, Lecture Notes - Mathematics - 6 and more Study notes Mathematics in PDF only on Docsity!

IB Paper 7: Linear Algebra Handout 6

Tom Hynes

7. Least Squares Solution of Ax = b and QR factorisation

Suppose we have carried out an experiment, in which

the parameter b has been measured at different times t ,

b = 0.25 at t = − 1

b = 1.0 at t = 0

b = 1.25 at t = 1

b = 3.50 at t = 2

and that we are seeking to fit a relationship to the

data:-

b = C + Dt

or for a quadratic fit

2 b = C + Dt + Et

where the constants C , D and E are to be found.

In matrix form the linear case is general case general quadratic case

C
D

1

2

1

2

m m

b

C b

D

t

b

t

t

2

(^1 1 )

2

2 2 2

2

... ... ...^ ...

m m m

t t (^) b

C

t t b D

E

b t t

  ^ 

A x = b

These equations are obviously inconsistent and there is no way C and D (and E ) can be found to

solve all of them. We need, instead, to find x which represents in some sense "the best fit".

A x as close as possible to b

Now the number of columns in A is the number of arbitrary constants in the function used for the

fit, and the number of rows is the number of data points. For least squares problems, then, the

m × n matrix A usually has the following properties:-

(i) m > n (often m >> n )

(ii) the columns of A are independent. (rank of A is n .)

We will assume that (i) and (ii) hold.

b 1 = 0.

b 2

b 3 = 1.

b 4

The least squares solution for x (= x ) minimises

2 Axb = Axb. Axb

and this can be multiplied out and then partial differentiation used to find the minimum. A, perhaps

more intuitive way, is based on geometrical reasoning.

This starts by noting that

A x = 1 1 2 2 3 3

x a + x a + x a + ...

lies in column space, so the nearest point will be at

the end of the “perpendicular” dropped from b onto

column space.

We saw in sections 5.3 that column space and the

left nullspace of A were orthogonal complements.

i.e. that for any vector

col left

b = b + b

where 0 col left

b. b =

So we need to get rid of left

b and just concentrate

on col

b. We can do this by multiplying the original

problem by

t A.

t t A A x A b

col left

t t A b A b

The solution of this is x.

For the specific example described above

A Ax A b

t t = 

C
D

i.e.

C
D

 C = 1 and D = 1

Best fit is b = 1 + t

Were we lucky that AA

t turned out to be invertible/non-singular?

Column space

Left Null Space

b m

A x = b col

b left

orthogonal

A

t y = 0

for y in here

Left Null space of A is

the Null Space of A

T

y in left-null space

t A y 0

7.2 Orthogonal basis of Column Space - the Gram-Schmidt process.

The equation A Ax Ab

t t = is fine, but we have to do quite a lot of work to follow through with

this method when m and n are large (forming AA

t

alone takes ( )

2 2 m − 1 n operations, before we

even set about solving the equation).

The reason for multiplying by

t A is so that we

can remove the part of b that is not in the column

space of A. Another way of doing this is to

project b directly onto column space.

col

b = λ 1

a 1

  • λ 2

a 2

Finding the λ’s, however, is a major exercise.

Because the a ’s are not orthogonal, dotting with

a 1

, etc. doesn’t help,

a 1

. b = λ 1

a 1

. a 1

  • λ 2

a 1

. a 2

If we do this with all of the a ’s we will have a matrix to invert for the λ’s.

Think how much easier this would be if column space was aligned with our co-ordinate directions,

so that, i , j , k , l ... lay in column space (and the other co-ordinate base vectors m , n , ... lay in left-

null space).

We would then write

1 2 3

b = b i + b j + b k + ...

and simply strip off the ones outside column space. Moreover, if we didn’t have them already, we

would generate the coefficients by

1

b = i. b , 2

b = j. b etc.

The Gram-Schmidt procedure is a way of

generating a set of mutually orthogonal unit

vectors (orthogonal + unit = orthonormal) from

an arbitrary set. Armed with these, taking

projections is much easier.

col

b = α 1

q 1

  • α 2

q 2

And to find the α’s, we simply employ

q 1

. b = α 1

, etc

We start with a 1

, a 2

, ... , a n

, the columns of A and derive the q ’s as follows:-

b

b col

a 1

a 2

Column space

b

q 1

q 2

b col

  1. Turn the first one into a unit vector

1

1

1 a

a

q = q 1

is in column space

Remember, the notation means the "length" of an n -dimensional vector

2 2

2

2

1 n

d = d + d + ... + d , generalised in the obvious fashion.

  1. Take a 2

and form q 2

by first subtracting off the bit that's parallel to a 1

and then normalising

2 2 1 2 1

a a q. a q

2

2

2 a

a

q ~

Check

q 2

is in column space because it is a linear combination of a 2

and q 1

  1. Repeat this process for the other a 's.

3 3 1 3 1 2 3 2

a a q. a q q. a q

3

3

3 a

a

q ~

= etc.

Check

q 1 . a  3 = q 1 . a 3 − ( q 1 . a 3 ) q 1. q 1 − ( q 2. a 3 ) q 1 . q 2 = 0  q 1 . q 3 = 0

q 2. a  3 = q 2. a 3 − ( q 1 . a 3 ) q 2. q 1 − ( q 2. a 3 ) q 2. q 2 = 0  q 2. q 3 = 0

Note that, since the columns of A are independent, we never have k

a = 0. So this Gram-Schmidt

orthogonalisation process , will always furnish an orthonormal set of n vectors.

Any vector in the column space can, by definition, be written as a linear combination of the a 's and

so as a linear combination of the q 's. i.e. q 1

, q 2

, ... , q n

is an (orthonormal) basis for the column

space of A.

a 1

a 2

q 2

q 1

a 2

( q 1

.a 2

) q 1

1 2 1 2 (^1 2 ) 1 1

1 2

....

.

= − =

 =

q a q a q a q q 0

q q 0



7.3 QR factorisation of A

If we assemble the three vectors a 1

, a 2

, a 3

from the previous section as the columns of a matrix A ,

and vectors q 1

, q 2

, q 3

as those of a matrix Q , then we have

A =

and

Q

Then

i.e. We have constructed another matrix factorisation

A = Q R

The matrix Q has mutually orthogonal unit vectors and the matrix R is upper triangular.

Before writing down the general form of this factorisation (we have done it for a 3 × 3 one), we can

tidy up the relationship between the a 's and the q 's. You can see that a 3

for example satisfies

3 1 3 1 2 3 2 3

a q a q q a q a

1 3 1 2 3 2 3 3

q a q q a q a q

Taking the dot product with q 3

gives a neater formula for 3

a

3 3 3

q. a = a 

so that

3 1 3 1 2 3 2 3 3 3

a = q. a q + q. a q + q. a q

The general formula is clear

1 1 1 1

a = q. a q

2 1 2 1 2 2 2

a = q. a q + q. a q

3 1 3 1 2 3 2 3 3 3

a = q. a q + q. a q + q. a q

etc.

Has to be this.

's are like , and

, etc.

x y z

x

= + +

 =

q i j k

c i j k

i. c

Writing the Gram-Schmidt process as a relationship between matrices (see Section 2.6) :-

A =

n

a a a 1 2

and Q =

n

q q q 1 2

Then

A =

a (^) 1 a 2 a n

q (^) 1 q 2 q n

1 1 1 2 1 3 1

2 2 2 3 2

3 3 3

n

n

n

n n

q a q a q a q a

q a q a q a

q a q a

q a

i.e. A = Q R

For the general case A is m × n , Q is the same shape as A ( m × n ), R is n × n.

Works for any A , provided rank( A ) = n.

The columns of Q are mutually orthogonal vectors which span the column space of A.

The matrix R is square, upper triangular with non-zero elements down the diagonal. It therefore

has rank n and is invertible. See section 4.2 where we discussed this issue for L.

7.4 The Matrix Q

We met square matrices like Q in Part IA Maths and studied all of their properties. We have to be

careful here, because these matrices are in general rectangular with m > n. It is still true that

Q Q = I

t

since QQ

t

n

n

q q q

q

q

q

1 2

2

1

n n n n

n

n

q q q q q q

q q q q q q

q q q q q q

1 2

2 1 2 2 2

1 1 1 2 1

and the q 's are orthogonal unit vectors.

But this does not imply that Q

− 1 = Q

t

. These matrices are not square ( in general )

Q does not have an inverse ( in general )

Note also that QQI

t unless Q is square.

q i

. q j

= 0 ij

= 1 i = j

n × m (^) m × n

n × n

m × n n × m

m × m

A = Q R

Step 2

Solve R x Qb

t = by back-substitution.

C
D
 ^ ^ 
=  ^  =
  ^ 
 ^ 
C = 1, D = 1

7.6 Operation Count and Robustness of QR

QR factorisation is more costly than LU decomposition (the cost is primarily in the Gram-Schmidt

process). LU is thus preferable for solving sets of consistent equations. For inconsistent equations

(i.e. a genuine least-squares problem), QR is more cost effective than solving A Ax Ab

t t = by LU

decomposition. In addition, the matrix AA

t is often numerically poorly conditioned, so it is a not

a good idea to go via A Ax Ab

t t =.

The Gram-Schmidt process can, for large n , become ill-conditioned (you are finding the q 's by a

process of subtracting a large number of things and then normalising to unity). There are other

ways of finding a Q , but these are beyond the scope of this course.

7.7 Projection onto Column Space

In section 7.1, we showed that col

b , the projection of b onto the column space of A ,

satisfies

( )

1 t t

col

b = A A A A b

i.e. Pb = col

b where the projection matrix P is given by

( )

t

1 t P AAA A

=

This rather complicated expression for P , was the reason

that we developed the QR method.

b

q 1

q 2

b col

There are a number of other applications were it is useful to be able to easily project onto column

space, and the QR decomposition should give us a much simpler expression for this projection.

If we have performed the decomposition A = QR , then

1 t t t t

P = QR R Q QR R Q

1 t t t

= QR R R R Q

1 1 t t t

− − = QRR R R Q

i.e. P

t = QQ

This is as expected (!) since

b col = ( q b q 1. ) 1 + ( q 2 . b q ) 2 + ... +( q n. b ) q n

col

b =

q b

q b

q b

q q q

n

n

2

1

1 2

b

q

q

q

q q q

n

n ... ... ...

...

2

1

1 2

= QQb

t

Key Points from Lecture 6

QR Decomposition

A = QR , where Q us the same shape as A and the columns of Q are orthonormal, and R is

square, upper-triangular and invertible. When m = n and so all matrices are square, Q is an

orthogonal matrix.

Least squares solution of Ax = b using QR

Solve R x Qb

t = by back-substitution.

N.B. Q

t

Q = I P = QQ

t

≠≠≠≠ I

You can now do Examples Paper 7/8 Q1-

R is square and upper triangular

so

1 t 1 and

− − R R exist

T T T T

1 1 2 2 1 1 2 2 3 3

or.. .... ... col n n n n

b = q q b + q q b + + q q b = q q + q q + q q + + q q b