






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Linear Squares, Solution Useful Properties of Matrix, Orthogonal Basis, QR factorization, Operation count and robustness of QR
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Tom Hynes
Suppose we have carried out an experiment, in which
the parameter b has been measured at different times t ,
b = 0.25 at t = − 1
b = 1.0 at t = 0
b = 1.25 at t = 1
b = 3.50 at t = 2
and that we are seeking to fit a relationship to the
data:-
b = C + Dt
or for a quadratic fit
2 b = C + Dt + Et
where the constants C , D and E are to be found.
In matrix form the linear case is general case general quadratic case
1
2
1
2
m m
b
C b
t
b
t
t
2
(^1 1 )
2
2 2 2
2
m m m
t t (^) b
t t b D
b t t
A x = b
These equations are obviously inconsistent and there is no way C and D (and E ) can be found to
solve all of them. We need, instead, to find x which represents in some sense "the best fit".
A x as close as possible to b
Now the number of columns in A is the number of arbitrary constants in the function used for the
fit, and the number of rows is the number of data points. For least squares problems, then, the
m × n matrix A usually has the following properties:-
(i) m > n (often m >> n )
(ii) the columns of A are independent. (rank of A is n .)
We will assume that (i) and (ii) hold.
b 1 = 0.
b 2
b 3 = 1.
b 4
The least squares solution for x (= x ) minimises
2 Ax − b = Ax − b. Ax − b
and this can be multiplied out and then partial differentiation used to find the minimum. A, perhaps
more intuitive way, is based on geometrical reasoning.
This starts by noting that
A x = 1 1 2 2 3 3
x a + x a + x a + ...
lies in column space, so the nearest point will be at
the end of the “perpendicular” dropped from b onto
column space.
We saw in sections 5.3 that column space and the
left nullspace of A were orthogonal complements.
i.e. that for any vector
col left
b = b + b
where 0 col left
b. b =
So we need to get rid of left
b and just concentrate
on col
b. We can do this by multiplying the original
problem by
t A.
t t A A x A b
col left
t t A b A b
The solution of this is x.
For the specific example described above
A Ax A b
t t =
i.e.
C = 1 and D = 1
Best fit is b = 1 + t
Were we lucky that AA
t turned out to be invertible/non-singular?
Column space
Left Null Space
b m
A x = b col
b left
orthogonal
t y = 0
for y in here
Left Null space of A is
the Null Space of A
T
y in left-null space
t A y 0
The equation A Ax Ab
t t = is fine, but we have to do quite a lot of work to follow through with
this method when m and n are large (forming AA
t
2 2 m − 1 n operations, before we
even set about solving the equation).
The reason for multiplying by
t A is so that we
can remove the part of b that is not in the column
space of A. Another way of doing this is to
project b directly onto column space.
col
b = λ 1
a 1
a 2
Finding the λ’s, however, is a major exercise.
Because the a ’s are not orthogonal, dotting with
a 1
, etc. doesn’t help,
a 1
. b = λ 1
a 1
. a 1
a 1
. a 2
If we do this with all of the a ’s we will have a matrix to invert for the λ’s.
Think how much easier this would be if column space was aligned with our co-ordinate directions,
so that, i , j , k , l ... lay in column space (and the other co-ordinate base vectors m , n , ... lay in left-
null space).
We would then write
1 2 3
b = b i + b j + b k + ...
and simply strip off the ones outside column space. Moreover, if we didn’t have them already, we
would generate the coefficients by
1
b = i. b , 2
b = j. b etc.
The Gram-Schmidt procedure is a way of
generating a set of mutually orthogonal unit
vectors (orthogonal + unit = orthonormal) from
an arbitrary set. Armed with these, taking
projections is much easier.
col
b = α 1
q 1
q 2
And to find the α’s, we simply employ
q 1
. b = α 1
, etc
We start with a 1
, a 2
, ... , a n
, the columns of A and derive the q ’s as follows:-
b
b col
a 1
a 2
Column space
b
q 1
q 2
b col
1
1
1 a
a
q = q 1
is in column space
Remember, the notation means the "length" of an n -dimensional vector
2 2
2
2
1 n
d = d + d + ... + d , generalised in the obvious fashion.
and form q 2
by first subtracting off the bit that's parallel to a 1
and then normalising
2 2 1 2 1
a a q. a q
2
2
2 a
a
q ~
Check
q 2
is in column space because it is a linear combination of a 2
and q 1
3 3 1 3 1 2 3 2
a a q. a q q. a q
3
3
3 a
a
q ~
= etc.
Check
Note that, since the columns of A are independent, we never have k
a = 0. So this Gram-Schmidt
orthogonalisation process , will always furnish an orthonormal set of n vectors.
Any vector in the column space can, by definition, be written as a linear combination of the a 's and
so as a linear combination of the q 's. i.e. q 1
, q 2
, ... , q n
is an (orthonormal) basis for the column
space of A.
a 1
a 2
q 2
q 1
a 2
( q 1
.a 2
) q 1
1 2
....
.
= − =
=
q a q a q a q q 0
q q 0
If we assemble the three vectors a 1
, a 2
, a 3
from the previous section as the columns of a matrix A ,
and vectors q 1
, q 2
, q 3
as those of a matrix Q , then we have
and
Then
i.e. We have constructed another matrix factorisation
The matrix Q has mutually orthogonal unit vectors and the matrix R is upper triangular.
Before writing down the general form of this factorisation (we have done it for a 3 × 3 one), we can
tidy up the relationship between the a 's and the q 's. You can see that a 3
for example satisfies
3 1 3 1 2 3 2 3
a q a q q a q a
1 3 1 2 3 2 3 3
q a q q a q a q
Taking the dot product with q 3
gives a neater formula for 3
a
3 3 3
q. a = a
so that
3 1 3 1 2 3 2 3 3 3
a = q. a q + q. a q + q. a q
The general formula is clear
1 1 1 1
a = q. a q
2 1 2 1 2 2 2
a = q. a q + q. a q
3 1 3 1 2 3 2 3 3 3
a = q. a q + q. a q + q. a q
etc.
Has to be this.
's are like , and
, etc.
x y z
x
= + +
=
q i j k
c i j k
i. c
Writing the Gram-Schmidt process as a relationship between matrices (see Section 2.6) :-
n
a a a 1 2
and Q =
n
q q q 1 2
Then
a (^) 1 a 2 a n
q (^) 1 q 2 q n
1 1 1 2 1 3 1
2 2 2 3 2
3 3 3
n
n
n
n n
q a q a q a q a
q a q a q a
q a q a
q a
i.e. A = Q R
For the general case A is m × n , Q is the same shape as A ( m × n ), R is n × n.
Works for any A , provided rank( A ) = n.
The columns of Q are mutually orthogonal vectors which span the column space of A.
The matrix R is square, upper triangular with non-zero elements down the diagonal. It therefore
has rank n and is invertible. See section 4.2 where we discussed this issue for L.
We met square matrices like Q in Part IA Maths and studied all of their properties. We have to be
careful here, because these matrices are in general rectangular with m > n. It is still true that
t
since QQ
n
n
q q q
q
q
q
1 2
2
1
n n n n
n
n
q q q q q q
q q q q q q
q q q q q q
1 2
2 1 2 2 2
1 1 1 2 1
and the q 's are orthogonal unit vectors.
But this does not imply that Q
− 1 = Q
t
. These matrices are not square ( in general )
Q does not have an inverse ( in general )
Note also that QQ ≠ I
t unless Q is square.
q i
. q j
= 0 i ≠ j
= 1 i = j
n × m (^) m × n
n × n
m × n n × m
m × m
Step 2
Solve R x Qb
t = by back-substitution.
QR factorisation is more costly than LU decomposition (the cost is primarily in the Gram-Schmidt
process). LU is thus preferable for solving sets of consistent equations. For inconsistent equations
(i.e. a genuine least-squares problem), QR is more cost effective than solving A Ax Ab
t t = by LU
decomposition. In addition, the matrix AA
t is often numerically poorly conditioned, so it is a not
a good idea to go via A Ax Ab
t t =.
The Gram-Schmidt process can, for large n , become ill-conditioned (you are finding the q 's by a
process of subtracting a large number of things and then normalising to unity). There are other
ways of finding a Q , but these are beyond the scope of this course.
In section 7.1, we showed that col
b , the projection of b onto the column space of A ,
satisfies
( )
1 t t
col
−
b = A A A A b
i.e. Pb = col
b where the projection matrix P is given by
( )
t
1 t P AAA A
−
=
This rather complicated expression for P , was the reason
that we developed the QR method.
b
q 1
q 2
b col
There are a number of other applications were it is useful to be able to easily project onto column
space, and the QR decomposition should give us a much simpler expression for this projection.
If we have performed the decomposition A = QR , then
1 t t t t
−
P = QR R Q QR R Q
1 t t t
−
= QR R R R Q
1 1 t t t
− − = QRR R R Q
i.e. P
t = QQ
This is as expected (!) since
col
b =
q b
q b
q b
q q q
n
n
2
1
1 2
b
q
q
q
q q q
n
n ... ... ...
...
2
1
1 2
= QQb
t
A = QR , where Q us the same shape as A and the columns of Q are orthonormal, and R is
square, upper-triangular and invertible. When m = n and so all matrices are square, Q is an
orthogonal matrix.
Solve R x Qb
t = by back-substitution.
t
t
R is square and upper triangular
so
1 t 1 and
− − R R exist
T T T T
1 1 2 2 1 1 2 2 3 3
or.. .... ... col n n n n
b = q q b + q q b + + q q b = q q + q q + q q + + q q b