





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Least Square Approximation, Singular Value Decomposition, QR decomposition , Back substitution
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






We will first deal with the problem of solving an overdetermined system of the form
Ax = b, (5)
where A ∈ R
m×n and b ∈ R
m are given and m > n. The latter means that we have more
equations than unknowns, which makes the system overdetermined. Hence, instead, we
may try to change the problem to solving
min
x∈R n
‖Ax − b‖ 2
For the time being, we will assume that the rank of the matrix rank(A) = dim{Ax : x ∈
n } is rank(A) = n, i.e. the matrix has full rank.
Then, to solve (6) we can look for critical points of the function
f (x) = ‖Ax − b‖
2
2
= x
T A
T Ax − 2 b
T Ax + b
T b.
Setting the gradient to zero yields
0 = ∇f (x) = 2A
T Ax − 2 A
T b = 0.
Since rank(A) = n, we also have rank(A
T A) = n (as we will prove later), which means
that the matrix A
T A is invertible. Thus we have in principle the following result.
Theorem 4.1 (Normal Equations) The solution of the least-squares problem (6) with
a matrix A ∈ R
m×n , which has full rank, is the unique solution of the Normal Equations
T A)x = A
T b.
Proof: The solution to the normal equations is indeed a critical point. Furthermore, since
the Hessian of the function f is given by the positive definite matrix A
T A, we have a local
minimum, which is actually a global one, since f is a convex function.!
The normal equations are usually not a good way of computing the solution to the least-
squares problem since the involved matrix A
T A is usually badly conditioned.
We are now going to investigate two better ways of computing the solution to the least-
squares problem. To this end, the following result concerning orthogonal matrices is ex-
tremely helpful.
Definition 4.2 A matrix Q ∈ R
n×n is called orthogonal if Q
T Q = I.
Note that an orthogonal matrix is always invertible and that its rows/columns form an
orthonormal basis of R
n .
Proposition 4.3 Let Q ∈ R
n×n be orthogonal and A ∈ R
n×n then ‖Qx‖ 2 = ‖x‖ 2 for all
x ∈ R
n and ‖QA‖ 2 = ‖A‖ 2.
Proof: We have
‖Qx‖
2
2 = (Qx)
T (Qx) = x
T Q
T Qx = x
T x = ‖x‖
2
2
Similarly,
‖QA‖ 2 = sup
‖x‖ 2 =
‖QAx‖ 2 = sup
‖x‖ 2 =
‖Ax‖ 2 = ‖A‖ 2.
One way of interpreting the singular value decomposition, which we want to discuss now,
is to find new orthonormal basis vectors in R
m and R
n , which allow to rewrite the linear
mapping A : R
n → R
m in a geometrically simpler form.
To understand it, we need to recall a few facts from linear algebra. We already introduced
the rank of a matrix. We also need the null space
ker(A) = {x ∈ R
n : Ax = 0}
and the intrinsic relations
n = dim ker(A) + rank(A), rank(A) = rank(A
T ).
Note that an element x ∈ ker(A) satisfies Ax = 0 and hence also A
T Ax = 0, i.e. it belongs
to ker(A
T A). If, on the other hand x ∈ ker(A
T A) then A
T Ax = 0, thus 0 = x
T (A
T Ax) =
‖Ax‖
2 2 and hence x ∈ ker(A). This means
ker(A) = ker(A
T A).
However, the dimension formula above immediately gives
rank(A) = rank(A
T ) = rank(A
T A) = rank(AA
T ),
This means that the symmetric and positive semi-definite matrices A
T A ∈ R
n×n and
T ∈ R
m×m have the same rank.
As before, we choose for A
T A a system of eigenvalues λ 1 ≥ λ 2 ≥... ≥ λ n ≥ 0 and
associated eigenvectors v 1 ,... , v n
n satisfying A
T Av j = λ j v j and v
T
j
v k = δ jk
. Exactly
the first r = rank(A) eigenvalues are positive and the remaining ones are zero.
According to the considerations we just made, also the matrix AA
T must have exactly r
positive eigenvalues. We will now see that, interestingly, the positive eigenvalues of AA
T
Theorem 5.2 The general solution to the least-squares problem (6) is given by
x =
r ∑
j=
cj
σ j
v j
m ∑
j=r+
α j v j =: x
m ∑
j=r+
α j v j
Here, σ j denotes the singular values of the matrix A and v j are the columns of the matrix V
from the singular value decomposition. Moreover, c = U
T b, and the α j are free to choose.
From all possible solutions, the solution x
is the solution having minimal Euclidean norm.
Proof: The representation of the general form follows from x = V y. The solution x
has
minimal norm, since we have for a general solution that
‖x‖
2
2
= ‖x
‖
2
2
m ∑
j=r+
α
2
j
≥ ‖x
‖
2
2
using the fact that the v j form an orthonormal system.!
Example 5.3 Let us consider the system Ax = b with the following input:
, b =
We can hopefully immediately see that the norm-minimal solution is given by x
= (0, 2)
T
and that the unavoidable error is (
2
2 )
1 / 2. Now, we have
T A =
so that λ 1 = 4 and λ 2 = 0. Hence, A has the singular value σ 1 = 2. Corresponding
eigenvectors are v 1 = e 2 and v 2 = e 1
. This defines the other eigenvector u 1
1
σ 1
Av 1
T , and hence we can choose u 2 = e 2 and u 3 = e 3. Thus, we have
Furthermore, c = U
T b = b and y 1 = c 1 /2 = 2, y 2 = 0, y 3 = 0, which gives x
= 2v 1
T .
There are efficient methods available to compute the singular value decomposition, like
the Golub-Reinsch bidiagonalisation algorithm. However, we will not discuss them here.
Instead, we will discuss an alternative decomposition, which will be useful in various ways.
The goal of this section is to prove, for every matrix A ∈ R
m×n , m ≥ n, the existence of a
factorisation of the form
where Q ∈ R
m×m is orthogonal and R ∈ R
m×n is an upper triangular matrix.
Definition 6.1 A matrix R ∈ R
m×n is called upper triangular if rij = 0 for all i > j, i.e.
if
We will do this using elementary reflections.
Definition 6.2 A Householder matrix is a matrix of the form
H(w) = I − 2 ww
T ∈ R
m×m ,
where w ∈ R
m satisfies either ‖w‖ 2 = 1 or w = 0.
Note that:
holder matrices is orthogonal.
H(w) = I − 2
ww
T
w
T w
for an arbitrary vector w )= 0. Here, we find both the inner product w
T w ∈ R and
the outer product ww
T ∈ R
m×m .
such that ‖v‖ 2 = 2a
T w. Hence, the application of H(w) to a becomes
H(w)a = a − 2(a
T w)w = a − ‖v‖ 2
v
‖v‖ 2
= a − v = ‖a‖ 2 e 1
Note that we could also project onto −e 1 by using v = a + ‖a‖ 2 e 1
. In general, one should
pick the sign to avoid numerical cancellation, i.e. one should use v = a + sign(a 1 )‖a‖ 2 e 1.
The cost of a matrix-vector multiplication is usually O(m
2 ). Here, however, we can com-
pute the product between a Householder matrix and a vector in O(m) time. To compute
H(w)a = a − 2(a
T w)w, we first compute the number λ = a
T w in O(m) time, then multi-
ply this number with each component of w to form the vector λw and finally we subtract
twice this vector from the vector a. All of this can be done in O(m) time.
Theorem 6.4 For every A ∈ R
m×m , m ≥ n, there exist an orthogonal matrix Q ∈ R
m×m
and an upper triangular matrix R ∈ R
m×n such that A = QR.
Proof: Let a 1
m be the first column of A. We find a first Householder matrix H 1
m×m such that
1
α 1
1
with an (m − 1) × (m − 1) matrix A 1. Considering the first column a˜ 2 ∈ R
m− 1 of this
matrix, we can find a second Householder matrix
2
(m−1)×(m−1) such that
2 a˜ 2
α 2 e 1
m− 1
. The matrix
2
2
is easily seen to be orthogonal and we derive
2
1
α 1 ∗... ∗
0 α 2
and we can proceed in this form to establish
n− 1
n− 2
2
1
where R is an upper triangular matrix. Since all H j are orthogonal so is there product and
also their inverse.!
Note that the proof is actually constructive and describes a numerical method for deriving
the QR factorisation of a matrix A.
Returning once again to our least-squares problem (6), we see that we can now produce a
solution by noting that
‖Ax − b‖
2
2
= ‖QRx − b‖
2
2
= ‖Rx − c‖
2
2
1
x −
c 1
c 2
2
1 x − c 1
2
2
2
2
where we have set c = Q
T b ∈ R
m and where R 1 ∈ R
n×n denotes the upper part of the
upper triangular matrix R.
The solution to the least-squares problem is therefore given by the solution of
1 x = c 1
where R 1
n×n is a square upper-triangular matrix. This system can be solved quite
easily, as we will discuss in the next section.
We are dealing now with the following problem. We seek a solution x ∈ R
n of the equation
Rx = b, (7)
where b ∈ R
n is given and R ∈ R
n×n is an upper triangular matrix.
The system has a unique solution if the determinant of R is different from zero. Since
we have an upper triangular matrix, the determinant equals the product of all diagonal
entries. Hence, R is invertible if rii )= 0 for 1 ≤ i ≤ n.
Assuming this, we see that the last equation of (7) is simply given by
rnnxn = bn,
which resolves to x n = b n /r nn
. Next, the equation from row n − 1 is
rn− 1 ,n− 1 xn− 1 + rn− 1 ,nxn = bn− 1.
But we already know x n
. Hence, we can compute x n− 1 = (b n− 1 − r n− 1 ,n x n )/r n− 1 ,n− 1
The general algorithm for computing the solution in this situation is called back substitu-
tion.