





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Introduction, Computational Cost, Norms Stability and condition number
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






1 Introduction
In this course, we are concerned with the following two questions:
When A is 3 × 3, this should be o.k. But what if A is 10^6 × 106 ???? To be more precise, we have to find algorithms to solve these problems and we have to analyse these algorithms. For example, we need to know :
We will use the notation, A ∈ Rm×n, this is a m × n (m rows, n columns) matrix with real valued entries,
A := {aij } :=
a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n · · am 1 am 2 · · · amn
We will also denote the columns of a matrix A ∈ Rm×n^ by aj , j = 1, 2 ,... , n. Hence, we have A :=
a 1 a 2 · · · an
2 Computational Cost
Before developing algorithms to solve linear equations, we will introduce concepts to analyse the cost of such algorithms and their stability. The multiplication of a matrix A ∈ Rm×n^ with a vector x ∈ Rn^ results into a vector b = Ax ∈ Rm^ with components
bi = (Ax)i =
∑^ n
j=
aij xj , 1 ≤ i ≤ m.
space. When developing algorithms it is important to consider both resources, time and space, and it might sometimes be necessary to sacrifice something of one resource to gain in the other. It is now easy to see that for the matrix-matrix multiplication C = AB, we would require
time(Ax) = O(mnp).
Hence, for square systems with m = n = p, doubling the input size results in eight times longer computations! Finally, let us see how additional information can be used to reduce the cost. If, for example, we have a tridiagonal matrix, i.e. a matrix which has only nonzero entries on the diagonal and the sub- and super-diagonal, i.e. which is of the form
where ∗ denotes the only non-zero entries, then a matrix-vector multiplication reduces to
(Ax)i =
∑^ n
j=
aij xj = ai,i− 1 xi− 1 + aiixi + ai,i+1xi+1.
Hence the time for the full matrix-vector product is now O(n). Also the matrix can be stored by exploiting its special form.
3 Norms, Stability, Condition Number
We will still stick to the example of matrix-vector multiplication. This time, however, our focus will be on what happens if our given data is inexact, for example, if it contains noise from measurements or if the representation of the real numbers on the computer causes rounding errors. For simplicity, we will still assume that the matrix A ∈ Rm×n^ is exactly given but that the vector x is given as something of the form x + ∆x, where ∆x is a vector with hopefully small entries representing the error. Hence, instead of computing b := Ax we are computing (b + ∆b) = A(x + ∆x) and we have to analyse the error ∆b in the output. Before we can do this, we have to introduce a theoretical machinery, which helps us in doing this.
Definition 2 Let V be a vector space. A function ‖ · ‖ : V → R is called a norm on V if for all x, y ∈ V and α ∈ R we have that
A unit vector with respect to ‖ · ‖ is a vector x ∈ V such that ‖x‖ = 1.
Here is a list of the most important vector norms for x ∈ Rn
∑n i=1 |xi|,
∑n i=1 |xi|
∑n i=1 |xi| p)^1 /p, for 1 ≤ p < ∞,
Example 3 The unit ball with respect to the ‖ · ‖p norm is defined to be the set Bp := {x : ‖x‖p ≤ 1 }. For R^2 the next figure shows B 1 , B 2 , B∞.
A norm allows us to compute the “length” of a vector. Though we are given different norms on Rn, for example, the following remark shows that these do not differ too much.
Remark 4 On Rn^ all norms are equivalent, i.e. for two norms ‖ · ‖ and ‖ · ‖∗ on Rn, there are two constants c 1 , c 2 > 0 such that
c 1 ‖x‖ ≤ ‖x‖∗ ≤ c 2 ‖x‖, x ∈ Rn.
For example,
‖x‖ 2 ≤ ‖x‖ 1 ≤
n‖x‖ 2 ‖x‖∞ ≤ ‖x‖ 2 ≤
n‖x‖∞ ‖x‖∞ ≤ ‖x‖ 1 ≤ n‖x‖∞.
Theorem 6 The induced matrix norm is indeed a norm.
In many cases, we will suppress index notation and simply also write ‖A‖, when the chosen vector-norm is not relevant. Coming back to our example, we immediately see that the constant c 2 can be chosen to be c 2 = ‖A‖. For the constant c 1 note that
‖x‖ = ‖A−^1 Ax‖ ≤ ‖A−^1 ‖‖Ax‖.
Hence, c 1 can be chosen to be c 1 = 1/‖A‖. Thus (1) becomes
‖∆b‖ ‖b‖
‖∆x‖ ‖x‖
Definition 7 The number κ(A) := ‖A‖‖A−^1 ‖ is called the condition number of the matrix A.
Hence, our conclusion so far is that a large condition number of a matrix A is extremely problematic when it comes to propagating input or rounding errors. The induced norms, we are mostly interested in, are given by the standard ℓp vector norms defined previously. Hence, for 1 ≤ p ≤ ∞ we define the matrix induced p-norms by
‖A‖p = sup x 6 =
‖Ax‖p ‖x‖p
This expression is of course only of limited use if we cannot really compute the matrix norms explicitly. Fortunately, this is in the most important cases possible. To state it we need to remember a few things from linear algebra.
Definition 8 A square matrix A ∈ Rn×n^ is called symmetric if AT^ = A, i.e. if aij = aji. The matrix is called positive semi-definite if for all x ∈ Rn, we have
xT^ Ax =
∑^ n
i=
∑^ n
j=
xixj aij ≥ 0.
It is called positive definite if the above expression is positive for all x 6 = 0.
If a matrix A ∈ Rn×n^ is symmetric and positive semi-definite then there there is an orthonormal basis of Rn^ consisting of eigenvectors of A. This means that there are real values λ 1 ,... , λn and vectors z 1 ,... , zn, which satisfy
Azj = λj zj , zTj zk = δjk.
Finally, for a matrix A ∈ Rm×n, the matrix AT^ A ∈ Rn×n^ is positive semi-definite, simply because of xT^ (AT^ A)x = (Ax)T^ (Ax) = ‖Ax‖^22 ≥ 0.
Theorem 9 Let A ∈ Rn^ and let λ 1 > 0 be the largest eigenvalue of AT^ A. Then
‖A‖ 1 = max 1 ≤j≤n
∑m
i=
|aij |, (2)
λ 1 , (3)
‖A‖∞ = max 1 ≤i≤m
∑n
j=
|aij |. (4)
Proof: Consider the 1-norm of a matrix A ∈ Rm×n^ , let x ∈ Rn, then
‖Ax‖ 1 =
∑^ m
i=
|(Ax)i| =
∑^ m
i=
∑^ n
j=
aij xj
∑^ m
i=
∑^ n
j=
|aij ||xj |
∑^ n
j=
( (^) m ∑
i=
|aij |
|xj |
max 1 ≤j≤n
∑m
i=
|aij |
) (^) n ∑
j=
|xj |
=: C‖x‖ 1 ,
showing that ‖A‖ 1 ≤ C. Next, let us pick an index k such that C =
∑n i=1 |aik|^ and x = ek the kth unit vector. Then, we have ‖x‖ 1 = 1 and
‖Ax‖ 1 = ‖Aek‖ 1 =
∑^ m
i=
|(Aek)i| =
∑^ m
i=
|aik| = C,
which shows ‖A‖ 1 = C. In the case of the 2-norm of a matrix A ∈ Rm×n, we have
‖Ax‖^22 = xT^ AT^ Ax.
Since, AT^ A is symmetric and positive semi-definite there are n orthonormal eigenvectors, {zi}ni=1 corresponding to the real, non-negative eigenvalues λ 1 ≥ λ 2 ≥... ≥ λn ≥ 0. Let x = α 1 z 1 + · · · + αnzn so that ‖x‖^22 = xT^ x = |α 1 |^2 + · · · + |αn|^2. Furthermore,
xT^ AT^ Ax = λ 1 |α 1 |^2 + · · · + λn|αn|^2 ≤ λ 1 xT^ x.
Hence, ‖A‖^22 ≤ λ 1. Finally, choosing x = z 1 gives equality. Therefore, ‖A‖^22 = λ 1. The case of the ∞-norm is an exercise.