Numerical Linear Algebra, Lecture Notes - Mathematics - 1, Study notes of Mathematical Methods for Numerical Analysis and Optimization

Introduction, Computational Cost, Norms Stability and condition number

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

luber-1
luber-1 🇬🇧

4.8

(12)

293 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Numerical Linear Algebra
Holger Wendland
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Numerical Linear Algebra, Lecture Notes - Mathematics - 1 and more Study notes Mathematical Methods for Numerical Analysis and Optimization in PDF only on Docsity!

Numerical Linear Algebra

Holger Wendland

1 Introduction

In this course, we are concerned with the following two questions:

  1. How do we find/approximate the solution, x, of the linear system of equations Ax = b?
  2. How do we find/approximate the eigenvalue, λ, and eigenvector, x, of the matrix A, i.e. Ax = λx?

When A is 3 × 3, this should be o.k. But what if A is 10^6 × 106 ???? To be more precise, we have to find algorithms to solve these problems and we have to analyse these algorithms. For example, we need to know :

  1. How expensive is the algorithm? How much time (and space) does it require to solve the problem?
  2. How stable is the algorithm? If we slightly change the input, i.e. the matrix A and/or the right-hand side b, how does the solution change?
  3. Can we exploit the structure of the matrix A, if it has a special structure?

We will use the notation, A ∈ Rm×n, this is a m × n (m rows, n columns) matrix with real valued entries,

A := {aij } :=

a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n · · am 1 am 2 · · · amn

We will also denote the columns of a matrix A ∈ Rm×n^ by aj , j = 1, 2 ,... , n. Hence, we have A :=

a 1 a 2 · · · an

2 Computational Cost

Before developing algorithms to solve linear equations, we will introduce concepts to analyse the cost of such algorithms and their stability. The multiplication of a matrix A ∈ Rm×n^ with a vector x ∈ Rn^ results into a vector b = Ax ∈ Rm^ with components

bi = (Ax)i =

∑^ n

j=

aij xj , 1 ≤ i ≤ m.

space. When developing algorithms it is important to consider both resources, time and space, and it might sometimes be necessary to sacrifice something of one resource to gain in the other. It is now easy to see that for the matrix-matrix multiplication C = AB, we would require

time(Ax) = O(mnp).

Hence, for square systems with m = n = p, doubling the input size results in eight times longer computations! Finally, let us see how additional information can be used to reduce the cost. If, for example, we have a tridiagonal matrix, i.e. a matrix which has only nonzero entries on the diagonal and the sub- and super-diagonal, i.e. which is of the form

A =

where ∗ denotes the only non-zero entries, then a matrix-vector multiplication reduces to

(Ax)i =

∑^ n

j=

aij xj = ai,i− 1 xi− 1 + aiixi + ai,i+1xi+1.

Hence the time for the full matrix-vector product is now O(n). Also the matrix can be stored by exploiting its special form.

3 Norms, Stability, Condition Number

We will still stick to the example of matrix-vector multiplication. This time, however, our focus will be on what happens if our given data is inexact, for example, if it contains noise from measurements or if the representation of the real numbers on the computer causes rounding errors. For simplicity, we will still assume that the matrix A ∈ Rm×n^ is exactly given but that the vector x is given as something of the form x + ∆x, where ∆x is a vector with hopefully small entries representing the error. Hence, instead of computing b := Ax we are computing (b + ∆b) = A(x + ∆x) and we have to analyse the error ∆b in the output. Before we can do this, we have to introduce a theoretical machinery, which helps us in doing this.

Definition 2 Let V be a vector space. A function ‖ · ‖ : V → R is called a norm on V if for all x, y ∈ V and α ∈ R we have that

  1. ‖x‖ ≥ 0 , and ‖x‖ = 0 if and only if x = 0 ,
  1. ‖x + y‖ ≤ ‖x‖ + ‖y‖, triangle inequality.
  2. ‖αx‖ = |α|‖x‖.

A unit vector with respect to ‖ · ‖ is a vector x ∈ V such that ‖x‖ = 1.

Here is a list of the most important vector norms for x ∈ Rn

  1. ‖x‖ 1 :=

∑n i=1 |xi|,

  1. ‖x‖ 2 := (

∑n i=1 |xi|

2 )^1 /^2 ,

  1. ‖x‖p := (

∑n i=1 |xi| p)^1 /p, for 1 ≤ p < ∞,

  1. ‖x‖∞ := max 1 ≤i≤n |xi|,

Example 3 The unit ball with respect to the ‖ · ‖p norm is defined to be the set Bp := {x : ‖x‖p ≤ 1 }. For R^2 the next figure shows B 1 , B 2 , B∞.

B∞

B 2

B 1

A norm allows us to compute the “length” of a vector. Though we are given different norms on Rn, for example, the following remark shows that these do not differ too much.

Remark 4 On Rn^ all norms are equivalent, i.e. for two norms ‖ · ‖ and ‖ · ‖∗ on Rn, there are two constants c 1 , c 2 > 0 such that

c 1 ‖x‖ ≤ ‖x‖∗ ≤ c 2 ‖x‖, x ∈ Rn.

For example,

‖x‖ 2 ≤ ‖x‖ 1 ≤

n‖x‖ 2 ‖x‖∞ ≤ ‖x‖ 2 ≤

n‖x‖∞ ‖x‖∞ ≤ ‖x‖ 1 ≤ n‖x‖∞.

Theorem 6 The induced matrix norm is indeed a norm.

In many cases, we will suppress index notation and simply also write ‖A‖, when the chosen vector-norm is not relevant. Coming back to our example, we immediately see that the constant c 2 can be chosen to be c 2 = ‖A‖. For the constant c 1 note that

‖x‖ = ‖A−^1 Ax‖ ≤ ‖A−^1 ‖‖Ax‖.

Hence, c 1 can be chosen to be c 1 = 1/‖A‖. Thus (1) becomes

‖∆b‖ ‖b‖

≤ ‖A‖‖A−^1 ‖

‖∆x‖ ‖x‖

Definition 7 The number κ(A) := ‖A‖‖A−^1 ‖ is called the condition number of the matrix A.

Hence, our conclusion so far is that a large condition number of a matrix A is extremely problematic when it comes to propagating input or rounding errors. The induced norms, we are mostly interested in, are given by the standard ℓp vector norms defined previously. Hence, for 1 ≤ p ≤ ∞ we define the matrix induced p-norms by

‖A‖p = sup x 6 =

‖Ax‖p ‖x‖p

This expression is of course only of limited use if we cannot really compute the matrix norms explicitly. Fortunately, this is in the most important cases possible. To state it we need to remember a few things from linear algebra.

Definition 8 A square matrix A ∈ Rn×n^ is called symmetric if AT^ = A, i.e. if aij = aji. The matrix is called positive semi-definite if for all x ∈ Rn, we have

xT^ Ax =

∑^ n

i=

∑^ n

j=

xixj aij ≥ 0.

It is called positive definite if the above expression is positive for all x 6 = 0.

If a matrix A ∈ Rn×n^ is symmetric and positive semi-definite then there there is an orthonormal basis of Rn^ consisting of eigenvectors of A. This means that there are real values λ 1 ,... , λn and vectors z 1 ,... , zn, which satisfy

Azj = λj zj , zTj zk = δjk.

Finally, for a matrix A ∈ Rm×n, the matrix AT^ A ∈ Rn×n^ is positive semi-definite, simply because of xT^ (AT^ A)x = (Ax)T^ (Ax) = ‖Ax‖^22 ≥ 0.

Theorem 9 Let A ∈ Rn^ and let λ 1 > 0 be the largest eigenvalue of AT^ A. Then

‖A‖ 1 = max 1 ≤j≤n

∑m

i=

|aij |, (2)

‖A‖ 2 =

λ 1 , (3)

‖A‖∞ = max 1 ≤i≤m

∑n

j=

|aij |. (4)

Proof: Consider the 1-norm of a matrix A ∈ Rm×n^ , let x ∈ Rn, then

‖Ax‖ 1 =

∑^ m

i=

|(Ax)i| =

∑^ m

i=

∑^ n

j=

aij xj

∑^ m

i=

∑^ n

j=

|aij ||xj |

∑^ n

j=

( (^) m ∑

i=

|aij |

|xj |

max 1 ≤j≤n

∑m

i=

|aij |

) (^) n ∑

j=

|xj |

=: C‖x‖ 1 ,

showing that ‖A‖ 1 ≤ C. Next, let us pick an index k such that C =

∑n i=1 |aik|^ and x = ek the kth unit vector. Then, we have ‖x‖ 1 = 1 and

‖Ax‖ 1 = ‖Aek‖ 1 =

∑^ m

i=

|(Aek)i| =

∑^ m

i=

|aik| = C,

which shows ‖A‖ 1 = C. In the case of the 2-norm of a matrix A ∈ Rm×n, we have

‖Ax‖^22 = xT^ AT^ Ax.

Since, AT^ A is symmetric and positive semi-definite there are n orthonormal eigenvectors, {zi}ni=1 corresponding to the real, non-negative eigenvalues λ 1 ≥ λ 2 ≥... ≥ λn ≥ 0. Let x = α 1 z 1 + · · · + αnzn so that ‖x‖^22 = xT^ x = |α 1 |^2 + · · · + |αn|^2. Furthermore,

xT^ AT^ Ax = λ 1 |α 1 |^2 + · · · + λn|αn|^2 ≤ λ 1 xT^ x.

Hence, ‖A‖^22 ≤ λ 1. Finally, choosing x = z 1 gives equality. Therefore, ‖A‖^22 = λ 1. The case of the ∞-norm is an exercise.