
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These lecture slides are very easy to understand and very helpful to built a concept about the Matrix computation.The key points discuss in these slides are:Conjugate Gradient, Convergence Rate, Optimality of Conjugate Gradients, Arbitrary Point, Monotonicity Property, Optimization Algorithm, Nonlinear Equation, Quadratic Function, Lanczos Iteration, Ritz Values
Typology: Slides
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















Conjugate gradient Convergence rate of conjugate gradient Preconditioning
An inner product of rn and a vector in Kn is zero A crucial property that makes CG powerful It implies that ‖e‖^2 A = e> n Aen + (∆x)>A(∆x) Only the second term depends on ∆x and since A is positive definite, the first term is larger or equal to 0 The second term is 0 if and only if ∆x = 0 , i.e., xn = x Thus ‖e‖A is minimal if and only if xn = x as claimed The monotonicity property is a consequence of the inclusion Kn ⊆ Kn+1, and since Kn is a subset of IRm^ of dimension n as long as convergence has not yet been achieved, convergence must be achieved in at most m steps That is, each step of conjugate direction cuts down the error term component by component
The guarantee that the CG iteration converges in at most m steps is void in floating point arithmetic For arbitrary matrices A on a real computer, no decisive reduction in ‖en‖A will necessarily be observed at all when n = m In practice, however, CG is used not for arbitrary matrices but for matrices whose spectra are well behaved (partially due to preconditioning) that convergence to a desired accuracy is achieved for n m The theoretical exact convergence at n = m has no relevance to this use of the CG iteration in scientific computing
Cannot use ‖e‖A or ‖e‖^2 A as neither can be evaluated without knowing x∗ On the other hand, given A and b and x ∈ IRm, the quantity
φ(x) =
x>Ax − x>b
can certainly be evaluated as
‖en‖^2 A = e> n Aen = (x∗ − xn)>A(x∗ − xn) = x> n Axn − 2 x> n Ax∗ + x>∗ Ax∗ = x> n Axn − 2 x> n b + x>∗ b = 2 φ(xn) + constant
Like ‖e‖^2 A, it must achieve its minimum uniquely at x = x∗
The CG iteration can be interpreted as an iterative process for minimizing the quadratic function φ(x) of x ∈ IRm At each step, an iterate xn = xn− 1 + αnpn− 1 is computed that minimizes φ(x) over all x in the one dimensional space xn− 1 + 〈pn− 1 〉 It can be readily confirmed that the formula
αn =
r> n− 1 rn− 1 p> n− 1 Apn− 1
ensures αn is optimal in the sense among all step lengths α What makes the CG iteration remarkable is the choice of the search direction pn− 1 , which has the special property that minimizing φ(x) over xn− 1 + 〈pn− 1 〉 actually minimizes it over all of Kn
Connection between Krylov subspace iteration and polynomials of matrices The Arnoldi and Lanczos iterations solve the Arnoldi/Lanczos approximation problem Find pn^ ∈ Pn^ such that ‖pn(A)b‖ = minimum The GMRES iteration solves the GMRES approximation problem Find pn ∈ Pn such that ‖pn(A)b‖ = minimum For CG, the appropriate approximation problem involves the A-norm of the error Find pn ∈ Pn such that ‖pn(A)e 0 ‖A = minimum where e 0 denotes the initial error e 0 = x∗ − x 0 = x∗, and Pn is again defined as GMRES (i.e., the set of polynomials p of degree ≤ n with p(0) = 1)
If the CG iteration has not already converged before step n (i.e., rn− 1 6 = 0 ), then ‖pn(A)e 0 ‖A has a unique solution pn ∈ Pn, and the iterate xn has error en = pn(A)e 0 for this same polynomial pn. Consequently, we have ‖en‖A ‖e 0 ‖A
= inf p∈Pn
‖p(A)e 0 ‖A ‖e 0 ‖A
≤ inf p∈Pn
max λ∈Λ(A)
|p(λ)| (2)
where Λ(A) denotes the spectrum of A
From theorem in the last lecture, it follows that en = p(A)e 0 for some p ∈ Pn The equality is a consequence of (2) and monotonic convergence (1)
First, we suppose that the eigenvalues are perfectly clustered but assume nothing about the locations of these clusters
If A has only n distinct eigenvalues, then the CG iteration converges in at most n steps
This is a corollary of (2), since a polynomial p(x) =
∏n j=1(1^ −^ x/λj^ )^ ∈^ Pn^ exists that is zero at any specified set of n points {λj } At the other extreme, suppose we know nothing about any clustering of the eigenvalues but only that their distances from the origin vary by at most a factor κ ≥ 1 In other words, suppose we know only the 2-norm condition number κ = λmax /λmin, where λmax and λmin are the extreme eigenvalues of A
Let the CG iteration be applied to a symmetric positive definite matrix problem Ax = b, where A has 2-norm condition number κ. Then the A-norm of the errors satisfy
‖en‖A ‖e 0 ‖A
κ + 1 √ κ − 1
)n
κ + 1 √ κ − 1
)−n] ≤ 2
κ − 1 √ κ + 1
)n
See text for proof using Chebyshev polynomials Since (^) √ κ − 1 √ κ + 1
κ as κ → ∞, it implies that if κ is large but not too large, convergence to a specified tolerance can be expected in O(
κ) iterations An upper bound, and convergence may be faster for special right hand sides or if the spectrum is clustered
10 −16 0 10 20
10 −
10 −
10 −
100
104
By &%^ yH .Kg By( MU ' ^04 :9^ yy^ ! 8 yO^0 y, 8y^ yoyE8y^ y H,h=j"y^ MN( $ '0V y Mu % ^ PL"Z ) %h,"5:"#*y^ yEhm % " EXYy^ 0/ h-^ )*y.+^ MFor τ = 0.01, A has 3,092 nonzero entries and κ ≈ 1 .06, the CG convergence takes place in 9 steps For τ = 0.05, A has 13,062 nonzero entries with κ ≈ 1 .83, and convergence takes place in 19 steps For τ = 0.1, A has 25,526 nonzero entries with κ ≈ 10 .3 and the process converges in 20 steps For τ = 0.2 with 50,834 nonzero entries, there is no convergence at all For this example, the CG beats Cholesky factorization by a factor of about 700 in terms of operation counts
The convergence of a matrix iteration depends on the properties of the matrix - the eigenvalues, the singular values, or sometimes other information In many cases, the problem of interest can be transformed so that the properties of the matrix are improved drastically The process of preconditioning is essential to most successful applications of iterative methods
Two extreme cases: I (^) If M = A, then (5) is the same as (3), and nothing has been gained I (^) If M = I , then (4) is the same as (3), and then it is a trivial solution Between these two extremes lie the useful preconditioners, I (^) structured enough (5) can be solved quickly I (^) but close enough to A in some sense that an iteration for (4) converges more quickly than an iteration for (3) What does it mean for M to be “close enough” to A? If the eigenvalues of M−^1 A are close to 1 and ‖M−^1 A − I ‖ 2 is small, then any of the iterations we have discussed can be expected to converge quickly However, preconditioners that do not satisfy such a strong condition may also perform well A simple rule of thumb: preconditioner M is good if M−^1 A is not too far from normal and its eigenvalues are clustered
What we have described may be more precisely terms as left preconditioner Another idea is to transform Ax = b into AM−^1 y = b with x = M−^1 y in which case M is called a right preconditioner If A is Hermitian positive definite, then it is usual to preserve this property in preconditioning Suppose M is also Hermitian positive definite, with M = CC ∗^ for some C , then (3) is equivalent to
[C −^1 AC −∗]C ∗x = C −^1 b
The matrix in brackets is Hermitian positive definite, so this equation can be solved by conjugate gradient or related iterations At the same time, since C −^1 AC −∗^ is similar to C −∗C −^1 A = M−^1 A, it is enough to examine the eigenvalues of the non-Hermitian matrix M−^1 A to investigate convergence