



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The jacobi-davidson method for linear eigenproblems, focusing on the use of harmonic ritz values and the associated algorithm. The authors compare the convergence behavior and efficiency of the jacobi-davidson method with harmonic ritz values to the standard lanczos method, davidson's method with diagonal preconditioning, and the standard jacobi-davidson method with the same diagonal preconditioning. The document also discusses the relationship between harmonic ritz values and other methods, such as shift and invert techniques.
Typology: Papers
1 / 27
This page cannot be seen from the preview
Don't miss anything!




















SIAM REVIEW ©c 2000 Society for Industrial and Applied Mathematics Vol. 42, No. 2, pp. 267–
Gerard L. G. Sleijpen† Henk A. Van der Vorst†
Abstract. In this paper we propose a new method for the iterative computation of a few of the extremal eigenvalues of a symmetric matrix and their associated eigenvectors. The method is based on an old and almost unknown method of Jacobi. Jacobi’s approach, combined with Davidson’s method, leads to a new method that has improved convergence properties and that may be used for general matrices. We also propose a variant of the new method that may be useful for the computation of nonextremal eigenvalues as well.
Key words. eigenvalues and eigenvectors, Davidson’s method, Jacobi iterations, harmonic Ritz values
AMS subject classifications. 65F15, 65N
PII. S
∗Published electronically April 24, 2000. This paper originally appeared in SIAM Journal on Matrix Analysis and Applications, Volume 17, Number 2, 1996, pages 401–425. http://www.siam.org/journals/sirev/42-2/36308.html †Mathematical Institute, Utrecht University, P.O. Box 80.010, 3508 TA Utrecht, Netherlands ([email protected], [email protected]).
267
The outline of this paper is as follows. In section 2 we briefly describe the meth- ods of Davidson and Jacobi, and we show that the original Davidson’s method may be viewed as an accelerated Gauss–Jacobi iteration method. Likewise, more recent approaches that include other preconditioners M can be interpreted as accelerated standard iteration methods associated with the splitting A = M − N. In section 3 we propose the new approach, which is essentially a combination of the JOCC approach and Davidson’s method for creating more general subspaces. The difference between this approach and Davidson’s method may seem very subtle but it is fundamental. Whereas in Davidson’s method accurate preconditioners M (accurate in the sense that they approximate the inverse of the given operator very well) may lead to stagnation or very slow convergence, the new approach takes advantage of such preconditioners, even if they are exact inverses. It should be stressed that in this approach we do not precondition the given eigensystem (neither does Davidson), but we precondition an auxiliary system for the corrections for the eigenapproximations. The behavior of the method is further discussed in section 4. There we see that for a specific choice the speed of convergence of the approximated eigenvalue is quadratic (and for symmetric problems even cubic). In practice, this requires the exact solution of a correction equation, but as we will demonstrate by simple examples (section 6), this may be relaxed. We suggest using approximate solutions for the correction equations. This idea may be further exploited for the construction of efficient inner- outer iteration schemes or by using preconditioners similar to those suggested for the Davidson method. In section 5 we discuss the harmonic Ritz values, and we show how these can be used in combination with our new algorithm for the determination of “interior” eigen- values. We conclude with some simple but illustrative numerical examples in section
2.1. Davidson’s Method. The main idea behind Davidson’s method is the follow- ing one. Suppose we have some subspace K of dimension k, over which the projected matrix A has a Ritz value θk (e.g., θk is the largest Ritz value) and a corresponding Ritz vector uk. Let us assume that an orthogonal basis for K is given by the vectors v 1 , v 2 ,... , vk. Quite naturally the problem of how to expand the subspace in order to find a successful update for uk arises. To that end we compute the residual rk = Auk −θkuk. Then Davidson, in his original paper [4], suggests computing t from (DA − θkI)t = rk, where DA is the diagonal of the matrix A. The vector t is made orthogonal to the basis vectors v 1 ,... , vk, and the resulting vector is chosen as the new vk+1, by which K is expanded. It has been reported that this method can be quite successful in finding dominant eigenvalues of (strongly) diagonally dominant matrices. The matrix (DA − θkI)−^1 can be viewed as a preconditioner for the vector rk. Davidson [6] suggests that his algorithm (more precisely, the Davidson–Liu variant of it) may be interpreted as a Newton–Raphson scheme, and this has been used as an argument to explain its fast convergence. It is tempting to also see the preconditioner as an approximation for (A − θkI)−^1 , and, indeed, this approach has been followed for the construction of more complicated preconditioners (see, e.g., [17], [3], [15], [18]). However, note that
with z 1 = 0, θ 1 = α, and y 1 = −(D −αI)−^1 b. However, in connection with Davidson’s method, representation (4) simplifies the discussion.
2.2.2. Short Discussion on the JOCC Method. Jacobi was well aware of the fact that the Jacobi iteration converges (fast) if the matrix is (strongly) diagonally dominant. Therefore, he proposed to perform a number of steps of the Jacobi diago- nalization method in order to obtain a (strongly) diagonally dominant matrix before applying the JOCC method. Since he was interested in the eigenvalue closest to α, he did enough steps using the diagonalization method to obtain a diagonally dominant F − αI so that F − θkI was also diagonally dominant. This can be done provided that λ is a simple eigenvalue of A. The application of the diagonalization method can be viewed as a technique to improve the initial guess e 1 ; i.e., the given matrix is rotated so that e 1 is closer to the (rotated) eigenvector u. These rotations were done only at the start of the iteration process, and this process was carried out with fixed F and D. However, note that in Jacobi’s approach we are looking, at all stages, for the orthogonal complement to the initial approximation u 1 = e 1. We do not take into account that better approximations uk = (1, zTk )T^ become available in the process and that it may be more efficient to try to compute the orthogonal complement u − (uT^ uk)uk. In the JOCC framework an improved approximation would have led to a similar situation as in (1) if we would have applied plane rotations on A, such that uk would have been rotated to e 1 by these rotations. Therefore, what Jacobi did only in the first step of the process could have been done at each step of the iteration process. This is an exciting idea, since in Jacobi’s approach the rotations were motivated by the desire to obtain stronger diagonal dominance, whereas our discussion suggests that one might take advantage of the information in the iteration process. Of course, this would have led to a different operator F in each step, and this is an important observation for the formulation of our new algorithm.
2.3. Davidson’s Method as an Accelerated JOCC Method. We will apply David- son’s method for the same problem as before. In particular we will assume that A is as in (1) and that u 1 = e 1. The eigenvector approximations produced in the kth step of JOCC, as well as in the Davidson method, are denoted by uk. We assume that uk is scaled such that its first coordinate is 1: uk = (1, zTk )T^. Let θk be the associated approximation to the eigenvalue. It will be clear from the context to which process an approximation refers. The residual is given by
rk = (A − θkI)uk =
α − θk + cT^ zk (F − θkI)zk + b
Davidson proposes computing tk from
(7) (DA − θkI)tk = −rk,
where DA is the diagonal of A. For the component ̂yk := (0, yTk )T^ of tk orthogonal to u 1 it follows, with D the diagonal of F , that
(8) (D − θkI)yk = −(F − θkI)zk − b = (D − F )zk − (D − θkI)zk − b,
or equivalently,
(9) (D − θkI)(zk + yk) = (D − F )zk − b.
Comparing (9) with (4), we see that zk + yk is the zk+1 that we would have obtained with one step of JOCC starting at zk. But after this point, Davidson’s method is an improvement over the JOCC method because instead of taking ûk+1 := (1, (zk + yk)T^ )T^ = uk + ̂yk as the next approximating eigenvector (as in JOCC), Davidson suggests computing the Ritz vector of A with respect to the subspace computed so far (that is, over the subspace spanned by the old approximations u 1 ,... , uk and the new ̂uk+1). See section 5.1 for a definition of Ritz vectors and Ritz values. Actually, Davidson selects the correction tk, but
span(u 1 ,... , uk, ̂uk+1) = span(u 1 ,... , uk, tk).
For the computation of the Ritz vector it is convenient to have an orthonormal basis, and that is precisely what is constructed in Davidson’s method. This orthonormal basis v 1 ,... , vk+1 appears if one orthonormalizes u 1 ,... , uk, tk by Gram–Schmidt. The (k + 1)th step in either JOCC or Davidson’s method can be summarized as follows. JOCC. Jacobi computes the component ŷk of tk orthogonal to u 1 and takes uk+1 = uk + ̂yk, θk+1 = eT 1 Auk+1. Unlike Davidson’s approach, Jacobi only computes components that are orthogonal to u 1 = e 1. However, in view of the orthogonalization step, the components in the u 1 -direction (as well as in new directions) automatically vanish in Davidson’s method. Davidson’s method. Davidson computes the component vk+1 of tk orthogonal to u 1 ,... , uk and takes for uk+1 (and θk+1) the Ritz vector (respectively, Ritz value) of A with respect to the space spanned by v 1 ,... , vk+1. Davidson exploits the complete subspace constructed so far, while Jacobi only takes a simple linear combination of the last vector zk and the last correction yk (or, taking the u 1 -component into account, of u 1 , the last vector uk, and the last correction tk). Although span(u 1 ,... , uk, ̂uk+1) = span(u 1 ,... , uk+1), the Davidson approximations do not span the same subspace as the Jacobi approximations since the eigenvalue approximations θk of Jacobi and of Davidson are different. Consequently, the methods have different corrections tk and also different components orthogonal to the uj. Note that our formulation makes clear that Davidson’s method also attempts to find the orthogonal update (0, zT^ )T^ for the initial guess u 1 = e 1 , and it does so by a clever orthogonalization procedure. However, just as in the JOCC approach, the process works with fixed operators (in particular DA; other variants use different approximations for A) and not with operators associated with the orthogonal com- plement of the current approximation uk (see also section 2.2.2). This characterizes the difference between our algorithm (of section 3) and Davidson’s method.
(10) A = B + Auku∗ k + uku∗ kA − θkuku∗ k.
Algorithm 1 The Jacobi–Davidson Method.
(I − u u ∗) (A − θI) (I − u u ∗) t = −r.
very modest precision, but this is certainly not an optimal choice in many practical situations. However, our experiments illustrate that the better we approximate the solution t of (13), the faster convergence we obtain (and no stagnation as in Davidson’s method). The algorithm for the improved Davidson method then becomes Algorithm 1 (in the style of [26]). We have skipped indices for variables that overwrite old values in an iteration step, e.g., u instead of uk. We do not discuss implementation issues, but we note that the computational costs for Algorithm 1 are about the same as for Davidson’s method (provided that the same amount of computational work is spent to approximate the solutions of the involved linear systems).
(14) (I − uku∗ k)(A − θkI)(I − uku∗ k)t = −rk and t ⊥ uk
(see Algorithm 1, in which we have skipped the index k).
Equation (14) can be solved approximately by selecting some more easily in- vertible approximation for the operator (I − uku∗ k)(A − θkI)(I − uku∗ k), or by some (preconditioned) iterative method. If any approximation (preconditioner) is available, then this will most often be an approximation for A − θkI. However, the formulation in (14) is not very suitable for incorporating available approximations for A − θkI. We will first discuss how to construct approximate so- lutions orthogonal to uk straight from a given approximation for A − θkI (one-step approximation: section 4.1). Then we will propose how to compute such an approxi- mated solution efficiently by a preconditioned iterative scheme (iterative approxima- tion: section 4.2).
4.1. One-Step Approximations. A more convenient formulation for (14) is ob- tained as follows. We are interested in determining t ⊥ uk, and for this t we have that
(I − uku∗ k)t = t,
and then it follows from (14) that
(15) (A − θkI)t − εuk = −rk
or
(A − θkI)t = εuk − rk.
When we have a suitable preconditioner M ≈ A − θkI available, then we can compute an approximation ˜t for t:
(16)^ ˜t = εM −^1 uk − M −^1 rk.
The value of ε is determined by the requirement that ˜t should be orthogonal to uk:
ε =
u∗ kM −^1 rk u∗ kM −^1 uk
Equation (16) leads to several interesting observations. (a) If we choose ε = 0, then we obtain the Davidson method (with preconditioner M ). In this case ˜t will not be orthogonal to uk. (b) If we choose ε as in (17), then we have an instance of the Jacobi–Davidson method. This approach has already been suggested in [19]. In that paper the method is obtained from a first-order correction approach for the eigensystem. Further experiments with this approach are reported in [28]. Note that this method requires two operations with the preconditioning matrix per iteration. (c) If M = A − θkI, then (16) reduces to
t = ε(A − θkI)−^1 uk − uk.
Since t is made orthogonal to uk afterwards, this choice is equivalent with t = (A − θkI)−^1 uk. In this case the method is just mathematically equivalent to (accelerated) shift and invert iteration (with optimal shift). For symmetric A this is the (accelerated) inverse Rayleigh quotient method, which converges cubically [22]. In the unsymmetric case we have quadratical convergence [22], [20]. In view of the speed of convergence of shift and invert methods, it may hardly be worthwhile to accelerate them in a “Davidson” manner: the
and we see that ˜t will make a very small angle with uk ≈ e 1 (since a 11 ≈ θk). This implies that ˜t − (˜t∗uk)uk may have only few significant digits, and then it may be a waste of effort to expand the subspace with this (insignificant) vector. However, in the Jacobi–Davidson method we compute the new vector as
˜t = ε(DA − θkI)−^1 uk − (DA − θkI)−^1 rk.
The factor ε is well determined, and note that for our strongly diagonally dominant model problem we have in good approximation that
‖ε(DA − θkI)−^1 uk‖ 2.
‖uk‖ 2 ‖(DA − θkI)−^1 rk‖ 2 ‖uk‖ 2 ‖(DA − θkI)−^1 uk‖ 2 ‖(DA − θkI)−^1 uk‖ 2
= ‖(DA − θkI)−^1 rk‖ 2.
Furthermore, since uk ⊥ rk, we have that ε(DA − θkI)−^1 uk and (DA − θkI)−^1 rk are not in the same direction, and therefore there will be hardly any cancellation in the computation of ˜t. This means that ˜t is well determined in finite precision and ˜t ⊥ uk. Our discussion can be adapted for nondiagonally dominant matrices as well, when we restrict ourselves to situations where the approximations uk and θk are sufficiently close to their limit values and where we have good preconditioners (e.g., inner iteration methods). We will illustrate our observations by a simple example taken from [3]: Exam- ple 5.1. In that example the matrix A of dimension 1000 has diagonal elements aj,j = j. The elements on the sub- and superdiagonal (aj− 1 ,j and aj,j+1) are all equal to 0.5, as well as the elements a 1 , 1000 and a 1000 , 1. For this matrix we compute the largest eigenvalue (≈ 1000 .225641) with (a) the standard Lanczos method, (b) Davidson’s method with diagonal precondition- ing ((DA − θkI)−^1 ), and (c) the Jacobi–Davidson method with the same diagonal preconditioning, carried out as in (16). With the same starting vector as in [3] we obtain, of course, the same results: a slowly converging Lanczos process, a much faster Davidson process, and Jacobi– Davidson is just as fast as Davidson in this case. The reason for this is that the starting vectors e 1 + e 1000 for Davidson and ≈ e 1 + 2000e 1000 for Lanczos are quite good for these processes, and the values for ε, which describe the difference between (b) and (c), are very small in this case. Shift and invert Lanczos with shift 1001 takes five steps for full convergence, whereas Jacobi–Davidson with exact inverse for A−θkI takes three steps. In Table 1 we see the effects when we take a slightly different starting vector u 1 = (0. 01 , 0. 01 ,... , 0. 01 , 1)T^ ; that is, we have taken a starting vector that still has a large component in the dominating eigenvector. This is reflected by the fact that the Ritz value in the first step of all three methods is equal to 954. 695 · · ·. In practical situations we will often not have such good starting vectors available. The Lanczos process converges slowly again, as might be expected for this uniformly distributed spectrum. In view of our discussion in section 2.3 we may interpret the new starting vector as a good starting vector for a perturbed matrix A. This implies that the diagonal preconditioner may not be expected to be a very good preconditioner. This is reflected by the very poor convergence behavior of Davidson’s method. The difference with the Jacobi–Davidson is now quite notable (see the values of ε), and for this method we observe rather fast convergence again. Although this example may seem quite artificial, it displays quite nicely the be- havior that we have seen in our experiments, as well as what we have tried to explain in our discussion.
Table 1 Approximation errors λ − θk.
Iteration Lanczos Davidson Jacobi–Davidson |ε| 0 0.45e+02 0.45e+02 0.45e+ 1 0.56e+01 0.40e+02 0.25e+02 0.50e+ 2 0.16e+01 0.40e+02 0.74e+01 0.12e+ 3 0.71e+00 0.40e+02 0.15e+01 0.11e+ 4 0.43e+00 0.40e+02 0.14e+01 0.14e+ 5 0.32e+00 0.40e+02 0.55e–01 0.49e+ 6 0.26e+00 0.39e+02 0.13e–02 0.72e– 7 0.24e+00 0.38e+02 0.29e–04 0.29e– 8 0.22e+00 0.37e+02 0.33e–06 0.14e– 9 0.21e+00 0.36e+02 0.25e–08 0.34e– 10 0.20e+00 0.36e+ 11 0.19e+00 0.35e+ 12 0.19e+00 0.34e+ 13 0.18e+00 0.33e+ 14 0.17e+00 0.32e+ 15 0.16e+00 0.31e+
In conclusion, Davidson’s method works well in these situations where ˜t does not have a strong component in the direction of uk. Shift and invert approaches work well if the component in the direction of u in uk is strongly increased. However, in this case this component may dominate so strongly (when we have a good precondi- tioner) that it prevents us from reconstructing in finite precision arithmetic a relevant orthogonal expansion for the search space. In this respect, the Jacobi–Davidson is a compromise between the Davidson method and the accelerated (inexact) shift and invert method, since the factor ε properly controls the influence of uk and makes sure that we construct the orthogonal expansion of the subspace correctly. In this view Jacobi–Davidson offers the best of two worlds, and this will be illustrated by our numerical examples.
4.2. Iterative Approximations. If a preconditioned iterative method is used to solve (14), then in each step of the linear solver, a “preconditioning equation” has to be solved. If M is some approximation of A − θkI, then the projected matrix
Md := (I − uk u∗ k) M (I − uk u∗ k)
can be taken as an approximation of (I − uk u∗ k)(A − θkI)(I − uk u∗ k) and, in each iterative step, we will have to solve an equation of the form Mdz = y, where y is some given vector orthogonal to uk and z ⊥ uk has to be computed. Of course, z can be computed as (cf. (16)–(17))
(M (^) d− 1 y =) z = αM −^1 uk − M −^1 y with α =
u∗ kM −^1 y u∗ kM −^1 uk
In this approach, we have to solve, except for the first iteration step, only one sys- tem involving M in each iteration step. The vector M −^1 uk and the inner product u∗ kM −^1 uk, to be computed only once, can also be used in all steps of the iteration process for (14). The use of a (preconditioned) Krylov subspace iteration method for (14) does not lead to the same result as when we apply this iterative method to the two equations in (16) separately. For instance, if p is a polynomial such that p(A − θkI) ≈ (A − θkI)−^1
for symmetric matrices, these harmonic Ritz values exhibit a monotonic convergence behavior with respect to the eigenvalues with smallest absolute value. This further supports the observation made in [15] that, for the approximation of “interior” eigen- values (close to some μ ∈ C), more regular convergence behavior with the harmonic Ritz values (of A − μI) than with the Ritz values may be expected. In [21] the harmonic Ritz values for symmetric matrices are discussed. The non- symmetric case has been considered in [31], [8]. However, in all these papers the dis- cussion is restricted to harmonic Ritz values of A with respect to Krylov subspaces. In [15] the harmonic Ritz values are considered for more general subspaces associated with symmetric matrices. The approach is based on a generalized Rayleigh–Ritz pro- cedure, and it is pointed out in [15] that the harmonic Ritz values are to be preferred for the Davidson method when aiming for interior eigenvalues. In connection with the Jacobi–Davidson method for unsymmetric matrices, we propose a slightly more general approach based on projections. To this end, as well as for the introduction of notations that we will also need later, we discuss to some extent the harmonic Ritz values in section 5.1.
5.1. Harmonic Ritz Values on General Subspaces. Ritz Values. If Vk is a linear subspace of C n, then θk is a Ritz value of A with respect to Vk with Ritz vector uk if
(20) uk ∈ Vk, uk 6 = 0, Auk − θuk ⊥ Vk.
How well the Ritz pair (θk, uk) approximates an eigenpair (λ, w) of A depends on the angle between w and Vk. In practical computations one usually computes Ritz values with respect to a growing sequence of subspaces Vk (that is, Vk ⊂ Vk+1 and dim(Vk) < dim(Vk+1)). If A is normal, then any Ritz value is in the convex hull of the spectrum of A: any Ritz value is a mean (convex combination) of eigenvalues. For normal matrices, at least, this helps to explain the often regular convergence of extremal Ritz values with respect to extremal eigenvalues. For further discussions on the convergence behavior of Ritz values (for symmetric matrices), see [23] and [30]. Harmonic Ritz Values. A value θ˜k ∈ C is a harmonic Ritz value of A with respect to some linear subspace Wk if θ˜− k 1 is a Ritz value of A−^1 with respect to Wk [21].
For normal matrices, θ˜− k 1 is in the convex hull of the collection of λ−^1 ’s, where λ is an eigenvalue of A: any harmonic Ritz value is a harmonic mean of eigenvalues. This property explains their name and, at least for normal matrices, it explains why we may expect a more regular convergence behavior of harmonic Ritz values with respect to the eigenvalues that are closest to the origin. Of course, we would like to avoid computing A−^1 or solving linear systems in- volving A. The following theorem gives a clue about how that can be done. Theorem 5.1. Let Vk be some k-dimensional subspace with basis v 1 ,... , vk. A value θ˜k ∈ C is a harmonic Ritz value of A with respect to the subspace Wk := AVk if and only if
(21) A˜uk − ˜θk u˜k ⊥ AVk for some ˜uk ∈ Vk, ˜uk 6 = 0.
If w 1 ,... , wk span AVk then,^4 with Vk := [v 1 |... |vk], Wk := [w 1 |... |wk], and H˜k := (W (^) k∗ Vk)−^1 W (^) k∗ AVk, (^4) If AVk has dimension less than k, then this subspace contains an eigenvector of A; this situation is often referred to as a lucky breakdown. We do not consider this situation here.
property (21) is equivalent to
(22) H^ ˜ks = θ˜ks for some s ∈ Ck, s 6 = 0 (and u˜k = Vks);
the eigenvalues of the k × k matrix H˜k are the harmonic Ritz values of A. Proof. By (20), (˜θ k− 1 , Au˜k) is a Ritz pair of A−^1 with respect to AVk if and only if
(A−^1 − ˜θ k− 1 I)A˜uk ⊥ AVk
for a ˜uk ∈ Vk, u˜k 6 = 0. Since (A−^1 − ˜θ k− 1 I)A˜uk = −θ˜− k 1 (Au˜k − θ˜k ˜uk) we have the first property of the theorem. For the second part of the theorem, note that (21) is equivalent to
(23) AVks − ˜θkVks ⊥ Wk for an s 6 = 0,
which is equivalent to
W (^) k∗ AVks − ˜θk(W (^) k∗ Vk)s = 0
or H˜ks − θ˜ks = 0. We will call the vector ˜uk in (21) the harmonic Ritz vector associated with the harmonic Ritz value ˜θk, and (˜θk, ˜uk) is a harmonic Ritz pair. In the context of Krylov subspace methods (Arnoldi or Lanczos), Vk is the Krylov subspace Kk(A; v 1 ). The vj are orthonormal and such that v 1 ,... , vi span Ki(A; v 1 ) for i = 1, 2 ,.... Then AVk = Vk+1Hk+1,k, with Hk+1,k a (k + 1) × k upper Hessenberg matrix. The elements of Hk+1,k follow from the orthogonalization procedure for the Krylov subspace basis vectors. In this situation, with Hk,k the upper k × k block of Hk+1,k, we see that
(W (^) k∗ Vk)−^1 W (^) k∗ AVk = (H k∗+1,kV (^) k∗+1Vk)−^1 H k∗+1, 1 V (^) k∗+1Vk+1Hk+1,k = H k,k∗^ −^1 H k∗+1,kHk+1,k.
Since Hk+1,k = Hk,k + βek+1e∗ k, where β is equal to the element in position (k + 1, k) of Hk+1,k, the harmonic Ritz values can be computed from a matrix that is a rank-one update of Hk,k:
H^ ˜k = H∗ k,k^ −^1 H k∗+1,kHk+1,k = Hk,k + |β|^2 H k,k∗^ −^1 eke∗ k.
In [8], Freund was interested in quasi-kernel polynomials (e.g., GMRES and quasi- minimal residual (QMR) polynomials). The zeros of these polynomials are harmonic Ritz values with respect to Krylov subspaces. This follows from Corollary 5.3 in [8], where these zeros are shown to be the eigenvalues of H∗ k,k^ −^1 H k∗+1,kHk+1,k. However,
in that paper these zeros are not interpreted as the Ritz values of A−^1 with respect to some Krylov subspace. In the context of Davidson’s method we have more general subspaces Vk and Wk = AVk. According to Theorem 5.1 we have to construct the matrix H˜k (which will not be Hessenberg in general), and this can be accomplished by either construct- ing an orthonormal basis for AVk (similar to Arnoldi’s method) or by constructing biorthogonal bases for Vk and AVk (similar to the bi-Lanczos method). We will con- sider this in more detail in section 5.2.
It is not necessary to invert W (^) k∗ Vk since the harmonic Ritz values are simply the inverses of the eigenvalues of W (^) k∗ Vk. The construction of an orthogonal basis for AVk can be done with modified Gram–Schmidt. Finally, note that H˜ k− 1 = W (^) k∗ Vk = W (^) k∗ A−^1 Wk, and we explicitly have the matrix of the projection of A−^1 with respect to an orthonormal basis of Wk. This again reveals how the harmonic Ritz values appear as inverses of Ritz values of A−^1 with respect to AVk.
5.3. A Restart Strategy. In the Jacobi–Davidson algorithm (Algorithm 1), just as for the original Davidson method, it is suggested to restart simply by taking the Ritz vector um computed so far as a new initial guess. However, the process may construct a new search space that has considerable overlap with the previous one; this phenomenon is well known for the restarted power method and the restarted Arnoldi (without deflation), and it may lead to a reduced speed of convergence or even to stagnation. One may try to prevent this by retaining a part of the search space, i.e., by returning to step 3 of Algorithm 1 with a well-chosen -dimensional subspace of the span of v 1 ,... , vm for some > 1. With our simple restart, we expect that the process will also construct vectors with strong components in directions of eigenvectors associated with eigenvalues close to the wanted eigenvalue. And this is just the kind of information that we have discarded at the point of restart. This suggests a strategy of retaining Ritz vectors associated with the Ritz values closest to this eigenvalue as well (including the Ritz vector um that is the approxima- tion for the desired eigenvector). In Algorithm 1, these would be the largest Ritz values. A similar restart strategy can be used for the harmonic Ritz values and, say, biorthogonalization: for the initial matrices Vand W, after restart we should take care that W= AV and the matrices are biorthogonal (i.e., W (^) ∗ V should be lower triangular).
5.4. The Use of the Harmonic Ritz Values. According to our approaches for the computation of harmonic Ritz values in section 5.2, there are two variants for an algorithm that exploits the convergence properties of the harmonic Ritz values towards the eigenvalues closest to the origin. Of course, these algorithms can also be used to compute eigenvalues that are close to some μ ∈ C. In that case one should work with A − μI instead of A. We start with the variant based on biorthogonalization. 5.4.1. Jacobi–Davidson with Biorthogonal Basis. When working with harmonic Ritz values, we have to be careful in applying Jacobi’s expansion technique. If (θ˜k, u˜k) is a harmonic Ritz pair of A then rk = Au˜k − θ˜k ˜uk is orthogonal to A˜uk, whereas in our discussion about the new Jacobi–Davidson method with regular Ritz values (cf. section 3) the vector rk was orthogonal to uk. However, we can follow Jacobi’s approach here as well by using a skew basis or an oblique projection. The update for ˜uk should be in the space orthogonal to ̂u := Au˜k. If λ is the eigenvalue of A closest to the harmonic Ritz value θ˜k, then the optimal update is the solution v of
(B − λI)v = −rk, where B :=
˜uk û ∗ ̂ u ∗^ u˜k
u˜k ̂u ∗ û ∗^ ˜uk
In our algorithm we will solve (27) approximately. To be more precise, we solve approximately ( I −
˜uk û ∗ ̂ u ∗^ ˜uk
(A − ˜θkI)
˜uk ̂u ∗ ̂ u ∗^ u˜k
(28) t = −rk.
Note that ̂u can be computed without an additional matrix vector product with A since ̂u = Au˜k = AVks = WkUks (if WkUk = AVk, where Uk is a matrix of order k). The above considerations lead to Algorithm 2.
Algorithm 2 The Jacobi–Davidson algorithm with harmonic Ritz values and biorthogonalization.
˜u û ∗ ̂ u ∗^ u˜
(A − ˜θI)
u˜ ̂u ∗ ̂ u ∗^ ˜u
t = −r.
eigenvalues of H˜m and the matrix Y = [y 1 |... |y] with columns representing the associated eigenvectors. Orthogonalize Y with respect to Lm (Y= ZR with R upper triangular and Z ∗ LmZ lower triangular): z 1 = y 1 and zj+1 = yj+1 − Zj (Z j∗ LmZj )−^1 Z j∗ Lmyj+1 for j = 1,... , ` − 1.:= VmZ, W:= WmZ, L:= Z∗ LmZ, Ĥ := Z ∗ Ĥ mZ , and goto 3.5.4.2. Jacobi–Davidson with Orthogonal Basis. If we want to work with an orthonormal basis for AVk, then we may proceed as follows. Let v 1 , v 2 ,... , vk be the Jacobi–Davidson vectors obtained after k steps. Then we orthonormalize the set Av 1 , Av 2 ,... , Avk (as in (25)). The eigenvalues of H˜k (cf. (26)) are the harmonic Ritz values of A, and let ˜θk be the one of interest, with
-14 0 5 10 15 20 25 30 35 40
0
2
4 Convergence plot for example 1
Number of iterations of (Jacobi)Davidson
log10 of the residual norm
-12 0 5 10 15 20 25 30 35 40
0
2
4 Convergence plot for example 1
Number of iterations of (Jacobi)Davidson
log10 of the residual norm
Fig. 1 Convergence of the residuals with exact inverses.
Fig. 2 Convergence of the residuals with ap- proximate inverses.
operator B − θkI, then we observe (in the lower solid curve) very fast (cubical?) convergence, just as expected. Of course, solving the involved linear systems exactly is usually too expensive, and it might be more realistic to investigate what happens if we take more crude approximations for the operators involved. For the Davidson method we take five steps of GMRES for the approximation of the solution of (A − θkI)v = rk, and for Jacobi–Davidson we also take five steps of GMRES for the approximate solution of the projected system (13). The results are given in Figure 2 (the dashed curve represents the results for the Davidson method). Again we see that for this diagonally dominant matrix it is attractive to work with the Jacobi–Davidson method. Note that the five GMRES steps for the ap- proximate inversion step are not sufficient to have quadratical convergence, but the linear convergence takes place with a very small convergence factor. The commencing convergence, around iteration 12, in the Davidson method gets lost due to rounding errors. The expansion vectors, after iteration 15, are close to the search subspace Vk, and modified Gram–Schmidt does not lead to an orthogonal basis; applying modified Gram–Schmidt twice restores the convergence here (see also Example 5). We also learn from this example that Krylov subspace methods may not be ex- pected to make good preconditioners for Davidson’s method: with a few steps one suffers from the fact that A − θkI has a very small eigenvalue, and if carried out to high precision (almost) stagnation is to be expected. The Jacobi–Davidson method does not have these problems, since the projected operator B − θkI does not have a small eigenvalue (unless the eigenvalue λ is close to some other eigenvalue of A). Example 2. Our second example is still highly artificial, but here we try to mimic more or less what happens when a matrix is not diagonally dominant. The matrix A is constructed as B = Q 1 AQ 1 , with
A = tridiag(− 1 , 2 , − 1)
and where Q 1 is a Householder transformation. The order of the matrices is 100. Note that the distribution of the eigenvalues at the upper and lower ends of the spectrum is not particularly favorable for Krylov subspace methods since they are not well separated in a relative sense.
For those who wish to repeat our experiments we add that the Householder vector h was chosen with elements hj =
j + .45, j = 1, 2 ,... , 100. The starting vector for the iteration algorithms was chosen as a vector with all elements equal to 1. Furthermore, we restarted the outer iterations after each 20 steps, which represents a usual strategy in practical situations. The Davidson algorithm (with DA − θkI) needed 565 iteration steps in order to find the largest eigenvalue
λ = 3. 9990325...
to almost working precision. In the Jacobi–Davidson algorithm we did the inner iterations, necessary for solving (13) approximately, with 5 steps of GMRES. This time we needed 65 outer iterations (i.e., 320 inner iteration steps). The inner iteration method (GMRES), as well as the number of steps (5 steps), was chosen arbitrarily. In actual computations one may choose any appropriate means to approximate the solution of the projected system (13), such as, e.g., the incomplete LU (ILU) decomposition. Example 3. This example illustrates that our new algorithms (in sections 3 and 5) may also be used for the computation of interior eigenvalues. In this example we compute an approximation for the eigenvalue of smallest absolute value. For this purpose the Jacobi–Davidson algorithm that uses Ritz values (section 3) is modified; instead of computing the largest eigenpair (θk, uk) of Hk we compute the one with smallest absolute value for the Ritz value. Again we take a simple matrix: A is the 100 × 100 diagonal matrix with spectrum { t^2 − 0. 8 | t =
j 100
, j = 1,... , 100
All coordinates of the starting vector v 1 are equal to 1. We solve the projected equations approximately using eight steps of GMRES. We do not use restarts for the (outer) iterations. Note that our algorithms, like any Krylov subspace method, do not take advantage of the fact that A is diagonal as long as we do not use diagonal preconditioning. Indeed, with D = QT^ AQ and y = Qv, we see that Ki(D; v) = QT^ Ki(A; y), so that except for an orthogonal transformation the same subspaces are generated. In particular, the Rayleigh quotients, of which the Ritz values are the local extrema for symmetric matrices, are the same for both subspaces. This means that if the starting vectors are properly related as y = Qv, then the Krylov method with D and v leads to the same convergence history as the method with A and y. We have also used the harmonic Ritz value variant of Jacobi–Davidson as in sec- tion 5. Figures 3 and 5 show the convergence history of log 10 ‖rk‖ 2 , where the residual rk, at step k, is either Auk −θkuk for the Ritz pair with Ritz value of smallest absolute value, or A˜uk − ˜θk ˜uk for the harmonic Ritz pair with harmonic Ritz value of smallest absolute value. Along the vertical axis we have log 10 ‖rk‖ 2 , and we have the iteration number k along the horizontal axis. Figures 4 and 6 show the convergence history of all (harmonic) Ritz values (indicated by ·) in the interval indicated along the vertical axis. Again, we have the number of iteration steps along the horizontal axis. We have marked the positions of the eigenvalues (with +) on the right sides of Figures 4 and 6. For this example, the algorithm based on Ritz values (see Figures 3 and 4) con- verges about as fast as the algorithm based on harmonic Ritz values (see Figures 5 and 6), but the convergence history with harmonic Ritz values is much smoother. The difference in smoothness seems to also be typical for other examples. This fact