Lecture Notes on Numerical Optimization | MATH 693A, Study notes of Mathematics

Material Type: Notes; Subject: Mathematics; University: San Diego State University; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-f91
koofers-user-f91 🇺🇸

8 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Recap
Trust-Region Newton
Numerical Optimization
Lecture Notes #15
Practical Newton Methods Trust-Region Newton Methods
Peter Blomgren,
Department of Mathematics and Statistics
Dynamical Systems Group
Computational Sciences Research Center
San Diego State University
San Diego, CA 92182-7720
http://terminus.sdsu.edu/
Fall 2009
Peter Blomgren, h[email protected]iTrust-Region Newton Methods (1/19)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Lecture Notes on Numerical Optimization | MATH 693A and more Study notes Mathematics in PDF only on Docsity!

Recap Trust-Region Newton

Numerical Optimization

Lecture Notes

Practical Newton Methods — Trust-Region Newton Methods

Peter Blomgren,

[email protected]

Department of Mathematics and Statistics Dynamical Systems Group Computational Sciences Research Center San Diego State University San Diego, CA 92182- http://terminus.sdsu.edu/

Fall 2009

Recap Trust-Region Newton

Outline

(^1) Recap Hessian Modifications Trust Region Algorithm

(^2) Trust-Region Newton Newton-Dogleg Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Recap Trust-Region Newton Hessian Modifications Trust Region Algorithm

Recall: The Trust Region Algorithm

Algorithm: Trust Region

[ 1] Set k = 1, ∆b > 0 , ∆ 0 ∈ (0, ∆)b, and η ∈ [0, 14 ] [ 2] While optimality condition not satisfied [ 3] Get ¯pk (approximate solution, Today’s Discussion) [ 4] Evaluate ρk [ 5] if ρk < (^14) [ 6] ∆k+1 = 14 ∆k [ 7] else [ 8] if ρk > 34 and ‖¯pk ‖ = ∆k [ 9] ∆k+1 = min(2∆k , ∆)b [10] else [11] ∆k+1 = ∆k [12] endif [13] endif [14] if ρk > η [15] ¯xk+1 = ¯xk + ¯pk [16] else [17] ¯xk+1 = ¯xk [18] endif [19] k = k + 1 [20] End-While

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Methods: Bk not Positive Definite is OK(?)

Trust-region methods do not require that the model Hessian is positive definite. It is possible to use the exact Hessian Bk = ∇^2 f (¯xk ) directly and find the search direction ¯pk by solving the trust-region subproblem

min p ¯∈Rn^

f (¯xk ) + ∇f (¯xk )T^ ¯p +

¯pT^ Bk ¯p, ‖p¯‖ ≤ ∆k

Some of the techniques we discussed, e.g. dogleg, require that Bk is positive definite. We have seen quite few ideas floating around, lets review what we have seen in the context of our methods: (i) the dogleg method, (ii) 2D-subspace minimization, (iii) nearly exact solution, and (iv) the CG method.

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Newton-Dogleg 2 of 2

In order to make the dogleg method work for general (non-positive definite) Bk s we can use the Hessian modification from last time to replace Bk → (Bk + Ek ) ︸ ︷︷ ︸ Pos.Def and use this matrix in the dogleg solution. There is a price to pay. When the matrix Bk is modified, the importance of different directions are changed in different ways. This may negatively impact the benefits of the trust-region approach. Modifications of the type Ek = τ I behave somewhat more predictably than modifications of the type Ek = diag(τ 1 , τ 2 ,... , τn). Usage of the dogleg method for non-convex problems is somewhat dicey, and even though it may work it is not the preferred method.

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Newton-2D-Subspace-Minimization

In much the same way we modified the dogleg method, we can

adapt the 2D-subspace minimization subproblem to work in the

case of indefinite Bk

¯p^ min∈Rn^ f^ (¯xk^ ) +^ ∇f^ (¯xk^ )T^ ¯p^ +

1 2 ¯pT^ Bk ¯p, ‖¯p‖ ≤ ∆k , p¯ ∈ span(∇f (¯xk ), ¯pB^ )

can be applied when Bk is positive definite, and with a modified

B˜k = (Bk + Ek ) which is positive definite in the case when Bk is

not positive definite:

min ¯p∈Rn^ f (¯xk ) + ∇f (¯xk )T^ ¯p +^1 2 ¯pT^ B˜k ¯p, ‖¯p‖ ≤ ∆k , p¯ ∈ span(∇f (¯xk ), ¯pB˜^ )

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-CG 1 of 4

The trust-region subproblem

min p ¯∈Rn^ f (¯xk ) + ∇f (¯xk )T^ ¯p +

¯pT^ Bk ¯p, ‖p¯‖ ≤ ∆k

can be solved using the Conjugate Gradient (CG) method, with two additional termination criteria (one of which we have seen already). For each subproblem we must solve

Bk ¯pk = −∇f (¯xk )

We apply CG with the following stopping criteria (standard) The system has been solved to desired accuracy. (previous) Negative curvature encountered. (new) Size of the approximate solution exceeds the trust-region radius.

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-CG 2 of 4

In the case of negative curvature we follow the direction to the boundary of the trust region; we get Steihaug’s Method

Algorithm: CG-Steihaug Given ǫ > 0 ; set ¯p 0 = 0, ¯r 0 = ∇f (¯xk ), ¯d 0 = −¯r 0 if( ‖¯r 0 ‖ < ǫ ) return(¯p 0 ) while( TRUE ) if( ¯dTj B¯dj ≤ 0 ) % Negative Curvature Find τ ≥ 0 such that ¯p = ¯pj + τ ¯dj satisfies ‖¯p‖ = ∆ return(¯p) endif αj = ¯rTj ¯rj /¯dTj B¯dj , ¯pj+1 = ¯pj + αj ¯dj if( ‖¯pj+1‖ ≥ ∆ ) % Step outside trust region Find τ ≥ 0 such that ¯p = ¯pj + τ ¯dj satisfies ‖¯p‖ = ∆ return(¯p) endif ¯rj+1 = ¯rj + αj B¯dj if( ‖¯rj+1‖ ≤ ǫ‖¯r 0 ‖ ) return(¯pj+1) βj+1 = ¯rTj+1¯rj+1/¯rTj ¯rj , d¯j+1 = −¯rj+1 + βj+1d¯j end-while

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-CG 4 of 4

Less-than-good properties of TR-Newton-CG: Any direction of

negative curvature is accepted — the accepted direction can give

an insignificant reduction in the model.

There is an extension of CG known as Lanczos method, and it is

possible to build a TR-Newton-Lanczos algorithm which does not

terminate when encountering the first direction of curvature, but

continues to search for a direction of sufficient negative curvature.

TR-Newton-Lanczos is more robust, but comes at a cost of a more

expensive solution of the subproblem.

We leave the discussion of the Lanczos algorithm to Math 643 (to

be offered in ∼Spring 2049).

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-PCG(M) 1 of 5

As we have seen in other (very similar) settings, adding preconditioning to the CG-solver can cut the number of iterations quite drastically. It would seem like a good (and natural) idea to add preconditioning to the Trust-Region Newton-CG scheme. We have to be a little careful... For the standard CG-Steihaug, the following is true

Theorem The sequence of vectors generated by CG -Steihaug satisfies

0 = ‖¯p 0 ‖ 2 < ‖¯p 1 ‖ 2 < · · · < ‖¯pj ‖ 2 < ‖¯pj+1‖ 2 <... ‖¯p‖ 2 ≤ ∆

This does not hold for preconditioned PCG(M)-Steihaug. This means that the sequence can leave the trust region, and then come back!

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-PCG(M) 3 of 5

As usual, we never make this change of variables explicitly. Instead the CG-Steihaug algorithm is modified so that the wherever we have a multiplication by D−^1 or D−T^ we solve the appropriate linear system. Note, if D−T^ Bk D−^1 = I the preconditioning is perfect. Usually

D−T^ Bk D−^1 = I + E

and if we multiply by DT^ from the left and D from the right we see

Bk = D︸ ︷︷T^ D ︸ M

+ D︸ T︷︷^ RD ︸

R

So that M ≈ Bk , and R captures the “inexactness” of the preconditioning.

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Trust-Region Newton-PCG(M) 4 of 5

We can get a good general-purpose preconditioner by using a variant of the Cholesky factorization, LLT^ = Bk. We have discussed two ideas in connection with the Cholesky factorization — last time, we talked about how to modify it to get an approximate factorization of an indefinite matrix, i.e.

[L, LT^ ] =

choldecomp(Bk ) = cholesky(Bk + diag(τ 1 , τ 2 ,... , τn)) modelhess(Bk ) = cholesky(Bk + λI )

We have also (in general terms) talked about the incomplete Cholesky factorization, which preserves the sparsity pattern of Bk by not allowing fill-ins. Putting the two together we get something like the algorithm on the next slide... (do not implement this one!)

Recap Trust-Region Newton Newton-2D-Subspace-Minimization Newton-Nearly Exact Solution Trust-Region Newton-(P)CG

Comments

We have looked at Newton methods (with quadratic convergence, if and only if we implement and solve all the subproblems in the right way) for both the linesearch and trust-region approach, and have developed quite a powerful framework of algorithms that are suitable and quite stable for large problems. Are we done??? — Not quite! We have 4 main topics left

  1. Estimation of derivatives — how to proceed if the gradient and/or the Hessian is not available in analytic form.
  2. Quasi-Newton methods — how to proceed if the Hessian is not available (too expensive).
  3. Application to Nonlinear Least Squares problems.
  4. Application to Nonlinear Equations. — If we can minimize, we can also solve ¯F(¯x) = ¯ 0.