Solving Complex Equations with Nonlinear Methods and Scientific Computing - Prof. Dianne P, Study notes of Mathematics

A lecture note from the book 'scientific computing with case studies' by dianne p. O’leary. It focuses on the topic of nonlinear equations and the methods used to solve them, including newton's method, newton-like methods, and globally-convergent continuation methods. The document also includes case studies on determining orientations of a truss and parameters in an ecological model of a colony of flour beetles.

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-6kw
koofers-user-6kw 🇺🇸

9 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Scientific Computing with Case Studies
SIAM Press, 2009
http://www.cs.umd.edu/users/oleary/SCCSwebpage
Lecture Notes for Unit VI
Nonlinear Equations and Continuation Methods
Dianne P. O’Leary
c
2008
The problem
Given a function F:Rn Rn, find a point x Rnsuch that
F(x) = 0.
Note: The one-dimensional case (n= 1) is covered in CMSC/AMSC 460. The
best software for this problem is some variant on Richard Brent’s zeroin,
available in Netlib. In Matlab, it is called fzero.
We’ll assume from now on that n > 1.
Goals
To develop algorithms for solving nonlinear systems of equations.
Note: Solving nonlinear equations is a close kin to solving optimization problems.
Easier than optimization, since a “local solution” is just fine.
Harder than optimization, since there is no natural merit function f(x)to
measure our progress.
Important note: If Fis a polynomial in the variables x, then use special purpose
software that enables you to find all of the solutions reliably.
Example of a polynomial system:
x2y3+xy = 2
2xy2+x2y+xy = 0
Pointers: Watson’s homotopy method; Traub’s software.
What we know
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Solving Complex Equations with Nonlinear Methods and Scientific Computing - Prof. Dianne P and more Study notes Mathematics in PDF only on Docsity!

Scientific Computing with Case Studies SIAM Press, 2009 http://www.cs.umd.edu/users/oleary/SCCSwebpage Lecture Notes for Unit VI Nonlinear Equations and Continuation Methods Dianne P. O’Leary ©^ c 2008

The problem

Given a function F : Rn^ → Rn, find a point x ∈ Rn^ such that

F(x) = 0.

Note: The one-dimensional case (n = 1) is covered in CMSC/AMSC 460. The best software for this problem is some variant on Richard Brent’s zeroin, available in Netlib. In Matlab, it is called fzero.

We’ll assume from now on that n > 1.

Goals

To develop algorithms for solving nonlinear systems of equations.

Note: Solving nonlinear equations is a close kin to solving optimization problems.

  • Easier than optimization, since a “local solution” is just fine.
  • Harder than optimization, since there is no natural merit function f (x) to measure our progress.

Important note: If F is a polynomial in the variables x, then use special purpose software that enables you to find all of the solutions reliably.

Example of a polynomial system:

x^2 y^3 + xy = 2 2 xy^2 + x^2 y + xy = 0

Pointers: Watson’s homotopy method; Traub’s software.

What we know

We already know a lot about solving nonlinear equations. The main tools are Newton’s method and Newton-like methods.

Differences from the methods we have studied:

  • Instead of the Hessian matrix, we have the Jacobian matrix of first derivatives: Jik =

∂Fi ∂xk

This matrix is generally not symmetric.

  • Line searches are more difficult to guide, since we can’t measure progress using the function f (x). Some attempts have been made to use

‖F(x)‖

as a merit function, but there are difficulties with this approach.

The Plan

  • Newton-like methods (for easy problems): Section 24.
  • Globally-convergent continuation methods (for hard problems): Section
  • Case study: Chapter 25 compares these methods on the problem of determining orientations of a truss.
  • Case study: Chapter 26 uses these methods to determine parameters in an ecological model of a colony of flour beetles.

Note: The nonlinear least squares problem, discussed in Section 24.2 and Chapter 13, is one approach to solving nonlinear equations for which

  • there are more equations than unknowns.
  • measurement error contaminates the equations.

Newton-like methods

  • applied to nonlinear least squares
  • applied to nonlinear equations

Now we can fit a linear function to F:

F(x(k)^ + p) ≈ F(x(k)) + J(x(k))p = 0 → J(x(k))p = −F(x(k)).

  • There is no notion of “downhill” because there is no function f.
  • There is no notion of “progress” because there is no function f.

But if we still find the Newton direction from solving J(x(k))p(k)^ = −F(x(k)), then we are still fitting a linear function to F.

As for the line search, we will just eliminate it and set αk = 1, or we will use ‖F(x)‖ as a merit function.

The resulting algorithm is Newton’s method for nonlinear equations:

Until x(k)^ is a good enough solution,

Find a direction p(k)^ by solving J(x(k))p(k)^ = −F(x(k)). Set x(k+1)^ = x(k)^ + αkp(k)^ where αk = 1 or αk is determined by a line search.

Convergence result: Under some “standard assumptions” (2nd derivatives exist, simple zero, etc.), the algorithm is guaranteed to converge to a solution if started close enough to it, and the convergence rate is quadratic.

Newton-like methods

  • Finite Difference Newton method: If J is not available, we can approximate it using finite differences. Again, this is not recommended; use the next method instead.
  • Inexact Newton method: Instead of solving the linear system J(x(k))p(k)^ = −F(x(k)) exactly, we can use an iterative method to obtain an approximate solution. The usual choice of iterative method is GMRES, a relative of conjugate gradients, and matrix-vector products are evaluated by differencing F.
  • Quasi-Newton methods: If storage is not a problem, then instead of the inexact Newton method, we can store and update an approximation to J. The usual formula is Broyden’s method:

B(k+1)^ = B(k)^ +

(y − B(k)s)sT sT^ s

where y = F(x(k+1)) − F(x(k)) and s = x(k+1)^ − x(k). Then

B(k+1)s = y Secant condition

and if sT^ v = 0 , then B(k+1)v = B(k)v.

Convergence result: As in the minimization case, we get superlinear convergence if the direction is (asymptotically) close enough to the Newton direction.

Globally-convergent continuation methods

Globally-convergent continuation methods

The basic idea:

  • Want to solve F(x) = 0.
  • Know the solution to some easy problem Fa(x) = 0. Example: Fa(x) = x − a for some constant vector a.
  • Formulate the problem

ρa(λ, x) = λF(x) + (1 − λ)Fa(x)

where λ is a real number in the interval [0, 1].

  • Note that the solution to ρa(0, x) = 0 is known (for our example, it is just a), and the solution to ρa(1, x) = 0 is the solution to our desired problem.

We have just seen one example of how to construct a homotopy function ρa(λ, x) so that solving ρa(λ, x) = 0 is easy when λ = 0 and solves the desired problem when λ = 1.

There are many other useful ways to define a homotopy, and we will use some alternatives in the case studies.

The basic algorithm

Initialize λ = 0 and x = solution to Fa(x) = 0.

While λ < 1 ,

Parameterized Sard’s Theorem: Let U = Rn^ × [0, 1) × Rn^ and define a function ρ : U → Rn^ in C^2 which is transversal to zero on U. (We’ll call its variables (a, λ, z).) Choose a point a ∈ Rn^ and define

ρa(λ, z) = ρ(a, λ, z).

Then the map ρa is transversal to zero on [0, 1) × Rn^ for almost all a.

What “for almost all” means:

  • “except for a set of measure zero”.
  • If we choose a point a at random, it has probability zero of giving a function that is not transversal to zero.

Why we need this:

  • If we encounter a zero that does not have a full-rank Jacobian, then it may be impossible to continue the algorithm: - The solution path may bifurcate. - The solution path may end.
  • If we write a computer program and choose a at random, it will almost never fail to give a function that is transversal to zero, and if it does, choosing a different a will almost always work.

Our solution curve is almost always “nice”

Theorem: In addition to the hypotheses above, assume that the equation ρa(0, z) = 0 has a unique solution z 0. Then for almost all a ∈ Rm, there is a curve γ of solutions to ρa(λ, z) = 0 , emanating from (0, z 0 ), with the properties:

  • The Jacobian matrix has (full) rank n at all points in γ.
  • The curve γ is smooth.
  • The curve does not intersect itself or any other solution curve.
  • The curve does not bifurcate.
  • The curve either goes to infinity or reaches the hyperplane λ = 1.

In summary, the curve is very well-behaved! In fact, we can hope to walk along it.

Where does the solution curve go?

Theorem: Under the hypotheses above,

  • If the solution curve γ is bounded, then it has an accumulation point (1, ˆz).
  • If the Jacobian is full rank at the accumulation point, then γ has finite length.

In other words, the curve is going just where we want to go, and it gets there finitely, as long as the Jacobians are bounded away from rank-deficient matrices.

Picture. The above theorem means that we have a solution curve that starts at λ = 0 and goes to our desired solution at λ = 1. But there may be a countable number of other solution curves, some closed and some with two endpoints at λ = 1.

Note that we need to be careful not to step onto one of these other curves.

An example: convex optimization

Suppose we want to find x∗^ to solve the problem

min x f (x)

where f : Rn^ → R^1 is a C^2 convex function that is bounded below. (This last assumption assures that a bounded solution exists.)

We define the mapping

ρa(λ, x) = λ∇f (x) + (1 − λ)(x − a)

Will this homotopy generate a solution curve that terminates in (1, x∗)?

It is sufficient to verify the hypotheses of our three theorems:

  • Let U = Rm^ × [0, 1) × Rn^ and define a function ρ : U → Rn in C^2 which is transversal to zero on U. Let’s look at the Jacobian matrix for the mapping. The Jacobian with respect to the x variables is

Jx = λH(x) + (1 − λ)I ,

where H is the Hessian matrix of f. Now H is positive semi-definite since f is convex, so Jx has rank n for λ ∈ [0, 1), so the complete Jacobian has full-rank, as required.

Note: This is a nice example, and the result means that the continuation method is one approach to solving convex optimization problems, but such problems are generally too easy to require such heavy machinery. Use one of the Newton-type algorithms to solve convex optimization problems.

What we have accomplished

To solve the nonlinear system of equations F(x) = 0 , under the above hypotheses, all we need to do is to follow the solution curve

ρa(λ, x) = λF(x) + (1 − λ)Fa(x) = 0

until λ = 1.

Following the solution curve

We want to follow the solution curve from λ = 0 to λ = 1.

Note first that we don’t need to be too careful when following the curve – we don’t care about the values at any points in between λ = 0 and λ = 1.

Note second that we need to be careful enough that we don’t start walking along any other solution curve, or we will lose our way!

Method 1

Given a function ρa, we set λ = 0 and ˆx = a. Then

ρa(λ, ˆx) = 0.

Until λ = 1,

Increase λ a little bit. Solve ρa(λ, x) = 0 using your favorite algorithm (Newton-like, etc.) with ˆx as a starting guess. Call the solution ˆx.

Upon termination, we have computed ˆx so that ρa(1, ˆx) = 0 , so ˆx solves the problem F(x) = 0.

Issues:

  • choosing the stepsize in λ.
  • choosing the tolerance in the nonlinear equation solver.

Method 2

Issues of choosing a stepsize and tolerance are troublesome, and we also face them in Unit 5, in solving ordinary differential equations (ode’s). Can we use our ode machinery to solve the homotopy problem?

Let’s differentiate our curve. We could differentiate with respect to λ, but then the derivative would fail to exist at any turning point and we might get in trouble.

Instead, we’ll introduce a new independent variable and differentiate with respect to it. The most convenient one is s = arc length.

Now ρa(λ(s), x(s)) = 0 ,

so d ds ρa(λ(s), x(s)) = 0

and we have the intial conditions

λ(0) = 0 , x(0) = a.

When the solution curve reaches λ(s) = 1, we are finished; the resulting x(s) solves our nonlinear system.

For uniqueness, we normalize so that ∥ ∥ ∥ ∥(^

dλ ds

dx ds

∥ = 1^.

What does this system of ode’s look like when Fa(x) = x − a?

dλ ds

F(x) + λJ(x)

dx ds

dλ ds

(x − a) + (1 − λ)

dx ds

dλ ds

∑^ n

i=

dxi ds

The system is implicit.

In principle, we could just plug the system into our favorite ode solver, and this is good for prototyping algorithms, but in practice the ode solver should be tailored to this problem: