






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to finite difference methods for solving first order initial value problems of ordinary differential equations (odes). Problem stability, stability analysis, and the relationship between problem stability and numerical stability. The document also introduces forward euler and backward euler methods, and discusses their stability regions. Students of mathematics, engineering, or computer science may find this document useful for understanding the basics of odes and numerical methods.
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







In these notes we will look at one type of numerical method (finite difference methods) for one problem–a first order initial value problem of the form
y′(t) = f (y, t), (1a) y(t 0 ) = y 0 , (1b)
where in general y ∈ Cm^ is a vector, and f (y, t) ∈ Cm^ is a vector valued, possibly nonlinear, function^1. The goal will be to find a numerical approximation to the solution y(t) over some fixed interval t = [t 0 , T ], as a discrete set of values Yi ∈ Cm, i = 0,... , N where Yi is a pointwise approximation to y(ti). This means that we have broken the domain t = [t 0 , T ] into a set of points t 0 , t 1 ,... , tN , with tN = T , and we find an approximation to the continuous solution at those discrete points (see Figure 1). Note that the system (1) is more general that it might first appear, since you can turn a system of ODEs with higher-order derivatives into a first-order system by adding components to the solution vector y(t). For instance, consider the following example of a set of two second order ODEs
x′′(t) = h(x, z, x′, z′), (2a) z′′(t) = g(x, z, x′, z′), (2b)
where x and z are scalar functions, with initial conditions x(0) = a, z(0) = b, x′(0) = c and z′(0) = c. By letting
y =
y 1 y 2 y 3 y 4
x z x′ z′
we can derive a first order system the governs the vector y(t):
y′^ = f (y), (4) (^1) We will assume that f satisfies properties such that our solution is unique and multiply differentiable, etc. etc. but not concern ourselves with such issues here.
Figure 1: Numerical approximation for y(t) is the set of values Yi, i = 0,... , N , where Yi is the discrete approximation to y(ti), for i = 0,... , N.
where
f (y) =
x′ z′ h(x, z, x′, z′) g(x, z, x′, z′)
y 2 y 3 h(y) g(y)
with initial conditions
y(0) =
a b c d
So by being able to solve (1), we really can solve any higher-order nonlinear system. Additionally, we can eliminate the direct dependence of f (y, t) on t, by letting t be a component of the solution vector y ∈ Cm. That is, we add another component to y, which is t. The governing ODE for that last component is just (ym+1)′^ = t′^ = 1. This is not necessary to use the numerical methods we’ll
If the norm of the solution to (11) grows in time, the original problem (7) is unstable, if it damps to zero in time the original problem is stable, and if it remains constant the original problem is sometimes said to be “neutrally stable.” In order to analyze (11) for nonlinear functions, we will do a form of linear stability analysis. The idea is as follows. We know ¯y(t 0 ), it is just the initial condition of the original problem (7). We want to consider arbitrary (yet “small”) values for . We then do a Taylor expansion of the right hand side of the ODE in (11), about the state ¯y which gives
f (¯y) − f (˜y) = f ′(¯y)(¯y − y˜) + O(z^2 ) = f ′(¯y)z + O(z^2 ). (12)
(Again, in the case of systems where y is a vector, the term f ′^ refers to the Jacobian matrix in the Taylor expansion of f. Now, if is small enough we can ignore O(z^2 ) initially, which implies that z(t) is initially governed by something close to
z′(t) = f ′(¯y)z, (13a) z(t 0 ) = . (13b)
This is a linear system for z resembling
z′(t) = λz, (14a) z(t 0 ) = z 0 = , (14b)
the solution to which is of course z(t) = z 0 eλ(t−t^0 ). If <(λ) < 0, then z(t) decays to zero, meaning that ˜y approaches ¯y, and O(z^2 ) terms in (12) can continue to be neglected. Now, f ′(¯y) is really a function of time in general, since ¯y does not need to be stationary. So problem stability can change as the solution evolves if the sign of the real part of f ′(¯y) changes. (For problems where ¯y is constant, such as if it is a critical point of f , stability doesn’t change as the solution evolves.) If y ∈ Cm, we might write (13) as z′(t) = Az, (15a) z(t 0 ) = z 0 = , (15b)
where A ∈ Cm×m^ is the m by m Jacobian f ′(¯y). The solution to (15) is
z(t) = eAtz 0 , (16)
where eAt^ ∈ Cm×m^ is the matrix exponential, defined by the series
eAt^ = I + At +
A^2 t^2 2
Aktk k!
If A is diagonalizable
Ak^ = (RΛR−^1 )k^ = (RΛR−^1 )(RΛR−^1 ) · · · (RΛR−^1 ) = RΛkR−^1 , (18)
since all of the R−^1 R terms are the identity. We can therefore write the series for the matrix exponential as
eAt^ = R(I + Λt + Λ^2 t^2 2
Λktk k!
⇒ R−^1 eAtR = I + Λt +
Λ^2 t^2 2
Λktk k!
The significance of this, is that eΛt^ is diagonal, since all the terms in its series are diagonal (the products of diagonal matrices are still diagonal.) That is
eΛt^ =
eλ^1 eλ 2
... eλ m
where λj^ is the jth^ eigenvalue of A. So we have
z(t) = eAtz 0 ⇒ R−^1 z(t) = R−^1 eAtz 0 ⇒ w(t) = R−^1 eAtRw 0 , (21a) ⇒ w(t) = eΛtw 0 , (21b)
where w 0 = R−^1 z 0. In other words, by changing to the basis of eigenvectors, we see that each component wj^ of the solution obeys the scalar equation
wj^ (t) = eλ j (^) t w 0 j. (22)
Therefore, if the real part of all of the eigenvalues of f ′(¯y) have negative real parts, each component of w(t) decays, so z(t) decays, and ˜y approaches ¯y. When looking at a general nonlinear system then, it is the eigenvalues of the Jacobian at any state ¯y that determine problem stability. When doing linear stability analysis, terms such as “small” when referring to , depend on the relative size of first and higher order terms in the Taylor expansion (12), which depends on the specific problem at hand since it depends on the size of f ′′(¯y) in relation to f ′(¯y) etc. etc. Don’t be too concerned about whether this analysis is always valid. It depends on the problem. For some problems linear stability analysis is sufficient and others it may not be. (A good example is when the real part of all eigenvalues is zero. Stability then depends on higher order terms in the Taylor expansion.) For our purposes, we are content to examine how our numerical methods will behave on stable systems. This section therefore, is really just an explanation of why later we will focus on the test problem
u′(t) = λu, (23a) u(t 0 ) = u 0. (23b)
If we consider how a numerical method treats (23), it gives us an indication of how it will treat more general problems where the true solution is perturbed from ¯y(t) to ˜y(t) at any time.
3 Finite Difference Methods
Finite difference methods provide formulas to generate a numerical sequence
U 0 , U 1 , U 2 ,... , UN , (24)
as an approximation to the solution u(t) of the ODE:
u′(t) = f (u), (25a) u(t 0 ). (25b)
Schemes that require solving implicit functions at each time-step (due to the f (Un+1)) such as Backward Euler, are called implicit. Implicit schemes therefore come with an added cost, however, as we will see when we talk about stability, they convey some benefit, and are in fact imperative for some ODEs. Another implicit scheme is the Trapezoid Method:
Un+1 = Un + ∆t 2
(f (Un) + f (Un+1)) , (32a) U 0 = u(t 0 ). (32b)
The Trapezoid Method (32) would also need Newton iteration (31). The methods we have discussed so far are all 1-step methods, since Un+1 only depends directly on Un. More generally, we might define Linear Multistep Methods, of the form
∑^ r
j=
αj Un+j =
∑^ r
j=
βj f (Un+j ), (33)
where αj and βj are constants that define this general r-step method. Even the 1-step methods we’ve looked at above are Linear Multistep Methods, with r = 1. Note that if βr = 0 the method is explicit, otherwise it is implicit. An example of an explicit 2-step method (r=2), is the Midpoint Method:
Un+1 = Un− 1 + 2∆tf (Un). (34)
You might notice that we need to find U 1 some other way, before we can start using (34). Typically some other 1-step method would be used to find U 1 , and then then (34) is used for U 2 , U 3 ,... , UN. More generally, an r-step method must find the first U 0 , U 1 ,... , Ur− 1 starting values using other methods. However, since only a small number of steps are used to generate these values, it doesn’t change the properties of the method overall.
The global error of our numerical method is simply the distance between the numerical solution and the actual solution in some norm. It is defined at each point tn, n = 0, 1 ,... , N. That is
en = ||Un − u(tn)||. (35)
Ultimately we are interested in the global error, however, it can be difficult to ascertain the global error directly, since of course we don’t know the true solution generally. The truncation error is another type of measure of the methods accuracy. Note that our numerical solution sequence {Un}Nn=0 is the solution to the difference equation
∑^ r
j=
αj Un+j −
∑^ r
j=
βj f (Un+j ) = 0. (36)
However, if we plug the true solution into (36) as
∑^ r
j=
αj u(tn+j ) −
∑^ r
j=
βj f (u(tn+j )), (37)
it will not be equal to zero. That is, if we turn the true solution into a discrete set of points u(t 0 ), u(t 1 ),... , u(tN ) by evaluating the true solution at the points t 0 , t 1 ,... , tN , it won’t satisfy the same difference equation (36) that the numerical solution satisfies. The truncation error is a measure of how far the difference equation is from the true ODE. This is most easily explained by example. Consider Forward Euler (28). To find the truncation error, we arrange (28) so that it resembles the actual ODE, not (26). That is, we rearrange Forward Euler as
Un+1 − Un ∆t − f (Un) = 0. (38)
The numerical solution is the solution to this difference equation. Now, the truncation error is defined by
u(tn+1) − u(tn) ∆t
− f (u(tn)) = τn. (39)
The truncation error tells us how good or bad our difference equation approximates the actual ODE. Now, we need to find a more informative value for τn. The trick is to use Taylor expansions of u(t), about the point tn. Note that
u(tn+1) − u(tn) ∆t = u′(tn) +
u′′(tn)∆t 2
u′′′(tn)∆t^2 3!
Because of the presence of the ∆t term on the right hand side of (45), the left hand side is known as a “first-order” approximation to u′(tn). Now, we know from the ODE that f (u(tn)) = u′(tn). So plugging that and (45) into (39) gives
τn =
u′′(tn)∆t 2
u′′′(tn)∆t^2 3!
In fact, we often write that
τn = u′′(tn)∆t 2
since that indicates the lowest order error. Since this involves ∆t, it is known as a “first-order” method. If the lowest order term was a constant times ∆t^2 , we would say that it is a “second-order” method, and so on. Therefore, “higher-order” methods are more accurate, since the truncation error is smaller as ∆t gets small. As one more example, we’ll look at the Trapezoid Method. We write the method as
Un+1 − Un ∆t
(f (Un) + f (Un+1)) = 0. (43)
Convergence of a method means that as ∆t → 0, UN → u(tN ). More precisely, if we let t 0 = 0, we say that a method on a problem (25) is convergent, if
lim ∆t→ 0 ,N ∆t=T
||UN − u(tN )|| = 0. (49)
We say that a method is convergent, if (49) is satisfied for all T , and all problems (25) (where f (u) guarantees a unique differentiable solution for all finite T .)^2
4 Numerical Stability
Numerical stability addresses the question of how numerical errors accumulate. There are different types of numerical stability. We will only look at one type in any detail known as absolute stability. However, here are the main types of stability as a summary.
4.1.1 Zero Stability
Zero stability addresses whether convergence is achieved. Therefore, it is only concerned with the stability of the method in the limit as ∆t → 0. In fact, we can say that for all Linear Multistep Methods (33)
Consistency + Zero Stability ⇔ Convergence. (50)
Furthermore, the global error will typically be of the same order as the truncation error. We always hope to use a convergent method, so we always expect zero stability. (For nonlinear PDEs, this might even be a lofty goal!). Determining zero stability of a method will be explained when we look at a more general type of stability known as absolute stability.
4.1.2 Absolute Stability
Absolute stability is a more general type of stability theory used to investigate how error accumu- lates even for finite ∆t, not necessarily in the limit ∆t → 0. This is often much more practical and useful, since in practice we always use some finite ∆t in applications. Absolute stability is more relevant, particularly in the field of scientific computation, where scientists simulate “real world” problems. Unfortunately, in practice, often it is erroneously assumed that convergence is all that matters. As we will see, a zero-stable convergent method can be unstable for all finite ∆t. (You saw this in your homework—Euler’s method is a convergent method, but is unstable for the pen- dulum problem.) We will look at absolute stability in detail in Section 5. Absolute stability theory is usually confined to stable problems. For unstable problems, absolute stability of a numerical method will not tell you much about global error.
(^2) From ODE theory, f (u) must be Lipschitz continuous for such a result.
4.1.3 Relative Stability
One might ask, as evidenced in class, “what if I’m dealing with an unstable problem.” Relative stability tells you how the relative error in the solution grows or decays, even if the solution is unstable. Additionally, one might hope that even for a stable problem, the error should decay faster than the true deviation ¯y − ˜y, which itself decays for a stable problem. Relative stability can be a complicated subject. If you are interested, you might do a Google search on “order stars and relative stability.” We obviously won’t address this fascinating topic. (No, “fascinating” wasn’t meant as sarcasm.)
5 Absolute Stability
As mentioned, absolute stability tells how error decays for a finite time-step ∆t. It is usually only informative for stable, or neutrally stable problems. We consider the scalar “test problem”
u′(t) = λu. (51)
Remember that we are thinking of this test problem as the governing equation for the difference between the true solution and a perturbed solution. Since we are considering linear methods, the numerical solution will treat this deviation similarly to the way that it treats (51). Conveniently then, we will isolate ourselves to (51). The idea is best introduced by example. To adopt standard notation in the literature, let’s denote ∆t = k. For Forward Euler, the numerical solution satisfies the difference equation
Un+1 = Un + kf (Un) = Un + kλUn. (52)
Solving for Un+1 gives
Un+1 + (1 + kλ)Un. (53)
Iterating, this implies that the numerical solution at any tn is
Un+1 = (1 + kλ)nU 0. (54)
Now, we look at values of kλ such that the solution either grows or decays. We allow λ to be complex. Now, if |1+kλ| > 1, the modulus of Un ∈ C grows as n increases. It decays if |1+kλ| < 1, and remains constant if |1 + kλ| = 1. We can graph these regions of the complex plane z = λk. The stable region |1 + kλ| < 1 is separated from the unstable region |1 + kλ| > 1, by the unit circle |1 + kλ| = 1, centered at kλ = −1, and intersecting the origin. This unit circle is known as the stability region for Forward Euler. For a given λ, this means that k must be chosen such that kλ is in the stability region, if we expect the solution to (51) to decay. This is somewhat nonintuitive—for very stable problems, where <(λ) << 0, we would need a very small step size k to put λk in the stable region guaranteeing a decaying numerical solution. Otherwise the numerical solution would grow in magnitude, even for a true solution that is rapidly decaying to zero! Forward Euler is therefore usually not a good choice for problems with eigenvalues large in magnitude, with negative real parts.