

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of rounding errors in floating point arithmetic, focusing on the behavior of addition, subtraction, multiplication, and division. It discusses the concept of catastrophic cancellation and its impact on the approximation of sums and differences of large numbers. The document also introduces forward and backward error analysis as methods to understand and quantify errors in floating point computations.
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


xˆ = x(1 + τ 1
) and yˆ = y(1 + τ 2
), for |τ i
| ≤ τ ≪ 1
where τi could be the relative errors in the process of “collecting/getting” the
data from the original source or the previous operations.
Question: how do the four basic arithmetic operations behave?
fl(ˆx + ˆy) = (ˆx + ˆy)(1 + δ), |δ| ≤
ǫ
= x(1 + τ 1
)(1 + δ) + y(1 + τ 2
)(1 + δ)
= x + y + x(τ 1 + δ + O(τ ǫ)) + y(τ 2 + δ + O(τ ǫ))
= (x + y)
(
x
x + y
(τ 1
y
x + y
(τ 2
)
≡ (x + y)(1 +
δ),
where
δ can be bounded as follows:
δ| ≤
|x| + |y|
|x + y|
(
τ +
ǫ + O(τ ǫ)
)
Three possible cases:
implies
δ| ≤ τ +
ǫ + O(τ ǫ) ≪ 1.
Thus fl(ˆx + ˆy) approximates x + y well.
that |
δ| could be nearly or much bigger than 1. Thus fl(ˆx + ˆy) may
turn out to have nothing to do with the true x + y. This is so called
catastrophic cancellation which happens when a floating point number is
subtracted from another nearly equal floating point number. Cancellation
causes relative errors or uncertainties already presented in ˆx and ˆy to be
magnified.
approximation to x + y.
fl(ˆx ∗ yˆ) = (ˆx × ˆy)(1 + δ) = xy(1 + τ 1 )(1 + τ 2 )(1 + δ) ≡ xy(1 +
δ×),
fl(ˆx/yˆ) = (ˆx/ˆy)(1 + δ) = (x/y)(1 + τ 1
)(1 + τ 2
− 1 (1 + δ) ≡ xy(1 +
δ ÷
where
δ ×
= τ 1
τ 2
δ + O(τ ǫ),
δ ÷
= τ 1
− τ 2
Thus |
δ ×
| ≤ 2 τ +
1
2
ǫ + O(τ ǫ) and |
δ ÷
| ≤ 2 τ +
1
2
ǫ + O(τ ǫ).
Example 1. Computing
n + 1 −
n straightforward causes substantial loss
of significant digits for large n
n fl(
√
n + 1) fl(
√
n) fl(fl(
√
n + 1) − fl(
√
n)
1.00e+10 1.00000000004999994e+05 1.00000000000000000e+05 4.99999441672116518e-
1.00e+11 3.16227766018419061e+05 3.16227766016837908e+05 1.58115290105342865e-
1.00e+12 1.00000000000050000e+06 1.00000000000000000e+06 5.00003807246685028e-
1.00e+13 3.16227766016853740e+06 3.16227766016837955e+06 1.57859176397323608e-
1.00e+14 1.00000000000000503e+07 1.00000000000000000e+07 5.02914190292358398e-
1.00e+15 3.16227766016838104e+07 3.16227766016837917e+07 1.86264514923095703e-
1.00e+16 1.00000000000000000e+08 1.00000000000000000e+08 0.00000000000000000e+
Catastrophic cancellation can sometimes be avoided if a formula is properly
reformulated. In the present case, one can compute
n + 1 −
n almost to
full precision by using the equality
n + 1 −
n =
n + 1 +
n
Consequently, the computed results are
n fl(1/(
√
n + 1 +
√
n))
1.00e+10 4.999999999875000e-
1.00e+11 1.581138830080237e-
1.00e+12 4.999999999998749e-
1.00e+13 1.581138830084150e-
1.00e+14 4.999999999999987e-
1.00e+15 1.581138830084189e-
1.00e+16 5.000000000000000e-
In fact, one can show that fl(1/(
n + 1 +
n)) = (
n + 1 −
n)(1 + δ), where
|δ| ≤ 5 ǫ + O(ǫ
2 ) (try it!)
Example 2. Consider the function
f (x) =
1 − cos x
x
2
(
sin(x/2)
x/ 2
) 2
Note that
0 ≤ f (x) < 1 / 2 for all x 6 = 0.
Compare the computed values for x = 1. 2 × 10
− 5 using the above two expres-
sions (assume that the value of cos x rounded to 10 significant figures).
We illustrate the basic idea through a simple example. Consider the compu-
tation of an inner product of two vector x, y ∈ R
3
x
T
y
def
= x 1 y 1 + x 2 y 2 + x 3 y 3 ,
assuming already xi’s and yj ’s are floating point numbers. It is likely that
fl(x · y) is computed in the following order.
fl(x
T
y) = fl( fl(fl(x 1
y 1
) + fl(x 2
y 2
)) + fl(x 3
y 3