Analysis of Error Propagation in Floating Point Computations: Round-off Errors, Lecture notes of Mathematics for Computing

The concept of round-off errors in floating point computations, focusing on their impact on the final result. Examples of calculations with round-off errors and their relative and absolute values. It also discusses methods to estimate and minimize round-off errors.

Typology: Lecture notes

2018/2019

Uploaded on 02/17/2019

mandela.quashie1
mandela.quashie1 🇺🇸

4.5

(2)

12 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1.6 Round-off errors in floating point computations.
1.6.1 Round-off errors.
When people or computers do computations with floating point numbers, they usually round the result of each
arithmetic operation to a certain fixed number of digits of precision. This introduces additional errors into the
final result called round-off errors. Usually round-off errors are insignificant compared to errors in
measurement or truncation errors, but sometimes they will actually be larger. This is the case when the result of
an addition or subtraction is significantly smaller in magnitude than the numbers which one is adding or
subtracting. In some cases the round-off errors can be serious enough to cause the final result to be
meaningless.
Example 1. An object moves along a straight line so that its position x at time t is given by x = t3. Let
to = 10 and t1 = 10 + h be two times and xo = to3 = 103 = 1000 and x1 = t13 = (10+h)3 be the corresponding
positions. The displacement F044x is the change in position, i.e. F044x = x1xo = (10+h)3 - 1000. Suppose
h = 0.014.
a. Compute F044x exactly.
b. Compute F044x doing the calculations using four digit decimal floating point arithmetic. What is the error in
the result?
c. An alternative formula for F044x is F044x = 3to2h + 3toh2 + h3 = 300h + 30h2 + h3. Compute F044x using this
alternative formula again doing the calculations using four digit decimal floating point arithmetic. What is
the error in the result? How does this compare with part b?
Solution. a. Compute F044x exactly.
10 + h = 10 + 0.014 = 10.014
(10 + h)2 = (10.014)2 = 100.280196
(10 + h)3 = (100.280196)(10.014) = 1004.205882744
F044x = 1004.205882744 – 1000 = 4.205882744
b. Compute F044x rounding results to four digits after each operation. In the following F 0 A E indicates
rounding and a subscript a indicates an approximate value.
10 + h = 10.014 F 0 A E (10 + h)a = 10.01
[(10 + h)a]2 = (10.01)2 = 100.2001 F 0 A E [(10 + h)2]a = 100.2
[(10 + h)2]a (10 + h)a = (100.2)(10.01) = 1003.002 F 0 A E [(10 + h)3]a = 1003
[(10 + h)3]a - 1000 = 1003 – 1000 = 3 F0 A E [ F044x]a = 3
Absolute error = 4.205882744 - 3 = 1.205882744
Relative error = 1.205882744/3 F 0 B B 0.4 = 40%
c. Compute F044x using the alternative formula.
10 + h = 10.014 F 0 A E (10 + h)a = 10.01
300h = (300)(0.014) = 4.2
h2 = (0.014)2 = 0.000196
30h2 = (30)(0.000196) = 0.00588
h3 = 0.000002744
300h + 30h2 + h3 = 4.205882744 F 0 A E [ F044x]a = 4.206
Absolute error = 4.205882744 - 4.206 = 0.000117256
Relative error = 0.000117256/4.206 F 0 B B 0.00003 = 0.003%
.6.1 - 1
pf3
pf4
pf5

Partial preview of the text

Download Analysis of Error Propagation in Floating Point Computations: Round-off Errors and more Lecture notes Mathematics for Computing in PDF only on Docsity!

1.6 Round-off errors in floating point computations.

1.6.1 Round-off errors.

When people or computers do computations with floating point numbers, they usually round the result of each arithmetic operation to a certain fixed number of digits of precision. This introduces additional errors into the final result called round-off errors. Usually round-off errors are insignificant compared to errors in measurement or truncation errors, but sometimes they will actually be larger. This is the case when the result of an addition or subtraction is significantly smaller in magnitude than the numbers which one is adding or subtracting. In some cases the round-off errors can be serious enough to cause the final result to be meaningless.

Example 1. An object moves along a straight line so that its position x at time t is given by x = t^3. Let t o = 10 and t (^) 1 = 10 + h be two times and x o = t (^) o^3 = 10 3 = 1000 and x (^) 1 = t (^) 13 = (10+ h ) 3 be the corresponding positions. The displacement F 0 4 4x is the change in position, i.e. F 0 4 4x = x (^) 1 – x (^) o = (10+ h )^3 - 1000. Suppose h = 0.014.

a. Compute F 0 4 4x exactly.

b. Compute F 0 4 4x doing the calculations using four digit decimal floating point arithmetic. What is the error in the result?

c. An alternative formula for F 0 4 4x is F 0 4 4x = 3 t (^) o^2 h + 3 t (^) o h^2 + h^3 = 300 h + 30 h^2 + h^3. Compute F 0 4 4x using this alternative formula again doing the calculations using four digit decimal floating point arithmetic. What is the error in the result? How does this compare with part b?

Solution. a. Compute F 0 4 4x exactly.

10 + h = 10 + 0.014 = 10. (10 + h )^2 = (10.014) 2 = 100. (10 + h )^3 = (100.280196)(10.014) = 1004.

F 0 4 4x = 1004.205882744 – 1000 = 4.

b. Compute F 0 4 4x rounding results to four digits after each operation. In the following F 0 A Eindicates rounding and a subscript a indicates an approximate value.

10 + h = 10.014 F 0 A E(10 + h ) (^) a = 10.

[(10 + h ) a ]^2 = (10.01) 2 = 100.2001 F 0 A E[(10 + h ) 2 ] a = 100.

[(10 + h )^2 ] a (10 + h ) (^) a = (100.2)(10.01) = 1003.002 F 0 A E[(10 + h ) 3 ] (^) a = 1003

[(10 + h )^3 ] a - 1000 = 1003 – 1000 = 3 F 0 A E[ F 0 4 4x ] a = 3 Absolute error = 4.205882744 - 3 = 1.

Relative error = 1.205882744/3 F 0 B B0.4 = 40%

c. Compute F 0 4 4x using the alternative formula.

10 + h = 10.014 F 0 A E(10 + h ) (^) a = 10. 300 h = (300)(0.014) = 4. h^2 = (0.014) 2 = 0.

30 h^2 = (30)(0.000196) = 0. h^3 = 0.

300 h + 30 h^2 + h^3 = 4.205882744 F 0 A E[ F 0 4 4x ] (^) a = 4.

Absolute error = 4.205882744 - 4.206 = 0. Relative error = 0.000117256/4.206 F 0 B B0.00003 = 0.003%

This is much better than b.

This example illustrates that sometimes one formula for computing a quantity is better than another equivalent formula from the standpoint of round-off error. It also raises several questions. Why did the result in part b have such a large relative error and the result in part c didn't? Would the same be true if we did the calculations with more digits of precision? Is there a way to describe/predict how large the round-off error might be ahead of time before we do the computation?

Estimating the round-off error in a certain computation is often difficult. One way is to repeat the same computation doing the second calculation with more digits of precision. By comparing the two values one can estimate the round-off error in the value obtained with fewer digits of precision. Another way is to try to estimate the round-off error at each step of the computation.

If we look carefully we can see that in the computation in part b we lost about three digits of precision when we did the final subtraction 1003 – 1000 = 3. Until then the intermediate results had almost four digits of precision. The two numbers we subtracted, 1003 and 1000, were close in the relative sense so the result was a number, 3, that was much smaller than either. The computation in part c did not involve the subtraction of two nearly equal numbers, so the only round-off errors were small in the relative sense.

Example 2. Redo part b of Example 1 with a general h which is small with respect to 10 if the computations are done on a computer with machine F 0 6 5 equal to F 0 6 5. (In parts b and c of Example 1 one has F 0 6 5 = 5 F 0 B 4 10 -4^ = 0.0005.) For simplicity you may make approximations in the calculation of the error.

Solution. Recall from section 1.5 that the relative error between a number x and its rounded value is no more

than F 0 6 5. The first thing we do in the calculation of (10+ h ) 3 - 1000 is to round h. The rounded value of h may

have a relative error as large as F 0 6 5 and an absolute error as large as h F 0 6 5. The next thing to do is to compute 10

  • h. Before rounding the absolute error might be as much as the absolute error in h which is h F 0 6 5 and the relative

error as much as h F 0 6 5 /(10 + h ) which is about h F 0 6 5 /10. Rounding introduces another relative error of

approximately F 0 6 5 which is added to h F 0 6 5 /10 giving F 0 6 5 + h F 0 6 5 /10 F 0 B B F 0 6 5. Next we multiply 10+ h by itself to get (

  • h ) 2 and then multiply (10+ h )^2 by 10+ h to get (10+ h )^3. Each step requires a multiplication and a rounding. In the multiplication the relative errors of the things we are multiplying add (approximately) and the rounding

adds an additional relative error of F 0 6 5. It follows that the computed value of (10+ h ) 3 may have a relative error

of about 5 F 0 6 5 and an absolute error of about 5 F 0 6 5 (10+ h )^3 which is about 5000 F 0 6 5. Finally, we compute (

  • h ) 3 0 0 1 E1000. In subtraction the absolute errors add. So before rounding the value of (10+ h ) 0 0 1 E1000 has an

absolute error of about 5000 F 0 6 5 and a relative error of about 5000 F 0 6 5 /[(10+ h )^3 - 1000]. Since (10+ h )^3 - 1000 F 0 B B

300 h , the relative error of (10+ h )^3 - 1000 is about 5000 F 0 6 5 /(300 h ) F 0 B B 17 F 0 6 5 / h. In part b one had h = 0.014, so the

worst case relative error is about 1200 F 0 6 5. If F 0 6 5 = 5 F 0 B 4 10 -4^ , then the relative error might be as large as 0.6. In fact it was only about 0.4, which is about 2/3 the worst case.

Example 2. The equation y = describes the top half of the circle of radius 2 centered at the origin. If one starts at x on the x axis and goes left to x = 0, then the change in the y values is F 0 4 4y = 2 -. Suppose x = 0.0016.

a. Compute F 0 4 4y exactly.

b. Compute F 0 4 4y doing the calculations using four digit decimal floating point arithmetic. What is the error in the result?

c. An alternative formula for F 0 4 4y is F 0 4 4y =. Compute F 0 4 4y using this alternative formula again doing the calculations using four digit decimal floating point arithmetic. What is the error in the result? How does this compare with part b?

Solution. a. Compute F 0 4 4y exactly.

4 - x = 4 - 0.0016 = 3. = = 1.999599960…

Example 4. Consider the calculation of y = 1 + x + x^2 /2! + x^3 /3! + F 0 B C+ xn^ / n! discussed in section 1.1. Let's estimate the round-off error in the computation when x = -5.5 and n = 25, and the calculations are done with decimal floating point numbers with six digits of precision. In this case F 0 6 5 = 5 F 0 B 4 10 -^.

Solution. The answer depends somewhat on the algorithm used to compute the sum. Note that y = y (^) 25 where

yj = 1 + x + x^2 /2! + x^3 /3! + F 0 B C+ x j^ / j!

= 1 + t (^) 1 + t (^) 2 + t (^) 3 + F 0 B C+ tj = yj -1 + tj

where

tj = xj^ / j! = q (^) j / f (^) j qj = xj^ = xqj -

fj = j! = j fj -

Suppose one uses the following algorithm.

x = - 5.5; n = 25; y o = 1; q o = 1;

f o = 1;

for j = 1 to n do begin qj = xq (^) j -1;

fj = j f (^) j -1 ; tj = q (^) j / f (^) j ;

yj = yj -1 + tj end

One way to estimate the round-off error is to first do the computation using six digits of precision and then with more digits of precision. This is done in Example 9.5.2a in Section 1.9.5 below. With six digits of precision one obtains 0.00405471, with 10 digits one obtains 0.00408674 and with 14 digits one obtains 0.00408673. It

appears that the true value is about 0.0040867, so the six digit calculation is off by about 3 F 0 B 4 10 -5^ which is about a 1% error.

Another way to estimate the round-off error is to estimate the error at each stage of the computation. This can be somewhat complicated as we saw in Example 2. First consider the error in q (^) j = x j^. The values of q (^) o and q (^) 1

can be represented exactly. In this particular example the values of q (^) 2 and q (^) 3 can also be represented exactly, but if x had some other value this might not be true. So we shall give an estimate that holds for any value of x. To get q (^) j we multiply q (^) j -1 by x and round. Before rounding the relative error in xqj -1 is no more than the relative

error in q (^) j -1. Rounding introduces an additional relative error of no more than F 0 6 5. So the relative error in q (^) j is

no more than about F 0 6 5 ( qj -1 ) + F 0 6 5. It follows that the relative error in q j is no more than approximately ( j -1) F 0 6 5.

Similarly the relative error in f (^) j is no more than approximately ( j -1) F 0 6 5.

Now consider the error in t (^) j = q (^) j / fj. Before rounding the relative error in t (^) j is no more than approximately the

sum of the relative errors in q (^) j and fj which is 2( j -1) F 0 6 5. Rounding introduces an additional relative error of no

more than approximately F 0 6 5 , so the relative error in t (^) j is no more than about (2 j -1) F 0 6 5. The absolute error in t (^) j is

no more than about (2 j -1) F 0 6 5 | tj |.

Finally consider the error in the yj = yj -1 + t (^) j. Before rounding the absolute error in y (^) j is no more than the sum of

the absolute errors in y (^) j -1 and t (^) j. Rounding introduces and an additional absolute error of nor more than | y (^) j | F 0 6 5.

So the absolute error in yj is no more than about b (^) j F 0 6 5 where b (^) j = +.

This value is computed for j = 25 in Example 2 in section 1.6.2. One obtains b (^) 25 F 0 B B2559, so the error in y 25 is

bounded by about b (^) 25 F 0 6 5 F 0 B B0.013. The terms | tj | start at 5.5 for j = 1 and go 15..., 27..., 38…, 41 for j = 2, 3, 4, 5 and then start to decrease. The terms (2 j -1) | t (^) j | start at 45.. for j = 2 and go 138..., 266..., 377…, 422 for j = 3,

4, 5, 6 and then start to decrease. The terms | y (^) j | start at 4.5 for j = 1 and go 10..., 17..., 21…,, for j = 2, 3, 4 and then start to decrease. It turns out that the main contributions to b (^) 25 are the terms (2 k -1) | t (^) k | for k between 3 and 12. This estimate of the error is quite a bit larger than the one obtained above by repeating the computations using more digits. This is because it assumes the worst possible case at each step.