Floating Point Number Representation and Error Analysis: A Problem-Solving Approach - Prof, Exams of Computer Science

Solutions to problems related to floating point number representation, error analysis, and algorithm implementation. Topics include binary mantissa format, base-ten conversion, machine epsilon, bisection algorithm, false position method, and finite difference approximation of derivatives.

Typology: Exams

Pre 2010

Uploaded on 08/04/2009

koofers-user-7o3
koofers-user-7o3 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ME2016A, Spring 2004, Dr. Ferri Name: _____Solution_______
Test 1, February 9
This is a closed-book, closed-notes test. There are 6 problems for a total of 40 points.
Honor Pledge:
On my honor, I pledge that I have neither given nor received any inappropriate aid in the
preparation of this test.
_______________________________
Signature
Problem 1 (10) ____10___
Problem 2 (15) ____15___
Problem 3 (8) ______8___
Problem 4-6 (7) ____7____
Total (40) ________40____
pf3
pf4
pf5

Partial preview of the text

Download Floating Point Number Representation and Error Analysis: A Problem-Solving Approach - Prof and more Exams Computer Science in PDF only on Docsity!

ME2016A, Spring 2004, Dr. Ferri Name: _____ Solution _______ Test 1, February 9 This is a closed-book, closed-notes test. There are 6 problems for a total of 40 points.

Honor Pledge: On my honor, I pledge that I have neither given nor received any inappropriate aid in the preparation of this test.

_______________________________

Signature

Problem 1 (10) ____ 10 ___

Problem 2 (15) ____ 15 ___

Problem 3 (8) ______ 8 ___

Problem 4-6 (7) ____ 7 ____

Total (40) ________ 40 ____

Problem 1. (10 points)

In the text, we studied a particular format for floating point numbers which is x = m 2 , where the e binary mantissa must have the form m = 0.1…… In other words, the first digit of m after the decimal point must be a “1”. To avoid wasting bits this way, most computers use an alternate format:

x =( 1 + f ) 2^ e

where f can be any binary number such that 0 ≤ f ≤ 0. 1111 " 1

Consider a 9-bit computer where the first bit is a sign bit for x (0 = positive, 1 = negative), the next bit is a sign bit for the exponent e , the next 3 bits are for the exponent itself, and the next 4 are the digits after the decimal-point in f. (a) What is the base-ten number given by the following register values:

(b) What is the smallest positive number that can be represented on this computer? (c) What is the largest positive number that can be represented on this computer? (d) What is the smallest number ε such that 1+ε > 1?

(a) f = (0.0101) 2 = (1/4 + 1/16) 10 = (5/16) 10 ; e = + (011) 2 = (+3) 10

x = (1+f) 2e = (1 + 5/16)23 = (10.5) 10

(b) Smallest: f = 0, e = -7 Æ x = (1+0)2-7 = 2-

(c) Largest: f = (0.1111) 2 = (1/2 + 1/4 + 1/8 + 1/16) 10 = (15/16) 10 ; e = +

x = (1+f) 2+7 = (31/16) 10 2+7 = (248) 10

(d) Machine Epsilon: ε = 2(1-t), where t is the number of bits in m. In this case, t = (4 (for f) plus 1) = 5.

ε = 2-4 = (1/16) 10

check: 1 = (1.0000) 2^0 and 1/16 = (1.0000) 2-4. To add, shift the decimal point:

1.0000 2^0
+ 0.0001 2^0

1.0001 2^0 Anything smaller than 1/16 will not register when added to 1

1 3 5 7 10

ε t ε a

Iteration

error

true root, x=0.

f

x

0.1 0.3 0.5 0.7 1.

(xr )^2

(xr )^1

False-position

Problem 3. (8 points) The formula for the first forward finite difference approximation of a derivative is

h

R

h

f x f x f xi i^1 i^1

where R1 is the Taylor-series remainder. It is assumed that the spacing between the xi coordinates is

constant; i.e., x (^) i + 1 − xi = h for all i. The 4 lines of Matlab code below generate a sequence of points

defining the curve f ( x )= sin( 10 x ).

(a) Add some lines of Matlab code that will take the vectors x , and f (already in the workspace), and the variables h and n to approximate the first derivative of f at each xi using the expression above. Your code MUST contain a loop. (b) The approximation generated by your answer to (a) will have some error relative to the exact result. If h is reduced by a factor of 10, should the error go up or down? By what factor will it change? Explain your reasoning.

h = 0.1; x = 0:h:50; f = sin(10*x); n = length(x);

for k = 1:(n-1)

fprime(k) = (f(k+1) – f(k) )/h;

end

(b) We know that ( ) ( ) 2

R 1 = f ′′ξ h = Oh. Therefore, ( )

(^1) Oh h

R

=. If h is reduced by a

factor of 10, the truncation error will also be reduced by a factor of 10.