Numerical Computing: Test Solutions for MATH/CSCI 4800, Exams of Mathematics

Solutions to test i of the numerical computing course (math/csci 4800) at the university level. It covers topics such as floating-point systems, machine epsilon, matrix computations, and error analysis. Students will learn about the smallest and largest positive numbers in a floating-point system, the concept of machine epsilon, and how to compute the determinant of a matrix using floating-point approximations. Additionally, they will explore the concepts of propagated data error, rounding error, truncation error, computational error, conditioning, and stability.

Typology: Exams

2011/2012

Uploaded on 02/17/2012

koofers-user-rc6-1
koofers-user-rc6-1 🇺🇸

9 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MATH/CSCI 4800: NUMERICAL COMPUTING
TEST I SOLUTIONS
1. Consider a hypothetical, normalized, floating-point system of positive numbers with base 2, precision
4 and exponent range [2,3].A typical number in this system is represented as (1·d1d2d3)×2E.Let
rounding be done by rounding to nearest.
(a) Write down, in the form of a fraction, the smallest positive number and the largest positive number
in this system. What is the significance of each?
(b) Define the term machine epsilon for a general floating-point number system, and write down its
value as a fraction for the system at hand.
(c) Consider the matrix
A=17/16 1/4
3/4 15/16 .
i. Write down fl(A),the matrix in which the entries of Aare replaced by their floating-point
approximations in the above system.
ii. Compute the value of the determinant of fl(A).In carrying out this computation, assume
that each floating-point operation is performed exactly, and then rounded appropriately for
representation in the system. Note that
det a b
c d =ad cb.
(a) The smallest positive number is
UFL = 22=1
4.
The largest positive number is
OFL = (1.111)2×23= 15.
UFL and OFL bound the interval of positive numbers representable in the system. A number
larger than the OFL is assigned the value Inf by the system, and a number smaller than the UFL
may be assigned the value zero.
(b) Machine epsilon is a measure of the spacing between the number 1 and the next larger float on
the machine. In the present system,
mach = (1.001)2(1.000)2= 23=1
8.
(c) The entries of the matrix and their floating-point equivalents obtained by rounding to nearest are
given below.
a= 17/16 = (1.0001)2; fl(17/16) = (1.001)2= 9/8 (but see below),
b= 1/4 = (1.000)2×22; fl(1/4) = 1/4,
c= 3/4 = (1.100)2×21; fl(3/4) = 3/4,
d= 15/16 = (1.111)2×21; fl(15/16) = 15/16.
Thus,
fl(A) = 9/8 1/4
3/4 15/16 .
det(fl(A)) = fl fl9
8×15
16fl3
4×1
4= fl fl1 + 7
128fl3
16.
1
pf3
pf4

Partial preview of the text

Download Numerical Computing: Test Solutions for MATH/CSCI 4800 and more Exams Mathematics in PDF only on Docsity!

MATH/CSCI 4800: NUMERICAL COMPUTING

TEST I SOLUTIONS

  1. Consider a hypothetical, normalized, floating-point system of positive numbers with base 2, precision 4 and exponent range [− 2 , 3]. A typical number in this system is represented as (1 · d 1 d 2 d 3 ) × 2 E^. Let rounding be done by rounding to nearest.

(a) Write down, in the form of a fraction, the smallest positive number and the largest positive number in this system. What is the significance of each? (b) Define the term machine epsilon for a general floating-point number system, and write down its value as a fraction for the system at hand. (c) Consider the matrix A =

[

]

i. Write down fl(A), the matrix in which the entries of A are replaced by their floating-point approximations in the above system. ii. Compute the value of the determinant of fl(A). In carrying out this computation, assume that each floating-point operation is performed exactly, and then rounded appropriately for representation in the system. Note that

det

[

a b c d

]

= ad − cb.

(a) The smallest positive number is UFL = 2−^2 =

The largest positive number is OFL = (1.111) 2 × 23 = 15. UFL and OFL bound the interval of positive numbers representable in the system. A number larger than the OFL is assigned the value Inf by the system, and a number smaller than the UFL may be assigned the value zero. (b) Machine epsilon is a measure of the spacing between the number 1 and the next larger float on the machine. In the present system,

mach = (1.001) 2 − (1.000) 2 = 2−^3 =

(c) The entries of the matrix and their floating-point equivalents obtained by rounding to nearest are given below.

a = 17/16 = (1.0001) 2 ; fl(17/16) = (1.001) 2 = 9/8 (but see below), b = 1/4 = (1.000) 2 × 2 −^2 ; fl(1/4) = 1/ 4 , c = 3/4 = (1.100) 2 × 2 −^1 ; fl(3/4) = 3/ 4 , d = 15/16 = (1.111) 2 × 2 −^1 ; fl(15/16) = 15/ 16.

Thus, fl(A) =

[

]

det(fl(A)) = fl

[

fl

×

− fl

×

)]

= fl

[

fl

− fl

)]

Since 7/ 128 < 1 /8 = mach, and 3/ 16 < 1 /4 = UFL, the above reduces simply to det(fl(A)) = 1. In this computation, the number 17/16 = (1.0001) 2 has been rounded up to (1.001) 2 = 9/ 8 , by dropping the 4th digit after the radix and changing the 3rd digit from 0 to 1. Note, however, that the number 17/16 lies precisely half way between the machine numbers 1 and 9/ 8. If one follows the IEEE convention, then this exceptional case should be rounded in such a way that the last digit of its binary machine representation is a zero, implying that fl(17/16) = (1.000) 2 = 1. With this choice we get fl(A) =

[

]

det(fl(A)) = fl

[

fl

1 ×

− fl

×

)]

= fl

[

fl

− fl

)]

Proceeding as above, and noting that 15 /16 = (1.111) 2 × 2 −^1 is exactly representable on the machine, the result becomes det(fl(A)) = 15/ 16. Either result will be accepted.

  1. (a) Show that f (x) = ex^ + x − 2 has a single real zero in the interval [0, 1].

(b) Suppose that this zero is to be found numerically by iterating the fixed-point map g(x) = x. Let a candidate for g be g(x) = x − bf (x) where b is a constant to be found. Determine the interval in which b must lie if the iteration xn = g(xn− 1 ) is guaranteed to converge, starting from any x 0 ∈ [0, 1]. (c) Using bisection starting from the initial bracket [0, 1], estimate the number of iterations that would be required to find the root of f (x) = ex^ + x − 2 = 0 to an accuracy of 10−^6.

(a) At least one root exists in the interval [0, 1] since f (0) = − 1 < 0 and f (1)e − 1 > 0. Also, f ′(x) = ex^ + 1 > 0. Thus f (x) is monotone increasing and can only have a single root. (b) The requisite condition is |g′(x)| < 1 for x ∈ [0, 1]. Since g′(x) = 1 − bf ′(x), we have

− 1 < 1 − bf ′(x) < 1

or, 0 < bf ′(x) < 2. Since f ′(x) has been shown above to be positive, the left inequality implies that b > 0 and the right inequality reduces to b <

f ′(x)

for x ∈ [0, 1].

In the interval [0, 1], the maximum value of f ′(x) is 1 + e, taken at x = 1. Therefore the above inequality is satisfied for all x ∈ [0, 1] if

b <

1 + e

(c) In bisection the error after n iterations satisfies

en ≤

δ 2 n+^

Here the initial bracket δ = 1. Therefore, the error tolerance is met if δ 2 n+^

≤ 10 −^6 ,

i.e., n ≥ −1 +

6 ln(10) ln 2

(e) conditioning, (f) stability.

(a) Propagated data error is the difference between the exact solution of a problem based on the correct input and the exact result of the same problem based on the perturbed input. If the problem corresponds to evaluation of the function f, and x and ˆx are the exact and perturbed inputs respectively, then the propagated data error equals f (ˆx) − f (x).

(b) Rounding error is the difference between the result of a numerical model using exact arithmetic and the result of the same model using finite-precision arithmetic. (c) Truncation error is the difference between the true result of the mathematical model and the result obtained from the numerical model using exact arithmetic. Recall that mathematical models are typically continuous, and involve the notion of the limit, as in an integral (an infinite sum) or a derivative (the limit of a difference quotient). The computer is unable to handle the infinite processes and must replace them with finite ones, as in truncating an infinite series, or replacing a derivative by a difference quotient. Such a replacement of a continuous process by a discrete one is necessarily approximate, and leads to discretization or truncation errors.

(d) Computational error is the difference between the exact result of the mathematical model and the computed result obtained from the numerical model using finite-precision arithmetic. It is the sum of the truncation error and the rounding error. (e) Conditioning refers to the sensitivity of a mathematical problem to data errors. It is a property of the mathematical model itself and is unrelated to the computational procedure. This sensitivity is measured by the condition number κ, defined as the magnitude of the ratio of relative output error to relative input error. A large condition number implies higher sensitivity, and the problem is then said to be ill-conditioned. A moderate condition number implies less sensitivity, and the problem is called well-conditioned. (f) Stability is a term applied to algorithms in much the same way as conditioning is applied to mathematical models: an algorithm is unstable if roundoff errors creeping in at an early stage of calculation are magnified to unacceptable levels.