IEEE Floating-Point Format: Bit Patterns and Values, Exams of Number Theory

Information on the IEEE single and double floating-point formats, including bit patterns and their corresponding decimal values. It covers normal numbers, subnormal numbers, signed zeros, positive and negative infinities, and NaN values. The document also includes a table comparing the range and precision of different storage formats.

Typology: Exams

2021/2022

Uploaded on 09/27/2022

ekanga
ekanga 🇺🇸

4.9

(16)

263 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Table 1: Values Represented by Bit Patterns in IEEE Single Format
Single-Format Bit Pattern Value
0<e<255 (-1)s×2e-127 ×1.f(normal numbers)
e= 0; f6= 0 (at least one bit in fis nonzero) (-1)s×2-126 ×0.f(subnormal numbers)
e= 0; f= 0 (all bits in fare zero) (-1)s×0.0 (signed zero)
s= 0; e= 255; f= 0 (all bits in fare zero) +INF (positive infinity)
s= 1; e= 255; f= 0 (all bits in fare zero) -INF (negative infinity)
s= u; e= 255;f6= 0 (at least one bit in fis nonzero) NaN (Not-a-Number)
Bit Patterns in Single-Storage Format and their IEEE Values
Common Name Bit Pattern (Hex) Decimal Value
+0 00000000 0.0
-0 80000000 -0.0
13f800000 1.0
240000000 2.0
maximum normal number 7f7fffff 3.40282347e+38
minimum positive normal number 00800000 1.17549435e-38
maximum subnormal number 007fffff 1.17549421e-38
minimum positive subnormal number 00000001 1.40129846e-45
+7f800000 Infinity
−∞ ff800000 -Infinity
Not-a-Number 7fc00000 NaN
pf3
pf4
pf5

Partial preview of the text

Download IEEE Floating-Point Format: Bit Patterns and Values and more Exams Number Theory in PDF only on Docsity!

Table 1: Values Represented by Bit Patterns in IEEE Single Format

Single-Format Bit Pattern Value 0

e

s^

×

e-

×

f (normal numbers) e

f 6 = 0 (at least one bit in f is nonzero)

s^

×

×

f (subnormal numbers) e

f = 0 (all bits in f are zero)

s^

×

0.0 (signed zero) s

e

f

(all bits in f are zero) +INF (positive infinity) s

e

f

(all bits in f are zero) -INF (negative infinity) s = u; e

f 6 = 0 (at least one bit in f is nonzero) NaN (Not-a-Number) Bit Patterns in Single-Storage Format and their IEEE Values Common Name Bit Pattern (Hex) Decimal Value

3f

maximum normal number 7f7fffff 3.40282347e+ minimum positive normal number

1.17549435e- maximum subnormal number 007fffff 1.17549421e- minimum positive subnormal number

1.40129846e-

7f Infinity −∞ ff -Infinity Not-a-Number 7fc NaN

Table 2: Values Represented by Bit Patterns in IEEE Double Format

Double-Format Bit Pattern Value 0

e

s^

×

e- x 1. f (normal numbers) e

f 6 = 0 (at least one bit in f is nonzero)

s^

×

x 0. f (subnormal numbers) e

f = 0 (all bits in f are zero)

s^

×

0.0 (signed zero) s

e

f

(all bits in f are zero) +INF (positive infinity) s

e

f

(all bits in f are zero) -INF (negative infinity) s = u; e

f 6 = 0 (at least one bit in f is nonzero) NaN (Not-a-Number) Bit Patterns in Double-Storage Format and their IEEE Values Common Name Bit Pattern (Hex) Decimal Value

  • 0

3ff00000 00000000

max normal number 7fefffff ffffffff 1.7976931348623157e+ min positive normal number

2.2250738585072014e- max subnormal number 000fffff ffffffff 2.2250738585072009e- min positive subnormal number

4.9406564584124654e-

7ff00000 00000000 Infinity −∞ fff00000 00000000 -Infinity Not-a-Number 7ff80000 00000000 NaN

Figure 1: The floating-point number line

int main()

float y, z;

y = 838861.2;

z = 1.3;

printf("y: %18.11f\n", y);

printf("z: %18.11f\n", z);

return 0;

The output from this program should be similar to:

y: 838861.

z: 1.

Standards: POSIX, BSD 4.3, ISO 9899

acos arccosine, returns value in [0, π]

asin arcsine, returns value in [−π/ 2 , π/2]

atan arctangent, returns value in [−π/ 2 , π/2]

atan2 takes y and x to break degeneracy in atan(y/x)

ceil smallest integral value not less than x

cos Cosine

cosh Hyperbolic cosine

exp Exponentiate

fabs absolute value of floating-point number

floor largest integral value not greater than x

fmod floating-point remainder function

frexp convert floating-point number to fractional and integral components

ldexp multiply floating-point number by integral power of 2

log Natural log

log10 Log base ten

modf extract signed integral and fractional values from floating-point number

pow Raise number to a power

sin Sine

sinh Hyperbolic sine

sqrt Square root of a number

tan Tangent

tanh Hyperbolic tangent