
















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These are the Lecture Slides of Computer Arithmetic which includes Speedup Methods, Addition and Counting, Carry-Lookahead Adders, Variations in Fast Adder, Multioperand Addition, Ripple-Carry Adders, Analysis of Carry Propagation etc. Key important points are: Real Arithmetic, Floating-Point Operations, Floating-Point Representations, Errors and Error Control, Precise and Certifiable Arithmetic, Conversions and Exceptions, Rounding Schemes
Typology: Slides
1 / 88
This page cannot be seen from the preview
Don't miss anything!

















































































Number Representation
Numbers and ArithmeticRepresenting Signed Numbers Redundant Number SystemsResidue Number Systems
Addition / Subtraction
Basic Addition and CountingCarry-Look ahead Adders Variations in Fast AddersMultioperand Addition
Multiplication
Basic Multiplication SchemesHigh-Radix Multipliers Tree and Array MultipliersVariations in Multipliers
Division
Basic Division SchemesHigh-Radix Dividers Variations in DividersDivision by Convergence
Real Arithmetic
Floating-Point ReperesentationsFloating-Point Operations Errors and Error ControlPrecise and Certifiable Arithmetic
Function Evaluation
Square-Rooting MethodsThe CORDIC Algorithms Variations in Function EvaluationArithmetic by Table Lookup
Implementation Topics
High-Throughput ArithmeticLow-Power Arithmetic Fault-Tolerant ArithmeticPast, Present, and Future
Parts Chapters I.
II.
III.
IV.
V.
VI.
VII.
1.2. 3.4. 5.6. 7.8.
10.9. 11.12.
25.26. 27.28.
21.22. 23.24.
17.18. 19.20.
13.14. Elementary Operations 15.16.
Chapter 17 Floating-Point Representations
Chapter 18 Floating-Point Operations
Chapter 19 Errors and Error Control
Chapter 20 Precise and Certifiable Arithmetic
Review floating-point numbers, arithmetic, and errors:
Floating-Point Representations: Topics
Topics in This Chapter
17.1 Floating-Point Numbers
17.2 The IEEE Floating-Point Standard
17.3 Basic Floating-Point Algorithms
17.4 Conversions and Exceptions
17.5 Rounding Schemes
17.6 Logarithmic Number Systems
17.1 Floating-Point Numbers
No finite number system can represent all real numbers Various systems can be used for a subset of real numbers
Fixed-point w. f Rational p / q Floating-point s be Logarithmic log bx
Fixed-point numbers
x = (0000 0000. 0000 1001)two Small number y = (1001 0000. 0000 0000)two Large number
Low precision and/or range Difficult arithmetic Most common scheme Limiting case of floating-point
Floating-point numbers
x = s be^ or significand baseexponent
A floating-point number comes with two signs:
Number sign, usually appears as a separate bit Exponent sign, usually embedded in the biased exponent
Square of neither number representable
x = 1.001 2 –^5 y = 1.001 2 +
17.2 The IEEE Floating-Point Standard
Short (32-bit) format
Long (64-bit) format
Sign Exponent Significand
8 bits, bias = 127,
11 bits, bias = 1023,
52 bits for fractional part (plus hidden 1 in integer part)
23 bits for fractional part (plus hidden 1 in integer part)
Fig. 17.3 The IEEE standard floating-point number representation formats.
IEEE 754-2008 Standard (supersedes IEEE 754-1985) Also includes half- & quad-word binary, plus some decimal formats
Overview of IEEE 754-2008 Standard Formats
Feature Single / Short Double / Long –––––––––––––––––––––––––––––––––––––––––––––––––––––––– Word width (bits) 32 64 Significand bits 23 + 1 hidden 52 + 1 hidden Significand range [1, 2 – 2 –^23 ] [1, 2 – 2 –^52 ] Exponent bits 8 11 Exponent bias 127 1023 Zero (0) e + bias = 0, f = 0 e + bias = 0, f = 0 Denormal e + bias = 0, f 0 e + bias = 0, f 0 represents 0. f 2 –^126 represents 0. f 2 –^1022 Infinity () e + bias = 255, f = 0 e + bias = 2047, f = 0 Not-a-number (NaN) e + bias = 255, f 0 e + bias = 2047, f 0 Ordinary number e + bias [1, 254] e + bias [1, 2046] e [–126, 127] e [–1022, 1023] represents 1. f 2 e^ represents 1. f 2 e min 2 –^126 1.2 10 –^38 2 –^1022 2.2 10 –^308 max 2128 3.4 1038 21024 1.8 10308 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Table 17.1 Some features of the IEEE 754-2008 standard floating-point number representation formats
Special Operands and Subnormals
Operations on special operands: Ordinary number (+) = 0 (+) Ordinary number = NaN + Ordinary number = NaN
Biased value 0 1 2... 253 254 255
Ordinary FLP numbers 0 , Subnormal , NaN ( 0. f 2 –^126 )
(1. f 2 e^ )
(1.00…01 – 1.00…00)2–^126 = 2–^149
(^0 ) Denormals – 2
......
min
...
Fig. 17.4 Subnormals in the IEEE single-precision format.
Subnormals
Short (32-bit) format
Long (64-bit) format
Sign Exponent Significand
8 bits, bias = 127,
11 bits, bias = 1023,
23 bits for fractional part (plus hidden 1 in integer part)
11 bits 32 bits
15 bits 64 bits
Double extended [-16 382, 16 383]
Single extended [-1022, 1023]
Bias is unspecified, but exponent range must include:
Single extended
Double extended
17.3 Basic Floating-Point Algorithms
( s 1 b e^1 ) + ( s 2 b e^2 ) = ( s 1 b e^1 ) + ( s 2 / b e^1 – e^2 ) b e^1 = ( s 1 s 2 / b e^1 – e^2 ) b e^1 = s b e
Assume e 1 e 2; alignment shift ( preshift ) is needed if e 1 > e 2
(^5)
5
Extra bits to be rounded off
Operand with smaller exponent to be preshifted
5
1
5
5
Example:
Addition
Rounding, overflow, and underflow issues discussed later
Floating-Point Multiplication and Division
Because s 1 s 2 [1, 4), postshifting may be needed for normalization
( s 1 b e^1 ) ( s 2 b e^2 ) = ( s 1 s 2 ) b e 1+ e^2
Multiplication
Overflow or underflow can occur during multiplication or normalization
Because s 1 / s 2 (0.5, 2), postshifting may be needed for normalization
( s 1 b e^1 ) / ( s 2 b e^2 ) = ( s 1 / s 2 ) b e^1 - e^2
Division
Overflow or underflow can occur during division or normalization
17.4 Conversions and Exceptions
Conversions from fixed- to floating-point
Conversions between floating-point formats
Conversion from high to lower precision: Rounding
The IEEE 754-2008 standard includes five rounding modes:
Round to nearest, ties away from 0 (rtna) Round to nearest, ties to even (rtne) [default rounding mode] Round toward zero (inward) Round toward + (upward) Round toward – (downward)
Exceptions in Floating-Point Arithmetic
Divide by zero
Overflow
Underflow
Inexact result: Rounded value not the same as original
Invalid operation: examples include
Addition (+) + (–) Multiplication 0 Division 0 / 0 or / Square-rooting operand < 0
Produce NaN as their results
Truncation or Chopping
Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0).
chop( x )
x
4 3 2 1
Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as downward-directed rounding).
chop( x ) = down( x )
x
4 3 2 1
Round to Nearest Number
Fig. 17.7 Rounding of a signed-magnitude value to the nearest number.
Rounding has a slight upward bias. Consider rounding ( xk – 1 xk – 2 ... x 1 x 0. x – 1 x – 2 )two to an integer ( yk – 1 yk – 2 ... y 1 y 0. )two The four possible cases, and their representation errors are: x – 1 x – 2 Round Error 00 down 0 01 down – 0. 10 up 0. 11 up 0. With equal prob., mean = 0.
For certain calculations, the probability of getting a midpoint value can be much higher than 2– l
rtn( x )
x
4
3
2
1
rtna( x )