Real Arithmetic - Computer Arithmetic - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of Computer Arithmetic which includes Speedup Methods, Addition and Counting, Carry-Lookahead Adders, Variations in Fast Adder, Multioperand Addition, Ripple-Carry Adders, Analysis of Carry Propagation etc. Key important points are: Real Arithmetic, Floating-Point Operations, Floating-Point Representations, Errors and Error Control, Precise and Certifiable Arithmetic, Conversions and Exceptions, Rounding Schemes

Typology: Slides

2012/2013

Uploaded on 03/22/2013

dhimant
dhimant 🇮🇳

4.3

(8)

128 documents

1 / 88

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Part V
Real Arithmetic
Number Representation
Numbers and Arithmetic
Representing Signed Numbers
Redundant Number Systems
Residue Number Systems
Addition / Subtraction
Basic Addition and Counting
Carry-Lookahead Adders
Variations in Fast Adders
Multioperand Addition
Multiplication
Basic Multiplication Schemes
High-Radix Multipliers
Tree and Array Multipliers
Variations in Multipliers
Division
Basic Division Schemes
High-Radix Dividers
Variations in Dividers
Division by Convergence
Real Arithmetic
Floating-Point Reperesentations
Floating-Point Operations
Errors and Error Control
Precise and Certifiable Arithmetic
Function Evaluation
Square-Rooting Methods
The CORDIC Algorithms
Variations in Function Evaluation
Arithmetic by Table Lookup
Implementation Topics
High-Throughput Arithmetic
Low-Power Arithmetic
Fault-Tolerant Arithmetic
Past, Present, and Future
Parts Chapters
I.
II.
III.
IV.
V.
VI.
VII.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
25.
26.
27.
28.
21.
22.
23.
24.
17.
18.
19.
20.
13.
14.
15.
16.
Elementary Operations
28. Reconfigurable Arithmetic
Appendix: Past, Present, and Future
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58

Partial preview of the text

Download Real Arithmetic - Computer Arithmetic - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

Part V

Real Arithmetic

Number Representation

Numbers and ArithmeticRepresenting Signed Numbers Redundant Number SystemsResidue Number Systems

Addition / Subtraction

Basic Addition and CountingCarry-Look ahead Adders Variations in Fast AddersMultioperand Addition

Multiplication

Basic Multiplication SchemesHigh-Radix Multipliers Tree and Array MultipliersVariations in Multipliers

Division

Basic Division SchemesHigh-Radix Dividers Variations in DividersDivision by Convergence

Real Arithmetic

Floating-Point ReperesentationsFloating-Point Operations Errors and Error ControlPrecise and Certifiable Arithmetic

Function Evaluation

Square-Rooting MethodsThe CORDIC Algorithms Variations in Function EvaluationArithmetic by Table Lookup

Implementation Topics

High-Throughput ArithmeticLow-Power Arithmetic Fault-Tolerant ArithmeticPast, Present, and Future

Parts Chapters I.

II.

III.

IV.

V.

VI.

VII.

1.2. 3.4. 5.6. 7.8.

10.9. 11.12.

25.26. 27.28.

21.22. 23.24.

17.18. 19.20.

13.14. Elementary Operations 15.16.

  1. Reconfigurable Arithmetic Appendix: Past, Present, and Future

V Real Arithmetic

Topics in This Part

Chapter 17 Floating-Point Representations

Chapter 18 Floating-Point Operations

Chapter 19 Errors and Error Control

Chapter 20 Precise and Certifiable Arithmetic

Review floating-point numbers, arithmetic, and errors:

  • How to combine wide range with high precision
  • Format and arithmetic ops; the IEEE standard
  • Causes and consequence of computation errors
  • When can we trust computation results?

Floating-Point Representations: Topics

Topics in This Chapter

17.1 Floating-Point Numbers

17.2 The IEEE Floating-Point Standard

17.3 Basic Floating-Point Algorithms

17.4 Conversions and Exceptions

17.5 Rounding Schemes

17.6 Logarithmic Number Systems

17.1 Floating-Point Numbers

No finite number system can represent all real numbers Various systems can be used for a subset of real numbers

Fixed-point  w. f Rational  p / q Floating-point  sbe Logarithmic  log bx

Fixed-point numbers

x = (0000 0000. 0000 1001)two Small number y = (1001 0000. 0000 0000)two Large number

Low precision and/or range Difficult arithmetic Most common scheme Limiting case of floating-point

Floating-point numbers

x =  sbe^ or  significand  baseexponent

A floating-point number comes with two signs:

Number sign, usually appears as a separate bit Exponent sign, usually embedded in the biased exponent

Square of neither number representable

x = 1.001  2 –^5 y = 1.001  2 +

17.2 The IEEE Floating-Point Standard

Short (32-bit) format

Long (64-bit) format

Sign Exponent Significand

8 bits, bias = 127,

  • 126 to 127

11 bits, bias = 1023,

  • 1022 to 1023

52 bits for fractional part (plus hidden 1 in integer part)

23 bits for fractional part (plus hidden 1 in integer part)

Fig. 17.3 The IEEE standard floating-point number representation formats.

IEEE 754-2008 Standard (supersedes IEEE 754-1985) Also includes half- & quad-word binary, plus some decimal formats

Overview of IEEE 754-2008 Standard Formats

Feature Single / Short Double / Long –––––––––––––––––––––––––––––––––––––––––––––––––––––––– Word width (bits) 32 64 Significand bits 23 + 1 hidden 52 + 1 hidden Significand range [1, 2 – 2 –^23 ] [1, 2 – 2 –^52 ] Exponent bits 8 11 Exponent bias 127 1023 Zero (0) e + bias = 0, f = 0 e + bias = 0, f = 0 Denormal e + bias = 0, f  0 e + bias = 0, f  0 represents  0. f  2 –^126 represents 0. f  2 –^1022 Infinity () e + bias = 255, f = 0 e + bias = 2047, f = 0 Not-a-number (NaN) e + bias = 255, f  0 e + bias = 2047, f  0 Ordinary number e + bias  [1, 254] e + bias  [1, 2046] e  [–126, 127] e  [–1022, 1023] represents 1. f  2 e^ represents 1. f  2 e min 2 –^126  1.2  10 –^38 2 –^1022  2.2  10 –^308 max  2128  3.4  1038  21024  1.8  10308 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––

Table 17.1 Some features of the IEEE 754-2008 standard floating-point number representation formats

Special Operands and Subnormals

Operations on special operands: Ordinary number  (+) =  0 (+)  Ordinary number =  NaN + Ordinary number = NaN

Biased value 0 1 2... 253 254 255

  • 126 - 125... 126 127

Ordinary FLP numbers  0 , Subnormal , NaN ( 0. f  2 –^126 )

(1. f  2 e^ )

(1.00…01 – 1.00…00)2–^126 = 2–^149

(^0 ) Denormals – 2

......

min

...

Fig. 17.4 Subnormals in the IEEE single-precision format.

Subnormals

Extended Formats

Short (32-bit) format

Long (64-bit) format

Sign Exponent Significand

8 bits, bias = 127,

  • 126 to 127

11 bits, bias = 1023,

  • 1022 to 1023 52 bits for fractional part (plus hidden 1 in integer part)

23 bits for fractional part (plus hidden 1 in integer part)

 11 bits  32 bits

 15 bits  64 bits

Double extended [-16 382, 16 383]

Single extended [-1022, 1023]

Bias is unspecified, but exponent range must include:

Single extended

Double extended

17.3 Basic Floating-Point Algorithms

( s 1  b e^1 ) + ( s 2  b e^2 ) = ( s 1  b e^1 ) + ( s 2 / b e^1 – e^2 )  b e^1 = ( s 1  s 2 / b e^1 – e^2 )  b e^1 =  sb e

Assume e 1  e 2; alignment shift ( preshift ) is needed if e 1 > e 2

Operands after alignment shift:

x = 2 1. 00101101

y = 2 0. 000111101101

Numbers to be added:

x = 2 1. 00101101

y = 2 1. 11101101

(^5)  

5

  Extra bits to be rounded off

Operand with smaller exponent to be preshifted

Result of addition:

s = 2 1. 010010111101

s = 2 1. 01001100 Rounded sum

 

5

1

5

5

Example:

Addition

Rounding, overflow, and underflow issues discussed later

Floating-Point Multiplication and Division

Because s 1  s 2  [1, 4), postshifting may be needed for normalization

( s 1  b e^1 )  ( s 2  b e^2 ) = ( s 1  s 2 )  b e 1+ e^2

Multiplication

Overflow or underflow can occur during multiplication or normalization

Because s 1 / s 2  (0.5, 2), postshifting may be needed for normalization

( s 1  b e^1 ) / ( s 2  b e^2 ) = ( s 1 / s 2 )  b e^1 - e^2

Division

Overflow or underflow can occur during division or normalization

17.4 Conversions and Exceptions

Conversions from fixed- to floating-point

Conversions between floating-point formats

Conversion from high to lower precision: Rounding

The IEEE 754-2008 standard includes five rounding modes:

Round to nearest, ties away from 0 (rtna) Round to nearest, ties to even (rtne) [default rounding mode] Round toward zero (inward) Round toward + (upward) Round toward –  (downward)

Exceptions in Floating-Point Arithmetic

Divide by zero

Overflow

Underflow

Inexact result: Rounded value not the same as original

Invalid operation: examples include

Addition (+) + (–) Multiplication 0   Division 0 / 0 or  /  Square-rooting operand < 0

Produce NaN as their results

Truncation or Chopping

Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0).

chop( x )

  • 4
  • 3
  • 2
  • 1

x

  • 4 – 3 – 2 – 1 1 2 3 4

4 3 2 1

Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as downward-directed rounding).

chop( x ) = down( x )

  • 4
  • 3
  • 2
  • 1

x

  • 4 – 3 – 2 – 1 1 2 3 4

4 3 2 1

Round to Nearest Number

Fig. 17.7 Rounding of a signed-magnitude value to the nearest number.

Rounding has a slight upward bias. Consider rounding ( xk – 1 xk – 2 ... x 1 x 0. x – 1 x – 2 )two to an integer ( yk – 1 yk – 2 ... y 1 y 0. )two The four possible cases, and their representation errors are: x – 1 x – 2 Round Error 00 down 0 01 down – 0. 10 up 0. 11 up 0. With equal prob., mean = 0.

For certain calculations, the probability of getting a midpoint value can be much higher than 2– l

rtn( x )

  • 4
  • 3
  • 2
  • 1

x

  • 4 – 3 – 2 – 1 1 2 3 4

4

3

2

1

rtna( x )