Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Carry Save Adder Implementation - Lecture Notes | EE 457, Study notes of Computer Architecture and Organization

University of Southern California (USC)Computer Architecture and Organization

Prof. Gandhi Puvvada

Carry save adder Material Type: Notes; Professor: Puvvada; Class: Computer Systems Organization; Subject: Electrical Engineering; University: University of Southern California; Term: Fall 2010;

Typology: Study notes

Pre 2010

Uploaded on 12/12/2010

mihir-mandavia2000 🇺🇸

5

(1)

3 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

ECEN 6263 Advanced VLSI Design

Carry Save Adder Implementation December 11, 2004 page 1 of 10

Carry Save Adder Implementation

Now that we have seen that the carry save adder trees are most efficiently implemented by

putting together the (3,2) blocks, we must still address the issue of how to implement the

(3,2) block (carry save adder) efficiently. Functionally, the carry save adder is identical to

the full adder. The full adder is usually implemented with a reduced delay from Cin to

Cout because the carry chain is the critical delay path in adders. Unfortunately, there is no

single carry chain in the carry save adder trees in multipliers. Thus, it does not pay to

make the delay shorter for one input by sacrificing delay on other inputs for carry save

adders. Instead, carry save adders are normally implemented by treating the 3 inputs

equally and trying to minimize delay from each input to the outputs. We have

B

A

C

A

B

C

A

C

S A B C ABC ABC A+BC ABC++=⊕⊕=

CABACBC++=

As we can see from the expanded version of the exclusive or function for the sum, S, both

the uncomplemented and complemented form is required for each input (there is a trans-

mission gate XOR circuit that does not require the complemented inputs but we won’t

consider this here). If we want to avoid putting extra inverters in our carry paths to pro-

duce the complemented input, the best thing to do is to have each carry save adder produce

both uncomplemented and complemented outputs which can then be used as inputs by the

next stage of carry save adders. Due to symmetries in the logic functions for C and S, pro-

ducing C, C, S and S does not take as much circuitry as one might think. The idea is to

find common sub-functions for which we may use the same transistors to implement parts

of more than one output function.

SABCABCABC⊕()AB C⊕()+=⊕⊕=⊕⊕=

SA B C⊕⊕ ABCABC⊕()AB C⊕()+=⊕⊕==

Discover Study notes of Computer Architecture and Organization University of Southern California (USC)

Partial preview of the text

Download Carry Save Adder Implementation - Lecture Notes | EE 457 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Carry Save Adder Implementation

Now that we have seen that the carry save adder trees are most efficiently implemented by putting together the (3,2) blocks, we must still address the issue of how to implement the (3,2) block (carry save adder) efficiently. Functionally, the carry save adder is identical to the full adder. The full adder is usually implemented with a reduced delay from Cin to

Cout because the carry chain is the critical delay path in adders. Unfortunately, there is no

single carry chain in the carry save adder trees in multipliers. Thus, it does not pay to make the delay shorter for one input by sacrificing delay on other inputs for carry save adders. Instead, carry save adders are normally implemented by treating the 3 inputs equally and trying to minimize delay from each input to the outputs. We have

B

A C A B B C

A

C

S = A ⊕ B ⊕ C = ABC + ABC + ABC + ABC

C = AB + AC + BC

As we can see from the expanded version of the exclusive or function for the sum, S , both the uncomplemented and complemented form is required for each input (there is a trans- mission gate XOR circuit that does not require the complemented inputs but we won’t consider this here). If we want to avoid putting extra inverters in our carry paths to pro- duce the complemented input, the best thing to do is to have each carry save adder produce both uncomplemented and complemented outputs which can then be used as inputs by the next stage of carry save adders. Due to symmetries in the logic functions for C and S , pro- ducing C , C , S and S does not take as much circuitry as one might think. The idea is to find common sub-functions for which we may use the same transistors to implement parts of more than one output function.

S = A ⊕ B ⊕ C = A ⊕ B ⊕ C = A B ( ⊕ C ) + A B ( ⊕ C )

C = AB + BC + AC

= AB + ( A + A ) BC + A B ( + B ) C

= AB + ABC + ABC

= AB +( AB + AB ) C

C = AB + BC + AC

= ( A + B ) ( B + C ) ( A + C )

= ABA + ABC + ACA + ACC + BBA + BBC + BCA + BCC

= AB + AC + BC

= AB + A B ( + B ) C +( A + A ) BC

= AB + ABC + ABC

= AB +( AB + AB ) C

In both cases, we see the functions have

Common sub-functions
The common part is gated by a complementary input.

These two properties allow the transistors for the common part to be shared. Consider full CMOS gates for f and f with a common part, C , which is gates by I.

C’

C

f I^ I f

C’ = dual of C

uncommon parts

Here it is obvious why the gating signals I , I must be complementary to avoid shorting f to f!

The transistor count may be further reduced by using logic gate design styles that elimi- nate the pMOS pull-up block which is made possible when synthesizing both f and f. Common blocks in f and f may still be shared as above.

CVSL [2] CPL [1]

f f

f f f f

f f

In both cases, the f and f blocks are synthesized with nMOSFET’s only (no pMOSFET’s). CVSL eliminates having to duplicate f and f with pMOSFET’s by using the cross coupled pMOSFET’s which force f and f to opposite values. The problem is that the cross-couple is slow as we saw last semester. Consider switching f from high to low. At the beginning of the switching transient the pMOS cross couple has not yet switched, so we have

f f

off^ on

open (^) on t

f pull up turns off

the nMOSFET that just turned on must fight the pMOSFET that is still turned on to bring the f output low enough to turn on the other pMOSFET which then causes the first pMOS- FET to turn off. This can take a considerable amount of time so that the typical CVSL gates are not much faster than the full CMOS gates even though the input gate load is 1/ that of full CMOS.

The Complementary Pass Logic (CPL) method overcomes the speed problem by using inverters as level detectors for the two nMOS pass transistor blocks. There is no cross couple circuit and no fighting of logic levels. However, an nMOS pass circuit is notori- ously slow at passing high logic levels. This can be compensated by adjusting the inverter cross over voltage, Vinv , to a lower than usual value as discussed for partial swing logic last semester. In fact, CPL is just non-full swing pass transistor logic where both a logic

function and its complement are implemented simultaneously. This is very useful for arithmetic circuits such as multipliers and adders. CPL gates as originally presented in [1], can be improved somewhat. The AND/NAND gate should be changed as follows.

B

A B^ B^ A

AB AB

B

A 0 1 A

AB AB

original AND/NAND improved AND/NAND

The revised form has a much smaller load on the B input, and is much faster. As usual, the inverters do not need to be included in every gate; they are inserted where needed to pre-

vent n^2 delay through n transistors in series. For example, two 2-input XOR gates can be cascaded to make a 3-input XOR gate and an inverter need not be inserted between the two XOR gates

ABC

A

BC

A

B

C C

S

The three input XOR gate can be used to produce the sum output for the carry save adder. The CPL three input XOR gate has the same number of transistors as the folded CVSL three input XOR gate[2]. The structure of the circuits is almost the same which can be

Cout (^) Cout

C C

B

A A

C C

B

A

Cout Cout

CPL Carry Circuits Folded CVSL

It is interesting to note that the folded CVSL carry circuit from [2], which has only 6 pass FETs in it, cannot be made into a CPL circuit. When A = B = 1 in the CVSL circuit, a par- allel combination of pass FETs controlled by C and C gives a valid logic 0, but in CPL it does not.

C C

invalid valid 0

The above circuits are optimized implementations for the (3,2) carry save adder building block cell. It is also possible to optimize other building block cells, for example the (4,2) compressor. The (4,2) compressor has 4 explicit inputs plus one hidden carry for a total of 5 inputs. The sum bit output of the (4,2) compressor is the exclusive or of all 5 inputs. If the (4,2) compressor is made from two (3,2) blocks, then the 5 input XOR gets imple- mented by four 2 input XOR gates in series. A tree of XOR gates would be faster [3].

Similarly, a tree of gates can be found for the other (4,2) outputs which would be faster than obtained from two cascaded (3,2) circuits.

B

A

5 input XOR from cascaded (3,2) compressors

A

5 input XOR optimized tree

C

D E

B

C

D E

The CPL gate for the XOR tree might look like the following. Note that it is necessary to add the inverters before the internal XOR outputs can be used to control the gate of a pass FET.

A

B B

E

D D

C C

S S

V (^) Tp = V (^) Tn + V (^) Tn (pass ) . Such a large VTp would make the p-channel devices very slow.

There is another way to make V (^) Tp = V (^) Tn + V (^) Tn (pass ). That is to make the nMOS pass FETs differently than the regular nMOSFETs in the inverter. A “native” nMOSFET is easy to fabricate with a threshold V (^) Tn ′ ≈ 0. If the native nMOSFET is used for the pass transistors, body effect increases the threshold to only a few tenths of a volt. Thus, it is possible to satisfy

V (^) Tp = V (^) Tn + V (^) Tn ′ (pass )

without increasing VTp very much.

[1] K. Yano et al. , “A 3.8-ns 16X16-b Multiplier Using Complementary Pass-Transistor Logic,” IEEE J. Solid-State Circuits , vol. 25, pp. 388-394, Apr. 1990.

[2] P. Song and G. De Micheli, “Circuit and Architecture Trade-offs for High-Speed Multiplication,” IEEE J. Solid-State Circuits , vol. 26, pp. 1184-1198, Sep. 1991.

[3] N. Nagamatsu et al. , “A 15-ns 32X32-b CMOS Multiplier with an Improved Parallel Structure,” IEEE J. Solid-State Circuits , vol. 25, pp. 494-497, Apr. 1990.

Carry Save Adder Implementation - Lecture Notes | EE 457, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Carry Save Adder Implementation - Lecture Notes | EE 457 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Carry Save Adder Implementation

B

A C A B B C

A

C

S = A ⊕ B ⊕ C = ABC + ABC + ABC + ABC

C = AB + AC + BC

S = A ⊕ B ⊕ C = A ⊕ B ⊕ C = A B ( ⊕ C ) + A B ( ⊕ C )

S = A ⊕ B ⊕ C = A ⊕ B ⊕ C = A B ( ⊕ C ) + A B ( ⊕ C )

C = AB + BC + AC

= AB + ( A + A ) BC + A B ( + B ) C

= AB + ABC + ABC

= AB +( AB + AB ) C

C = AB + BC + AC

= ( A + B ) ( B + C ) ( A + C )

= ABA + ABC + ACA + ACC + BBA + BBC + BCA + BCC

= AB + AC + BC

= AB + A B ( + B ) C +( A + A ) BC

= AB + ABC + ABC

= AB +( AB + AB ) C

C’

C

CVSL [2] CPL [1]

B

B

A B^ B^ A

AB AB

B

B

A 0 1 A

AB AB

ABC

A

BC

A

A

B

B

C C

S

S

S

S

C C

B

B

A A

C C

B

B

A

A

C C

C C

B

A

A

C

D E

B

C

D E

A

A

B B

E

E

D D

C C

S S