Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pipelining in Digital Circuits: A Case Study from MIT 6.004, Slides of Computer Fundamentals

Baddi University of Emerging Sciences and Technologies Computer Fundamentals

A part of the mit 6.004 spring 2009 course materials, focusing on pipelining in digital circuits. It discusses the concept of pipelining, its advantages and disadvantages, and various pipelining methodologies. The document also includes examples and summaries to help students understand the concept.

Typology: Slides

2012/2013

Uploaded on 04/18/2013

palmoni 🇮🇳

4.5

(2)

75 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

L08 - Pipelining 1

6.004 – Spring 2009 3/3/09

Pipelining

what Seymour Cray taught the laundry industry

I’ve got 3 months

Worth of laundry

To do tonight…

Funny, considering that

he’s only got one

outﬁt…

Due Thursday: Lab #3

modiﬁed 2/23/09 10:45 L08 - Pipelining 2

6.004 – Spring 2009 3/3/09

Forget circuits… lets solve a “Real Problem”

Device: Washer

Function: Fill, Agitate, Spin

WasherPD = 30 mins

Device: Dryer

Function: Heat, Spin

DryerPD = 60 mins

INPUT:

dirty laundry

OUTPUT:

6 more weeks

L08 - Pipelining 3

6.004 – Spring 2009 3/3/09

Total = WasherPD + DryerPD

= _________ mins

90

One load at a time

Everyone knows that the real

reason that MIT students put

oﬀ doing laundry so long is not

because they procrastinate,

are lazy, or even have better

things to do.

The fact is, doing one load at a time

is not smart.

L08 - Pipelining 4

6.004 – Spring 2009 3/3/09

Doing N loads of laundry

Here’s how they do laundry at

Harvard, the “combinational” way.

Total = N*(WasherPD + DryerPD)

= ____________ mins

N*90

(Of course, this is just an urban legend.

No one at Harvard actually does

laundry. The butlers all arrive on

Wednesday morning, pick up the dirty

laundry and return it all pressed and

starched in time for afternoon tea)

Figure by MIT OpenCourseware.

Step 1:

Step 2:

Figure by MIT OpenCourseware.

Image by MIT OpenCourseWare.

Step 1:

Step 3:

Step 2:

Step 4:

...

Figure by MIT OpenCourseware.

Discover Slides of Computer Fundamentals Baddi University of Emerging Sciences and Technologies

Partial preview of the text

Download Pipelining in Digital Circuits: A Case Study from MIT 6.004 and more Slides Computer Fundamentals in PDF only on Docsity!

L08 - Pipelining 1

3/3/

Pipelining

what Seymour Cray taught the laundry industry

I’ve got

3 months Worth of laundryTo do tonight…

Funny, considering thathe’s only got one

outfit…

Due Thursday: Lab #

modified 2/23/09 10:

L08 - Pipelining 2

6.004 – Spring 2009

3/3/

Forget circuits… lets solve a “Real Problem”

Device: WasherFunction: Fill, Agitate, SpinWasher

= 30 minsPD Device: DryerFunction: Heat, SpinDryer

= 60 minsPD

INPUT:dirty laundry OUTPUT:6 more weeks

L08 - Pipelining 3

3/3/

Total = Washer

DryerPD

PD

= _________ mins

One load at a time

Everyone knows that the realreason that MIT students putoff doing laundry so long is notbecause they procrastinate,are lazy, or even have betterthings to do.The fact is, doing one load at a timeis not smart.

L08 - Pipelining 4

6.004 – Spring 2009

3/3/

Doing N loads of laundry

Here’s how they do laundry atHarvard, the “combinational” way.

Total = N*(Washer

DryerPD

)PD

= ____________ mins

N*

(Of course, this is just an urban legend.No one at Harvard actually

does

laundry. The butlers all arrive onWednesday morning, pick up the dirtylaundry and return it all pressed andstarched in time for afternoon tea)

Figure by MIT OpenCourseware.

Step 1: Step 2:

Figure by MIT OpenCourseware.

Image by MIT OpenCourseWare.

Step 1: Step 2: Step 3: Step 4:...

Figure by MIT OpenCourseware.

L08 - Pipelining 5

3/3/

Doing N Loads… the MIT way

MIT students “pipeline”the laundry process.That’s why we wait!

Total = N * Max(Washer

, DryerPD

)PD

= ____________ mins

N*

Actually, it’s more like N*60 + 30if we account for the startuptransient correctly. When doingpipeline analysis, we’re mostlyinterested in the “steady state”where we assume we have aninfinite supply of inputs.

L08 - Pipelining 6

6.004 – Spring 2009

3/3/

Performance Measures

Latency:^ The delay from when an input is established until the outputassociated with that input becomes valid.

(Harvard Laundry = _________ mins)(^

MIT Laundry = _________ mins)

Throughput:^ The

rate

at which inputs or outputs are processed.

(Harvard Laundry = _________ outputs/min)(^

MIT Laundry = _________ outputs/min)

(^90120) 1/901/

Assuming that the washis started as soon aspossible and waits (wet)in the washer until dryeris available.

L08 - Pipelining 7

3/3/

Okay, back to circuits…

F G

H

X^

P(X)

For combinational logic:latency = t

,PD

throughput = 1/t

PD.

We can’t get the answer faster, butare we making effective use of ourhardware at all times?

F(X) G(X)P(X)

X^ F & G are “idle”, just holding their outputsstable while H performs its computation

L08 - Pipelining 8

6.004 – Spring 2009

3/3/

Pipelined Circuits

use registers to hold H’s input stable!F G

H

X^

P(X)

15 20

25

Now F & G can be working on input X

i+

while H is performing its computation onX^. We’ve created a 2-stagei^

pipeline

: if we

have a valid input X during clock cycle j,P(X) is valid during clock j+2.

Suppose F, G, H have propagation delays of 15, 20, 25 ns andwe are using ideal zero-delay registers:

latency^45 ______

throughput1/45______

unpipelined

2-stage pipeline

(^50) worse

1/25better

Step 1: Step 2: Step 3:...^ Figure by MIT OpenCourseware.

L08 - Pipelining 13

3/3/

Pipeline Example

A

B

C

X Y

2

1

0-pipe:1-pipe:2-pipe:3-pipe:

LATENCY

THROUGHPUT

OBSERVATIONS:

1-pipeline improves neitherL or T.• T improved by breaking longcombinational paths,allowing faster clock.• Too many stages cost L,don’t improve T.• Back-to-back registers areoften required to keeppipeline well-formed.

2

L08 - Pipelining 14

6.004 – Spring 2009

3/3/

Pipelining Summary

Advantages:

^ Allows us to increase thruput, by breaking up longcombinational paths and (hence) increasing clockfrequency

Disadvantages:

^ May increase latency... – ^ Only as good as the weakest link: slowest stepconstrains system thruput.

Isn’t there a way around this “weak link” problem?

This bottleneckis the onlyproblem

L08 - Pipelining 15

but... but... 3/3/ How can I pipelinea^ clothes dryer???

A’^ (2-pipe)

Pipelined Components

C

X Y

1

Pipelined systems can behierarchical:

^ Replacing a slowcombinational componentwith a k-pipe version mayincrease clock frequency

B^1

3

1

2

4 4-stage pipeline, thruput=

^ Must account for newpipeline stages in our plan

L08 - Pipelining 16

6.004 – Spring 2009

3/3/

How do 6.004 Aces do Laundry?

They work around the bottleneck.First, they find a place withtwice as many dryers aswashers.Throughput =

______ loads/min

Latency = ______ mins/load

Step 1: Step 2: Step 3: Step 4:^ Figure by MIT OpenCourseware.

L08 - Pipelining 17

3/3/

Back to our bottleneck...

A4 nS

B3 nS

C8 nS D4 nS

E2 nS

F5 nS

T = 1/8nsL = 24ns

Recall our earlier example...

^ C – the slowest component –limits clock period to 8 ns. • ^ HENCE throughput limited to1/8ns. We could improve throughput by - ^ Finding a pipelined version of C;OR ... • ^ interleaving

multiple copies of C!

L08 - Pipelining 18

6.004 – Spring 2009

3/3/

Circuit Interleaving

We can simulate a pipelinedversion of a slowcomponent by replicatingthe critical element andalternate inputs betweenthe various copies.

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

Xi

C(Xi-

This is a simple2-state FSMthat alternatesbetween 0 and 1on each clock

clk Q

L08 - Pipelining 19

3/3/

Circuit Interleaving

We can simulate a pipelinedversion of a slowcomponent by replicatingthe critical element andalternate inputs betweenthe various copies.

C^0 D QG D Q

(^10)

C’

D Q^ G

C^1

X^ i

C(X^ i-

clk Q

When Q is 1 the lower path iscombinational (the latch isopen), yet the output of theupper path will be enabledonto the input of the outputregister ready for the NEXTclock edge.Meanwhile, the other latchmaintains the input from thelast clock.

C^ odd

Coutput^1

Ceven

Mux output

Codd

“It acts like a 2-stage pipeline”

L08 - Pipelining 20

6.004 – Spring 2009

3/3/

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

Xi

x x^

C(Xi-

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

X^0

C(Xi-

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

X^1

C(Xi-

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

X^2

C(X^0

C^0 D QG D Q

1 0 C’

D Q^ G

C^1

X^3

C(X^1

Circuit Interleaving

Latency = 2 clocks

^ Clock period 0: X

presented at input, 0

propagates thru upper latch, C

. 0 - ^ Clock period 1: X

presented at input, 1

propagates thru lower latch, C

. C(X 10

) 0

propagates to register inputs. • Clock period 2: X

presented at input, 2

propagates thru upper latch, C. C

(X^ ) loaded 00

into register, appears at output.

N registers… N-wayinterleave

2-Clock MartinizingN-way interleavingis equivalent toN pipeline Stages...

“In by t

, out by ti

”i+

L08 - Pipelining 25

3/3/

Self-timed Example

a glimpse of an asynchronous, locally-time discipline

Elegant, timing-independent design:

X • Each component specifies its own time constraints• Local adaptation to special cases (eg, multiplication by 0)• Module performance improvements automatically exploited• Can be made asynchronous (no clock at all!) or synchronous

A^

C

B A(X)

here’s^

…Got it.

L08 - Pipelining 26

6.004 – Spring 2009

3/3/

Control Structure TaxonomySynchronous

Asynchronous

GloballyTimed LocallyTimed

Centralized clockedFSM generates allcontrol signals.

Central control unit tailorscurrent time slice to

current tasks.

Start and Finish signalsgenerated by each major

subsystem, synchronously with global

clock.

Each subsystem takesasynchronous Start,generates asynchronousFinish (perhaps using local

clock).

Easy to design but fixed-sizedinterval can be wasteful (no data-dependencies in timing)

Large systems lead to verycomplicated timing generators…just say no!

The best way to build largesystems that haveindependently-timed

components.

The “next big idea” for the lastseveral decades: a lot of designwork to do in general, but extrawork is worth it in special cases

L08 - Pipelining 27

3/3/

Summary

^ Latency (L) = time it takes for given input to arrive at output • ^ Throughput (T) = rate at each new outputs appear • ^ For combinational circuits: L = t

of circuit, T = 1/LPD

^ For K-pipelines (K > 0):
- ^ always have register on output(s) • ^ K registers on every path from input to output • ^ Inputs available shortly after clock i, outputs availableshortly after clock (i+K) • ^ T = 1/(t

PD,REG

t PD

of slowest pipeline stage + t

SETUP

^ more throughput

^ split slowest pipeline stage(s)

^ use replication/interleaving if no further splits possible
- ^ L = K / T
  - ^ pipelined latency

^ combinational latency

Pipelining in Digital Circuits: A Case Study from MIT 6.004, Slides of Computer Fundamentals

Related documents

Partial preview of the text

Download Pipelining in Digital Circuits: A Case Study from MIT 6.004 and more Slides Computer Fundamentals in PDF only on Docsity!

Pipelining

Forget circuits… lets solve a “Real Problem”

One load at a time

Doing N loads of laundry

Here’s how they do laundry atHarvard, the “combinational” way.

)PD

N*

Doing N Loads… the MIT way

MIT students “pipeline”the laundry process.That’s why we wait!

)PD

Performance Measures

Latency:^ The delay from when an input is established until the outputassociated with that input becomes valid.

(Harvard Laundry = _________ mins)(^

MIT Laundry = _________ mins)

Throughput:^ The

rate

at which inputs or outputs are processed.

(Harvard Laundry = _________ outputs/min)(^

MIT Laundry = _________ outputs/min)

Okay, back to circuits…

,PD

F(X) G(X)P(X)

X^ F & G are “idle”, just holding their outputsstable while H performs its computation

Pipelined Circuits

use registers to hold H’s input stable!F G

latency^45 ______

throughput1/45______

unpipelined

2-stage pipeline

Step 1: Step 2: Step 3:...^ Figure by MIT OpenCourseware.

Pipeline Example

A

B

C

LATENCY

THROUGHPUT

OBSERVATIONS:

Pipelining Summary

Advantages:

Disadvantages:

Isn’t there a way around this “weak link” problem?

Pipelined Components

C

Pipelined systems can behierarchical:

B^1

How do 6.004 Aces do Laundry?

They work around the bottleneck.First, they find a place withtwice as many dryers aswashers.Throughput =

______ loads/min

Latency = ______ mins/load

Circuit Interleaving

1 0 C’

Circuit Interleaving

C’

1 0 C’

1 0 C’

X^0

1 0 C’

X^1

1 0 C’

X^2

C(X^0

1 0 C’

X^3

C(X^1

Circuit Interleaving

N registers… N-wayinterleave

2-Clock MartinizingN-way interleavingis equivalent toN pipeline Stages...

Self-timed Example

Control Structure TaxonomySynchronous

Summary