



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A part of the mit 6.004 spring 2009 course materials, focusing on pipelining in digital circuits. It discusses the concept of pipelining, its advantages and disadvantages, and various pipelining methodologies. The document also includes examples and summaries to help students understand the concept.
Typology: Slides
1 / 7
This page cannot be seen from the preview
Don't miss anything!




L08 - Pipelining 1
3/3/
what Seymour Cray taught the laundry industry
Iāve got
3 months Worth of laundryTo do tonightā¦
Funny, considering thatheās only got one
outfitā¦
Due Thursday: Lab #
modified 2/23/09 10:
L08 - Pipelining 2
6.004 ā Spring 2009
3/3/
Device: WasherFunction: Fill, Agitate, SpinWasher
= 30 minsPD Device: DryerFunction: Heat, SpinDryer
= 60 minsPD
INPUT:dirty laundry OUTPUT:6 more weeks
L08 - Pipelining 3
3/3/
Total = Washer
PD
= _________ mins
Everyone knows that the realreason that MIT students putoff doing laundry so long is notbecause they procrastinate,are lazy, or even have betterthings to do.The fact is, doing one load at a timeis not smart.
L08 - Pipelining 4
6.004 ā Spring 2009
3/3/
Total = N*(Washer
= ____________ mins
(Of course, this is just an urban legend.No one at Harvard actually
does
laundry. The butlers all arrive onWednesday morning, pick up the dirtylaundry and return it all pressed andstarched in time for afternoon tea)
Figure by MIT OpenCourseware.
Step 1: Step 2:
Figure by MIT OpenCourseware.
Image by MIT OpenCourseWare.
Step 1: Step 2: Step 3: Step 4:...
Figure by MIT OpenCourseware.
L08 - Pipelining 5
3/3/
Total = N * Max(Washer
, DryerPD
= ____________ mins
N*
Actually, itās more like N*60 + 30if we account for the startuptransient correctly. When doingpipeline analysis, weāre mostlyinterested in the āsteady stateāwhere we assume we have aninfinite supply of inputs.
L08 - Pipelining 6
6.004 ā Spring 2009
3/3/
(^90120) 1/901/
Assuming that the washis started as soon aspossible and waits (wet)in the washer until dryeris available.
L08 - Pipelining 7
3/3/
F G
H
X^
P(X)
For combinational logic:latency = t
throughput = 1/t
PD.
We canāt get the answer faster, butare we making effective use of ourhardware at all times?
L08 - Pipelining 8
6.004 ā Spring 2009
3/3/
H
X^
P(X)
15 20
25
Now F & G can be working on input X
i+
while H is performing its computation onX^. Weāve created a 2-stagei^
pipeline
: if we
have a valid input X during clock cycle j,P(X) is valid during clock j+2.
Suppose F, G, H have propagation delays of 15, 20, 25 ns andwe are using ideal zero-delay registers:
(^50) worse
1/25better
L08 - Pipelining 13
3/3/
X Y
2
1
1
0-pipe:1-pipe:2-pipe:3-pipe:
2
L08 - Pipelining 14
6.004 ā Spring 2009
3/3/
This bottleneckis the onlyproblem
L08 - Pipelining 15
but... but... 3/3/ How can I pipelinea^ clothes dryer???
Aā^ (2-pipe)
X Y
1
3
1
2
4 4-stage pipeline, thruput=
L08 - Pipelining 16
6.004 ā Spring 2009
3/3/
Step 1: Step 2: Step 3: Step 4:^ Figure by MIT OpenCourseware.
L08 - Pipelining 17
3/3/
Back to our bottleneck...
A4 nS
B3 nS
C8 nS D4 nS
E2 nS
F5 nS
T = 1/8nsL = 24ns
Recall our earlier example...
multiple copies of C!
L08 - Pipelining 18
6.004 ā Spring 2009
3/3/
We can simulate a pipelinedversion of a slowcomponent by replicatingthe critical element andalternate inputs betweenthe various copies.
C^0 D QG D Q
D Q^ G
C^1
Xi
C(Xi-
This is a simple2-state FSMthat alternatesbetween 0 and 1on each clock
clk Q
L08 - Pipelining 19
3/3/
We can simulate a pipelinedversion of a slowcomponent by replicatingthe critical element andalternate inputs betweenthe various copies.
C^0 D QG D Q
(^10)
D Q^ G
C^1
X^ i
C(X^ i-
clk Q
When Q is 1 the lower path iscombinational (the latch isopen), yet the output of theupper path will be enabledonto the input of the outputregister ready for the NEXTclock edge.Meanwhile, the other latchmaintains the input from thelast clock.
C^ odd
Coutput^1
Ceven
Mux output
Codd
āIt acts like a 2-stage pipelineā
L08 - Pipelining 20
6.004 ā Spring 2009
3/3/
C^0 D QG D Q
D Q^ G
C^1
Xi
x x^
C(Xi-
C^0 D QG D Q
D Q^ G
C^1
C(Xi-
C^0 D QG D Q
D Q^ G
C^1
C(Xi-
C^0 D QG D Q
D Q^ G
C^1
C^0 D QG D Q
D Q^ G
C^1
Latency = 2 clocks
presented at input, 0
propagates thru upper latch, C
. 0 - ^ Clock period 1: X
presented at input, 1
propagates thru lower latch, C
. C(X 10
) 0
propagates to register inputs. ⢠Clock period 2: X
presented at input, 2
propagates thru upper latch, C. C
(X^ ) loaded 00
into register, appears at output.
āIn by t
, out by ti
āi+
L08 - Pipelining 25
3/3/
a glimpse of an asynchronous, locally-time discipline
Elegant, timing-independent design:
X ⢠Each component specifies its own time constraints⢠Local adaptation to special cases (eg, multiplication by 0)⢠Module performance improvements automatically exploited⢠Can be made asynchronous (no clock at all!) or synchronous
A^
C
B A(X)
hereās^
ā¦Got it.
L08 - Pipelining 26
6.004 ā Spring 2009
3/3/
Asynchronous
GloballyTimed LocallyTimed
Centralized clockedFSM generates allcontrol signals.
Central control unit tailorscurrent time slice to
current tasks.
Start and Finish signalsgenerated by each major
subsystem, synchronously with global
clock.
Each subsystem takesasynchronous Start,generates asynchronousFinish (perhaps using local
clock).
Easy to design but fixed-sizedinterval can be wasteful (no data-dependencies in timing)
Large systems lead to verycomplicated timing generatorsā¦just say no!
The best way to build largesystems that haveindependently-timed
components.
The ānext big ideaā for the lastseveral decades: a lot of designwork to do in general, but extrawork is worth it in special cases
L08 - Pipelining 27
3/3/
of circuit, T = 1/LPD
PD,REG
of slowest pipeline stage + t
SETUP
^ split slowest pipeline stage(s)
^ combinational latency