Download Introduction to Parallel Architecture, Lecture Slides - Assembly Programming and more Slides Assembly Language Programming in PDF only on Docsity!
Prof. Saman Amarasinghe, MIT.
1
6.189 IAP 2007
Lecture 3
Introduction to Parallel Architectures
2
Implicit vs. Explicit Parallelism Prof. Saman Amarasinghe, MIT.
Implicit^
Explicit
Hardware
Compiler SuperscalarProcessors
Explicitly Parallel Architectures
4
Implicit Parallelism: Superscalar Processors ●^ Issue varying numbers of instructions per clock^ ^ statically scheduled–^ Prof. Saman Amarasinghe, MIT.
using compiler techniques– in-order execution dynamically scheduled– Extracting ILP by examining 100’s of instructions– Scheduling them in parallel as operands become available– Rename registers to eliminate anti dependences– out-of-order execution– Speculative execution
5
6.189 IAP 2007 MIT
Pipelining Execution^ Instruction i Prof. Saman Amarasinghe, MIT.
IF
ID
EX
WB IF^
ID^ EX^
WB IF^
ID^ EX^
WB IF^
ID^ EX^
WB IF^
ID^ EX^
WB
Instruction #Instruction i+1Instruction i+2Instruction i+3Instruction i+
1
2
3
4
5
6
7
8
Cycles IF: Instruction fetch
ID : Instruction decode EX : Execution
WB : Write back
7
6.189 IAP 2007 MIT
●^ InstrJ Prof. Saman Amarasinghe, MIT.
is data dependent (aka true dependence) on Instr
I:
Data Dependence and Hazards ● If two instructions are data dependent, they cannot executesimultaneously, be completely overlapped or execute in out-of-order ● If data dependence caused a hazard in pipeline,called a Read After Write (RAW) hazard
I: add r1,r2,r3J: sub r4,r1,r
8
ILP and Data Dependencies, Hazards ●^ HW/SW must preserve program order:order instructions would execute in if executed sequentially asdetermined by original source program^ ^ Dependences are a property of programs ●^ Importance of the data dependencies^ ^ 1) indicates the possibility of a hazard^ ^ 2) determines order in which results must be calculated^ ^ 3) sets an upper bound on how much parallelism can possiblybe exploited ●^ Goal: exploit parallelism by preserving program order onlywhere it affects the outcome of the programProf. Saman Amarasinghe, MIT.
10
Name Dependence #2: Output dependence^ ●^ Instrwrites operandJ^ Prof. Saman Amarasinghe, MIT.
before
Instrwrites it.I^
●^ Called an “output dependence” by compiler writers.This also results from the reuse of name “r1” ●^ If anti-dependence caused a hazard in the pipeline, called aWrite After Write (WAW) hazard ●^ Instructions involved in a name dependence can executesimultaneously if name used in instructions is changed soinstructions do not conflict^ ^ Register renaming resolves name dependence for registers^ ^ Renaming can be done either by compiler or by HW
I: sub r1,r4,r3J: add r1,r2,r3K: mul r6,r1,r
11
Control Dependencies^ ●^ Every instruction is control dependent on some set ofbranches, and, in general, these control dependencies mustbe preserved to preserve program orderProf. Saman Amarasinghe, MIT.
if p1 {S1;};if p2 {S2;} ● S1 is control dependent on
p1, and
S2^ is control dependent
on^ p^
but not on
p1.
●^ Control dependence need not be preserved^ ^ willing to execute instructions that should not have beenexecuted, thereby violating the control dependences, if cando so without affecting correctness of the program ●^ Speculative Execution
13
Speculation in Rampant in Modern Superscalars^ ●^ Different predictors^ ^ Prof. Saman Amarasinghe, MIT.
Branch Prediction Value Prediction Prefetching (memory access pattern prediction)
●^ Inefficient^ ^
Predictions can go wrong Has to flush out wrongly predicted data While not impacting performance, it consumes power
14
6.189 IAP 2007 MIT
Today’s CPU Architecture:Heat becoming an unmanageable problem 10,000^ 10,000 1,0001,000^100100101011 ‘‘70 Prof. Saman Amarasinghe, MIT.
70
‘‘80^80
‘‘90^90
‘‘00^00
‘‘10^10
)) 22 Power Density (W/cmPower Density (W/cm
400440048008800880808080
8086808680858085286286
(^386386)
486486
®®Pentium (^) Pentium
Hot PlateHot Plate
Nuclear ReactorNuclear Reactor
Sun’Sun Rocket NozzleRocket Nozzle ’s Surfaces Surface
Intel Developer Forum, Spring 2004 - Pat Gelsinger(Pentium at 90 W)
Cube relationship between the cycle time and pow
16
Outline ●^ Implicit Parallelism: Superscalar Processors ●^ Explicit Parallelism ●^ Shared Instruction Processors ●^ Shared Sequencer Processors ●^ Shared Network Processors ●^ Shared Memory Processors ●^ Multicore ProcessorsProf. Saman Amarasinghe, MIT.
17
Explicit Parallel Processors ●^ Parallelism is exposed to software^ ^ Compiler or Programmer ●^ Many different forms^ ^ Loosely coupled Multiprocessors to tightly coupled VLIWProf. Saman Amarasinghe, MIT.
19
Types of Parallelism Prof. Saman Amarasinghe, MIT.
Time Data-Level Parallelism (DLP)
Time Thread-Level Parallelism (TLP)
Time Instruction-Level Parallelism (ILP)
Pipelining Time
20
Translating Parallelism Types Prof. Saman Amarasinghe, MIT.
DataParallel
Pipelining ThreadParallel
InstructionParallel