



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A lecture note from cmsc 411 computer systems architecture course, focusing on the mips r4000 processor's pipelining and instruction-level parallelism. The stages of the mips r4000 pipeline, forwarding techniques, and pipeline performance. It also discusses the pitfalls of extensive pipelining and the importance of instruction-level parallelism.
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Alan Sussmanl @
d
d
Questions about HW #1?
-^
HW #2 on pipelining due next Thursday Feb 26
-^
HW #2, on pipelining, due next Thursday, Feb. 26
-^
Start reading Chapter 2 of H&P
CMSC 411 - 7 (from Patterson)
MIPS64 architecture with deeper 8 stage pipeline • MIPS64 architecture, with deeper 8 stage pipeline
-^
to get higher clock rates
-^
extra stages come from memory accessest^
h^
i^
ll d
i^
li^
i
-^
techniques called
superpipelining
CMSC 411 - 7 (from Patterson)
3
IF
- 1
st^
half instruction fetch
-^
PC selection and start instruction cache access
-^
IS
- 2
nd
half instruction fetch
-^
complete instruction cache access
-^
RF
- instruction decode, register fetch, hazard checking, instruction cache hit detection -^
EX
- execution – includes effective address computation, ALU operation, branch targetcomputation and condition evaluation
CMSC 411 - 7 (from Patterson)
st
half data fetch
-^
st 1 half of data cache access
nd
half data fetch
-^
complete data cache access
determine whether data cache access
hit
- write back for loads and ALU operations
CMSC 411 - 7 (from Patterson)
5
CMSC 411 - 7 (from Patterson)
Might need to restart ADDD’s ALU
CMSC 411 - 7 (from Patterson)
7
Deeper pipeline increases number of levels offorwarding for ALU operations
-^
4 possible sources for an ALU bypass – EX/DF, DF/DS, DS/TC,TC/WB
CMSC 411 - 7 (from Patterson)
-^
for example, when a branch is taken before a previousi^
t^
ti^
fi^
i h
instruction finishes
lead to worse cost-performance–
more complex hardware can cause a longer clock cycle,killing the benefits of more pipelining
CMSC 411 - 7 (from Patterson)
13
-^
compiler writers need to understand the architecture in ordertto^ »
optimize efficiently and » avoid hazards b tt
t^
li^
i^
t^
l^
i^
t^
ti^
th
k^
th
-^
better to eliminate useless instructions, than make them runfaster
CMSC 411 - 7 (from Patterson)
CMSC 411 - 7 (from Patterson)
15
Compiler techniques to increase ILP
p
q
Loop Unrolling
-^
Static Branch Prediction
-^
Dynamic Branch Prediction
-^
Overcoming Data Hazards with DynamicSchedulingScheduling
-^
(Start) Tomasulo Algorithm
-^
ConclusionConclusion
CMSC 411 - 7 (from Patterson)
D t
d St ll
t^
l St ll
attainable by the implementation
instructions
Data hazards: Instruction depends on result of priorinstruction still in the pipeline
instructions and decisions about changes in control flowinstructions and decisions about changes in control flow(branches and jumps)
CMSC 411 - 7 (from Patterson)
17
Instruction-Level Parallelism (ILP): overlap theexecution of instructions to improve performance
p^
p
2 approaches to exploit ILP:1)
Rely on hardware to help discover and exploit the parallelismdynamically (e g
Pentium 4 AMD Opteron IBM Power)
and
dynamically (e.g., Pentium 4, AMD Opteron, IBM Power) , and
Rely on software technology to find parallelism, statically atcompile-time (e.g., Itanium 2/IA-64)
CMSC 411 - 7 (from Patterson)
entry and no branches out except at the exit
=> 4 to 7 instructions execute between a pair of branches
across
multiple basic blocks
Simplest: loop level parallelism to exploit parallelism
among iterations of a loop. E.g.,
for (i=1; i<=1000; i=i+1)
x[i] = x[i] + y[i];
CMSC 411 - 7 (from Patterson)
19
Exploit loop-level parallelism by “unrolling loop” eitherby1. dynamic via branch prediction or2. static via loop unrolling by compiler(Another way is vectors to be covered later)(Another way is vectors, to be covered later)
-^
Determining dependences
critical
If 2 instructions are
-^
If^
2 instructions are– parallel, they can execute simultaneously in apipeline of arbitrary depth without causing any stalls
y^
g^
y
(assuming no structural hazards)
t d i
d^
lth
h th
ft^
b
CMSC 411 - 7 (from Patterson)
executed in order, although they may often bepartially overlapped