Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture 7: Pipelining & Instruction Level Parallelism in MIPS R4000 - Prof. Alan L. Sussma, Assignments of Computer Science

University of Maryland Computer Science

Prof. Alan L. Sussman

A lecture note from cmsc 411 computer systems architecture course, focusing on the mips r4000 processor's pipelining and instruction-level parallelism. The stages of the mips r4000 pipeline, forwarding techniques, and pipeline performance. It also discusses the pitfalls of extensive pipelining and the importance of instruction-level parallelism.

Typology: Assignments

Pre 2010

Uploaded on 07/29/2009

koofers-user-trb 🇺🇸

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

CMSC 411

Computer Systems Architecture

Lecture 7

Lecture

7

Basic Pipelining (cont.) &

Instruction Level Parallelism

Alan Sussman

l@ d d

a

l

s

@

cs.um

d

.e

d

u

Administrivia

•Questions about HW #1?

•

HW #2 on pipelining due next Thursday Feb 26

•

HW

#2

,

on

pipelining

,

due

,

Feb

.

26

•Start reading Chapter 2 of H&P

CMSC 411 - 7 (from Patterson) 2

A case study: MIPS R4000

•

MIPS64 architecture with deeper 8 stage pipeline

•

MIPS64

architecture

,

with

deeper

8

stage

pipeline

–to get higher clock rates

–extra stages come from memory accesses

thi lld

ilii

–

t

ec

h

n

i

ques ca

ll

e

d

superp

i

pe

li

n

i

ng

CMSC 411 - 7 (from Patterson) 3

MIPS R4000 pipeline stages

•IF–1

st half instruction fetch

–PC selection and start instruction cache access

•IS–2

nd half instruction fetch

–complete instruction cache access

•RF– instruction decode, register fetch, hazard checking,

instruction cache hit detection

•EX– execution

–includes effective address computation, ALU operation, branch target

computation and condition evaluation

CMSC 411 - 7 (from Patterson) 4

Discover Assignments of Computer Science University of Maryland

Partial preview of the text

Download Lecture 7: Pipelining & Instruction Level Parallelism in MIPS R4000 - Prof. Alan L. Sussma and more Assignments Computer Science in PDF only on Docsity!

CMSC 411

Computer Systems Architecture

Lecture 7Lecture 7

Basic Pipelining (cont.) &

Instruction Level Parallelism

Alan Sussmanl @

d

[email protected]

Administrivia^ •

Questions about HW #1?

-^

HW #2 on pipelining due next Thursday Feb 26

-^

HW #2, on pipelining, due next Thursday, Feb. 26

-^

Start reading Chapter 2 of H&P

CMSC 411 - 7 (from Patterson)

A case study: MIPS R4000^ •

MIPS64 architecture with deeper 8 stage pipeline • MIPS64 architecture, with deeper 8 stage pipeline

-^

to get higher clock rates

-^

extra stages come from memory accessest^

h^

i^

ll d

i^

li^

i

-^

techniques called

superpipelining

CMSC 411 - 7 (from Patterson)

3

MIPS R4000 pipeline stages •^

IF

- 1

st^

half instruction fetch

-^

PC selection and start instruction cache access

-^

IS

- 2

nd

half instruction fetch

-^

complete instruction cache access

-^

RF

- instruction decode, register fetch, hazard checking, instruction cache hit detection -^

EX

- execution – includes effective address computation, ALU operation, branch targetcomputation and condition evaluation

CMSC 411 - 7 (from Patterson)

MIPS R4000 pipeline (cont.) • DF

st

half data fetch

-^

st 1 half of data cache access

• DS

nd

half data fetch

-^

complete data cache access

TC - tag check -^

determine whether data cache access

hit

• WB

- write back for loads and ALU operations

CMSC 411 - 7 (from Patterson)

5

MIPS R4000 pipeline (cont.)^ A 2 cycle load delay

CMSC 411 - 7 (from Patterson)

Might need to restart ADDD’s ALU

MIPS R4000 pipeline (cont.) A 3 cycle branch delay – 1 delay slot + 2 cycle stall for takenbranch (untaken just delay slot)

CMSC 411 - 7 (from Patterson)

7

Forwarding •^

Deeper pipeline increases number of levels offorwarding for ALU operations

-^

4 possible sources for an ALU bypass – EX/DF, DF/DS, DS/TC,TC/WB

CMSC 411 - 7 (from Patterson)

Pitfalls • Unexpected hazards do occur

-^

for example, when a branch is taken before a previousi^

t^

ti^

fi^

i h

instruction finishes

Extensive pipelining can slow a machine down, or

lead to worse cost-performance–

more complex hardware can cause a longer clock cycle,killing the benefits of more pipelining

CMSC 411 - 7 (from Patterson)

13

Pitfalls (cont.) • A poor compiler can make a good machine look bad

-^

compiler writers need to understand the architecture in ordertto^ »

optimize efficiently and » avoid hazards b tt

t^

li^

i^

t^

l^

i^

t^

ti^

th

k^

th

-^

better to eliminate useless instructions, than make them runfaster

CMSC 411 - 7 (from Patterson)

INSTRUCTION-LEVEL PARALLELISM

CMSC 411 - 7 (from Patterson)

15

Outline •^

ILP

•^

Compiler techniques to increase ILP

p

q

•^

Loop Unrolling

-^

Static Branch Prediction

-^

Dynamic Branch Prediction

-^

Overcoming Data Hazards with DynamicSchedulingScheduling

-^

(Start) Tomasulo Algorithm

-^

ConclusionConclusion

CMSC 411 - 7 (from Patterson)

Recall from Pipelining^ • Pipeline CPI = Ideal pipeline CPI + Structural Stalls

D t

H

d St ll

C

t^

l St ll

Data Hazard Stalls + Control Stalls

Ideal pipeline CPI: measure of the maximum performance

attainable by the implementation

Structural hazards: HW cannot support this combination of

instructions

Data hazards: Instruction depends on result of prior

Data hazards: Instruction depends on result of priorinstruction still in the pipeline

Control hazards: Caused by delay between the fetching of

instructions and decisions about changes in control flowinstructions and decisions about changes in control flow(branches and jumps)

CMSC 411 - 7 (from Patterson)

17

Instruction Level Parallelism •^

Instruction-Level Parallelism (ILP): overlap theexecution of instructions to improve performance

p^

p

•^

2 approaches to exploit ILP:1)

Rely on hardware to help discover and exploit the parallelismdynamically (e g

Pentium 4 AMD Opteron IBM Power)

and

dynamically (e.g., Pentium 4, AMD Opteron, IBM Power) , and

Rely on software technology to find parallelism, statically atcompile-time (e.g., Itanium 2/IA-64)

CMSC 411 - 7 (from Patterson)

Instruction-Level Parallelism (ILP)

(^

Basic Block (BB) ILP is quite small
- BB: a straight-line code sequence with no branches in except to the

entry and no branches out except at the exit

average dynamic branch frequency 15% to 25%

=> 4 to 7 instructions execute between a pair of branches

Plus instructions in BB likely to depend on each other
- Need ILP

across

multiple basic blocks

Simplest: loop level parallelism to exploit parallelism

Simplest: loop-level parallelism to exploit parallelism

among iterations of a loop. E.g.,

for (i=1; i<=1000; i=i+1)

x[i] = x[i] + y[i];

CMSC 411 - 7 (from Patterson)

19

Loop-Level Parallelism

p

•^

Exploit loop-level parallelism by “unrolling loop” eitherby1. dynamic via branch prediction or2. static via loop unrolling by compiler(Another way is vectors to be covered later)(Another way is vectors, to be covered later)

-^

Determining dependences

critical

•^

If 2 instructions are

-^

If^

2 instructions are– parallel, they can execute simultaneously in apipeline of arbitrary depth without causing any stalls

y^

g^

y

(assuming no structural hazards)

dependent, they are not parallel and must be

t d i

d^

lth

h th

ft^

b

CMSC 411 - 7 (from Patterson)

executed in order, although they may often bepartially overlapped

Lecture 7: Pipelining & Instruction Level Parallelism in MIPS R4000 - Prof. Alan L. Sussma, Assignments of Computer Science

Related documents

Partial preview of the text

Download Lecture 7: Pipelining & Instruction Level Parallelism in MIPS R4000 - Prof. Alan L. Sussma and more Assignments Computer Science in PDF only on Docsity!

CMSC 411

Computer Systems Architecture

Lecture 7Lecture 7

Basic Pipelining (cont.) &

Instruction Level Parallelism

Administrivia^ •

A case study: MIPS R4000^ •

MIPS R4000 pipeline stages •^

MIPS R4000 pipeline (cont.) • DF

• DS

• WB

MIPS R4000 pipeline (cont.)^ A 2 cycle load delay

MIPS R4000 pipeline (cont.) A 3 cycle branch delay – 1 delay slot + 2 cycle stall for takenbranch (untaken just delay slot)

Forwarding •^

Pitfalls • Unexpected hazards do occur

Pitfalls (cont.) • A poor compiler can make a good machine look bad

INSTRUCTION-LEVEL PARALLELISM

Outline •^

ILP

•^

•^

Recall from Pipelining^ • Pipeline CPI = Ideal pipeline CPI + Structural Stalls

H

C

Instruction Level Parallelism •^

•^

Instruction-Level Parallelism (ILP)

(^

Loop-Level Parallelism

p

•^

•^