Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls , Study notes of Computer Science

University of Maryland Computer Science

Prof. Alan L. Sussman

Instruction pipelining in computer systems architecture using the mips processor as an example. It covers both an unpipelined and pipelined implementation, the benefits and costs of pipelining, and the hazards that can cause stalls. The document also touches upon compiler approaches to branch delays and exception handling.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-rnu 🇺🇸

8 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

CMSC 411 - A. Sussman (from D. O'Leary) 1

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 13, 2003

CMSC 411 - Alan Sussman 2

Administrivia

• HW #2 due today – solution posted soon

• Quiz on Tuesday – Units 1 & 2

• HW #1 problem 1.17d

– MFLOPs with coprocessor

– answer shows that MFLOPs is computed as

(# fp ops)/(time for fp ops) =

(# fp ops)/(total time – time for integer ops)

– that is correct – don’t count integer ops against

MFLOPs

– but both are counted in MIPS (both integer and fp ops

are instructions!)

CMSC 411 - Alan Sussman 3

Last time

• Compiler/architecture interaction

– providing a good target for the compiler can make a

huge difference in performance – up to a factor of 10 on

an f.p. intensive application

– provide regularity, primitives, make costs of code

sequences easy to determine

• MIPS/MIPS64 architectures

– load/store, 64 bits (with 32-bit ops), 3 instruction

formats for MIPS64 (all 32 bits), immediate and

displacement addressing modes

CMSC 411 - Alan Sussman 4

So far

• What we mean by computer performance

• How to measure it

• How instruction sets are designed

• How the design influences performance

CMSC 411 - Alan Sussman 5

What’s next

• A variety of hardware and compiler techniques to

speed the execution of programs

– What is pipelining? (Section A.1)

– How does MIPS divide instructions into stages or

cycles? (A.1)

– What kinds of overheads are there in pipelining? (A.1)

– How much speedup do we get? (A.1)

– What are structural hazards, data hazards, and control

hazards? (A.2)

– How are these techniques used to reduce stalls:

• data forwarding? (A.2)

• instruction reordering? (A.2)

• compiler approaches to reduce branch delays? (A.2)

CMSC 411 - Alan Sussman 6

What is pipelining?

•Pipelining is an implementation technique

whereby multiple instructions are

overlapped in execution

• In other words, at any given moment in the

execution of a computer program, many

different instructions are at various stages of

completion!

• Example: Car wash

Discover Study notes of Computer Science University of Maryland

Partial preview of the text

Download Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls and more Study notes Computer Science in PDF only on Docsity!

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 13, 2003

CMSC 411 - Alan Sussman 2

Administrivia

HW #2 due today – solution posted soon
Quiz on Tuesday – Units 1 & 2
HW #1 problem 1.17d
- MFLOPs with coprocessor
- answer shows that MFLOPs is computed as (# fp ops)/(time for fp ops) = (# fp ops)/(total time – time for integer ops)
- that is correct – don’t count integer ops against MFLOPs
- but both are counted in MIPS (both integer and fp ops are instructions!)

CMSC 411 - Alan Sussman 3

Last time

Compiler/architecture interaction
- providing a good target for the compiler can make a huge difference in performance – up to a factor of 10 on an f.p. intensive application
- provide regularity, primitives, make costs of code sequences easy to determine
MIPS/MIPS64 architectures
- load/store, 64 bits (with 32-bit ops), 3 instruction formats for MIPS64 (all 32 bits), immediate and displacement addressing modes

CMSC 411 - Alan Sussman 4

So far

What we mean by computer performance
How to measure it
How instruction sets are designed
How the design influences performance

CMSC 411 - Alan Sussman 5

What’s next

A variety of hardware and compiler techniques to speed the execution of programs - What is pipelining? (Section A.1) - How does MIPS divide instructions into stages or cycles? (A.1) - What kinds of overheads are there in pipelining? (A.1) - How much speedup do we get? (A.1) - What are structural hazards, data hazards, and control hazards? (A.2) - How are these techniques used to reduce stalls: - data forwarding? (A.2) - instruction reordering? (A.2) - compiler approaches to reduce branch delays? (A.2)

CMSC 411 - Alan Sussman 6

What is pipelining?

Pipelining is an implementation technique

whereby multiple instructions are

overlapped in execution

In other words, at any given moment in the

execution of a computer program, many

different instructions are at various stages of

completion!

Example: Car wash

CMSC 411 - Alan Sussman 7

Throughput

The number of instructions that complete

per unit time

Instructions take many clock cycles

Ideally, every clock cycle, we want a new

instruction to begin (and end)

This is how we will improve throughput

CMSC 411 - Alan Sussman 8

A MIPS implementation without

pipelining

Recall from CMSC 311 that instructions

execute in different stages or cycles

Instruction fetch cycle (IF) : fetch the instruction from memory and update the program counter (PC) to point to the next instruction. Note: We’re not using the NPC register that the book introduces. IR ← Mem[PC] PC ← PC + 4

CMSC 411 - Alan Sussman 9

MIPS w/o pipelining (cont.)

Instruction decode cycle (ID) : Put the operands in pipeline registers A and B. Sign- extend the low order 16 bits of the IR and store in pipeline register Imm. (This sometimes holds the "immediate" constant.) A ← Regs[IR6..10] B ← Regs[IR11..15] Imm ← ((IR 16 ) 16 ##IR16..31)

CMSC 411 - Alan Sussman 10

MIPS w/o pipelining (cont.)

Execution cycle (EC) : Use the ALU
If memory reference: ALUOutput ← A + Imm
If register-register ALU instruction: ALUOutput ← A op B
If register-immediate ALU instruction: ALUOutput ← A op Imm
If branch instruction: compute the branch address and check the branch condition: ALUOutput ← PC + (Imm << 2) Cond ← (A op 0) (but PC or Imm should be adjusted down by 4 to make this work right).

CMSC 411 - Alan Sussman 11

MIPS w/o pipelining (cont.)

Memory access cycle (MEM) : finish loads, stores, and branches: Load: LMD ← Mem[ALUOutput] Store: Mem[ALUOutput] ← B Branch: if Cond then PC ← ALUOutput else PC is ok

CMSC 411 - Alan Sussman 12

MIPS w/o pipelining (cont.)

Write-back cycle (WB) : update the registers Register-register ALU instruction: Regs[IR16..20] ← ALUOutput Register-immediate ALU instruction: Regs[IR11..15] ← ALUOutput Load instruction: Regs[IR11..15] ← LMD

CMSC 411 - Alan Sussman 19

Example 2 (cont.)

Time for pipelined MIPS implementation:

We have to synchronize the stages, so we

need to run the clock at 10 ns

1st instruction takes 50 ns. The others each

finish 1 cycle later than the preceding one.

Time = 50 ns + 99*10 ns = 1040 ns
Speedup = 4000/1040 ≈ 3.

CMSC 411 - Alan Sussman 20

Even more realistic case

Example 3: The original MIPS implementation doesn't always need to use the MEM cycle - IF -10ns - ID - 8ns - EX - 7ns - MEM - 10ns - WB - 5ns
Suppose that only 30% of instructions use memory access. So, on average, for every 100 instructions, we have about 70 that use 4 stages and 30 that use 5.

CMSC 411 - Alan Sussman 21

Example 3 (cont.)

Time for original MIPS implementation:
- 70 instructions × 30 ns per instruction + 30 instructions × 40 ns per instruction = 3300 ns
Time for pipelined MIPS implementation: We have to synchronize the stages, so we need to run the clock at 10 ns, and we need 5 cycles for every instruction. - 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one - Time = 50 ns + 99*10 ns = 1040 ns
Speedup = 3300/1040 ≈ 3.

CMSC 411 - Alan Sussman 22

Overhead of pipelining

We just summarized the two major overhead costs in pipelining: - making the time for every stage equal the time for the longest stage - making the time for every instruction equal the time for the longest instruction (not quite true, but true for a wide range of instructions)
Unfortunately, the speedup of pipelining is reduced even further by hazards that cause “bubbles” in the pipeline

CMSC 411 - Alan Sussman 23

Pipeline hazards cause stalls

When some instruction is unable to

complete on schedule, we must

finish the earlier instructions on schedule
delay the later instructions
This is called stalling the pipeline

CMSC 411 - Alan Sussman 24

Pipeline hazards

What causes delays in instruction completion?
- Structural hazards are hardware delays Example: memory does not respond to a request as fast as it is expected to
- Data hazards arise when data can be predicted to be unready at the time it is needed Example: an instruction needs a register that a previous instruction is still modifying
- Control hazards arise when we need to do something other than incrementing the PC by 4 Example: conditional branch, jump

CMSC 411 - Alan Sussman 25

Pipeline hazards (cont.)

Pipeline hazards reduce throughput and speedup even more! Fig. A. Structural hazard – a load with 1 memory port for data/instructions Clock cycle

i +6 IF ID EX

i +5 IF ID EX MEM

i +4 IF ID EX MEM WB

i +3 stall IF ID EX MEM WB

i +2 IF ID EX MEMWB

i+1 IF ID EX MEM WB

Load IFIDEX MEM WB

Inst # 1 2 3 4 5 6 7 8 9 10

CMSC 411 - Alan Sussman 26

Pipeline hazards (cont.)

Example 4: In Example 3, had on average, 70 instructions that use 4 stages and 30 that use 5
Time for original MIPS implementation = 3300 ns
Suppose that 5 of those instructions involve branches. So 5 times, need to wait until the ID cycle of one instruction is complete before start the IF cycle of the next instruction.
Therefore, the next instruction will start 2 cycles later, not 1. So add 5 cycles to the time.

CMSC 411 - Alan Sussman 27

Example 4 (cont.)

Time for pipelined MIPS implementation:
- 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one, but there is a 5 cycle hazard penalty
- Time = 50 ns + 9910 ns +510 ns = 1090 ns
Speedup = 3300/1090 ≈ 3.

CMSC 411 - Alan Sussman 28

Data hazards

A data hazard occurs when a piece of data

is not available when it is needed

Perhaps there was a cache miss : we expected the value to be in cache, but instead we need to find it in memory
Perhaps it is involved in a previous computation that has not yet completed

CMSC 411 - Alan Sussman 29

Example – Figure A.

CMSC 411 - Alan Sussman 30

Types of data hazards

RAW : read after write
- One instruction writes a value. A later instruction reads it. Problem: an old value may be read.
WAW : write after write
- One instruction writes a value. A later instruction writes in the same location. Problem: the final value may be the first, rather than the second.
WAR : write after read
- One instruction reads a value. A later instruction writes in the same location. Problem: the value read may be the changed value rather than the original. This ordinarily cannot happen.

CMSC 411 - Alan Sussman 37

Sometimes forwarding not enough

Example : Data needs to be loaded from memory at least two instructions before use in order to avoid a stall – Figure A.

CMSC 411 - Alan Sussman 38

Forwarding (cont.)

Compilers need to be smart enough to prevent stalls when possible

Example : a = b + c + d; e = d - f;

Need to make sure that the first ADD operation delays until b and c are loaded

LD R1, b LD R2, c LD R3, d ADD can’t be done yet DADD R4,R1,R DADD R4,R3,R4 ok by forwarding LD R5, f need to start this before a = b + c + d completes SD a, R DSUB R6,R3,R SD e, R6 ok by forwarding

CMSC 411 - Alan Sussman 39

Forwarding (cont.)

Rules for interchanging instructions:
- must be in same block (i.e., no branches between them)
- must check graph of dependencies to make sure they are independent

CMSC 411 - Alan Sussman 40

How the MIPS pipeline

introduces stalls

Data hazards are checked during instruction

decode (ID) - if a hazard exists, the EX

cycle is delayed (i.e., the instruction is not

issued ), a "no-op" is issued instead

The ID cycle also determines whether data

forwarding is needed

CMSC 411 - Alan Sussman 41

Control hazards

Question : When do we find out that the PC

needs to be modified?

Answer : In pipeline stage ID of a branch

instruction

So, if a branch is taken (i.e., if the PC is

modified), then have to wait until the next

cycle before can fetch the correct

instruction

CMSC 411 - Alan Sussman 42

Control hazards (cont.)

Successor IF ID EX

2

Successor IF ID EX MEM

1

Branch IF IF ID EX MEM WB successor

Branch IF ID EXMEM WB inst.

Wastes 1 clock cycle

CMSC 411 - Alan Sussman 43

Example

If branch in 30% of instructions, then

instead of executing 1 instruction per cycle,

have 70% of instructions executing in 1

cycle and 30% of instructions executing in 2

cycles

An average of .7 + .6 = 1.3 cycles per

instruction

Worse by 30% CMSC 411 - Alan Sussman 44

Compiler approaches to branch

delays

Freeze or flush the pipeline when

determine that a branch is taken - refer back

to Figure A.11 (a stall is inserted)

Predict not taken : continue to begin

execution of instructions as if the branch is

not taken, but change them to a "no-op" if

the branch is taken

CMSC 411 - Alan Sussman 45

Predict not taken scheme – Fig. A.

Inst. i+4 IF ID EX MEMWB

Inst. i+3 IF ID EX MEMWB

Inst. i+2 IF ID EX MEMWB

Inst. i+1 IF ID EX MEMWB

UntakenIF ID EX MEMWB branch

B.t. + 2 IF ID EX MEMWB

B.t. + 1 IF ID EX MEMWB

Branch IF ID EX MEMWB target

Inst. i+1 IF idle idle idle idle

Taken IF ID EX MEMWB branch

CMSC 411 - Alan Sussman 46

Compiler approaches (cont.)

Predict taken : Good if most of the

branches are from loops

Schedule using branch delay slots ,

reordering the code to test the branch earlier

CMSC 411 - Alan Sussman 47

Branch delay slot – Fig. A.

CMSC 411 - Alan Sussman 48

Scheduling branch delay slot

If taken from before branch
- branch must not depend on rescheduled instruction
- always improves performance
If taken from branch target
- must be OK to execute rescheduled instructions if branch not taken, and may need to duplicate insts.
- performance improved when branch taken
If taken from fall through
- must be OK to execute insts. if branch taken
- improves performance when branch not taken

CMSC 411 - Alan Sussman 55

Categorizing exceptions – Fig. A. 27

Floating pt.Synch Coerced MaskableWithin Resume overflow/ underflow

Integer Synch Coerced MaskableWithin Resume overflow

Breakpoint Synch User req. MaskableBetween Resume

Tracing Synch User req. MaskableBetween Resume instructions

Invoke OS Synch User req. Not Between Resume

I/O device Asynch Coerced Not Between Resume request

Resume vs. terminate

Within vs. between instructions

User maskable vs. not

User request vs. coerce

Synch. vs. asynch.

Exception type

CMSC 411 - Alan Sussman 56

Categorizing exceptions (cont.)

Power Asynch Coerced Not Within Terminate failure

Hardware Asynch Coerced Not Within Terminate malfunction

Undefined Synch Coerced Not Within Terminate instruction

Mem. prot. Synch Coerced Not Within Resume violation

Misaligned Synch Coerced MaskableWithin Resume memory access

Page fault Synch Coerced Not Within Resume

Resume vs. terminate

Within vs. between instructions

User maskable vs. not

User request vs. coerce

Synch. vs. asynch.

Exception type

CMSC 411 - Alan Sussman 57

The most difficult exceptions...

... are those that occur within EX or MEM stages and need to be handled in a restartable way
Why difficult? Handling one includes:
- the next IF gets a "trap instruction"
- until the trap is taken, turn off all "writes" for the faulting instruction and those that follow it.
- what does the trap do?
  - The trap transfers control to the exception handling routine in the operating system, which saves the PC of the faulting instruction and handles the fault
- the task is then resumed, using the saved PC and the MIPS instruction RFE or something like it
Note : May need to save several PCs if delayed branches are involved CMSC 411 - Alan Sussman 58

Exceptions (cont.)

Ideally, pipeline can be interrupted so that

instructions before the fault complete. Then

want to restart execution just after the

faulting instruction - precise exception

handling

This is the right way to do it, but sometimes

architects/manufacturers take shortcuts

CMSC 411 - Alan Sussman 59

When do MIPS exceptions occur?

• IF

page fault on instruction fetch
misaligned memory access
memory protection violation
ID
undefined or illegal opcode
EX
arithmetic exception
MEM
page fault on data fetch/store
misaligned memory access
memory protection violation
WB : None!

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 25, 2003

CMSC 411 - Alan Sussman 61

Administrivia

HW #3 due next Tuesday, March 4
Quiz today – last 25 minutes of class
No office hours today or tomorrow
- see TA or try Thursday

CMSC 411 - Alan Sussman 62

Last time

Forwarding
- to use ALU or load result before WB
- compiler reorders instructions to prevent stalls, use forwarding
Control hazards lead to branch delays
- because branch target isn’t computed until ID
- one (partial) solution is for compiler to schedule branch delay slots
Exceptions
- machine must save pipeline state, handle exception (with OS), and restart where exception occurred
- precise vs. imprecise exception handling
- program generated ones can occur in all pipe stages except WB

CMSC 411 - Alan Sussman 63

Examples of exception handling

ADD IF ID EX MEMWB

LD IF ID EX MEM WB

Handle the MEM fault first, then restart

ADD IF ID EX MEMWB

LD IF ID EX MEM WB

IF fault occurs first, even though LD will fault later
But for precise exceptions, must handle LD fault first CMSC 411 - Alan Sussman 64

How is this done?

Answer : Don't handle exceptions until the WB stage - each instruction has an associated status vector that keeps track of faults - any bit set in the status vector turns off register writes and memory writes - in WB stage, the status vector is checked and any fault is handled - So, since instructions reach WB in proper order, faults for earlier instructions are handled before faults for later instructions - Unfortunately, will need to violate this later (for instructions that don’t reach WB in proper order)

CMSC 411 - Alan Sussman 65

Commitment

When an instruction is guaranteed to complete, it is committed
Life is easier if no instruction changes the machine state before it is committed
In MIPS, commitment occurs at the end of the MEM stage - that’s why register update occurs in the stage after that
Some machines muddy the state before commitment, and the exception handler must do its best to restore the state that existed before the instruction started CMSC 411 - Alan Sussman 66

Complications caused by long

instructions

So far, all MIPS instructions take 5 cycles
But haven't talked yet about the floating

point instructions

Take it on faith that floating point

instructions are inherently slower than

integer arithmetic instructions

doubters may consult Appendix H in H&P online

CMSC 411 - Alan Sussman 73

FP stalls from RAW hazards – Fig. A.

S.D IF stall stall stall stall stall F2,0(R2)

ADD.D IF stall ID stall stall stall stall F2,F0,F

MUL.D IF ID stall M1 M2 M3 M4 M F0,F4,F

L.D IF ID EX MEM WB F4,0(R2)

Inst. 1 2 3 4 5 6 7 8 9

S.D stall stall ID EX stall stall stall MEM

ADD.Dstall stall A1 A2 A3 A4 MEM

MUL.DM6 M7 MEM WB

L.D

Inst. 10 11 12 13 14 15 16 17

CMSC 411 - Alan Sussman 74

Long instructions (cont.)

It is possible that two instructions enter the WB stage at the same time

DADD IF ID ALU MEMWB

LD IF ID ALU MEMWB

ADD.D IFID A1 A2 A3 A4 MEMWB

A structural hazard

CMSC 411 - Alan Sussman 75

Long instructions (cont.)

Instructions can finish in the wrong order
This can cause WAW hazards
- see p. A-52 of H&P for an example
This violation of WB ordering defeats the

previous strategy for precise exception

handling

CMSC 411 - Alan Sussman 76

WAW structural hazard – Fig. A.

L.D IF ID EX MEMWB F2,0(R2)

… IF ID EX MEM WB

ADD.D IF ID A1 A2 A3 A4 MEMWB F2,F4,F

… IF ID EX MEM WB

… IF ID EX MEMWB

MUL.D IF IDM1M2M3 M4 M5 M6 M7 MEMWB F0,F4,F

CMSC 411 - Alan Sussman 77

How to detect hazards in ID

Early detection would prevent trouble
Check for structural hazards :
- will the divide unit clear in time?
- will WB be possible when we need it?
Check for RAW data hazards :
- will all source registers be available when needed?
Check for WAW data hazards :
- Is the destination register for any ADD.D, multiply or divide instructions the same register as the destination for this instruction?
If anything dangerous could happen, delay the execute cycle so no conflict occurs CMSC 411 - Alan Sussman 78

Precise exception handling for

long instructions

Suppose
- ADD.D completes,
- then SUB.D has a floating-point exception,
- then DIV.D detects an exception
Big trouble, because ADD.D has destroyed

register F

Example: DIV.D F0, F2, F ADD.D F10, F10, F SUB.D F12, F12, F

CMSC 411 - Alan Sussman 79

Possible fixes

Give up and just do imprecise exception handling - tempting, but very annoying to users
Delay WB until all previous instructions complete
- since so many instructions can be active, this is expensive - requires a lot of supporting hardware
Write, to memory, a history file of register and memory changes so can undo instructions if necessary - or keep a future file of computed results that are waiting for MEM or WB CMSC 411 - Alan Sussman 80

Possible fixes (cont.)

Let the exception handler finish the

instructions in the pipeline and then restart

the pipe at the next instruction

Have the floating point units diagnose

exceptions in their first or second stages ,

so can handle them by methods that work

well for handling integer exceptions

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 27, 2003

CMSC 411 - Alan Sussman 82

Administrivia

Quiz returned Tuesday
- answers posted on web page
Read Chapter 3 – Unit 4 on instruction-level

parallelism

HW #3 due Tuesday, March 4
HW #4 posted soon

CMSC 411 - Alan Sussman 83

Last time

Exception handling
- for 5 stage pipeline, handle them in WB stage, to keep proper order
  - after the instruction commits
Long instructions – generally means f.p. instructions
- higher latency, and initiation interval, than other (integer) instructions
- typically means the EX stage is multiple cycles, and sometimes not pipelined (e.g., divider)
- can get 2 (or more) instructions trying to enter WB at same time – structural hazard
- can get instructions finishing in wrong order – may cause WAW hazards, messing up precise exception handling
- detect hazards in ID stage, and delay EX if a conflict occurs
- finally, several ways to handle exceptions – history/future file, software (OS exception handler), diagnose early in pipeline, … CMSC 411 - Alan Sussman 84

A case study: MIPS R

pipeline design

MIPS64 architecture, with deeper 8 stage

pipeline

to get higher clock rates
extra stages come from memory accesses
techniques called superpipelining

CMSC 411 - Alan Sussman 91

R4000 pipeline performance

4 major causes of pipeline stalls
- load stalls – from using load result 1 or 2 cycles after load
- branch stalls – 2 cycles on every taken branch, or empty branch delay slot
- FP result stalls – RAW hazards for an FP operand
- FP structural stalls – from conflicts for functional units in FP pipeline

CMSC 411 - Alan Sussman 92

SPEC92 benchmarks

Assuming a perfect cache – 5 integer and five FP programs

CMSC 411 - Alan Sussman 93

Dynamically scheduled pipelines

We’ll cover this, and the scoreboard

technique, in Unit 4

need some general background first

CMSC 411 - Alan Sussman 94

Pitfalls

Unexpected hazards do occur …
- for example, when a branch is taken before a previous instruction finishes
Extensive pipelining can slow a machine

down, or lead to worse cost-performance

more complex hardware can cause a longer clock cycle, killing the benefits of more pipelining

CMSC 411 - Alan Sussman 95

Pitfalls (cont.)

A poor compiler can make a good

machine look bad

compiler writers need to understand the architecture in order to - optimize efficiently and - avoid hazards
better to eliminate useless instructions, than make them run faster

Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls , Study notes of Computer Science

Related documents

Partial preview of the text

Download Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls and more Study notes Computer Science in PDF only on Docsity!

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 13, 2003

Administrivia

Last time

So far

What’s next

What is pipelining?

whereby multiple instructions are

overlapped in execution

execution of a computer program, many

different instructions are at various stages of

completion!

Throughput

per unit time

Ideally, every clock cycle, we want a new

instruction to begin (and end)

A MIPS implementation without

pipelining

execute in different stages or cycles

MIPS w/o pipelining (cont.)

MIPS w/o pipelining (cont.)

MIPS w/o pipelining (cont.)

MIPS w/o pipelining (cont.)

Example 2 (cont.)

We have to synchronize the stages, so we

need to run the clock at 10 ns

finish 1 cycle later than the preceding one.

Even more realistic case

Example 3 (cont.)

Overhead of pipelining

Pipeline hazards cause stalls

complete on schedule, we must

Pipeline hazards

Pipeline hazards (cont.)

Pipeline hazards (cont.)

Example 4 (cont.)

Data hazards

is not available when it is needed

Example – Figure A.

Types of data hazards

Sometimes forwarding not enough

Forwarding (cont.)

Forwarding (cont.)

How the MIPS pipeline

introduces stalls

decode (ID) - if a hazard exists, the EX

cycle is delayed (i.e., the instruction is not

issued ), a "no-op" is issued instead

forwarding is needed

Control hazards

needs to be modified?

instruction

modified), then have to wait until the next

cycle before can fetch the correct

instruction

Control hazards (cont.)

Example

instead of executing 1 instruction per cycle,

have 70% of instructions executing in 1

cycle and 30% of instructions executing in 2

cycles

instruction

Compiler approaches to branch

delays

determine that a branch is taken - refer back

to Figure A.11 (a stall is inserted)

execution of instructions as if the branch is

not taken, but change them to a "no-op" if

the branch is taken

Predict not taken scheme – Fig. A.

Compiler approaches (cont.)

branches are from loops

reordering the code to test the branch earlier