Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls , Study notes of Computer Science

Instruction pipelining in computer systems architecture using the mips processor as an example. It covers both an unpipelined and pipelined implementation, the benefits and costs of pipelining, and the hazards that can cause stalls. The document also touches upon compiler approaches to branch delays and exception handling.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-rnu
koofers-user-rnu 🇺🇸

8 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC 411 - A. Sussman (from D. O'Leary) 1
Computer Systems Architecture
CMSC 411
Unit 3 – Instruction Pipelining
Alan Sussman
February 13, 2003
CMSC 411 - Alan Sussman 2
Administrivia
HW #2 due today – solution posted soon
Quiz on Tuesday – Units 1 & 2
HW #1 problem 1.17d
MFLOPs with coprocessor
answer shows that MFLOPs is computed as
(# fp ops)/(time for fp ops) =
(# fp ops)/(total time – time for integer ops)
that is correct – don’t count integer ops against
MFLOPs
but both are counted in MIPS (both integer and fp ops
are instructions!)
CMSC 411 - Alan Sussman 3
Last time
Compiler/architecture interaction
providing a good target for the compiler can make a
huge difference in performance – up to a factor of 10 on
an f.p. intensive application
provide regularity, primitives, make costs of code
sequences easy to determine
MIPS/MIPS64 architectures
load/store, 64 bits (with 32-bit ops), 3 instruction
formats for MIPS64 (all 32 bits), immediate and
displacement addressing modes
CMSC 411 - Alan Sussman 4
So far
What we mean by computer performance
How to measure it
How instruction sets are designed
How the design influences performance
CMSC 411 - Alan Sussman 5
What’s next
A variety of hardware and compiler techniques to
speed the execution of programs
What is pipelining? (Section A.1)
How does MIPS divide instructions into stages or
cycles? (A.1)
What kinds of overheads are there in pipelining? (A.1)
How much speedup do we get? (A.1)
What are structural hazards, data hazards, and control
hazards? (A.2)
How are these techniques used to reduce stalls:
data forwarding? (A.2)
instruction reordering? (A.2)
compiler approaches to reduce branch delays? (A.2)
CMSC 411 - Alan Sussman 6
What is pipelining?
Pipelining is an implementation technique
whereby multiple instructions are
overlapped in execution
In other words, at any given moment in the
execution of a computer program, many
different instructions are at various stages of
completion!
Example: Car wash
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Instruction Pipelining in Computer Systems Architecture: MIPS Implementation and Pitfalls and more Study notes Computer Science in PDF only on Docsity!

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 13, 2003

CMSC 411 - Alan Sussman 2

Administrivia

  • HW #2 due today – solution posted soon
  • Quiz on Tuesday – Units 1 & 2
  • HW #1 problem 1.17d
    • MFLOPs with coprocessor
    • answer shows that MFLOPs is computed as (# fp ops)/(time for fp ops) = (# fp ops)/(total time – time for integer ops)
    • that is correct – don’t count integer ops against MFLOPs
    • but both are counted in MIPS (both integer and fp ops are instructions!)

CMSC 411 - Alan Sussman 3

Last time

  • Compiler/architecture interaction
    • providing a good target for the compiler can make a huge difference in performance – up to a factor of 10 on an f.p. intensive application
    • provide regularity, primitives, make costs of code sequences easy to determine
  • MIPS/MIPS64 architectures
    • load/store, 64 bits (with 32-bit ops), 3 instruction formats for MIPS64 (all 32 bits), immediate and displacement addressing modes

CMSC 411 - Alan Sussman 4

So far

  • What we mean by computer performance
  • How to measure it
  • How instruction sets are designed
  • How the design influences performance

CMSC 411 - Alan Sussman 5

What’s next

  • A variety of hardware and compiler techniques to speed the execution of programs - What is pipelining? (Section A.1) - How does MIPS divide instructions into stages or cycles? (A.1) - What kinds of overheads are there in pipelining? (A.1) - How much speedup do we get? (A.1) - What are structural hazards, data hazards, and control hazards? (A.2) - How are these techniques used to reduce stalls: - data forwarding? (A.2) - instruction reordering? (A.2) - compiler approaches to reduce branch delays? (A.2)

CMSC 411 - Alan Sussman 6

What is pipelining?

  • Pipelining is an implementation technique

whereby multiple instructions are

overlapped in execution

  • In other words, at any given moment in the

execution of a computer program, many

different instructions are at various stages of

completion!

  • Example: Car wash

CMSC 411 - Alan Sussman 7

Throughput

  • The number of instructions that complete

per unit time

  • Instructions take many clock cycles

Ideally, every clock cycle, we want a new

instruction to begin (and end)

  • This is how we will improve throughput

CMSC 411 - Alan Sussman 8

A MIPS implementation without

pipelining

  • Recall from CMSC 311 that instructions

execute in different stages or cycles

  • Instruction fetch cycle (IF) : fetch the instruction from memory and update the program counter (PC) to point to the next instruction. Note: We’re not using the NPC register that the book introduces. IR ← Mem[PC] PC ← PC + 4

CMSC 411 - Alan Sussman 9

MIPS w/o pipelining (cont.)

  • Instruction decode cycle (ID) : Put the operands in pipeline registers A and B. Sign- extend the low order 16 bits of the IR and store in pipeline register Imm. (This sometimes holds the "immediate" constant.) A ← Regs[IR6..10] B ← Regs[IR11..15] Imm ← ((IR 16 ) 16 ##IR16..31)

CMSC 411 - Alan Sussman 10

MIPS w/o pipelining (cont.)

  • Execution cycle (EC) : Use the ALU
  • If memory reference: ALUOutput ← A + Imm
  • If register-register ALU instruction: ALUOutput ← A op B
  • If register-immediate ALU instruction: ALUOutput ← A op Imm
  • If branch instruction: compute the branch address and check the branch condition: ALUOutput ← PC + (Imm << 2) Cond ← (A op 0) (but PC or Imm should be adjusted down by 4 to make this work right).

CMSC 411 - Alan Sussman 11

MIPS w/o pipelining (cont.)

  • Memory access cycle (MEM) : finish loads, stores, and branches: Load: LMD ← Mem[ALUOutput] Store: Mem[ALUOutput] ← B Branch: if Cond then PC ← ALUOutput else PC is ok

CMSC 411 - Alan Sussman 12

MIPS w/o pipelining (cont.)

  • Write-back cycle (WB) : update the registers Register-register ALU instruction: Regs[IR16..20] ← ALUOutput Register-immediate ALU instruction: Regs[IR11..15] ← ALUOutput Load instruction: Regs[IR11..15] ← LMD

CMSC 411 - Alan Sussman 19

Example 2 (cont.)

  • Time for pipelined MIPS implementation:

We have to synchronize the stages, so we

need to run the clock at 10 ns

  • 1st instruction takes 50 ns. The others each

finish 1 cycle later than the preceding one.

  • Time = 50 ns + 99*10 ns = 1040 ns
  • Speedup = 4000/1040 ≈ 3.

CMSC 411 - Alan Sussman 20

Even more realistic case

  • Example 3: The original MIPS implementation doesn't always need to use the MEM cycle - IF -10ns - ID - 8ns - EX - 7ns - MEM - 10ns - WB - 5ns
  • Suppose that only 30% of instructions use memory access. So, on average, for every 100 instructions, we have about 70 that use 4 stages and 30 that use 5.

CMSC 411 - Alan Sussman 21

Example 3 (cont.)

  • Time for original MIPS implementation:
    • 70 instructions × 30 ns per instruction + 30 instructions × 40 ns per instruction = 3300 ns
  • Time for pipelined MIPS implementation: We have to synchronize the stages, so we need to run the clock at 10 ns, and we need 5 cycles for every instruction. - 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one - Time = 50 ns + 99*10 ns = 1040 ns
  • Speedup = 3300/1040 ≈ 3.

CMSC 411 - Alan Sussman 22

Overhead of pipelining

  • We just summarized the two major overhead costs in pipelining: - making the time for every stage equal the time for the longest stage - making the time for every instruction equal the time for the longest instruction (not quite true, but true for a wide range of instructions)
  • Unfortunately, the speedup of pipelining is reduced even further by hazards that cause “bubbles” in the pipeline

CMSC 411 - Alan Sussman 23

Pipeline hazards cause stalls

  • When some instruction is unable to

complete on schedule, we must

  • finish the earlier instructions on schedule
  • delay the later instructions
  • This is called stalling the pipeline

CMSC 411 - Alan Sussman 24

Pipeline hazards

  • What causes delays in instruction completion?
    • Structural hazards are hardware delays Example: memory does not respond to a request as fast as it is expected to
    • Data hazards arise when data can be predicted to be unready at the time it is needed Example: an instruction needs a register that a previous instruction is still modifying
    • Control hazards arise when we need to do something other than incrementing the PC by 4 Example: conditional branch, jump

CMSC 411 - Alan Sussman 25

Pipeline hazards (cont.)

Pipeline hazards reduce throughput and speedup even more! Fig. A. Structural hazard – a load with 1 memory port for data/instructions Clock cycle

i +6 IF ID EX

i +5 IF ID EX MEM

i +4 IF ID EX MEM WB

i +3 stall IF ID EX MEM WB

i +2 IF ID EX MEMWB

i+1 IF ID EX MEM WB

Load IFIDEX MEM WB

Inst # 1 2 3 4 5 6 7 8 9 10

CMSC 411 - Alan Sussman 26

Pipeline hazards (cont.)

  • Example 4: In Example 3, had on average, 70 instructions that use 4 stages and 30 that use 5
  • Time for original MIPS implementation = 3300 ns
  • Suppose that 5 of those instructions involve branches. So 5 times, need to wait until the ID cycle of one instruction is complete before start the IF cycle of the next instruction.
  • Therefore, the next instruction will start 2 cycles later, not 1. So add 5 cycles to the time.

CMSC 411 - Alan Sussman 27

Example 4 (cont.)

  • Time for pipelined MIPS implementation:
    • 1st instruction takes 50 ns. The others each finish 1 cycle later than the preceding one, but there is a 5 cycle hazard penalty
    • Time = 50 ns + 9910 ns +510 ns = 1090 ns
  • Speedup = 3300/1090 ≈ 3.

CMSC 411 - Alan Sussman 28

Data hazards

  • A data hazard occurs when a piece of data

is not available when it is needed

  • Perhaps there was a cache miss : we expected the value to be in cache, but instead we need to find it in memory
  • Perhaps it is involved in a previous computation that has not yet completed

CMSC 411 - Alan Sussman 29

Example – Figure A.

CMSC 411 - Alan Sussman 30

Types of data hazards

  • RAW : read after write
    • One instruction writes a value. A later instruction reads it. Problem: an old value may be read.
  • WAW : write after write
    • One instruction writes a value. A later instruction writes in the same location. Problem: the final value may be the first, rather than the second.
  • WAR : write after read
    • One instruction reads a value. A later instruction writes in the same location. Problem: the value read may be the changed value rather than the original. This ordinarily cannot happen.

CMSC 411 - Alan Sussman 37

Sometimes forwarding not enough

  • Example : Data needs to be loaded from memory at least two instructions before use in order to avoid a stall – Figure A.

CMSC 411 - Alan Sussman 38

Forwarding (cont.)

  • Compilers need to be smart enough to prevent stalls when possible

Example : a = b + c + d; e = d - f;

  • Need to make sure that the first ADD operation delays until b and c are loaded

LD R1, b LD R2, c LD R3, d ADD can’t be done yet DADD R4,R1,R DADD R4,R3,R4 ok by forwarding LD R5, f need to start this before a = b + c + d completes SD a, R DSUB R6,R3,R SD e, R6 ok by forwarding

CMSC 411 - Alan Sussman 39

Forwarding (cont.)

  • Rules for interchanging instructions:
    • must be in same block (i.e., no branches between them)
    • must check graph of dependencies to make sure they are independent

CMSC 411 - Alan Sussman 40

How the MIPS pipeline

introduces stalls

  • Data hazards are checked during instruction

decode (ID) - if a hazard exists, the EX

cycle is delayed (i.e., the instruction is not

issued ), a "no-op" is issued instead

  • The ID cycle also determines whether data

forwarding is needed

CMSC 411 - Alan Sussman 41

Control hazards

  • Question : When do we find out that the PC

needs to be modified?

  • Answer : In pipeline stage ID of a branch

instruction

  • So, if a branch is taken (i.e., if the PC is

modified), then have to wait until the next

cycle before can fetch the correct

instruction

CMSC 411 - Alan Sussman 42

Control hazards (cont.)

Successor IF ID EX

  • 2

Successor IF ID EX MEM

  • 1

Branch IF IF ID EX MEM WB successor

Branch IF ID EXMEM WB inst.

Wastes 1 clock cycle

CMSC 411 - Alan Sussman 43

Example

  • If branch in 30% of instructions, then

instead of executing 1 instruction per cycle,

have 70% of instructions executing in 1

cycle and 30% of instructions executing in 2

cycles

  • An average of .7 + .6 = 1.3 cycles per

instruction

  • Worse by 30% CMSC 411 - Alan Sussman 44

Compiler approaches to branch

delays

  • Freeze or flush the pipeline when

determine that a branch is taken - refer back

to Figure A.11 (a stall is inserted)

  • Predict not taken : continue to begin

execution of instructions as if the branch is

not taken, but change them to a "no-op" if

the branch is taken

CMSC 411 - Alan Sussman 45

Predict not taken scheme – Fig. A.

Inst. i+4 IF ID EX MEMWB

Inst. i+3 IF ID EX MEMWB

Inst. i+2 IF ID EX MEMWB

Inst. i+1 IF ID EX MEMWB

UntakenIF ID EX MEMWB branch

B.t. + 2 IF ID EX MEMWB

B.t. + 1 IF ID EX MEMWB

Branch IF ID EX MEMWB target

Inst. i+1 IF idle idle idle idle

Taken IF ID EX MEMWB branch

CMSC 411 - Alan Sussman 46

Compiler approaches (cont.)

  • Predict taken : Good if most of the

branches are from loops

  • Schedule using branch delay slots ,

reordering the code to test the branch earlier

CMSC 411 - Alan Sussman 47

Branch delay slot – Fig. A.

CMSC 411 - Alan Sussman 48

Scheduling branch delay slot

  • If taken from before branch
    • branch must not depend on rescheduled instruction
    • always improves performance
  • If taken from branch target
    • must be OK to execute rescheduled instructions if branch not taken, and may need to duplicate insts.
    • performance improved when branch taken
  • If taken from fall through
    • must be OK to execute insts. if branch taken
    • improves performance when branch not taken

CMSC 411 - Alan Sussman 55

Categorizing exceptions – Fig. A. 27

Floating pt.Synch Coerced MaskableWithin Resume overflow/ underflow

Integer Synch Coerced MaskableWithin Resume overflow

Breakpoint Synch User req. MaskableBetween Resume

Tracing Synch User req. MaskableBetween Resume instructions

Invoke OS Synch User req. Not Between Resume

I/O device Asynch Coerced Not Between Resume request

Resume vs. terminate

Within vs. between instructions

User maskable vs. not

User request vs. coerce

Synch. vs. asynch.

Exception type

CMSC 411 - Alan Sussman 56

Categorizing exceptions (cont.)

Power Asynch Coerced Not Within Terminate failure

Hardware Asynch Coerced Not Within Terminate malfunction

Undefined Synch Coerced Not Within Terminate instruction

Mem. prot. Synch Coerced Not Within Resume violation

Misaligned Synch Coerced MaskableWithin Resume memory access

Page fault Synch Coerced Not Within Resume

Resume vs. terminate

Within vs. between instructions

User maskable vs. not

User request vs. coerce

Synch. vs. asynch.

Exception type

CMSC 411 - Alan Sussman 57

The most difficult exceptions...

  • ... are those that occur within EX or MEM stages and need to be handled in a restartable way
  • Why difficult? Handling one includes:
    • the next IF gets a "trap instruction"
    • until the trap is taken, turn off all "writes" for the faulting instruction and those that follow it.
    • what does the trap do?
      • The trap transfers control to the exception handling routine in the operating system, which saves the PC of the faulting instruction and handles the fault
    • the task is then resumed, using the saved PC and the MIPS instruction RFE or something like it
  • Note : May need to save several PCs if delayed branches are involved CMSC 411 - Alan Sussman 58

Exceptions (cont.)

  • Ideally, pipeline can be interrupted so that

instructions before the fault complete. Then

want to restart execution just after the

faulting instruction - precise exception

handling

  • This is the right way to do it, but sometimes

architects/manufacturers take shortcuts

CMSC 411 - Alan Sussman 59

When do MIPS exceptions occur?

• IF

  • page fault on instruction fetch
  • misaligned memory access
  • memory protection violation
  • ID
  • undefined or illegal opcode
  • EX
  • arithmetic exception
  • MEM
  • page fault on data fetch/store
  • misaligned memory access
  • memory protection violation
  • WB : None!

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 25, 2003

CMSC 411 - Alan Sussman 61

Administrivia

  • HW #3 due next Tuesday, March 4
  • Quiz today – last 25 minutes of class
  • No office hours today or tomorrow
    • see TA or try Thursday

CMSC 411 - Alan Sussman 62

Last time

  • Forwarding
    • to use ALU or load result before WB
    • compiler reorders instructions to prevent stalls, use forwarding
  • Control hazards lead to branch delays
    • because branch target isn’t computed until ID
    • one (partial) solution is for compiler to schedule branch delay slots
  • Exceptions
    • machine must save pipeline state, handle exception (with OS), and restart where exception occurred
    • precise vs. imprecise exception handling
    • program generated ones can occur in all pipe stages except WB

CMSC 411 - Alan Sussman 63

Examples of exception handling

ADD IF ID EX MEMWB

LD IF ID EX MEM WB

  • Handle the MEM fault first, then restart

ADD IF ID EX MEMWB

LD IF ID EX MEM WB

  • IF fault occurs first, even though LD will fault later
  • But for precise exceptions, must handle LD fault first CMSC 411 - Alan Sussman 64

How is this done?

  • Answer : Don't handle exceptions until the WB stage - each instruction has an associated status vector that keeps track of faults - any bit set in the status vector turns off register writes and memory writes - in WB stage, the status vector is checked and any fault is handled - So, since instructions reach WB in proper order, faults for earlier instructions are handled before faults for later instructions - Unfortunately, will need to violate this later (for instructions that don’t reach WB in proper order)

CMSC 411 - Alan Sussman 65

Commitment

  • When an instruction is guaranteed to complete, it is committed
  • Life is easier if no instruction changes the machine state before it is committed
  • In MIPS, commitment occurs at the end of the MEM stage - that’s why register update occurs in the stage after that
  • Some machines muddy the state before commitment, and the exception handler must do its best to restore the state that existed before the instruction started CMSC 411 - Alan Sussman 66

Complications caused by long

instructions

  • So far, all MIPS instructions take 5 cycles
  • But haven't talked yet about the floating

point instructions

  • Take it on faith that floating point

instructions are inherently slower than

integer arithmetic instructions

  • doubters may consult Appendix H in H&P online

CMSC 411 - Alan Sussman 73

FP stalls from RAW hazards – Fig. A.

S.D IF stall stall stall stall stall F2,0(R2)

ADD.D IF stall ID stall stall stall stall F2,F0,F

MUL.D IF ID stall M1 M2 M3 M4 M F0,F4,F

L.D IF ID EX MEM WB F4,0(R2)

Inst. 1 2 3 4 5 6 7 8 9

S.D stall stall ID EX stall stall stall MEM

ADD.Dstall stall A1 A2 A3 A4 MEM

MUL.DM6 M7 MEM WB

L.D

Inst. 10 11 12 13 14 15 16 17

CMSC 411 - Alan Sussman 74

Long instructions (cont.)

  • It is possible that two instructions enter the WB stage at the same time

DADD IF ID ALU MEMWB

DADD IF ID ALU MEMWB

LD IF ID ALU MEMWB

ADD.D IFID A1 A2 A3 A4 MEMWB

  • A structural hazard

CMSC 411 - Alan Sussman 75

Long instructions (cont.)

  • Instructions can finish in the wrong order
  • This can cause WAW hazards
    • see p. A-52 of H&P for an example
  • This violation of WB ordering defeats the

previous strategy for precise exception

handling

CMSC 411 - Alan Sussman 76

WAW structural hazard – Fig. A.

L.D IF ID EX MEMWB F2,0(R2)

… IF ID EX MEM WB

… IF ID EX MEM WB

ADD.D IF ID A1 A2 A3 A4 MEMWB F2,F4,F

… IF ID EX MEM WB

… IF ID EX MEMWB

MUL.D IF IDM1M2M3 M4 M5 M6 M7 MEMWB F0,F4,F

CMSC 411 - Alan Sussman 77

How to detect hazards in ID

  • Early detection would prevent trouble
  • Check for structural hazards :
    • will the divide unit clear in time?
    • will WB be possible when we need it?
  • Check for RAW data hazards :
    • will all source registers be available when needed?
  • Check for WAW data hazards :
    • Is the destination register for any ADD.D, multiply or divide instructions the same register as the destination for this instruction?
  • If anything dangerous could happen, delay the execute cycle so no conflict occurs CMSC 411 - Alan Sussman 78

Precise exception handling for

long instructions

  • Suppose
    • ADD.D completes,
    • then SUB.D has a floating-point exception,
    • then DIV.D detects an exception
  • Big trouble, because ADD.D has destroyed

register F

Example: DIV.D F0, F2, F ADD.D F10, F10, F SUB.D F12, F12, F

CMSC 411 - Alan Sussman 79

Possible fixes

  • Give up and just do imprecise exception handling - tempting, but very annoying to users
  • Delay WB until all previous instructions complete
    • since so many instructions can be active, this is expensive - requires a lot of supporting hardware
  • Write, to memory, a history file of register and memory changes so can undo instructions if necessary - or keep a future file of computed results that are waiting for MEM or WB CMSC 411 - Alan Sussman 80

Possible fixes (cont.)

  • Let the exception handler finish the

instructions in the pipeline and then restart

the pipe at the next instruction

  • Have the floating point units diagnose

exceptions in their first or second stages ,

so can handle them by methods that work

well for handling integer exceptions

Computer Systems Architecture

CMSC 411

Unit 3 – Instruction Pipelining

Alan Sussman

February 27, 2003

CMSC 411 - Alan Sussman 82

Administrivia

  • Quiz returned Tuesday
    • answers posted on web page
  • Read Chapter 3 – Unit 4 on instruction-level

parallelism

  • HW #3 due Tuesday, March 4
  • HW #4 posted soon

CMSC 411 - Alan Sussman 83

Last time

  • Exception handling
    • for 5 stage pipeline, handle them in WB stage, to keep proper order
      • after the instruction commits
  • Long instructions – generally means f.p. instructions
    • higher latency, and initiation interval, than other (integer) instructions
    • typically means the EX stage is multiple cycles, and sometimes not pipelined (e.g., divider)
    • can get 2 (or more) instructions trying to enter WB at same time – structural hazard
    • can get instructions finishing in wrong order – may cause WAW hazards, messing up precise exception handling
    • detect hazards in ID stage, and delay EX if a conflict occurs
    • finally, several ways to handle exceptions – history/future file, software (OS exception handler), diagnose early in pipeline, … CMSC 411 - Alan Sussman 84

A case study: MIPS R

pipeline design

  • MIPS64 architecture, with deeper 8 stage

pipeline

  • to get higher clock rates
  • extra stages come from memory accesses
  • techniques called superpipelining

CMSC 411 - Alan Sussman 91

R4000 pipeline performance

  • 4 major causes of pipeline stalls
    • load stalls – from using load result 1 or 2 cycles after load
    • branch stalls – 2 cycles on every taken branch, or empty branch delay slot
    • FP result stalls – RAW hazards for an FP operand
    • FP structural stalls – from conflicts for functional units in FP pipeline

CMSC 411 - Alan Sussman 92

SPEC92 benchmarks

Assuming a perfect cache – 5 integer and five FP programs

CMSC 411 - Alan Sussman 93

Dynamically scheduled pipelines

  • We’ll cover this, and the scoreboard

technique, in Unit 4

  • need some general background first

CMSC 411 - Alan Sussman 94

Pitfalls

  • Unexpected hazards do occur
    • for example, when a branch is taken before a previous instruction finishes
  • Extensive pipelining can slow a machine

down, or lead to worse cost-performance

  • more complex hardware can cause a longer clock cycle, killing the benefits of more pipelining

CMSC 411 - Alan Sussman 95

Pitfalls (cont.)

  • A poor compiler can make a good

machine look bad

  • compiler writers need to understand the architecture in order to - optimize efficiently and - avoid hazards
  • better to eliminate useless instructions, than make them run faster