Pipelining in Computer Architecture: Throughput, Latency, and Hazards, Exams of Computer Architecture and Organization

This document from a cs 141 class by chien discusses pipelining in computer architecture, focusing on throughput, latency, and the hazards that can arise. It includes examples of pipelining in real life and in computer instruction execution, as well as contrasting latency and throughput. The document also touches upon the importance of multiple resources and parallelism in improving performance.

Typology: Exams

Pre 2010

Uploaded on 03/28/2010

koofers-user-251
koofers-user-251 🇺🇸

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1
CS 141 Chien 1 Feb 22, 2000
Pipelining and Instruction
Pipelining
Last Time
Midterm Exam, Grading in progress
Today
Pipelining: performance thru concurrency
Applying Pipelining to Instruction Execution
Hazards and when they are problems
Reminders/Announcements
Read P&H Chapter 6.1-6.7, Pipelining
CS 141 Chien 2 Feb 22, 2000
Complete Basic Computer
Implementation
Constituents
Instruction Fetch
Decode
Read Registers
Execute
Write Registers
Single Cycle Control (slow...)
Multiple Cycle Control (Control FSM)
Exceptions
So, how can we make it go faster?
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Pipelining in Computer Architecture: Throughput, Latency, and Hazards and more Exams Computer Architecture and Organization in PDF only on Docsity!

CS 141 Chien 1 Feb 22, 2000

Pipelining and Instruction

Pipelining

 Last Time

– Midterm Exam, Grading in progress

 Today

– Pipelining: performance thru concurrency

– Applying Pipelining to Instruction Execution

– Hazards and when they are problems

 Reminders/Announcements

– Read P&H Chapter 6.1-6.7, Pipelining

CS 141 Chien 2 Feb 22, 2000

Complete Basic Computer

Implementation

 Constituents

– Instruction Fetch

– Decode

– Read Registers

– Execute

– Write Registers

 Single Cycle Control (slow...)

 Multiple Cycle Control (Control FSM)

 Exceptions

 So, how can we make it go faster?

CS 141 Chien 3 Feb 22, 2000

Concepts of Pipelining

 Latency = Time from initiation of an operation

until its results are available

 Examples

– Adder: time from inputs valid to output valid

– Memory: time from addresses valid, read strobe, to data out

– Control logic: time from stable inputs to stable outputs

– Others?

 Macroscopic examples? (real life)

CS 141 Chien 4 Feb 22, 2000

Latency Examples (real life)

 Line at McDonalds: 10 mins from entry to “have

food”

 Car ride SD -> LAX: 2.5 hours (or faster)

 Homework grading: time from turn-in, to handed

back (hopefully, < 1.5 week)

 Turning tap on until water comes out of hose

 Others?

CS 141 Chien 7 Feb 22, 2000

Contrasting Latency and

Throughput

 Multiple Resources (Parallelism)

5 seconds to copy 12 copies / minute

12 copies / minute

12 copies / minute

All 3 machines: 5 secs to copy, 36 copies / minute

5 seconds to copy

5 seconds to copy

CS 141 Chien 8 Feb 22, 2000

Latency versus Throughput

 Replication only increases throughput , not latency.

Throughput is additive, latency is min(x,y).

 Latency is a critical commodity, and very

expensive to improve.

 Pipelining and replication only improve

throughput.

 Pipelining: the basic idea

– Think “assembly line”

– Breaking total work into small components

– Each component can be “busy” doing useful work

CS 141 Chien 9 Feb 22, 2000

Pipelining

 Washing Laundry: Washer + Dryer

30 minutes 50 minutes

Latency for a wash = 30 + 50 = 80 minutes

2 loads ==> 160 minutes = 2 hrs, 40 minutes?

Pipelined: start 1 wash, start second when first goes into the dryer.

CS 141 Chien 10 Feb 22, 2000

Pipelining

 Washer and Dryer

30 minutes 50 minutes

Latency = 80 minutes

Overlapped execution allows us to achieve....

2 loads in 30 + 50 + 50 = 130 minutes, much faster!

Throughput = 1 load / 50 minutes = 1.2 loads / hour

Latency and Throughput are not reciprocals.

CS 141 Chien 13 Feb 22, 2000

Pipelined Execution

Fetch decode execute Mem Reg

Fetch decode execute Mem Reg

Fetch decode execute Mem Reg

Fetch decode execute Mem Reg

Op

Op

Op

Op

Latency or Pipeline load time

1 instruction per cycle

AFTER pipeline is loaded

CS 141 Chien 14 Feb 22, 2000

Pipeline Hazards

 There must be problems and pitfalls

 Structural Hazard

– Hardware cannot support two instructions simultaneously

» e.g. either wash or dry, but not both

» read or write memory, but not both

 Control Hazard

– Decision made by partially executed instruction affects

currently loading instruction

» May execute wrong code if branch taken

» Later: Branch Prediction, Delayed Branching

 Data Hazard

– Current instruction depends on output of incomplete instruction

ahead of it in the pipeline

CS 141 Chien 15 Feb 22, 2000

 Improve perfomance by increasing instruction throughput

Ideal speedup is number of stages in the pipeline. Do we

achieve this?

Pipelining

Instruction fetch

Reg ALU

Data access

Reg

8 ns

Instruction fetch

Reg ALU

Data access

Reg

8 ns

Instruction fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

...

Program

execution

order

(in instructions)

Instruction fetch

Reg ALU

Data access

Reg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 ns

Instruction fetch

Reg ALU

Data access

Reg

2 ns

Instruction fetch

Reg ALU

Data access

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Program

execution

order

(in instructions)

CS 141 Chien 16 Feb 22, 2000

Pipelining: Basics

 What makes it easy

– all instructions are the same length

– just a few instruction formats

– memory operands appear only in loads and stores

 What makes it hard?

– structural hazards: suppose we had only one memory

– control hazards: need to worry about branch instructions

– data hazards: an instruction depends on a previous instruction

 We’ll build a pipeline and follow instructions

through. Start to think about hazards

CS 141 Chien 19 Feb 22, 2000

Single Cycle Control - Review

What do we need to add to actually split the datapath

into stages?

Which reverse flows cause data/control hazards?

Instruction memory

Address

Add (^) resultAdd

Shift left 2

Instruction

M

u x

Add

PC

Write^0 data

M

u x

Registers

Read data 1

Read data 2

Read register 1 Read register 2

(^16) Sign extend

Write register

Write data

Read Address data Data memory 1

ALU

M result u x

ALU

Zero

IF: Instruction fetch ID: Instruction decode/

register file read

EX: Execute/

address calculation

MEM: Memory access WB: Write back

CS 141 Chien 20 Feb 22, 2000

Pipelined Datapath

C an you find a problem even if there are no

dependencies? (Think about the registers?)

Instruction memory

Address

4

32

0

Add

Add result Shift left 2

Instruction

IF/ID EX/MEM MEM/WB

M u x

0

1

Add

PC

0 Write data

M u x

1

Registers

Read data 1

Read data 2

Read register 1

Read register 2

16 Sign extend

Write register Write data

Read data

1

ALU result M u x

ALU

Zero

ID/EX

Data memory

Address

Pipeline registers keep track of

each instruction as it passes

through the pipeline

CS 141 Chien 21 Feb 22, 2000

Corrected Datapath

Instruction memory

Address

4

32

0

Add (^) resultAdd

Shift left 2

Instruction

IF/ID EX/MEM MEM/WB

M u x

0

1

Add

PC

0

Address

Write data

M u x

1

Registers

Read data 1

Read data 2

Read register 1 Read register 2

(^16) Sign extend

Write register

Write data

Read data Data memory 1

ALU M result u x

ALU

Zero

ID/EX

The source/write registers^ needs to be kept track of for each

instruction.

CS 141 Chien 22 Feb 22, 2000

Graphically Representing Pipelines

(Multiple-clock-cycle diagram)

Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand data paths

Most recent instruction at bottom and moved right

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Program

execution

order

(in instructions)

sub $11, $2, $

ALU

ALU

CS 141 Chien 25 Feb 22, 2000

Clock Cycle 2

 Instruction 1 in instruction fetch

 Instruction 2 in instruction decode

In s tr u c t io n m e m o ry

A d d re s s

A d d (^) re s u ltA d d

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

M re s u lt u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

Fetch: sub $11, $2, $3 decode: lw $10, $20($1)

CS 141 Chien 26 Feb 22, 2000

Clock Cycle 3

 1 in exec, 2 in decode, 3 in fetch

In s tr u c t io n m e m o ry

A d d re s s

A d d

A d d re s u lt

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te^0 d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

re s u lt M u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

Fetch: add $5, $6, $7 decode: sub $11, $2, $3 exec: lw $10, $20($1)

CS 141 Chien 27 Feb 22, 2000

Clock Cycle 4

 1 in memory, 2 in execute, 3 in decode

In s tr u c t io n m e m o ry

A d d re s s

A d d (^) re s u ltA d d

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

M re s u lt u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

decode: add $5, $6, $7 exec: sub $11, $2, $3 Mem: lw $10, $20($1)

CS 141 Chien 28 Feb 22, 2000

Clock Cycle 5

 1 in write back, 2 in memory, 3 in execute

 Monotonous yet?

In s tr u c t io n m e m o ry

A d d re s s

A d d

A d d re s u lt

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te^0 d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

re s u lt M u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

exec: add $5, $6, $7 mem: sub $11, $2, $

Write back:

lw $10, $20($1)

CS 141 Chien 31 Feb 22, 2000

Stalling

 We can stall the pipeline by keeping an instruction

in the same stage

lw $2, 20($1)

Program

execution

order

(in instructions)

and $4, $2, $

or $8, $2, $

add $9, $4, $

slt $1, $6, $

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

IM IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

Reg Reg

Reg

bubble

CS 141 Chien 32 Feb 22, 2000

 When we decide to branch, other instructions are in the

pipeline!

 We are predicting “branch not taken”

– need to add hardware for flushing instructions if we are wrong

Branch Hazards

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program

execution

order

(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

IM Reg

44 and $12, $2, $

48 or $13, $6, $

52 add $14, $2, $

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

CS 141 Chien 33 Feb 22, 2000

Summary

 Throughput and Latency

 Basics of Pipelining

– Increases throughput

– Doesn’t reduce latency

– Can increase performance!

 Pitfalls

– Structural Hazards

– Control Hazards

– Data Hazards

 Next time: Pipelined Control