Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pipelining in Computer Architecture: Throughput, Latency, and Hazards, Exams of Computer Architecture and Organization

University of California - San Diego Computer Architecture and Organization

This document from a cs 141 class by chien discusses pipelining in computer architecture, focusing on throughput, latency, and the hazards that can arise. It includes examples of pipelining in real life and in computer instruction execution, as well as contrasting latency and throughput. The document also touches upon the importance of multiple resources and parallelism in improving performance.

Typology: Exams

Pre 2010

Uploaded on 03/28/2010

koofers-user-251 🇺🇸

10 documents

1 / 17

This page cannot be seen from the preview

Don't miss anything!

Page 1

CS 141 Chien 1 Feb 22, 2000

Pipelining and Instruction

Pipelining

◆Last Time

–Midterm Exam, Grading in progress

◆Today

–Pipelining: performance thru concurrency

–Applying Pipelining to Instruction Execution

–Hazards and when they are problems

◆Reminders/Announcements

–Read P&H Chapter 6.1-6.7, Pipelining

CS 141 Chien 2 Feb 22, 2000

Complete Basic Computer

Implementation

◆Constituents

–Instruction Fetch

–Decode

–Read Registers

–Execute

–Write Registers

◆Single Cycle Control (slow...)

◆Multiple Cycle Control (Control FSM)

◆Exceptions

◆So, how can we make it go faster?

Discover Exams of Computer Architecture and Organization University of California - San Diego

Partial preview of the text

Download Pipelining in Computer Architecture: Throughput, Latency, and Hazards and more Exams Computer Architecture and Organization in PDF only on Docsity!

CS 141 Chien 1 Feb 22, 2000

Pipelining and Instruction

Pipelining

Last Time

– Midterm Exam, Grading in progress

Today

– Pipelining: performance thru concurrency

– Applying Pipelining to Instruction Execution

– Hazards and when they are problems

Reminders/Announcements

– Read P&H Chapter 6.1-6.7, Pipelining

CS 141 Chien 2 Feb 22, 2000

Complete Basic Computer

Implementation

Constituents

– Instruction Fetch

– Decode

– Read Registers

– Execute

– Write Registers

Single Cycle Control (slow...)

Multiple Cycle Control (Control FSM)

Exceptions

So, how can we make it go faster?

CS 141 Chien 3 Feb 22, 2000

Concepts of Pipelining

Latency = Time from initiation of an operation

until its results are available

Examples

– Adder: time from inputs valid to output valid

– Memory: time from addresses valid, read strobe, to data out

– Control logic: time from stable inputs to stable outputs

– Others?

Macroscopic examples? (real life)

CS 141 Chien 4 Feb 22, 2000

Latency Examples (real life)

Line at McDonalds: 10 mins from entry to “have

food”

Car ride SD -> LAX: 2.5 hours (or faster)

Homework grading: time from turn-in, to handed

back (hopefully, < 1.5 week)

Turning tap on until water comes out of hose

Others?

CS 141 Chien 7 Feb 22, 2000

Contrasting Latency and

Throughput

Multiple Resources (Parallelism)

5 seconds to copy 12 copies / minute

12 copies / minute

All 3 machines: 5 secs to copy, 36 copies / minute

5 seconds to copy

CS 141 Chien 8 Feb 22, 2000

Latency versus Throughput

Replication only increases throughput , not latency.

Throughput is additive, latency is min(x,y).

Latency is a critical commodity, and very

expensive to improve.

Pipelining and replication only improve

throughput.

Pipelining: the basic idea

– Think “assembly line”

– Breaking total work into small components

– Each component can be “busy” doing useful work

CS 141 Chien 9 Feb 22, 2000

Pipelining

Washing Laundry: Washer + Dryer

30 minutes 50 minutes

Latency for a wash = 30 + 50 = 80 minutes

2 loads ==> 160 minutes = 2 hrs, 40 minutes?

Pipelined: start 1 wash, start second when first goes into the dryer.

CS 141 Chien 10 Feb 22, 2000

Pipelining

Washer and Dryer

30 minutes 50 minutes

Latency = 80 minutes

Overlapped execution allows us to achieve....

2 loads in 30 + 50 + 50 = 130 minutes, much faster!

Throughput = 1 load / 50 minutes = 1.2 loads / hour

Latency and Throughput are not reciprocals.

CS 141 Chien 13 Feb 22, 2000

Pipelined Execution

Fetch decode execute Mem Reg

Op

Latency or Pipeline load time

1 instruction per cycle

AFTER pipeline is loaded

CS 141 Chien 14 Feb 22, 2000

Pipeline Hazards

There must be problems and pitfalls

Structural Hazard

– Hardware cannot support two instructions simultaneously

» e.g. either wash or dry, but not both

» read or write memory, but not both

Control Hazard

– Decision made by partially executed instruction affects

currently loading instruction

» May execute wrong code if branch taken

» Later: Branch Prediction, Delayed Branching

Data Hazard

– Current instruction depends on output of incomplete instruction

ahead of it in the pipeline

CS 141 Chien 15 Feb 22, 2000

Improve perfomance by increasing instruction throughput

Ideal speedup is number of stages in the pipeline. Do we

achieve this?

Pipelining

Instruction fetch

Reg ALU

Data access

Reg

8 ns

Instruction fetch

Reg ALU

Data access

Reg

8 ns

Instruction fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

...

Program

execution

order

(in instructions)

Instruction fetch

Reg ALU

Data access

Reg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 ns

Instruction fetch

Reg ALU

Data access

Reg

2 ns

Instruction fetch

Reg ALU

Data access

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Program

execution

order

(in instructions)

CS 141 Chien 16 Feb 22, 2000

Pipelining: Basics

What makes it easy

– all instructions are the same length

– just a few instruction formats

– memory operands appear only in loads and stores

What makes it hard?

– structural hazards: suppose we had only one memory

– control hazards: need to worry about branch instructions

– data hazards: an instruction depends on a previous instruction

We’ll build a pipeline and follow instructions

through. Start to think about hazards

CS 141 Chien 19 Feb 22, 2000

Single Cycle Control - Review

What do we need to add to actually split the datapath

into stages?

Which reverse flows cause data/control hazards?

Instruction memory

Address

Add (^) resultAdd

Shift left 2

Instruction

M

u x

Add

PC

Write^0 data

M

u x

Registers

Read data 1

Read data 2

Read register 1 Read register 2

(^16) Sign extend

Write register

Write data

Read Address data Data memory 1

ALU

M result u x

ALU

Zero

IF: Instruction fetch ID: Instruction decode/

register file read

EX: Execute/

address calculation

MEM: Memory access WB: Write back

CS 141 Chien 20 Feb 22, 2000

Pipelined Datapath

C an you find a problem even if there are no

dependencies? (Think about the registers?)

Instruction memory

Address

Add

Add result Shift left 2

Instruction

IF/ID EX/MEM MEM/WB

M u x

Add

0 Write data

M u x

Registers

Read data 1

Read data 2

Read register 1

Read register 2

16 Sign extend

Write register Write data

Read data

ALU result M u x

ALU

Zero

ID/EX

Data memory

Address

Pipeline registers keep track of

each instruction as it passes

through the pipeline

CS 141 Chien 21 Feb 22, 2000

Corrected Datapath

Instruction memory

Address

Add (^) resultAdd

Shift left 2

Instruction

IF/ID EX/MEM MEM/WB

M u x

Add

Address

Write data

M u x

Registers

Read data 1

Read data 2

Read register 1 Read register 2

(^16) Sign extend

Write register

Write data

Read data Data memory 1

ALU M result u x

ALU

Zero

ID/EX

The source/write registers^ needs to be kept track of for each

instruction.

CS 141 Chien 22 Feb 22, 2000

Graphically Representing Pipelines

(Multiple-clock-cycle diagram)

Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand data paths

Most recent instruction at bottom and moved right

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Program

execution

order

(in instructions)

sub $11, $2, $

ALU

CS 141 Chien 25 Feb 22, 2000

Clock Cycle 2

Instruction 1 in instruction fetch

Instruction 2 in instruction decode

In s tr u c t io n m e m o ry

A d d re s s

A d d (^) re s u ltA d d

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

M re s u lt u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

Fetch: sub $11, $2, $3 decode: lw $10, $20($1)

CS 141 Chien 26 Feb 22, 2000

Clock Cycle 3

1 in exec, 2 in decode, 3 in fetch

In s tr u c t io n m e m o ry

A d d re s s

A d d

A d d re s u lt

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te^0 d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

re s u lt M u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

Fetch: add $5, $6, $7 decode: sub $11, $2, $3 exec: lw $10, $20($1)

CS 141 Chien 27 Feb 22, 2000

Clock Cycle 4

1 in memory, 2 in execute, 3 in decode

In s tr u c t io n m e m o ry

A d d re s s

A d d (^) re s u ltA d d

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

M re s u lt u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

decode: add $5, $6, $7 exec: sub $11, $2, $3 Mem: lw $10, $20($1)

CS 141 Chien 28 Feb 22, 2000

Clock Cycle 5

1 in write back, 2 in memory, 3 in execute

Monotonous yet?

In s tr u c t io n m e m o ry

A d d re s s

A d d

A d d re s u lt

S h if t le ft 2

Ins truction

I F / I D E X / M E M M E M /W B

M

u x

A d d

P C

W ri te^0 d a t a

M

u x

R e g is t e rs

R e a d d a t a 1

R e a d d a t a 2

R e a d re g is t e r 1

R e a d re g is t e r 2

S i g n e x t e n d

W rite re g is t e r

W rite d a t a

R e a d d a t a

A L U

re s u lt M u x

A L U

Z e ro

I D /E X

D a ta m e m o ry

A d d re s s

exec: add $5, $6, $7 mem: sub $11, $2, $

Write back:

lw $10, $20($1)

CS 141 Chien 31 Feb 22, 2000

Stalling

We can stall the pipeline by keeping an instruction

in the same stage

lw $2, 20($1)

Program

execution

order

(in instructions)

and $4, $2, $

or $8, $2, $

add $9, $4, $

slt $1, $6, $

Reg

IM

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

IM IM Reg DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

Reg Reg

Reg

bubble

CS 141 Chien 32 Feb 22, 2000

When we decide to branch, other instructions are in the

pipeline!

We are predicting “branch not taken”

– need to add hardware for flushing instructions if we are wrong

Branch Hazards

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Program

execution

order

(in instructions)

IM Reg

IM DM

DM Reg

Reg Reg

Reg

IM Reg

44 and $12, $2, $

48 or $13, $6, $

52 add $14, $2, $

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

Pipelining in Computer Architecture: Throughput, Latency, and Hazards, Exams of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Pipelining in Computer Architecture: Throughput, Latency, and Hazards and more Exams Computer Architecture and Organization in PDF only on Docsity!

CS 141 Chien 1 Feb 22, 2000

Pipelining and Instruction

Pipelining

Last Time

– Midterm Exam, Grading in progress

Today

– Pipelining: performance thru concurrency

– Applying Pipelining to Instruction Execution

– Hazards and when they are problems

Reminders/Announcements

– Read P&H Chapter 6.1-6.7, Pipelining

CS 141 Chien 2 Feb 22, 2000

Complete Basic Computer

Implementation

Constituents

– Instruction Fetch

– Decode

– Read Registers

– Execute

– Write Registers

Single Cycle Control (slow...)

Multiple Cycle Control (Control FSM)

Exceptions

So, how can we make it go faster?

CS 141 Chien 3 Feb 22, 2000

Concepts of Pipelining

Latency = Time from initiation of an operation

until its results are available

Examples

– Adder: time from inputs valid to output valid

– Memory: time from addresses valid, read strobe, to data out

– Control logic: time from stable inputs to stable outputs

– Others?

Macroscopic examples? (real life)

CS 141 Chien 4 Feb 22, 2000

Latency Examples (real life)

Line at McDonalds: 10 mins from entry to “have

food”

Car ride SD -> LAX: 2.5 hours (or faster)

Homework grading: time from turn-in, to handed

back (hopefully, < 1.5 week)

Turning tap on until water comes out of hose

Others?

CS 141 Chien 7 Feb 22, 2000

Contrasting Latency and

Throughput

Multiple Resources (Parallelism)

5 seconds to copy 12 copies / minute

12 copies / minute

12 copies / minute

All 3 machines: 5 secs to copy, 36 copies / minute

5 seconds to copy

5 seconds to copy

CS 141 Chien 8 Feb 22, 2000

Latency versus Throughput

Replication only increases throughput , not latency.

Throughput is additive, latency is min(x,y).

Latency is a critical commodity, and very

expensive to improve.

Pipelining and replication only improve

throughput.

Pipelining: the basic idea

– Think “assembly line”

– Breaking total work into small components

– Each component can be “busy” doing useful work

CS 141 Chien 9 Feb 22, 2000

Pipelining

Washing Laundry: Washer + Dryer

30 minutes 50 minutes

Latency for a wash = 30 + 50 = 80 minutes

2 loads ==> 160 minutes = 2 hrs, 40 minutes?

Pipelined: start 1 wash, start second when first goes into the dryer.

CS 141 Chien 10 Feb 22, 2000

Pipelining

Washer and Dryer

30 minutes 50 minutes