Understanding Moore's Law & Performance Improvement with Pipelining, Slides of Assembly Language Programming

The concept of pipelining in computer architecture, discussing moore's law, the benefits of pipelining, and the challenges it presents, including structural, control, and data hazards. The document also covers various solutions to these hazards, such as stalling, predicting, and delayed branching.

Typology: Slides

2011/2012

Uploaded on 07/26/2012

parinita
parinita 🇮🇳

4.8

(6)

67 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 6.1 - Pipelining1 1
Moore’s Law
Moore’s Law says that the number of processors on a chip doubles
about every 18 months.
Given the data on the following two slides, is this true?
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Understanding Moore's Law & Performance Improvement with Pipelining and more Slides Assembly Language Programming in PDF only on Docsity!

Chapter 6.1 - Pipelining

Moore’s Law

Moore’s Law says that the number of processors on a chip doublesabout every 18 months.Given the data on the following two slides, is this true?

docsity.com

Chapter 6.1 - Pipelining

2 docsity.com

Chapter 6.1 - Pipelining

Intel Architecture

docsity.com

Chapter 6.1 - Pipelining

Intel Architecture

docsity.com

Chapter 6.1 - Pipelining

Pipelining is Natural!

-^ Laundry Example •^ Ann, Brian, Cathy, Daveeach have one load of clothesto wash, dry, and fold •^ Washer takes 30 minutes •^ Dryer takes 40 minutes •^ “Folder” takes 20 minutes

A^ B

C^

D

docsity.com

Chapter 6.1 - Pipelining

Sequential Laundry

30 A B C D • Sequential laundry takes 6 hours for 4 loads • If they learned pipelining, how long would laundry take?

6 PM^

Midnight 11

T a s k O r d e r

Time

docsity.com

Chapter 6.1 - Pipelining

Pipelining Lessons

-^ Pipelining doesn’t helplatency of single task, it helpsthroughput of entire workload •^ Pipeline rate limited byslowest pipeline stage •^ Multiple tasks operatingsimultaneously usingdifferent resources •^ Potential speedup = Numberpipe stages •^ Unbalanced lengths of pipestages reduces speedup •^ Time to “fill” pipeline andtime to “drain” it reducesspeedup •^ Stall for Dependences

6 PM A B C D

T a s k O r d e r

Time

docsity.com

Chapter 6.1 - Pipelining

The Five Stages of An Instruction• Ifetch: Instruction Fetch^ – Fetch the instruction from the Instruction Memory• Reg/Dec: Registers Fetch and Instruction Decode • Exec: Calculate the memory address • Mem: Read the data from the Data Memory • Wr: Write the data back to the register file

Cycle 1^

Cycle 2^

Cycle 3^

Cycle 4^

Cycle 5

Ifetch^

Reg/Dec^

Exec^

Mem^

Wr

lw

docsity.com

Chapter 6.1 - Pipelining

Basic Idea

-^ Single-Cycle Datapath; Colored lines show flow of data backwards. •^ What do we need to add to split the datapath into stages? (^4) Address^ Instruction^ memory

32

AddAdd resultShift left 2 0

0 M^ u^ x 1 Add Instruction PC

0 Write data

1 M^ u^ x

Read register 1Read^ data 1Read register 2RegistersReadWrite^ data 2 registerWrite data^16 Sign^ extend

Read^ data AddressData^ memory ZeroALUALU resultM u x 1

IF: Instruction fetch

ID: Instruction decode/^ register file read

EX: Execute/ address calculation

MEM: Memory access

WB: Write back

docsity.com

Chapter 6.1 - Pipelining

Pipelined (Single-Cycle) Datapath

-^ Pipeline registers (colored), separate the datapath stages. •^ Must be wide enough to store data, control and conditions as they flowdownstream.

(^4) Address^ Instruction^ memory

32

AddAdd resultShift left 2 0

IF/ID^ Instruction

EX/MEM^

MEM/WB

0 M^ u^ x 1 Add PC

0 Write data

1 M^ u^ x

Read register 1Read^ data 1Read register 2RegistersReadWrite^ data 2^ registerWrite^ data^16 Sign^ extend

Read^ data

ID/EX ZeroALUALU^ resultM^ u^ x^1

AddressData^ memory

64 bits

128 bits

97 bits

64bits

docsity.com

Chapter 6.1 - Pipelining

Can pipelining get us into trouble?

-^ Yes: Pipeline Hazards^ – structural hazards: attempt to use the same resource two different waysat the same time -^ e.g.

, combined washer/dryer would be a structural hazard or folderbusy doing something else (watching TV)

  • control hazards: attempt to make a decision before condition is evaluated•^

e.g. , washing football uniforms and need to get proper detergentlevel; need to see after dryer before next load in• branch instructions

  • data hazards: attempt to use item before it is ready•^

e.g. , one sock of pair in dryer and one in washer; can’t fold until getsock from washer through dryer• instruction depends on result of prior instruction still in the pipeline

-^ Can always resolve hazards by waiting^ – pipeline control must detect the hazard– take action (or delay action) to resolve hazards

docsity.com

Chapter 6.1 - Pipelining

Mem

Single Memory Is a Structural Hazard I n s t r. O r d e r

Time (clock cycles)

LoadInstr 1Instr 2Instr 3Instr 4

ALU

Mem^ Reg

Mem^

Reg ALU Mem^ Reg

Mem^

Reg ALU Mem^ Reg

Mem^

Reg ALU Reg^

Mem^

Reg ALU Mem^ Reg

Mem^

Reg

Detection is easy in this case! (right half highlight means read, left half write)

lw^ needs memory here

Instruction fetchneeds memory here

docsity.com

Chapter 6.1 - Pipelining

-^ Stall: wait until decision is clear (conditionalbranching). •^ Impact

: 2 lost cycles (

i.e. , 3 clock cycles per branch

Control Hazard Solution #1: Stall I n s t r. O r d e r instruction) => slow. • Move decision to end of decode.^ – save 1 cycle per branch.

Time (clock cycles)

AddBeqLoad

ALU

Mem^ Reg

Mem^

Reg ALU Mem^ Reg

Mem^

Reg

ALU

Reg^

Mem^

Reg

Mem

Lostpotential

docsity.com

Chapter 6.1 - Pipelining

-^ Predict: guess one direction then back up if wrong •^ Impact: 0 lost cycles per branch instruction if right, 1 if wrong(right 50% of time)^ – Need to “Squash” and restart following instruction if wrong– Produce CPI on branch of (

^ .5 + 2

^ .5) = 1.

  • Total CPI might then be: 1.

^ .2 + 1

^ .8 = 1.1 (20% branch)

Control Hazard Solution #2: Predict I n s t r. O r d e r • More dynamic scheme: history of 1 branch ( 90%)

Time (clock cycles)

AddBeqLoad

ALU

Mem^ Reg

Mem^

Reg ALU Mem^ Reg

Mem^

Reg Mem

ALU

Reg^

Mem^

Reg

docsity.com