Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture Notes on Pipelining - Computer Organization | C SC 252, Study notes of Computer Architecture and Organization

University of Arizona (UA)Computer Architecture and Organization

Prof. Homer

Material Type: Notes; Professor: Homer; Class: Computer Organization; Subject: COMPUTER SCIENCE; University: University of Arizona; Term: Spring 2009;

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-5kz 🇺🇸

10 documents

1 / 33

This page cannot be seen from the preview

Don't miss anything!

CSc 252 — Computer Organization 8 — Pipelining

1

Pipelining

Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)

•Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).

•If only one person’s wash, it takes 2 hours to complete.

•If several folks need to do laundry, can do in 2 hours each — sequential solution:

•But, the washer, dryer, “folder”, and “stasher” are independent units.

Discover Study notes of Computer Architecture and Organization University of Arizona (UA)

Partial preview of the text

Download Lecture Notes on Pipelining - Computer Organization | C SC 252 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

CSc 252 — Computer Organization 1 8 — Pipelining

Pipelining

Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)

Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).

If only one person’s wash, it takes 2 hours to complete.

If several folks need to do laundry, can do in 2 hours each — sequential solution:

But, the washer, dryer, “folder”, and “stasher” are independent units.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline basics :

Pipelined laundry takes 3.5 hours for four loads:
Pipelining:
- Does not help the^ latency^ of a single tasks — still takes 2 hours to do one person’s laundry.
- Does help the^ throughput^ of the entire work load — 3.5 hours vs. 8 hours. - Multiple^ tasks operating simultaneously, each using different resources. - Potential^ speedup^ = number of pipe stages. - Rate limited by slowest pipeline stage. - Unbalanced lengths of pipe stages reduces speedup. - Time to “fill” pipeline and time to “drain” it reduces speedup.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline basics (continued):

A more realistic picture: Not all cycles take the same amount of time:

Memory access is slower.

ALU computation is slower.

Register access is faster.

Figure 4.26, page 333 (4th edition) (There is a similar Figure 6.2, page 439 in the 3rd edition):

Instruction class

Instruction

fetch

Register

read

ALU

operation

Data

access

Register

write

Total

time

Load word ( lw ) 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps

Store word ( sw ) 200 ps 100 ps 200 ps 200 ps 700 ps

R-format ( add , sub , and , or , slt ) 200 ps 100 ps 200 ps 100 ps 600 ps

Branch ( beq ) 200 ps 100 ps 200 ps 500 ps

CSc 252 — Computer Organization 8 — Pipelining

Pipeline basics (continued):

Can improve performance by increasing the instruction throughput:

3 load word ops, 24 nanoseconds:

Becomes 3 load word ops, 13 nanoseconds:

Clock cycle time dependent on the slowest phases: 200 picoseconds in this case.

Instruction

fetch

Reg. ALU

Data

access

Reg.

lw $s1,100($t0)

800 ps

Instruction

fetch

Reg.

ALU

Data

access

Reg.

Instruction

fetch

Reg. lw $s2,200($t0)

lw $s3,300($t0)

ALU

800 ps

Instruction

fetch

Reg. ALU

Data

access

Reg.

lw $s1,100($t0)

200 ps

Instruction

fetch

Reg.

ALU

Data

access

Reg.

Instruction

fetch

Reg. lw $s2,200($t0)

lw $s3,300($t0)

ALU

Data

access

Reg.

200 ps

200 ps 200 ps 200 ps 200 ps 200 ps

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

Add pipeline registers in-between each pipeline stage.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend 16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

IF:

Instruction fetch

ID:

Instruction decode/

register file read

EX:

Execute or

address calculation

MEM:

Memory

access

WB:

Write

back

M

u

x

M

u

x

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

How big is each pipeline register? How many bits are in each? (We’ll need more bits before we are done…)

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

IF:

Instruction fetch

ID:

Instruction decode/

register file read

EX:

Execute or

address calculation

MEM:

Memory

access

WB:

Write

back

M

u

x

M

u

x

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

The write register value is stored in the ID/EX register on cycle 2, then in EX/MEM on cycle 3, then in MEM/

WB on cycle 4. The value is finally used on cycle 5.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend 16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

M

u

x

M

u

x

ID/EX, EX/MEM,

and MEM/WB are

now larger by

how many bits?

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

What makes pipelining easy?

All instructions are the same length.

Only a few instruction formats (R-type, I-type, etc.)

Memory operands appear only in loads and stores.

What makes it hard?

Structural hazards: suppose we have only one memory.

Data hazards: an instruction depends on a previous instruction.

Control hazards: need to worry about branch instructions.

We’ll build a simple pipeline and look at (some of) these issues.

(Time permitting) We’ll talk about modern processors and what really makes it hard:

Exception handling.

Trying to improve performance with out-of-order execution, etc.

CSc 252 — Computer Organization 8 — Pipelining

Representing Pipelines (continued):

Can help with answering questions such as:

How many clock cycles does it take to execute this code?

What is the ALU doing during clock cycle 4? What else is happening during clock cycle 4?

Can use this representation to help understand datapaths through the CPU.

lw $10, 20($1)

sub $11,$2,$

IF ID MEM

EX

WB

sw $12,28($4)

time flows down:

program execution

order

IF ID

EX

WB

IF ID MEM

EX

WB

MEM

CSc 252 — Computer Organization 8 — Pipelining

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend 16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Instruction

[31-26]

Instruction

[25-21]

Instruction

[20-16]

Instruction

[15-11]

Instruction

[15-0]

M

u

x

Zero

ALU

result

ALU

M

u

x

Instruction

[5-0]

M

u

x

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

RegDst

Branch

MemRead

MemtoReg

ALUOp

MemWrite

ALUSrc

RegWrite

M

u

x

Shift

left 2 PC+4 [31-28]

Instruction

[25-0]

Jump

Pipeline Control :

Control wires are (more or less) the same ones we used before (as per the single-clock cycle implementation).

The lw instruction

uses bits 20-16.

The arithmetic ( add , sub , and , or ,

slt ) instructions use bits 15-11.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

The second clock cycle has the Control unit. It gets the opcode from wires 31-26.

Control turns the RegDst control wire on or off.

Store the RegDst control wire in the ID/EX pipeline register for use during the 3rd clock cycle.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend 16 32

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

M

u

x

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

How many bits are now in

the ID/EX pipeline register?

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

M

u

x

M

u

x

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

How many bits are now in

the ID/EX pipeline register?

Other control wires for 3rd clock cycle:

ALUSrc (1 wire)

ALUOp (2 wires)

Both from single clock cycle

implementation.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

M

u

x

M

u

x

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

How many bits in the

EX/MEM pipeline

register?

CSc 252 — Computer Organization 8 — Pipelining

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

Sum

Add

M

u

x

M

u

x

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

MemtoReg

RegWrite

Lecture Notes on Pipelining - Computer Organization | C SC 252, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Lecture Notes on Pipelining - Computer Organization | C SC 252 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Pipelining

Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)

Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).

If only one person’s wash, it takes 2 hours to complete.

If several folks need to do laundry, can do in 2 hours each — sequential solution:

But, the washer, dryer, “folder”, and “stasher” are independent units.

Pipeline basics :

Pipeline basics (continued):

A more realistic picture: Not all cycles take the same amount of time:

Memory access is slower.

ALU computation is slower.

Register access is faster.

Figure 4.26, page 333 (4th edition) (There is a similar Figure 6.2, page 439 in the 3rd edition):

Instruction class

Instruction

fetch

Register

read

ALU

operation

Data

access

Register

write

Total

time

Load word ( lw ) 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps

Store word ( sw ) 200 ps 100 ps 200 ps 200 ps 700 ps

R-format ( add , sub , and , or , slt ) 200 ps 100 ps 200 ps 100 ps 600 ps

Branch ( beq ) 200 ps 100 ps 200 ps 500 ps

Pipeline basics (continued):

Can improve performance by increasing the instruction throughput:

3 load word ops, 24 nanoseconds:

Becomes 3 load word ops, 13 nanoseconds:

Clock cycle time dependent on the slowest phases: 200 picoseconds in this case.

ALU

ALU

ALU

ALU

Building a Pipelined datapath (continued):

Add pipeline registers in-between each pipeline stage.

Registers

[31-0]

Instruction

memory

PC

Add

ALU

Data

memory

M

u

x

Add

IF:

Instruction fetch

ID:

Instruction decode/

register file read

EX:

Execute or

address calculation

MEM:

Memory

access

WB:

Write

back

M

u

x

M

u

x

Building a Pipelined datapath (continued):

How big is each pipeline register? How many bits are in each? (We’ll need more bits before we are done…)