Building a Pipelined Datapath in Computer Organization - Prof. Homer, Study notes of Computer Architecture and Organization

The process of building a pipelined datapath in computer organization, including the need to re-arrange items, add pipeline registers, and handle data hazards through register forwarding and stalls.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-4nj
koofers-user-4nj 🇺🇸

9 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSc 252 — Computer Organization 8 — Pipelining
1
Pipelining
Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)
Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).
If only one person’s wash, it takes 2 hours to complete.
If several folks need to do laundry, can do in 2 hours each — sequential solution:
But, the washer, dryer, “folder”, and “stasher” are independent units.
CSc 252 — Computer Organization 8 — Pipelining
Pipeline basics:
Pipelined laundry takes 3.5 hours for four loads:
Pipelining:
Does not help the latency of a single tasks — still takes 2 hours to do one person’s laundry.
Does help the throughput of the entire work load — 3.5 hours vs. 8 hours.
Multiple tasks operating simultaneously, each using different resources.
Potential speedup = number of pipe stages.
Rate limited by slowest pipeline stage.
Unbalanced lengths of pipe stages reduces speedup.
Time to “fill” pipeline and time to “drain” it reduces speedup.
2
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Building a Pipelined Datapath in Computer Organization - Prof. Homer and more Study notes Computer Architecture and Organization in PDF only on Docsity!

CSc 252 — Computer Organization 1 8 — Pipelining

Pipelining

Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)

Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).

If only one person’s wash, it takes 2 hours to complete.

If several folks need to do laundry, can do in 2 hours each — sequential solution:

But, the washer, dryer, “folder”, and “stasher” are independent units.

Pipeline basics :

  • Pipelined laundry takes 3.5 hours for four loads:
  • Pipelining:

Does not help the latency of a single tasks — still takes 2 hours to do one person’s laundry.

Does help the throughput of the entire work load — 3.5 hours vs. 8 hours.

- Multiple tasks operating simultaneously, each using different resources.

Potential speedup = number of pipe stages.

- Rate limited by slowest pipeline stage.

Unbalanced lengths of pipe stages reduces speedup.

- Time to “fill” pipeline and time to “drain” it reduces speedup.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline basics (continued):

Consider the load word instruction:

lw $s0, 0($t0)

IFetch : Instruction Fetch: get the instruction from memory.

Reg/Dec : Fetch values from Registers and Decode the instruction.

Exec : Execute; calculate the memory address from which to load the word.

Mem : Read the word from Memory.

Write : Write the word to the Register.

3

IFetch Reg/Dec Exec Mem Write

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Pipeline basics (continued):

A more realistic picture: Not all cycles take the same amount of time:

Memory access is slower.

ALU computation is slower.

Register access is faster.

Figure 4.26, page 333 (4th edition) (There is a similar Figure 6.2, page 439 in the 3rd edition):

Instruction class

Instruction

fetch

Register

read

ALU

operation

Data

access

Register

write

Total

time

Load word ( lw ) 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps

Store word ( sw ) 200 ps 100 ps 200 ps 200 ps 700 ps

R-format ( add , sub , and , or , slt ) 200 ps 100 ps 200 ps 100 ps 600 ps

Branch ( beq ) 200 ps 100 ps 200 ps 500 ps

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

Add pipeline registers in-between each pipeline stage.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

IF:

Instruction fetch

ID:

Instruction decode/

register file read

EX:

Execute or

address calculation

MEM:

Memory

access

WB:

Write

back

M

u

x

0

1

M

u

x

1

0

7

Building a Pipelined datapath (continued):

How big is each pipeline register? How many bits are in each? (We’ll need more bits before we are done…)

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

IF:

Instruction fetch

ID:

Instruction decode/

register file read

EX:

Execute or

address calculation

MEM:

Memory

access

WB:

Write

back

M

u

x

0

1

M

u

x

1

0

CSc 252 — Computer Organization 8 — Pipelining

Building a Pipelined datapath (continued):

Problem:

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

ID:

Instruction decode/

register file read

9

We know the number of the register

to write on the 2nd clock cycle.

But, we do not have the data to

write until the 5th clock cycle.

We need to “remember” the

number of the register until

the 5th clock cycle…

Building a Pipelined datapath (continued):

The write register value is stored in the ID/EX register on cycle 2, then in EX/MEM on cycle 3, then in MEM/

WB on cycle 4. The value is finally used on cycle 5.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

M

u

x

0

1

M

u

x

1

0

ID/EX, EX/MEM,

and MEM/WB are

now larger by

how many bits?

CSc 252 — Computer Organization 8 — Pipelining

Representing Pipelines (continued):

Can help with answering questions such as:

How many clock cycles does it take to execute this code?

What is the ALU doing during clock cycle 4? What else is happening during clock cycle 4?

Can use this representation to help understand datapaths through the CPU.

13

lw $10, 20($1)

sub $11,$2,$

IF ID MEM

EX

WB

sw $12,28($4)

time flows down:

program execution

order

IF ID

EX

WB

IF ID MEM

EX

WB

MEM

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

Instruction

[31-26]

Instruction

[25-21]

Instruction

[20-16]

Instruction

[15-11]

Instruction

[15-0]

M

u

x

0

1

Zero

ALU

result

ALU

M

u

x

0

1

Instruction

[5-0]

M

u

x

0

1

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

RegDst

Branch

MemRead

MemtoReg

ALUOp

MemWrite

ALUSrc

RegWrite

M

u

x

1

0

Shift

left 2

PC+4 [31-28]

Instruction

[25-0]

Jump

Pipeline Control :

Control wires are (more or less) the same ones we used before (as per the single-clock cycle implementation).

The lw instruction

uses bits 20-16.

The arithmetic ( add , sub , and , or ,

slt ) instructions use bits 15-11.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

A specific example: the number of the register to which the result is written.

The lw instruction uses bits 20-16.

The arithmetic ( add , sub , and , or , slt ) instructions use bits 15-11.

We do not know which set of bits to use until the end of the second clock cycle.

Therefore, the control wire cannot be turned on or off until the third clock cycle.

The multiplexor has to be in the third clock cycle portion of the CPU.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

15

Pipeline Control (continued):

The second clock cycle has the Control unit. It gets the opcode from wires 31-26.

Control turns the RegDst control wire on or off.

Store the RegDst control wire in the ID/EX pipeline register for use during the 3rd clock cycle.

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

[31-26]

RegDst

How many bits are now in

the ID/EX pipeline register?

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

M

u

x

0

1

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

19

How many bits in the

EX/MEM pipeline

register?

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

M

u

x

0

1

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

MemtoReg

RegWrite

Pipeline Control (continued):

5th clock cycle control wires:

MemtoReg, RegWrite.

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

M

u

x

0

1

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

MemtoReg

RegWrite

21

How many bits?

Pipeline Control (continued):

  • What are the settings needed for each control line on each stage of the pipeline?
    • Only need to look at stages 3, 4, and 5.

Why not 1 and 2?

X = “does not care”

Taken from the single-clock cycle, but re-arranged to fit the stages for pipelining.

See figure 4.49, page 360 (4th edition); figure 6.28, page 469 (3rd edition).

Instruction

Execution/Address Calculation

3rd Stage

Memory Access

4th Stage

Write-back

5th Stage

RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoReg

R-format 1 1 0 0 0 0 0 1 0

lw 0 0 0 1 0 1 0 1 1

sw X 0 0 1 0 0 1 0 X

beq X 0 1 0 1 0 0 0 X

CSc 252 — Computer Organization 8 — Pipelining

Pipeline Control (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Sign

extend

16 32

Shift

left 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Add

4

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

Sum

Add

M

u

x

0

1

M

u

x

1

0

[20-16]

[15-11]

M

u

x

1

0

[31-26]

RegDst

ALUOp

ALUSrc

[5-0]

Branch

MemRead

MemWrite

MemtoReg

RegWrite

25

How many bits?

Data Hazards and Forwarding (continued):

The value we need is present in a pipeline register — Use it instead of waiting.

Register file forwarding to handle read/write to same register:

sub $2, $1, $

and $12,$2,$

or $13,$5,$

add $14,$2,$

sw $15,100($2)

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg $2 10 10 10 10 10/-20 -20 -20 -20 -

EX/MEM X X X -20 X X X X X

MEM/WB X X X X -20 X X X X

MEM

EX

IF ID WB

EX

IF ID MEM WB

EX

IF ID MEM WB

WB

EX

IF ID

MEM

WB

EX

IF ID MEM

There are three cases to consider:

Value needed is in the register

Value needed is in MEM/WB

Value needed is in EX/MEM

CSc 252 — Computer Organization 8 — Pipelining

Data Hazards and Forwarding

(continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

Branch

MemRead

MemWrite

RegWrite

MemtoReg

RegWrite

Forwarding

unit

IF/ID Register Rs

IF/ID Register Rt

Tells the 2 registers needed

for ALU on this clock cycle

27

Tells the 2 registers needed

for ALU on this clock cycle

Tells the register computed

on the previous clock cycle

Tells the register computed

two clock cycles ago

Tells the register computed

on the previous clock cycle

Data Hazards and Forwarding

(continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

Branch

MemRead

MemWrite

RegWrite

MemtoReg

RegWrite

Forwarding

unit

IF/ID Register Rs

IF/ID Register Rt

M

u

x

M

u

x

3 inputs:

Value fm Registers

Value fm EX/MEM

Value fm MEM/WB 2 control wires carry result

from Forwarding unit

3 inputs:

Value fm Registers

Value fm EX/MEM

Value fm MEM/WB

CSc 252 — Computer Organization 8 — Pipelining

Data Hazards — Stalls (continued):

Stall the pipeline by keeping an instruction in the same stage.

lw $2, 20($1)

and $4, $2, $

or $5, $4, $

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

[20-16]

[15-11]

M

u

x

[31-26]

RegDst

ALUOp

ALUSrc

Branch

MemRead

MemWrite

RegWrite

MemtoReg

RegWrite

Forwarding

unit

IF/ID Register Rs

IF/ID Register Rt

M

u

x

M

u

x

Hazard

detection

[25-16]

Rt fm ID/EX

31

MemRead, to determine

if memory will be read

Registers needed by

current instruction

Register to be

read fm memory

CSc 252 — Computer Organization 8 — Pipelining

Data Hazards — Stalls (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

[20-16]

[15-11]

M

u

x

[31-26]

RegWrite

Forwarding

unit

IF/ID Register Rs

IF/ID Register Rt

M

u

x

M

u

x

Hazard

detection

[25-16]

M

u

x 0

32

Choose:

Control wire values to be passed to later cycles

Choose:

Change PC for next instruction

Choose:

Write instruction in IF/ID

No stall necessary!

CSc 252 — Computer Organization 8 — Pipelining

Data Hazards — Stalls (continued):

Write

data

Read

register 1

Registers

Read

register 2

Write

register

Read

data 1

Read

data 2

Read

Address

Instruction

[31-0]

Instruction

memory

PC

Zero

Result

ALU

Address

Write

data

Read

data

Data

memory

M

u

x

1

0

[20-16]

[15-11]

M

u

x

[31-26]

RegWrite

Forwarding

unit

IF/ID Register Rs

IF/ID Register Rt

M

u

x

M

u

x

Hazard

detection

[25-16]

M

u

x 0

33

Choose:

All control wires turned off for stall

Choose:

Leave prev instr in IF/ID

Choose:

PC stays the same to re-read instruction

Stall is needed!