Pipelining in CS433: Implementation and Hazards - Prof. Josep Torrellas, Study notes of Computer Architecture and Organization

An overview of pipelining in computer systems, focusing on its implementation and the performance issues that arise due to data and control hazards. The concept of pipeline stages, the impact of pipelining on instruction execution time and throughput, and the different types of hazards that can occur. It also discusses methods to handle these hazards, such as pipeline interlocks and compiler scheduling.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-xdk-1
koofers-user-xdk-1 🇺🇸

10 documents

1 / 64

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Copyright Josep Torrellas 1999, 2001, 2002 1
Appendix A
Instructor: Josep Torrellas
CS433
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40

Partial preview of the text

Download Pipelining in CS433: Implementation and Hazards - Prof. Josep Torrellas and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Copyright Josep Torrellas 1999, 2001, 2002

Appendix A

Instructor: Josep Torrellas

CS

Copyright Josep Torrellas 1999, 2001, 2002

Pipelining

Multiple instructions are overlapped in execution

Each is in a different stage

Each stage is called “pipe stage or segment”

Throughput: # inst completed/cycle

Each step takes a machine cycle

Want to balance the work in each stage

Ideally:Time per instruction = Time per inst in a non-pipelined

# pipe stages

Copyright Josep Torrellas 1999, 2001, 2002

Implementation of RISC Instructions

1. Instruction Fetch cycle (IF)

IR

Mem[PC]

; IR holds the instruction

NPC

PC+

2. Instruction decode/register fetch cycle (ID)

A

Regs[rs]

; decode the instruction

B

Regs[rt]

; in the meantime

Imm

sign-extend imm field of IR

;Regs A, B, Imm

; ok if some of this is not needed

Copyright Josep Torrellas 1999, 2001, 2002

3. Execution /Effective address cycle (EX)

memory ref: ALU output

A+Imm

Reg-Reg (ALU op): ALU output

A op B

Reg-Immed (ALU op): ALU output

A op Imm

Branch: ALU output

NPC+ (Imm << 2)

;address of target

cond

(A op O)

; op = equal,

= not equal

/ note: no instructions need to do 2 of these operations /

/ note: Imm has word count for branches; need to shift*

*by 2 to get bytes to add to PC /

Copyright Josep Torrellas 1999, 2001, 2002

5. Write-back cycle (WB)

Reg-Reg ALU instr: Regs[rd]

ALU output

Reg-Imm ALU instr: Regs[rt]

ALU output

Load Instruction: Regs[rt]

LMD

Now we will try to pipeline itWe need: At the end of each cycle, the data is stored in

some registers (PC,LMD,Imm,A,B,…). This allowsother instructions to execute too.

Branches

4 cycles

Rest of ins

5 cycles

Copyright Josep Torrellas 1999, 2001, 2002

Why does it work?

Use separate I and D caches

Register file can be read/written in 0.5 cycles

PC: incremented in IF

if branch taken, in EX, add PC+ (Imm << 2)

Cannot keep any state in IR

need to move it to

another register every cycle

see picture

These registers IF/ID, ID/EX, EX/MEM, MEM/WBsubsume the temp ones

e.g. Destination Reg in a LD

Copyright Josep Torrellas 1999, 2001, 2002

Control of the pipeline: set the control of the 4 MUXES

(Figure A.18)

ALU stage MUXES: set depending on instructiontype which is set by ID/EX. IR

top one: branch or not

bottom one: reg-reg ALU or other

MUX in IF:

chooses between PC+4 and EX/MEM. ALUOutputcontrolled by EX/MEM.cond

MUX in WB:

controlled by whether inst. is a LD or anALU op

Copyright Josep Torrellas 1999, 2001, 2002

Example

Unpipelined: 10ns cycle time

4 cycles for ALU (40%), branch (20%)5 cycles for mem (40%)

pipelining: adds 1 ns to clockspeedup in execution rate?Unpipelined: avg inst = clock * avg CPI =

10((40%+20%)4 + 40%5) = 44 ns*

pipelined = 11 ns

Speedup= 44/11 = 4

Copyright Josep Torrellas 1999, 2001, 2002

Pipeline Hazards

Situations that prevent the next instruction from

executing its designated clock cycle

Structural: resource conflicts e.g. not enough multipliers

Data: instruction depends on the result of a previous one. e.g. ADD R1, R2, R

ADD R4, R5, R

Control: results from instructions that change the PC. e.g. BEQ R1, label

ADD R7, R6, R

As a result, the pipeline may have to stall

Copyright Josep Torrellas 1999, 2001, 2002

Structural Hazards

Some Combination of inst. Cannot be accomodated because of resourceconflicts

Usually because some functional unit is not pipelined two instructionsusing it cannot proceed back to back

Some resource has not been replicated enough

Eg 1 register file port

Combined I,D memory

Result : Pipeline stall, like if we had inserted a bubble.

Copyright Josep Torrellas 1999, 2001, 2002

Data Hazards

Occurs because pipelining changes the order of read/write accesses tooperands

1 ADD

R1, R2, R

2 SUB

R4,R5,R

3 AND

R6,R1,R

4 OR

R8,R1,R

5 XOR

R10,R1,R

Left to their own devices all these instructions produce wrong resultsTo fix problem 4

Split register access W,R

2,3 Forwarding (bypassing or short circuiting).

  • Copyright Josep Torrellas 1999, 2001,
  • See Figure A.
    • Copyright Josep Torrellas 1999, 2001,
  • See Figure A.1 and A.
    • Copyright Josep Torrellas 1999, 2001,
  • See Figure A.18 and A.
    • Copyright Josep Torrellas 1999, 2001,
  • See Figure A.4 and Figure A.
    • Copyright Josep Torrellas 1999, 2001,
  • See Figure A.