Branch Prediction - Lecture Notes | ECE 4100, Study notes of Computer Architecture and Organization

Material Type: Notes; Professor: Lee; Class: Adv Computer Architecure; Subject: Electrical & Computer Engr; University: Georgia Institute of Technology-Main Campus; Term: Fall 2000;

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-1l8
koofers-user-1l8 🇺🇸

5

(1)

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 4100/6100
Advanced Computer Architecture
Lecture 5 Branch Prediction
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
2
Reading for this Module
Branch Prediction
Appendix A.2 (pg. A-21 – A-26), Section 2.3
Branch Target Buffers and Return Address
Predictors
–Section 2.9
Reading assignments
Papers on class website
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Branch Prediction - Lecture Notes | ECE 4100 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

ECE 4100/

Advanced Computer Architecture

Lecture 5 Branch Prediction

Prof. Hsien-Hsin Sean Lee

School of Electrical and Computer Engineering

Georgia Institute of Technology

Reading for this Module

  • Branch Prediction
    • Appendix A.2 (pg. A-21 – A-26), Section 2.
  • Branch Target Buffers and Return Address

Predictors

  • Section 2.
  • Reading assignments
  • Papers on class website

3

Control Dependencies

  • Control dependencies determine execution order of

instructions

  • Instructions may be control dependent on a branch

DADD R5, R6, R

BNE R4, R2, CONTINUE

DMUL R4, R2, R5 DSUB R4, R9, R

Structural

Dependencies

Data (^) Name Control

Anti Output

I-Fetch ExecutionCore Retire

Core

Predict What?

  • Direction (1-bit)
    • Single direction for unconditional jumps and calls/returns
    • Binary for conditional branches
  • Target (32-bit or 64-bit addresses)
    • Some are easy
      • One: Uni-directional jumps
      • Two: Fall through (Not Taken) vs. Taken
    • Many: Function Pointer or Indirect Jump (e.g. jr r31)

7

Branch Misprediction

PC Next PC Fetch DriveAlloc Rename Queue Schedule Dispatch Reg File ExecFlags Br Resolve

Single Issue

Mispredict

Branch Misprediction

PC Next PC Fetch DriveAlloc Rename Queue Schedule Dispatch Reg File ExecFlags Br Resolve

Single Issue (flush entailed instructions and refetch)

Mispredict

9

Branch Misprediction

PC Next PC Fetch DriveAlloc Rename Queue Schedule Dispatch Reg File ExecFlags Br Resolve

Single Issue

Fetch the correct path

Branch Misprediction

PC Next PC Fetch DriveAlloc Rename Queue Schedule Dispatch Reg File ExecFlags Br Resolve

Single Issue

Mispredict 8-issue Superscalar Processor (Worst case)

13

Static prediction

  • Static prediction is used to guide code

scheduling strategies

  • Simple strategy for all branches in the code
  • Based on opcode or direction of branches
  • Profile based Æ

•individual branches tend to be strongly bimodal (set a bit in the opcode)

  • Provide mechanisms for compilers or

programmers to provide hints

  • Bit in the instruction encoding
  • I-fetch is steered accordingly

Profile Guided Static Prediction

IF ID EX MEM WB

IF

IF ID EX MEM WB

Bubble Bubble Bubble

beq $1,$2,L1 Branch bit in the encoded instruction is set to 1

Target address is known here

15

Profile Guided Static Prediction

IF ID EX MEM^ WB

IF

IF ID Bubble^ Bubble

Bubble

IF ID EX MEM WB

ID EX Bubble

Encoded prediction is incorrect

Branch condition is known here

Dynamic Branch Prediction Strategies

  • Use past behavior to predict the future
  • Local vs. global behaviors
    • Branches show surprisingly good correlation with

one another and their history

  • They are not totally random events

n-1 0

prediction

Last branch behavior, i.e., taken or not taken

From Ref: “Modern Processor Design: Fundamentals of Superscalar Processors, J. Shen and M. Lipasti

Shift register

How do we capture this history?

How do we predict?

19

Simplest Dynamic Branch Predictor

T

NT

T

T

NT

NT

. . .

addi r10, r0, 100 addi r1, r1, r L1: add r21, r20, r lw r2, (r21) beq r2, r0, L … … j L L2: … … … L3: addi r1, r1, 1 bne r1, r10, L

0x 0x

0x 0x4001010c 0x

0x

0x40010B0c 0x40010B

for (i=0; i<100; i++) { if (a[i] == 0) { … } … }

NT

T

1-bit Branch History Table

FSM of the Simplest Predictor

  • A 2-state machine
  • Change mind fast

If branch not taken

If branch taken

Predict not taken

Predict taken

21

Example using 1-bit branch history table

for (i=0; i< 44 ; i++) { …. }

Pred 00

Actual T T

T T

addi r10, r0, 4 addi r1, r1, r L1: … … addi r1, r1, 1 bne r1, r10, L

NT

T

T

T T

NT

T

60% accuracy

2-bit Saturating Up/Down Counter Predictor

Not Taken

Taken

Predict Not taken

Predict taken

ST: Strongly Taken

WT: Weakly Taken

WN: Weakly Not Taken

SN: Strongly Not Taken

WN

WN

SN

SN

WT

WT

ST

ST

MSB: Direction bit

LSB: Hysteresis bit

25

Capturing Global Behavior

  • A shift register captures the

local path through the

program

  • For each unique path a

predictor is maintained

  • Prediction is based on the

behavior history of each

local path

  • Shift register length

determines program region

size

B

B2 (^) B

B4 B

B6 B

T F

B

Branch Correlation

  • Branch direction
    • Not independent
    • Correlated to the path taken
  • Example: Path 1-1 of b3 can be surely known beforehand
  • Track path using a 2-bit register

if (aa==2) // b aa = 0; if (bb==2) // b bb = 0; if (aa!=bb) { // b ……. }

b

b2 b

b3 b3 b

1 (T)
0 (NT)

b

Path: A:1-1 B:1-0 C:0-1 D:0- aa= bb=

aa= bb≠ 2

aa≠ 2 bb=

aa≠ 2 bb≠ 2

Code Snippet

27

Correlated Branch Predictor [PanSoRahmeh’92]

  • (M,N) correlation scheme
    • M: shift register size (# bits)
    • N: N-bit counter

2-bit

counte

r

hash. . . .

X X

Branch PC

hash

2-bit counter

. . . . 2-bit counter

. . . .

X X

2-bit counter

. . . . 2-bit counter

. . . .

Prediction Prediction

2-bit shift register

(global branch history)

select

Subsequent branch direction

(2,2) Correlation Scheme

2-bit Sat. Counter Scheme

2 w

w

Branch PC

Two-Level Branch Predictor [YehPatt91,92,93]

  • Generalized correlated branch predictor
  • 1 st^ level keeps branch history in Branch History Register (BHR)
  • 2 nd^ level segregates pattern history in Pattern History Table (PHT)

Branch History Pattern

Pattern History Table (PHT)

Prediction

Rc-k Rc-

Rc: Actual Branch Outcome

FSM Update Logic

Branch History Register (BHR)

(Shift left when update)

N

2 N^ entries

PHT update Current State

31

Key Idea

  • Separate all of the histories of a branch Æ

sub-histories

  • For each sub-history employ a separate

predictor

  • Each history maps to a FSM

Global History Schemes

Global BHR

Global PHT

GAg

Global BHR

..

SetP(B) Per-set

PHTs (SPHTs)

GAs

Global BHR

..

Addr(B) Per-addr

PHTs (PPHTs)

GAp

  • [PanSoRahmeh’92]* similar to GAp

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Set can be determined by branch opcode, compiler classification, or branch PC address.

33

GAs Two-Level Branch Prediction

(^01100110)

BHR

PC = 0x4001000C

.

.

.

PHT

MSB = 1

Predict Taken

The 2 LSBs are insignificant for

32-bit instruction

Predictor Update (Actually, Not Taken)

(^01100110)

BHR

PC = 0x4001000C

.

.

.

PHT

1001 decremented

(^11001100)

Wrong

Prediction

  • Update Predictor after branch is resolved

37

Per-Set History Schemes

Global PHT

SAg

SetP(B) Per-set

PHTs (SPHTs)

SAs

Addr(B) Per-addr

PHTs (PPHTs)

SAp

.

.

.

Per-set BHT (SBHT)

.

.

.

SetH(B)

Per-set BHT (SBHT)

.

.

.

SetH(B)

Per-set BHT (SBHT)

..

.

.

.

.

.

.

.

.

.

.

.

.

..

.

.

.

.

.

.

.

.

.

SetH(B)

PHT Indexing

Gselect

Global

history

Branch addr

Insufficient

History

  • Tradeoff between more history bits and address bits
  • Too many bits needed in Gselect ⇒ sparse table entries

39

Gshare Branch Predictor [McFarling93]

  • Tradeoff between more history bits and address bits
  • Too many bits needed in Gselect ⇒ sparse table entries
  • Gshare ⇒ Not to lose global history bits
  • Ex: AMD Athlon, MIPS R12000, Sun MAJC, Broadcom SiByte’s SB-

Gshare

Gselect

Global

history

Branch addr

Gselect Gselect 4/4: Index PHT by concatenateconcatenate low order 4 bits

Gshare Gshare 8/8: Index PHT by {Branch address ⊕ Global history}

Gshare Branch Predictor

PHT

MSB = 0

Predict Not Taken

PC Address

Global BHR