Dynamic Hardware Branch Prediction: Techniques and Algorithms - Prof. Josep Torrellas, Study notes of Computer Architecture and Organization

Dynamic hardware branch prediction techniques used to improve processor performance by predicting the outcome of branches. Topics include branch prediction buffers, two-bit prediction schemes, saturating counters, correlating predictors, tournament predictors, and branch target buffers. The document also discusses the advantages and disadvantages of each approach.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-3a7
koofers-user-3a7 🇺🇸

10 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Copyright J. Torrellas 1999,2001,2002,2007 1
Chapter 2 (CONT)
Instructor: Josep Torrellas
CS433
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Dynamic Hardware Branch Prediction: Techniques and Algorithms - Prof. Josep Torrellas and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Copyright J. Torrellas 1999,2001,2002,

Chapter 2 (CONT)Instructor: Josep Torrellas

CS

Copyright J. Torrellas 1999,2001,2002,

Dynamic Hardware Branch Prediction

-^

Control hazards are sources of losses , especially forprocessors that want to issue > 1 instr / cycle

-^

this approach : use H/W to dynamically predict theoutcome of a branch (may change with time) 1

Branch prediction buffer ( branch history table)– Small memory indexed by lower bits of addr of branch

instruction

  • Contains 1 bit that says if branch was recently taken

Copyright J. Torrellas 1999,2001,2002,

  1. N-bit saturating counter•^

n bit counter can take from 0 to 2 - 1

-^

if count

branch predict taken

else predict untaken

-^

if taken, increment ; if untaken , decrement Accuracy : 4096 entries in table , 2 bits (large)

misprediction rate

≈1 - 18% in spec

usually better at FP programs (more loops)

n

n-

Copyright J. Torrellas 1999,2001,2002,

4 Look at recent behavior of other branches too

“correlating predictors or two-level predictors”if ( d == 0)

d =1 ;

if (d == 1)e.g scheme where each branch has 2 separate bits

prediction used

prediction used if

if the last branch

the last branch taken

not taken

example

works very well for b

Copyright J. Torrellas 1999,2001,2002,

•^

A two bit predictor with no global history is a (0,2)predictor

bits in an (m,n) predictor?^2

  • n * #entries e.g (2,2) See figure 3.14 (old edition)

m

Copyright J. Torrellas 1999,2001,2002,

Tournament Predictors

•^

Use multiple predictors, usually–

One based on global info– One based on local info– Combining them with a selector

-^

They do very well

-^

They are a popular form of Multilevel branch predictors (useseveral levels of branch prediction tables together with analgorithm for choosing among them)

-^

Existing ones: use a 2-bit saturating counter per branch tochoose among two different predictors (the four states of thecounter dictate whether to use predictor 1 or 2)

Copyright J. Torrellas 1999,2001,2002,

Branch Target Buffers (BTB)

•^

In addition to predicting the branch,we need to guess thetarget address

-^

If the target address can be determined by end of IF →

zero branch penalty

-^

BTB : Small cache that stores the predicted address for thenext instruction after a branch Note:•^

If we use a branch prediction table

accessed during ID →

branches have 1 cycle penalty

-^

If we use a BTB

accessed during IF →

branches have 0 cycle penalty

Copyright J. Torrellas 1999,2001,2002,

BTB : How It Works

•^

During the IF , we use the addr to access the BTB

-^

If hit, we extract the address of the next inst. to fetch

-^

Note : unlike the branch prediction buffer , the entry mustbe for this instruction, else would fetch something wrong

-^

Note : Only need to store predicted - taken branches

-^

Easiest scheme : Store in the BTB only PC-relativeconditional branches

target address is a constant

Copyright J. Torrellas 1999,2001,2002,

•^

No branch delay if entry found in BTB and it is correct

-^

There is some cost in updating the BTB in case ofmisprediction or wrong target

-^

We do not try to update BTB while fetching instr →

could not access it →

best to stall 1 or 2 cycles

•^

Can be combined with a branch prediction table to decidewhen to put entries in BTB

Copyright J. Torrellas 1999,2001,2002,

See figure 2.

•^

If branch not correctly predicted (not taken or taken towrong target ) : 1 cycle update BTB

1 cycle to restart fetching

•^

If branch not found & taken: 2 cycles update BTB

Copyright J. Torrellas 1999,2001,2002,

Fancier BTBs

•^

Store target instruction instead of target address →

BTB access can take longer now (

larger BTB)

can do “branch folding “( achieves zero cycle unconditional BR and sometimeszero cycle cond. BR.)Processsor

target

Uncond. Br.cache

BTB

Zero costunconditional Br

Copyright J. Torrellas 1999,2001,2002,

Integrated Instruction Fetch Units

-^

Have a fancy module that implements IF inmultiple cycles (since it has to provide multiple I)

-^

IFU performs the following functions:– Branch prediction– Instruction prefetch– Buffering of instructions (may come from multiple

cache lines, etc)

-^

IFU provides I to the issue stage