










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Dynamic hardware branch prediction techniques used to improve processor performance by predicting the outcome of branches. Topics include branch prediction buffers, two-bit prediction schemes, saturating counters, correlating predictors, tournament predictors, and branch target buffers. The document also discusses the advantages and disadvantages of each approach.
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Copyright J. Torrellas 1999,2001,2002,
Copyright J. Torrellas 1999,2001,2002,
Dynamic Hardware Branch Prediction
-^
Control hazards are sources of losses , especially forprocessors that want to issue > 1 instr / cycle
-^
this approach : use H/W to dynamically predict theoutcome of a branch (may change with time) 1
Branch prediction buffer ( branch history table)– Small memory indexed by lower bits of addr of branch
instruction
Copyright J. Torrellas 1999,2001,2002,
n bit counter can take from 0 to 2 - 1
-^
if count
branch predict taken
else predict untaken
-^
if taken, increment ; if untaken , decrement Accuracy : 4096 entries in table , 2 bits (large)
misprediction rate
≈1 - 18% in spec
usually better at FP programs (more loops)
n
n-
Copyright J. Torrellas 1999,2001,2002,
4 Look at recent behavior of other branches too
“correlating predictors or two-level predictors”if ( d == 0)
d =1 ;
if (d == 1)e.g scheme where each branch has 2 separate bits
prediction used
prediction used if
if the last branch
the last branch taken
not taken
example
works very well for b
Copyright J. Torrellas 1999,2001,2002,
A two bit predictor with no global history is a (0,2)predictor
m
Copyright J. Torrellas 1999,2001,2002,
Use multiple predictors, usually–
One based on global info– One based on local info– Combining them with a selector
-^
They do very well
-^
They are a popular form of Multilevel branch predictors (useseveral levels of branch prediction tables together with analgorithm for choosing among them)
-^
Existing ones: use a 2-bit saturating counter per branch tochoose among two different predictors (the four states of thecounter dictate whether to use predictor 1 or 2)
Copyright J. Torrellas 1999,2001,2002,
In addition to predicting the branch,we need to guess thetarget address
-^
If the target address can be determined by end of IF →
zero branch penalty
-^
BTB : Small cache that stores the predicted address for thenext instruction after a branch Note:•^
If we use a branch prediction table
→
accessed during ID →
branches have 1 cycle penalty
-^
If we use a BTB
→
accessed during IF →
branches have 0 cycle penalty
Copyright J. Torrellas 1999,2001,2002,
During the IF , we use the addr to access the BTB
-^
If hit, we extract the address of the next inst. to fetch
-^
Note : unlike the branch prediction buffer , the entry mustbe for this instruction, else would fetch something wrong
-^
Note : Only need to store predicted - taken branches
-^
Easiest scheme : Store in the BTB only PC-relativeconditional branches
→
target address is a constant
Copyright J. Torrellas 1999,2001,2002,
No branch delay if entry found in BTB and it is correct
-^
There is some cost in updating the BTB in case ofmisprediction or wrong target
-^
We do not try to update BTB while fetching instr →
could not access it →
best to stall 1 or 2 cycles
Can be combined with a branch prediction table to decidewhen to put entries in BTB
See figure 2.
If branch not correctly predicted (not taken or taken towrong target ) : 1 cycle update BTB
1 cycle to restart fetching
If branch not found & taken: 2 cycles update BTB
Copyright J. Torrellas 1999,2001,2002,
Store target instruction instead of target address →
BTB access can take longer now (
larger BTB)
can do “branch folding “( achieves zero cycle unconditional BR and sometimeszero cycle cond. BR.)Processsor
target
Uncond. Br.cache
BTB
Zero costunconditional Br
Copyright J. Torrellas 1999,2001,2002,
-^
-^
cache lines, etc)
-^