









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various branch prediction techniques used in computer architecture, including static vs. Dynamic prediction, bimodal prediction, 1-bit and 2-bit prediction, correlating predictors, local and global predictors, and tournament predictors. The document also discusses the importance of branch prediction in reducing control hazards and improving processor performance.
Typology: Study notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Topics: bimodal, global, local branch prediction^ (Sections 2.3-2.6)
Static ILP:
The compiler finds parallelism
no scoreboarding
higher clock speeds and lower power + Compiler knows what is next
better global schedule
- Compiler can not react to dynamic events (cache misses) - Can not re-order instructions unless you provide hardware and extra instructions to detect violations (eats into the low complexity/power argument) - Static branch prediction is poor
even statically scheduled processors use hardware branch predictors
- Building an optimizing compiler is easier said than done - A comparison of the Alpha, Pentium 4, and Itanium (statically^ scheduled IA-64 architecture) shows that the Itanium is not^ much better in terms of performance, clock speed or power
IF (br) PC Reg Read^ Compare^ Br-target PC + 4 In the 5-stage pipeline, a branch completes in two cycles Æ If the branch went the wrong way, one incorrect instr is fetched Æ One stall cycle per incorrect branch
IF (br) PC Reg Read^ Compare^ Br-target In the 5-stage pipeline, a branch completes in two cycles Æ If the branch went the wrong way, one incorrect instr is fetched Æ One stall cycle per incorrect branch Branch Predictor
For each branch, keep track of what happened last time and use that outcome as the prediction
What are prediction accuracies for branches 1 and 2 below: while (1) { for (i=0;i<10;i++) { branch- … } for (j=0;j<20;j++) { branch- … } }
For each branch, maintain a 2-bit saturating counter:^ if the branch is taken: counter = min(3,counter+1)^ if the branch is not taken: counter = max(0,counter-1)
If (counter >= 2), predict taken, else predict not taken - Advantage: a few atypical branches will not influence the^ prediction (a better measure of “the common case”) - Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor) - Can be easily extended to N-bits (in most processors, N=2)
A single register that keeps track^ of recent history for all branches
Branch PC 8 bits 6 bits Table of 16K entries of 2-bit saturating counters Also referred to as a two-level predictor
Branch PC Table of 16K entries of 2-bit saturating counters Table of 64 entries of 14-bit histories for a single branch 10110111011001 Use 6 bits of branch PC to index into local history table 14-bit history^ indexes into next level Also a two-level predictor that only uses local histories at the first level
A local predictor might work well for some branches or programs, while a global predictor might work well for others
Provide one of each and maintain another predictor to identify which predictor is best for each branch Tournament Predictor Branch PC Table of 2-bit saturating counters Local Predictor Global Predictor M U^ X Alpha 21264: 1K entries in level-1 1K entries in level-2 4K entries 12-bit global history 4K entries Total capacity:?
In addition to predicting the branch direction, we must also predict the branch target address
Branch PC indexes into a predictor table; indirect branches might be problematic - Most common indirect branch: return from a procedure – can be easily handled with a stack of return addresses
Bullet