












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The basics of branch prediction, including the need to predict conditional branch outcomes and quickly determine branch target addresses. It explores dynamic branch prediction schemes, such as a branch-prediction buffer or a 1-bit prediction buffer, and their performance limitations. The document also introduces a 2-bit saturating counter and its state machine, as well as the importance of correlating branch predictors and using global and local branch history. Two-level adaptive branch prediction and other branch prediction strategies are also covered.
Typology: Study notes
1 / 20
This page cannot be seen from the preview
Don't miss anything!













© Copyright by Alaa Alameldeen and Haitham Akkary 2008
Portland State University ECE 587/
n Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy, and to no branch prediction u Processor has a 20-stage 6-wide pipeline, incorrectly predicted branch leads to pipeline flush u Program can have an average of 4 instructions retire per cycle, has 100,000 conditional branches out of 1 million instructions u Perfect BP: IPC = 1,000,000/250,000 = 4. u 90% BP accuracy: 1/10 branches incorrectly predicted Ø IPC = 1,000,000/(250,000 + 0.1x100,000x20) = 2.22 (80% slower) u 95% BP accuracy: 1/20 branches incorrectly predicted Ø IPC = 1,000,000/(250,000 + 0.05x100,000x20) = 2.85 (40% slower) u 99% BP accuracy: 1/100 branches incorrectly predicted Ø IPC = 1,000,000/(250,000 + 0.01x100,000x20) = 3.70 (8% slower) u No BP: Fetch stalled until branch is resolved (5 pipeline stages) Ø IPC = 1,000,000/(250,000 + 100,000x5) = 1.33 (300% slower – 3x)
n Simplest dynamic branch prediction scheme uses a branch-prediction buffer or branch history table u Small memory indexed by the lower portion of the branch address u Stores previous branch outcomes to predict next outcome u Memory is not tagged u Prediction may have been put in the entry by a different branch
1K entries prediction buffer
n 1-bit prediction buffer stores the last executed branch outcome, and uses it to predict the next outcome u If bit = 1, branch is predicted taken u If bit = 0, branch is predicted not-taken
n A simple 1-bit scheme has a performance shortcoming u A series of branch outcomes and corresponding predictions : outcomes 1111011110111101 predictions 111101111011110 mispredictions 111 10 111 10 111 10
Predict taken “11”
Predict taken “10”
Predict not taken “01”
Predict not taken “00”
n Assuming initial state to be “11”, branch outcomes and corresponding predictions now look as follows:
outcomes 1111011110111101 states 333323333233332 predictions 111111111111111 mispredictions 111 1 1111 1 1111 1 1
n Two main structures u Branch History Register (BHR) or Branch History Table (BHT) u Pattern History Table (PHT) u Basic structure of the branch predictor: Yeh&Patt Figure 1 u Updating predictions using automatons: Yeh&Patt Figure 2
n Three different flavors u Global History Register and Global Pattern History Table (GAg) u Per-address Branch History Table and Global Pattern History Table (PAg) u Per-address Branch History Table and Per-address Pattern History Tables (PAp)
- DSUBUI R3,R1,# code example n Cost-effectiveness of three flavors u GAg has too much branch interference, needs long history u PAp needs lots of space for Per-address PHT u PAg is the most cost-effective
n Context switch u GAg almost unaffected u PAg, PAp degraded u Pros and cons for saving branch history on a context switch?
n McFarling’s Paper: u Bimodal Predictor: Figure 1 u PAg and GAg: Figure 4 and 6 u Global Predictor with Index Selection: Figure 8 u Global History with Index Sharing (GShare): Figure 10
n Using perceptrons instead of 2-bit saturating counters Ø Jimenez&Lin’s paper (skim, not in exam) Ø Provides higher prediction accuracy
n A cache that stores branch targets
n Accessed by the address of the instruction currently fetched
n Allows branch target to be read in the IF stage u When a branch is predicted taken, the fetch of the instruction at the branch target address can proceed immediately in the next cycle u Stall cycles that would have been needed to wait for the decoding of the branch and the computation of the target are saved
n A small address buffer organized as a stack
n When a Call is encountered the Return address (which is Call address + 4) is pushed onto the RAS
n When a Return instruction is encountered, the address from the top of the RAS is popped and used as the target
n Eric Rotenberg et al., “Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” MICRO 1996 (Review)
n Daniel Jimenez and Calvin Lin, “Dynamic Branch Prediction with Perceptrons,” HPCA 2001 (Skim)