Branch Prediction Techniques in Computer Architecture: ECE 511 Lecture 6, Study notes of Computer Architecture and Organization

An overview of branch prediction techniques used in computer architecture. The lecture covers direct and indirect branches, the role of branch target buffer (btb) and predictor tables, 2-bit smith predictor, agreement predictor, return address stack, and branch correlation. The document also discusses the importance of high bandwidth instruction fetch and the impact of branch mispredictions on performance.

Typology: Study notes

Pre 2010

Uploaded on 03/11/2009

koofers-user-jus
koofers-user-jus 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 6 ECE 511
September 15 2004
Recap
High bandwidth instruction fetch is very important
Some branches are very predictable
Address generator:
Direct Branches
jmp address unconditional branch to an absolute address
bnz $17, +56 conditional branch to PC relative address
call address unconditional branch to an absolute address
All information needed to resolve the branch is contained in the
instruction.
Indirect Branches
ret Return from a function – all function returns are
indirect branches
jr $23 jump register – unconditional branch to an absolute
address in register $23
jalr $24 jump and link register – unconditional branch to an
absolute address in register $23
Information needed to resolve branch is contained in the instruction
PC
BTB
(cache)
+4
MUX
2-bit
Predictor
Table A
pf3
pf4

Partial preview of the text

Download Branch Prediction Techniques in Computer Architecture: ECE 511 Lecture 6 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Lecture 6 ECE 511

September 15 2004

Recap

  • High bandwidth instruction fetch is very important
  • Some branches are very predictable
  • Address generator:

Direct Branches

jmp address unconditional branch to an absolute address bnz $17, +56 conditional branch to PC relative address call address unconditional branch to an absolute address All information needed to resolve the branch is contained in the instruction.

Indirect Branches

ret Return from a function – all function returns are indirect branches jr $23 jump register – unconditional branch to an absolute address in register $ jalr $24 jump and link register – unconditional branch to an absolute address in register $

  • Information needed to resolve branch is contained in the instruction PC BTB +4 (cache) MUX 2-bit Predictor Table A

and in data register.

  • BTB's are excellent for direct branches, but less for indirect branches

2-bit Smith predictor

Right Bit

  • 0 if most recent branch was not taken
  • 1 if most recent branch was taken Left Bit
  • 0 if most recent adjacent pair of branches were NOT Taken
  • 1 if most recent adjacent pair of branches were Taken Mis-predicted branches are in BOLD.
  • The code pattern that produces 100% or thereabouts mispredicts is much less likely that the one for the simple 2 bit saturating predictor. The pattern is: Pattern: NNTTNNTTNNTTNNTT... Pattern for 2 bit saturating predictor: NTNTNTNTNTNTNTNTNT...
  • This 2 bit predictor will not mispredict the second iteration of a loop like a 1bit predictor.

Agreement Predictor

  • the BTB returns a hit if a branch was taken recently, else it returns a miss
  • the predictor returns an agree or a disagree signal
  • For example: Consider a predictor table with only 4 entries 11 10 01 t

t

t

nt

nt 00

TT

NN

t nt

nt

must really be a stack. Return addresses are pushed by function calls and popped by function returns. NOTE: after a context switch the BTB, predictor, and RA stack all contain totally bogus values. So branch prediction is very inefficient immediately after a context switch.

  • We need feedback from instruction decode to determine which branches are function calls/ returns, so that we can control pushes and pops to the RA stack. This makes things slow.
  • Alternatively, we can store a couple of flags in the BTB to indicate which branches addresses are calls/returns/other. So the BTB can generate the control logic for the stack w/o waiting for feedback from Instruction Decode pipeline stage.

Branch Correlation

  • For example printf(“%d”, my_int) printf(“%e”, my_float) ➔ the code to convert/print an int is very different from the code to convert/print a float
  • Certain calls to printf will invoke a certain set of subroutines: correlated branches. The branch associated with printing an int [printf(“%d”,...)] is correlated with an internal branch [if (format==”%d”)]. PC BTB +4 (cache) MUX Agreement Predictor Control Logic Return Address Stack