






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Two designs for alleviating the effect of branches in computer architecture: one using compile-time scheduling with delay slots and no branch prediction, and the other employing branch prediction without delay slots. Equations to calculate the increase in cpi for each design based on the probability of misprediction. Additionally, it discusses the use of a branch prediction buffer and the impact of global branch prediction on performance.
Typology: Assignments
1 / 11
This page cannot be seen from the preview
Don't miss anything!







What prediction accuracy is required in the second design to achieve the same performance as the first design?
From a. we know that 10% of the branch instructions result in two pipeline bubbles while 60% result in a one cycle bubble. We can compute the increase in CPI for each case (we can ignore the probability an instruction is a branch instruction since it is the same in both cases) we have,
The increase in CPI due to a. is prob_instr_is_a_branch*(0.6 * 1 + 0.1 * 2) =
The increase in CPI due to b. is prob_instr_is_a_branch* 1 * (p * 3) = 3p Where p is the probability of misprediction.
Equating both provides the critical value of p. The prediction accuracy = (1-p)
Address 0x0038 L3: .. 0x003C .. 0x0040 DSUBUI R3 R1 # 0x0044 BNEZ R3 L 0x0048 DADD R1 R0 R 0x004C L1: DSUBUI R3 R2 # 0x0050 BNEZ R3 L 0x0054 DADD R2 R0 R 0x0058 L2: DSUBUI R3 R1 R 0x005C BEQZ R3 L
a. Considering only the preceding code, how many entries should the branch prediction buffer have to avoid the possibility of aliasing of branch addresses?
b. If all prediction buffer entries were initialized to 0, what can be the value of the counters in the prediction buffer corresponding to these two branch instructions?
The minimum number of least significant bits to ensure no aliasing for these addresses is 4, hence the branch prediction buffer would need 2^4 = 16 entries.
c. Now consider the case where we use a global branch predictor with 3 bit global history. Execution of the 13th^ iteration is about to start. Provide an example of i) the value of feasible 3-bit global branch history, and ii) the value of an infeasible global branch history. Ensure you clearly identify the entries in the branch history with the branch instructions in the code sequence.
The first two branches test for equality between two numbers, N1 & N2, with the number
c. Now consider the occurrence of load delay slots, where loads occur with a probability of 24%, and 40% of these fetch data used by the immediately following instruction. If we perform no instruction scheduling to fill delay slots, what is the increase in CPI compared to the original pipeline (i.e., without pipelining the memory system).
The load stalls are now three cycles rather than 1.
With no instruction scheduling we have 0.24 * 4 * 3 = 0.
a. Show a valid state of a 4 entry ROB when instruction 7 issued. Identify the head and tail of the ROB.
Destination Value Status F6 NO VALUE PENDING 0(R1) NO VALUE PENDING F8 NO VALUE PENDING F4 41 COMPLETED
b. Register re-mapping is employed where architecture registers are remapped to physical registers (PR). F6 in instruction 3 is remapped on issue to PR 9. When the DIV instruction reaches the head of the ROB can PR 9 be freed? Justify your answer.
Yes. This means that all instructions prior to the DIV.D have committed and all instructions that used the mapped register for F6 have completed. Therefore, it can be freed.
With an ROB: Exceptions for an executing instruction are flagged in its ROB entry, but not raised. The processor raises an exception associated with an instruction when that instruction reaches the head of the ROB. Since instructions in the ROB are allotted entries in program order, they are committed in program order. Instructions fetched speculatively on a mispredicted branch are never committed. Therefore, all exceptions are precise.
With an ROB, register renaming does not affect handling of precise exceptions. This is because register renaming does not affect how the instructions commit (always in program order) and exceptions can be raised only at commit time.
In the case of a history buffer, instructions are allocated history buffer entries that contain the old value (history) of the register being written. If an exception occurs the corresponding history buffer entry is labeled. When exception instruction reaches the head of the history buffer, the history buffer is scanned from head to tail and all old values replaced using those in the history buffer. This is needed because instructions write directly to the register-file.
Multiple consecutive instructions starting at the current head of the ROB must have completed execution & writeback of results (to their respective entries in the ROB). All these instructions can be committed in a single cycle. However, structural limitations (like, number of write-ports to the register file and bandwidth to memory for committing stores) would put hard limits on how many of the instructions at the head-of-queue in the ROB may commit simultaneously.
Note that if multiple consecutive instructions are writing to the same destination registers or memory location, the commit-hardware can still commit them in the same cycle by ensuring that the value written to the destination-register/memory-location comes from the result of the last committed instruction that wrote that register/location. Finally, all instructions are logically committed in program order.