












































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
How ILP is achieved by instruction pipelines
Typology: Study Guides, Projects, Research
1 / 84
This page cannot be seen from the preview
Don't miss anything!













































































n Assume time for stages is n 100ps for register read or write n 200ps for other stages n Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps
n i.e., all take the same time n Time between instructions pipelined = Time between instructions nonpipelined Number of stages
n Latency (time for each instruction) does not decrease
n All instructions are 32-bits n Easier to fetch and decode in one cycle n c.f. x86: 1- to 17-byte instructions n Few and regular instruction formats n Can decode and read registers in one step n Load/store addressing n Can calculate address in 3 rd stage, access memory in 4 th stage n Alignment of memory operands n Memory access takes only one cycle
n Load/store requires data access n Instruction fetch would have to stall for that cycle n Would cause a pipeline “bubble”
n Or separate instruction/data caches
n add $s0, $t0, $t sub $t2, $s0, $t
n If value not computed when needed n Can’t forward backward in time!
Code Scheduling to Avoid Stalls
lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t sw $t5, 16($t0) stall stall lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t sw $t3, 12($t0) add $t5, $t1, $t sw $t5, 16($t0) 13 cycles 11 cycles
n Stall penalty becomes unacceptable
n Only stall if prediction is wrong
n Can predict branches not taken n Fetch instruction after branch, with no delay
More-Realistic Branch Prediction n Static branch prediction n Based on typical branch behavior n Example: loop and if-statement branches n Predict backward branches taken n Predict forward branches not taken n Dynamic branch prediction n Hardware measures actual branch behavior n e.g., record recent history of each branch n Assume future behavior will continue the trend n When wrong, stall while re-fetching, and update history
n Executes multiple instructions in parallel n Each instruction has the same latency
n Structure, data, control
The BIG Picture
n To hold information produced in previous cycle
n “Single-clock-cycle” pipeline diagram n Shows pipeline usage in a single cycle n Highlight resources used n c.f. “multi-clock-cycle” diagram n Graph of operation over time