



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Johnson; Class: COMPUTER ARCHITECTURE; Subject: Electrical and Computer Engineering ; University: Oklahoma State University - Stillwater; Term: Spring 2008;
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Operand Forwarding. Note that the pipeline reads the register file source register before writing results into the register file from the previous instruction. Thus,
ADD R1,R2,R SUB R4,R3,R
the above instruction sequence has a RAW hazard because R1 has not yet had the results of the previous ADD stored in it when it is read as a source for the SUB instruction. The following methods can be used to insure that the hazard is avoided and only valid register data are used.
Let us first consider operand forwarding with just the pipelined register and ALU. The new register contents will be available in the D-register just when the source operands for the next instruction need to be read. If the next instruction needs the modified register contents, then the data can be provided from the D-register rather than from the A-register or B-register. Note that there is a connection from the D-register to both ALU inputs so
that the D-register can provide either or both of the operands to the next instruction. The
timing of the two instruction example is shown below. Control logic needs to be provided
for the multiplexers so that the D-register is used when appropriate to provide inputs to the ALU inputs.
Operand forwarding is more complex for a longer pipeline such as the 5-stage MIPS. In this case, RAW hazards can occur between any 5 consecutive instructions (see fig. 6.28, p. 405).
For ALU instructions, operand forwarding can provide the necessary operands by for- warding them from one of the pipeline registers (see fig. 6.29, p. 408). Note that the last two instructions really do not cause a hazard since the correct operands are already in the register file when it is read by these instructions.
A close inspection of fig. 6.29 shows that ALU operands may be forwarded from either the EX/MEM pipeline register (the second instruction) or the MEM/WB register (the third instruction). Multiplexers are added to the ALU inputs to select between the normal input (ID/EX pipeline register) and the two forwarding locations as shown in fig. 6.30, p.
So far we have taken care of only RAW data hazards. What about WAR and WAW haz- ards? In the MIPS pipeline, operands are read in the ID stage near the beginning of the pipeline and operands are written in the WB stage at the end of the pipeline. Since writing is done after reading operands of all previous instructions, this eliminates the WAR haz- ard. Also, since instructions results are written back in the same order that instructions are fetched, this eliminates WAW hazards.
also produces a RAW hazard that cannot be eliminated with operand forwarding. Note that store followed by another instruction does not cause a hazard because a store instruc- tion does not modify the register file.
A stall (the “bubble” in fig. 6.35, p.415) avoids the RAW hazard by delaying instructions already in the pipeline after the load instruction. The delayed instructions after the load instruction can then use operand forwarding to avoid the hazard. Operand forwarding helps to avoid the hazard without having to stall for an additional clock cycle, but cannot prevent a single cycle stall.
Controlling Stalls. Since the instruction after a load with a RAW hazard (the “and” instruction in fig. 6.34) must stall before going into the EX pipeline stage, the pipeline controller must detect the stall while the instruction after the load is in ID and the load is in EX. The stall can be detected by comparing the source operands for the instruction fol- lowing the load (in IF/ID) with the destination operand for the load (in ID/EX). This is done by the hazard detection unit in fig. 6.36, p. 416.
A careful understanding of the timing in fig. 6.35, p. 415 is needed to understand how the hazard detection unit works. Although this type of diagram makes it clear where each instruction is during each clock cycle, it is not clear what each stage in the pipeline is doing during each clock cycle. Rather than organizing the pipeline timing diagram by instruction, organize the diagram by pipeline stage to show what each pipeline stage must do and when it must do it. It is much easier to see the stalls (bubbles) in each pipeline stage in these diagrams.
The stall is detected during clock cycle 3.
The EX, MEM, and WB stalls can be implemented by inserting a NOP (any ALU op with R0 as the destination) into the ID/EX pipeline register at the beginning of clock cycle 4 and letting the NOP flow down the rest of the pipeline in a normal manner.
The IF and ID stalls can be implemented by repeating the instructions after the load in IF and ID during cycle 3. that is leave the PC and IF/ID register unchanged at the beginning of cycle 4.
Pipeline Stage Time IF ID EX MEM WB CC 1 lw $2, 20($1) CC 2 and $4, $2, $5 lw $2, 20($1) CC 3 or $8, $2, $6 and $4, $2, $5 lw $2, 20($1) CC 4 REPEAT REPEAT NOP lw $2, 20($1) CC 5 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 NOP lw $2, 20($1) CC 6 slt $1, $6, $7 add $9, $4, $2 or $8, $2, $6 and $4, $2, $5 NOP CC 7 slt $1, $6, $7 add $9, $4, $2 or $8, $2, $6 and $4, $2, $
The snapshots clearly show that each stage is stalled for just a single clock cycle, but that different stages can be stalled at different times.
The following hardware diagram shows an implementation of this scheme.
Compiler Scheduling. The hardware can eliminate all of the data hazards except for a load followed by an instruction that uses the load destination operand. The optimizing compiler can reduce the performance impact of the stall by trying to find a useful instruc- tion to insert between the load instruction and the ALU instruction. This is not always easy to do. Consider the following.
Code for a = b + c;
lw $1, b lw $2, c add $3, $1, $2 (must be stalled!) sw $3, a
Code for a = b + c; d = e - f
lw $1, b lw $2, c lw $4, e (instruction order changed to separate lw and add) add $3, $1, $2 (no stall)
compare IF/ID sources with ID/EX load dest
stall
Hazard Dectection Unit