
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This presentation provides an overview of instruction scheduling, focusing on dynamic scheduling methods such as Tomasulo's scoreboarding and algorithm. Dynamic scheduling allows instructions to execute out of order, improving performance by handling data dependencies and tolerating unprecedented delays. The presentation covers the advantages and disadvantages of dynamic scheduling, as well as approaches like scoreboarding and Tomasulo's algorithm.
Typology: Slides
1 / 24
This page cannot be seen from the preview
Don't miss anything!

















Instruction Scheduling 4
is a method in which the hardware determines which instructions to execute.In essence, the processor is executing instructions out of order. Dynamic scheduling is similar to a data flow machine, in which instructions don't execute based on the order in which they appear, but rather on the availability of the source operands. 7
╺ (^) To overcome data dependency through interlocking and forwarding ╺ (^) Scheduling: ordering the execution of instructions in a program so as to improve the performance ╺ (^) To overcome the inability to detect the dependencies at compile time. 8
10 Approaches for Dynamic Instruction Scheduling Tomasulo’s Scoreboardi ng Approach
Scoreboarding Approach ╺ (^) Thornton's technique of Scoreboarding is a technique for allowing instructions to execute out of order when there are sufficient resources and no data dependencies ╺ (^) The goal of a scoreboard is to maintain an execution rate of one instruction per clock cycle by executing an instruction as early as possible. Thus, when the next instruction to execute is stalled, other instructions can be issued and executed if they do not depend on any active or stalled instruction. ╺ (^) The scoreboard takes full responsibility for instruction issue and execution, including all hazard detection. ╺ (^) Every instruction goes through the scoreboard, where a record of the data dependencies is constructed; this step corresponds to instruction issue and replaces part of the ID step in the DLX pipeline.
╺ (^) Scoreboard Example We'll assume a machine with 2 integer units, 2 FP add units, 1 FP multiply unit, and 1 FP divide unit. A scoreboard consists of three parts: the instruction status, the functional unit status, and the register status. These parts are shown and explained below. ╺ (^) Instruction status 13
Functional unit status 14
Scoreboarding Approach ╺ (^) In the tables above, we have filled out the initial status of the tables, assuming all the instructions had been fetched into the instruction status table. We issue instructions sequentially until we encounter a WAW hazard or we run out of functional units. In this case, we ran out of functional units. ╺ (^) Here the scoreboard machine took 33 cycles complete the above instructions. The only thing limiting this is the lack of a second floating point multiply unit. 16
╺ (^) Tomasulo's algorithm is another method of implementing dynamic scheduling. This scheme was invented by Robert Tomasulo, and was first used in the IBM 360/91. ╺ Tomasulo's algorithm differs from scoreboarding in that it uses register renaming to eliminate output and anti- dependences, i.e. WAW and WAR hazards. Output and anti- dependences are just name dependences, there is no actual data dependence. 17
╺ (^) For example, the code below MULTD F4,F2,F ADDD F2,F0,F ╺ contains an anti-dependence since the first instruction reads from F2 and the second instruction writes to F2 (a WAR hazard). However, there is no data dependence, as is shown by the code below : MULTD F4,F2,F ADDD F8,F0,F ╺ (^) The anti-dependence is removed without changing the semantics of the code simply by changing the F2 to an F8. However, we don't have to use F8, we can use any available register. (^19)
Tomasulo's algorithm Suppose there were some extra registers. We can use those extra registers for register renaming it allows the hardware to detect a name dependence and eliminate it by storing the result of an instruction somewhere else. Thus, with register renaming, MULTD F4,F2,F ADDD F2,F0,F the add can execute and finish before the multiply starts even though looking at the code suggests that would result in an erroneous answer. 20