


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
refinement in tomasulo's algorithm
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



In this paper Tomasulo’s algorithm for out-of-order ex- ecution is shown to be a refinement of the sequential in- struction execution algorithm. Correctness of Tomasulo’s algorithm is established by proving that the register files of Tomasulo’s algorithm and the sequential algorithm agree once all instructions have been completed.
Modern out-of-order super-scalar microprocessors use dynamic scheduling to increase the number of instructions executed per cycle. These processors maintain a fixed-size window into the instruction stream, analyzing the instruc- tions in the window to determine which can be executed out of order to improve performance. Branch prediction and register renaming are employed in order to keep the window full, while result-buffering techniques maintain the in-order-execution model required by the architecture. In this paper we propose a proof-method for proving cor- rectness of such processor designs based on refinement , and illustrate it by showing correctness of the Tomasulo algo- rithm [3] for out-of-order execution. Our approach is based closely on [2] and shares the advantages of this approach, in particular, the ability to cope with generic designs, es- tablishing correctness for an arbitrary number of functional units performing arbitrary arithmetic operations. Arithmetic operations are modeled as uninterpreted functions , allow- ing us to verify instruction scheduling without performing arithmetic verification. Our approach greatly simplifies the proof by obviating the need for the intermediate abstrac- tion used by Damm and Pnueli. Instead, we introduce a predicted value field to allow us to directly compare sys- tem TOMASULO to system SEQ. We believe that the use of
This research was supported in part by a gift from Intel, a grant from the Minerva foundation, and an Infrastructure grant from the Israeli Ministry of Science.
predicted values, as presented in this paper, provides an ef- fective means of verifying refinement between two systems which run at different rates. Our proof is intuitively simple and has been verified using the PVS [5] theorem prover. Recent papers [4, 6] propose new techniques for verifying out-of-order processing units. In [4] Tomasulo’s algorithm is verified using the SMV verifier, and an impressive level of automation is achieved. The proof, however, is depen- dent on the configuration and arithmetic operators and any modification requires that the new system be verified afresh. A more complex model of out-of-order execution is ver- ified in [6] using the Stanford Validity Checker. The im- plementation machine is shown to refine an intermediate abstraction. Incremental flushing is then used to show that the intermediate abstraction is functionally equivalent to the specification machine. In comparison, our refinement is to the specification level, obviating the need for a second stage. This paper is structured as follows: The next section presents our definition of refinement. Ultimately, the im- plementation has to be compatible with the sequential ref- erence model developed in Section 3. Section 4 introduces the Tomasulo algorithm. In section 5 we sketch the proof of correctness.
An abstract system SA is designed to serve as a specifi- cation capturing all the acceptable correct computations of the concrete system SC. We take SEQ and TOMASULO as the abstract and concrete system respectively. Observation functions OA and OC of SA and SC are defined. These functions indicate the parts of the systems which are compared by the refinement relation. The transition relations A and C define how the system moves from one state to another. For example, in one step the sequential algorithm may execute an instruction, updating the program counter and register file. Typically, A is non- deterministic (e.g. SEQ may either execute an instruction or idle). We define a transition relation A which can use any
op 1 t 1 s 1 s 2
Register File
t:
s 1 :
s 2 : (^) v 2
v 1
op 1 (v 1 ; v 2 )
Program
op target src 1 src 2
top
Figure 1. Execution of one instruction in sys- tem SEQ
details of the current state in system SA and SC , or of the next state in system SC , to choose one of the possibilities offered by A. (E.g. SEQ executes an instruction if and only if TOMASULO dispatched an instruction.) We say that system SA refines system SC according to the observation pair (OC ; OA ) if R1. It is always possible to take a A -step. R2. Every A -step is a legal A -step. R3. When SC advances using^ C and^ SA using^ A , OC = OA. That is, SA refines^ SC if using^ A a legal (R2) computa- tion of SA can be generated (R1) such that the observation functions OC and^ OA always agree (R3). A more rigorous definition of refinement can be found in [1].
3. The Reference Model: System SEQ
In this section we present system SEQ which is to serve as a reference model. System SEQ executes in a strictly sequen- tial manner an input program consisting of non-branching register-to-register instructions. It accepts two parameters, N , the number of instructions, and R the maximum register index. System SEQ is presented in Fig. 1. Instructions are stored in an array prog of length N. Each instruction has an operation , a target and two source operands. A program counter, top , points to the next instruc- tion in prog. A register file reg records the current values of each register 0::R. At each step system SEQ either delays, in which case no change is made in the system, or executes the instruction
pointed to by top. The value computed by the instruction is stored in reg [pr og [ top ]: target ] and top is incremented by one.
4. The Tomasulo Algorithm
In this section we give a brief overview of the Tomasulo algorithm [7] for data-driven instruction execution. Our definitions are based on the descriptions in [2, 3]. The program in system TOMASULO is identical to that in system SEQ, and like SEQ TOMASULO has a program counter called top. In addition, system TOMASULO contains a reg- ister file R F , a finite number U of functional units and an array r s of reservation stations in which each functional unit has Z slots. Each functional unit owns one result register in which results are stored until they can be put on the bus. The parameters of system TOMASULO are thus N ; R ; U and Z. The data structures are presented in Fig. 2. Instructions are issued from the instruction stream to a reservation station slot. The reservation station entry records the status of the instruction, the operation of the instruction, and information about its operands. The operand infor- mation is taken from the register file which records, for every register, a value for the register, a busy-flag indicat- ing whether an instruction targeting the register is pending, and a tag pointing to the reservation station entry which will produce the result. Either the value or the tag ( prod ) is valid, depending on the busy flag. All this information about the operand is copied to the reservation station. The reservation station is marked as occupied and a busy-bit is set, indicating that the instruction has not been completed. The reservation stations continuously snoop the bus, waiting for the values of any pending results to be writ- ten back. Once both operands of an instruction are available the instruction may be executed. On each cycle each func- tional unit can execute at most one such instruction from its block of reservation stations. The result of the instruction which has been executed is passed to the result register of the functional unit. The active bit of the result register is set exactly when the result register contains a value and is waiting for control of the bus. When a result is put on the bus it is read by both the register file and the reservation stations. If the tag on the bus matches the producer field of a busy register, the register file saves the value and clears the busy-flag. When a reservation station notices its own address as the tag on the bus, it marks itself as unoccupied, and becomes available to receive a new instruction. Every value field in the system is paired with an auxil- iary predicted value ( pv ) field. Auxiliary variables are only updated and copied from one record to another but they never affect the flow of control. Hence, it is easy to prove that deleting these will not affect valuations of other sys-
new state. This provides us with an inductive proof that the observation functions of the two system are equal.
We have thus shown that TOMASULO refines SEQ and that the predicted values in the register file of TOMASULO equal the values computed by SEQ. The proof of correctness can now be completed by proving that the predicted values in registers match the values finally written to them. This proof proceeds in two stages: We first show that all predicted values referring to the same instruction agree, and then that values are predicted correctly.
Uniformity of predicted values. The quadruple h busy , prod ; val , pv i appears repeatedly in the system and in Fig. 2 each occurrence is enclosed in a rectangle. We call it a record of type QUAD. A QUAD-record is called active if it is a busy register, part of an active result, the main QUAD-field of an occupied reservation station, or a busy operand field of an occupied slot. When two active QUAD-records have the same producer field they are intuitively pointing to the same reservation station entry, either waiting for its result to become available (e.g. busy registers or operand fields in the reservation stations), or recording the source of the value already obtained (active result register). It is clear that all such QUAD-records should record the same predicted values, and this can be proved to be the case. Correctness of prediction. Consider a reservation sta- tion slot in which an instruction is being executed. The pre- dicted value stored in the slot’s QUAD-record was obtained by applying the instruction operation to the predicted values of its operands. The values of the operands have been cal- culated previously, and we assume, for induction, that their values agree with their predicted values. The value for the instruction is calculated by applying the instruction opera- tion to the values of the two operands and will thus agree with the predicted value. The calculated value is propagated to the result register, and from there to any operand fields and registers which are waiting for it. By the uniformity of predicted values, the predicted value in each of these active QUAD -records equals that in the reservation station which calculated the instruction. Thus, wherever the value is written it will equal the predicted value. This proves that every value written to a QUAD -record is correctly predicted by the record’s predicted value field. (A QUAD-record may be overwritten without having stored a value. We claim only that values which are written to QUAD-records are correctly predicted.) We compare systems SEQ and TOMASULO after all in- structions have completed. It can be shown that at this point all registers do contain a final value. The principle of correct prediction asserts that the value and predicted value fields in all registers agree. By the refinement relation the predicted values in system TOMASULO agree with the values recorded in system SEQ , and thus system TOMASULO is shown to be correct.
6. Conclusion
We have presented a refinement-based proof-method for the verification of modern processor architectures, and have demonstrated its applicability by showing the correctness of a data-path involving multiple functional units, register- renaming, dynamic scheduling, and out-of-order execution. Our use of predicted values has significantly simplified the refinement and obviated the need for an intermediate level. The strength of our proof is its generality: we prove correctness for arbitrary configurations of unlimited size. The proof is also independent of the operations appearing in the instructions, and checks only the correctness of the out-of-order scheduling, not the calculation of mathematical expressions. Still, this work needs to be extended in a number of ways: We would like to increase the degree of automation in the proofs. We are currently investigating the possi- bility of making better use of PVS strategies. Compi- lation of the employed notation for STS could ideally replace the current hand-translation.
Our model is limited to non-branching programs in which no loads, stores or exceptions occur. Ongoing work considers extending the framework of this paper to incorporate some of these features.
References
[1] T. Arons and A. Pnueli. Verifying Tomasulo’s algorithm by refinement. Technical report, Dept. of Comp. Sci., Weizmann Institute, Oct 1998. [2] W. Damm and A. Pnueli. Verifying out-of-order exectutions. CHARME’97 :23-47, Chapman & Hall, 1997. [3] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc.,
[4] K.L. McMillan. Verification of an implementation of Tomasulo’s algorithm by compositional model checking. CAV’98 :110-121, 1998. [5] S. Owre, J.M. Rushby, N. Shankar, and M.K. Srivas. A tuto- rial on using PVS for hardware verification. In R. Kumar and T. Kropf, editors, Proceedings of the Second Conference on Theorem Provers in Circuit Design , pages 167–188. FZI Pub- lication, Universit¨at Karlsruhe, 1994. Preliminary Version. [6] J.U. Skakkebaek, R.B. Jones and D.L. Dill. Formal verifi- cation of out-of-order execution using incremental flushing. CAV’98 :98-110, 1998. [7] R.M. Tomasulo. An efficient algorithm for exploiting multi- ple arithmetic units. IBM J. of Research and Development , 11(1):25–33, 1967.