

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The implementation of store sets for memory dependence prediction in superscalar processors. The authors, daniel chen and george h. Huang, from the university of illinois, urbana-champaign, introduce the concept of store sets as a solution to the memory dependence problem in out-of-order processors. Store sets are sets of stores that a load instruction has previously depended on. By using store sets, processors can predict when a load may be issued with reduced risk of memory violations, increasing the throughput of memory instructions. The document also includes an evaluation of the store set system and its impact on performance.
Typology: Papers
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Abstract Superscalar processors introduce a number of problems when issuing instructions out-of-order. Among those problems are register and memory dependencies. A memory dependence occurs when a load reads from the same address as a previous store. The information of the load is dependent on the store, which means the load must be executed after the store. One way to increase the throughput of memory instructions is to use memory dependence prediction using “store sets.” A store set is the set of stores that a load instruction has previously depended on. A processor can discover and use store sets to predict when a load may be issued with reduced risk of memory violations. We implemented a store set system based on the paper by Chrysos and Emer to investigate the advantage of using store sets and confirmed that memory prediction using store sets significantly reduces the number of memory violations.
1. Introduction While the widely accepted solution to overcome register dependencies in out-of-order processors is register renaming, there is no such standard for overcoming memory dependencies. There are two basic approaches to issuing instructions with memory dependencies: no speculation and naïve speculation. A no speculation approach waits until all previous store instructions have issued before a load instruction is allowed to issue. Naïve speculation issues memory instructions as they arrive, but when stores execute, they must check if any loads that were dependent on them have already executed. If so, then a memory violation has occurred, and execution must restart from the misspeculated load. The problem with no speculation is that while it eliminates memory violations, it results in poor performance. On the other hand, naïve speculation, while providing better performance, suffers from wasted time when recovering from memory violations.
The goal of store sets is to use previous memory violation history to predict memory
dependencies and aggressively issue loads by delaying loads which have dependencies only long enough to avoid memory violations.
2. Store Sets A store set is a set of all the stores that a particular load has ever depended on. This set of stores is not known by the processor before hand, but must be discovered. Every time a memory violation occurs, the store which caused the violation is added to the dependent load’s store set. The next time the load is fetched, the scheduler will make sure all of the stores in the load’s store set have been issued before it will allow the load to issue. In the following short program, each load’s store set is shown:
Example 2.1: PC 0 Store A 4 Store B 8 Store C 12 Store A
16 Load A, store set[0,12] 20 Load B, store set[4] 24 Load C, store set[8] 28 Load D, store set[empty]
It is important to note that the example shows complete store sets. In an actual implementation, a particular store will only be added into a store set when it causes a memory violation.
3. Implementation The implementation of store sets we chose to use is modeled after the paper by Chysos and Emer.[1] Conceptually, a store set structure would be infinitely large and hold a store set for each load executed, however building such a structure would not be practical. The idea is simplified by using a Store Set ID Table (SSIT) and Last Fetched Store Table (LFST). The SSIT is addressed by a hash of the current PC for load or store instructions. The
Figure 3.1: Store Set Implementation
Loads/Stores index into the SSIT to obtain their SSID. The SSID indexes into the LFST to obtain the inum of the last store to be dependent on.
Table 4. SSIT is filled with Store Set IDs (SSID). The SSID is an ID that identifies a particular store set. The SSID indexes to a store inum in the LFST. This store inum identifies the last store that was fetched. By keeping track of the store inums, we can greatly simplify the dependence chain.
Assignment of SSIDs are governed by four rules defined by Chysos and Emer.
3.1 Simulator The simulator we had available was the ECE511 simulator. It is an out-of-order issue, 1-wide pipeline, with ghist type branch predictor and load- store queues. It also has variable issue buffer and reorder buffer. It takes the naïve approach to memory speculation. We constructed the SSIT and LFST inside of the scheduler stage. In-flight instructions are uniquely identified by the time-stamp that is already provided in the instruction data structure. We added two fields to the instruction data structure to keep track if the current instruction depends on another instruction and that instruction’s time stamp.
Dependencies are enforced by checking if the depended on store is in the store queue and has been issued, or if the retired instruction time stamp is greater than the depended on instruction. If either of the above case is satisfied, then we can safely issue the current memory instruction.
4. Data Analysis We evaluated the performance advantage of store sets by running different benchmarks with the original simulator, which implements naïve speculation and then with our modified simulator implementing memory prediction with store sets. Data showed an improvement in the number of memory violations across all benchmarks tested. The most gains were in lzw, with a 21% IPC increase. Parser, which had very few memory violations to begin with, showed almost no increase in IPC. We also studied the optimal time to clear the SSIT table to reduce the aliasing effect. Specifically, we want to see if there is a correlation between the aliasing effect and the fullness of the SSIT table. We tested the performance by resetting the SSIT table when the table reaches 70%, 75%, 80%, 85%, 90%, and 95% full respectively. The result indicated that the performance actually decreases for bzip benchmark when we resetting the SSIT table. In Graph 3 we see that the memory violation increases
Naïve Bzip2 gcc lzw mcf parser mem viol 4670 503 11691 2059 12 percent 3.01% 2.01% 9.87% 1.16% 0.04% Store Sets mem viol 128 72 7 23 4 percent 0.08% 0.04% 0.01% 0.01% 0.01%