Memory Dependence Prediction Using Store Sets

Memory Dependence Prediction using Store Sets

George Z. Chrysos and Joel S. Emer

Digital Equipment Corporation

Hudson, MA 01749

{chrysos,emer} @vssad.hlo.dec.com

Abstract

For maximum performance,

an

out-of-order processor

must issue load instructions as early as possible, while

avoiding memov-order violations with prior store in-

structions that write to the same memory location. One

approach is to use memory dependence prediction to

identify the stores upon which a load depends, and com-

municate that information to the instruction scheduler. We

designate the set

of

stores upon which each load has de-

pended as the load’s “store set”. The processor can dis-

cover and use a load’s store set to accurately predict the

earliest time the load can safely execute. We show that

store sets accurately predict memory dependencies in the

context

of

large instruction window, superscalar ma-

chines, and allow

for

near-optimal performance compared

to an instruction scheduler with perfect knowledge

of

memory dependencies. In addition, we explore the imple-

mentation aspects

of

store sets, and describe a low cost

implementation that achieves nearly optimal performance.

1. Introduction

Modern superscalar processors such as the Alpha

21264[1], MIPS RlOOO0[2], HP-PA8000[3] and Intel

Pentium Pro [4], allow instructions to execute out of pro-

gram order to find more instruction level parallelism

(ILP). These processors must monitor data dependencies

to maintain correct program behavior. There are two types

of data dependencies. Register dependencies occur when

one instruction writes a register and a subsequent instruc-

tion reads the same register. Memory dependencies occur

when a store instruction writes to a memory location and a

subsequent load instruction reads that same location.

Register dependencies are determined in the instruction

decode stage by examining instructions’ register operand

fields. Memory dependencies cannot be determined as

early, because they require the computation of the memory

address, which occurs after register operands are ready and

the instruction is issued.

This lack of information about memory dependencies

at instruction decode time is a problem for an out-of-order

instruction scheduler. If the scheduler executes a load be-

fore a prior store that writes to the same memory location,

the load will read the wrong value. In this event the load

and all subsequent dependent instructions must be re-

executed, resulting in a performance penalty. To avoid

these memory-order violations, the scheduler could be

conservative and prevent loads from executing until all

prior stores have executed. This approach decreases per-

formance because, in most cases, loads will bc made

1063-6897/98 $10.00 0 1998 IEEE

falsely dependent on unrelated stores, unnecessarily de-

laying their execution. This dilemma has created the need

for memory dependence prediction.

The goals of memory dependence prediction are 1) to

predict the load instructions that if allowed to execute

would cause a memory-order violation and 2) to delay the

execution of these loads only as long as is necessary to

avoid a such a violation [5]. When a memory dependence

predictor makes a mistake, it fails to satisfy one of these

two goals. To quantify the success of a memory depend-

ence predictor, we count the number of memory-order

violations and the number of false dependencies created

by the predictions. This paper will use these two metrics,

as well as overall program performance, to evaluate our

memory dependence predictor.

Our memory dependence predictor is based upon the

concept of store sets. A store set for a specific load is the

set of all stores upon which the load has ever depended. A

load’s store set can be approximated in hardware by first

allowing speculation of all loads around older stores. If a

load executes before a store upon which it depends, the

processor detects a memory-order violation when the store

is executed and adds the store to that load’s store set. Es-

sentially the processor discovers and remembers a load’s

store set during program execution. The store set is then

used to predict which stores a load must wait for before

executing.

In this paper, we focus on the performance impact of

memory dependence prediction based on store sets in an

eight-wide superscalar out-of-order processor. We de-

scribe our CPU model and simulation environment in

Section 2. In Section 3, we illustrate the benefit of mem-

ory dependence prediction by considering two alternatives

and comparing their performance to that of a perfect

memory dependence predictor. Section 4 discusses related

work. We explain store-sets-based memory dependence

prediction in Section 5, and propose and evaluate an im-

plementation in Section 6. In Section 7, we analyze the

performance of our memory dependence predictor on a

benchmark where it performs less than optimally.

2. Simulation Environment

Accurate memory dependence prediction becomes

more important for wider-issue processors with larger in-

struction windows. Therefore, we focus on the need for

memory dependence prediction for next-generation out-of-

order, speculative-execution processors. Our CPU model

represents a processor with roughly double the caches and

issue width of the Alpha 21264, and executes the Alpha

142

Memory Dependence Prediction Using Store Sets | ECE 587, Papers of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Memory Dependence Prediction Using Store Sets | ECE 587 and more Papers Computer Architecture and Organization in PDF only on Docsity!

George Z. Chrysos and Joel S. Emer

Digital Equipment Corporation

Hudson, MA 01749

{chrysos,emer} @vssad.hlo.dec.com

Abstract

context of large instruction window, superscalar ma-

1. Introduction

1063-6897/98 $10.00 0 1998 IEEE

2. Simulation Environment

Table 2.

3. Motivation

3. I No Speculation

3.2 Naive speculation

5. Store Sets

5.1 Concept

5.2 Performance

6. Store Set Implementation

6.1 Components

6.2 Store Set Assignment

6.4 Table Sizes

7. Analysis