Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Memory Dependence Prediction and Load Value Prediction in Computer Architecture, Lecture notes of Computer Science

National Economics University (NEU)Computer Science

Decision is based on analysis or profile information – 90% of backward-going branches are taken – 50% of forward-going branches are not taken

Typology: Lecture notes

2018/2019

Uploaded on 10/29/2019

bach-hoang 🇻🇳

4 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

CS252

Graduate Computer Architecture

Lecture 14

Prediction (Con’t)

(Dependencies, Load Values, Data Values)

John Kubiatowicz

Electrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252

http://www-inst.eecs.berkeley.edu/~cs252

3/12/2007 cs252-S07, Lecture 14 2

Review: Yeh and Patt classification

GBHR

GPHT

GAg

GPHT

PABHR

PAg

PAPHT

PABHR

PAp

•GAg: Global History Register, Global History Table

•PAg: Per-Address History Register, Global History Table

•PAp: Per-Address History Register, Per-Address History Table

3/12/2007 cs252-S07, Lecture 14 3

Review: Other Global Variants

•GAs: Global History Register,

Per-Address (Set Associative) History Table

•Gshare: Global History Register, Global History Table with

Simple attempt at anti-aliasing

GAs

GBHR

PAPHT

GShare

GPHT

GBHR

Address

⊕

3/12/2007 cs252-S07, Lecture 14 4

Review: Tournament Predictors

•Motivation for correlating branch predictors is 2-

bit predictor failed on important branches; by

adding global information, performance

improved

•Tournament predictors: use 2 predictors, 1

based on global information and 1 based on

local information, and combine with a selector

•Use the predictor that tends to guess correctly

addr history

Predictor A Predictor B

Discover Lecture notes of Computer Science National Economics University (NEU)

Partial preview of the text

Download Memory Dependence Prediction and Load Value Prediction in Computer Architecture and more Lecture notes Computer Science in PDF only on Docsity!

CS

Graduate Computer Architecture

Lecture 14Prediction (Con’t)

(Dependencies, Load Values, Data Values)

John Kubiatowicz

Electrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252http://www-inst.eecs.berkeley.edu/~cs

3/12/

cs252-S07, Lecture 14

Review: Yeh and Patt classification GBHR

GPHT

GAg

GPHT

PABHRPAg

PAPHT

PABHR

PAp

-^ GAg: Global History Register, Global History Table •^ PAg: Per-Address History Register, Global History Table •^ PAp: Per-Address History Register, Per-Address History Table

3/12/

cs252-S07, Lecture 14

3

Review: Other Global Variants • GAs: Global History Register,Per-Address (Set Associative) History Table • Gshare: Global History Register, Global History Table withSimple attempt at anti-aliasing

GAs

GBHR

PAPHT

GShare

GPHT

GBHR Address

3/12/

cs252-S07, Lecture 14

Review: Tournament Predictors^ •^ Motivation for correlating branch predictors is 2-bit predictor failed on important branches; byadding

global

information,

performance

improved • Tournament

predictors:

use

predictors,

based

on^

global

information

and

1 based

on

local information, and combine with a selector • Use the predictor that tends to guess correctly

addr^

history Predictor A

Predictor B

cs252-S07, Lecture 14

5

Review: Memory Dependence Prediction • Important to speculate?Two Extremes: –^ Naïve Speculation: always letload go forward –^ No Speculation: always waitfor dependencies to beresolved • Compare NaïveSpeculation to NoSpeculation –^ False Dependency: wait whendon’t have to –^ Order Violation: result ofspeculating incorrectly • Goal of prediction: –^ Avoid false dependencies^ and^ order violations

From “Memory Dependence Predictionusing Store Sets”, Chrysos and Emer.

3/12/

cs252-S07, Lecture 14

Premise: Past indicates Future •^ Basic Premise is that past dependencies indicate futuredependencies^ –^ Not always true! Hopefully true most of time •^ Store Set: Set of store insts that affect given load^ –^ Example:

Addr

Inst 0 Store C 4 Store A 8 Store B 12 Store C 28 Load B

⇒^ Store set { PC 8 } 32

Load D

⇒^ Store set { (null) } 36

Load C

⇒^ Store set { PC 0, PC 12 } 40

Load B

⇒^ Store set { PC 8 }

-^ Idea: Store set for load starts empty. If ever load go forward and thiscauses a violation, add offending store to load’s store set • Approach: For each indeterminate load: –^ If Store from Store set is in pipeline, stallElse let go forward • Does this work?

cs252-S07, Lecture 14

7

How well does “infinite” tracking work? • “Infinite” here means to place no limits on:^ –^ Number of store sets^ –^ Number of stores in given set • Seems to do pretty well^ –^ Note: “Not Predicted” means load had empty store set^ –^ Only Applu and Xlisp seems to have false dependencies

3/12/

cs252-S07, Lecture 14

How to track Store Sets in reality? • SSIT: Assigns Loads and Stores to Store Set ID (SSID)^ –^ Notice that this requires each store to be in only one store set! • LFST: Maps SSIDs to most recent fetched store^ –^ When Load is fetched, allows it to find most recent store in its store set that isexecuting (if any)

⇒^ allows stalling until store finished

-^ When Store is fetched, allows it to wait for previous store in store set^ »^

Pretty much same type of ordering as enforced by ROB anyway » Transitivity

⇒^ loads end up waiting for all active stores in store set

-^ What if store needs to be in two store sets?^ –^

Allow store sets to be merged together deterministically^ »^ Two loads, multiple stores get same SSID

-^ Want periodic clearing of SSIT to avoid:^ –^

problems with aliasing across program – Out of control merging

cs252-S07, Lecture 14

13

Accuracy of LCT • Question of accuracy isabout how well we avoid: –^ Predicting unpredictable load –^ Not predicting predictable loads • How well does this work? –^ Difference between “Simple” and“Limit”: history depth^ »^ Simple: depth 1^ »^ Limit: depth 16 –^ Limit tends to classify more thingsas predictable (since this worksmore often) • Basic Principle: –^ Often works better to have onestructure decide on the basic“predictability” of structure –^ Independent of predictionstructure

3/12/

cs252-S07, Lecture 14

Constant Value Unit • Idea: Identify a loadinstruction as “constant” –^ Can ignore cache lookup (noverification) –^ Must enforce by monitoring resultof stores to remove “constant”status • How well does this work? –^ Seems to identify 6-18% of loadsas constant –^ Must be unchanging enough tocause LCT to classify as constant

cs252-S07, Lecture 14

15

Load Value Architecture • LCT/LVPT in fetch stage • CVU in execute stage –^ Used to bypass cache entirely –^ (Know that result is good) • Results: Some speedups –^ 21264 seems to do better thanPower PC –^ Authors think this is because ofsmall first-level cache and in-orderexecution makes CVU more useful

3/12/

cs252-S07, Lecture 14

Data Value Prediction + • Why do it?^ –^ Can “Break the DataFlow Boundary”^ –^ Before: Critical path = 4 operations (probably worse)^ –^ After: Critical path = 1 operation (plus verification)

A^ /

B +

Y^

X

A^

B +

Y^

X

Guess

cs252-S07, Lecture 14

17

Data Value Predictability •^ “The Predictability of Data Values”^ –^ Yiannakis Sazeides and James Smith, Micro 30, 1997 •^ Three different types of Patterns:^ –^ Constant (C):

-^ Stride (S):

-^ Non-Stride (NS):

•^ Combinations:^ –^

Repeated Stride (RS):

-^ Repeadted Non-Stride (RNS):

3/12/

cs252-S07, Lecture 14

Computational Predictors • Last Value Predictors –^ Predict that instruction will produce same value as last time –^ Requires some form of hysteresis. Two subtle alternatives:^ »^

Saturating counter incremented/decremented on success/failurereplace when the count is below threshold » Keep old value until new value seen frequently enough

–^ Second version predicts a constant when appears temporarily constant • Stride Predictors –^ Predict next value by adding the sum of most recent value to differenceof two most recent values:^ »^

If v^ and vn-^

are the two most recent values, then predict nextn-^ value will be: v

+ (vn-

- v^ n-1 n-

»^ The value (v

- v^ n-1^ n-

) is called the “stride”

-^ Important variations in hysteresis:^ »^

Change stride only if saturating counter falls below threshold » Or “two-delta” method. Two strides maintained.^ •^ First (S1) always updated by difference between two most recent values^ •^ Other (S2) used for computing predictions^ •^ When S1 seen twice in a row, then S

⇒ S

•^ More complex predictors:^ –^

Multiple strides for nested loops – Complex computations for complex loops (polynomials, etc!)

cs252-S07, Lecture 14

19

Context Based Predictors • Context Based Predictor –^ Relies on Tables to do trick –^ Classified according to the order: an “n-th” order model takes last nvalues and uses this to produce prediction^ »^ So – 0

th^ order predictor will be entirely frequency based

•^ Consider sequence: a a a b c a a a b c a a a^ –^

Next value is?

•^ “Blending”: Use prediction of highest order available

3/12/

cs252-S07, Lecture 14

Which is better? • Stride-based:^ –^ Learns faster^ –^ less state^ –^ Much cheaper interms of hardware!^ –^ runs into errors forany pattern that is notan infinite stride • Context-based:^ –^ Much longer to train^ –^ Performs perfectlyonce trained^ –^ Much more expensivehardware

Memory Dependence Prediction and Load Value Prediction in Computer Architecture, Lecture notes of Computer Science

Related documents

Partial preview of the text

Download Memory Dependence Prediction and Load Value Prediction in Computer Architecture and more Lecture notes Computer Science in PDF only on Docsity!

CS

Graduate Computer Architecture

Lecture 14Prediction (Con’t)

(Dependencies, Load Values, Data Values)

John Kubiatowicz

Electrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~kubitron/cs252http://www-inst.eecs.berkeley.edu/~cs

Review: Yeh and Patt classification GBHR

GPHT

GAg

GPHT

PABHRPAg

PAPHT

PABHR

PAp

Review: Other Global Variants • GAs: Global History Register,Per-Address (Set Associative) History Table • Gshare: Global History Register, Global History Table withSimple attempt at anti-aliasing

GAs

GBHR

PAPHT

GShare

GPHT

Review: Tournament Predictors^ •^ Motivation for correlating branch predictors is 2-bit predictor failed on important branches; byadding

global

information,

performance

improved • Tournament

predictors:

use

predictors,

based

on^

global

information

and

1 based

on

local information, and combine with a selector • Use the predictor that tends to guess correctly

Premise: Past indicates Future •^ Basic Premise is that past dependencies indicate futuredependencies^ –^ Not always true! Hopefully true most of time •^ Store Set: Set of store insts that affect given load^ –^ Example:

Data Value Prediction + • Why do it?^ –^ Can “Break the DataFlow Boundary”^ –^ Before: Critical path = 4 operations (probably worse)^ –^ After: Critical path = 1 operation (plus verification)

A^ /

B +

Y^

X

A^

B +

Y^

X

Data Value Predictability •^ “The Predictability of Data Values”^ –^ Yiannakis Sazeides and James Smith, Micro 30, 1997 •^ Three different types of Patterns:^ –^ Constant (C):

•^ Combinations:^ –^

Computational Predictors • Last Value Predictors –^ Predict that instruction will produce same value as last time –^ Requires some form of hysteresis. Two subtle alternatives:^ »^

–^ Second version predicts a constant when appears temporarily constant • Stride Predictors –^ Predict next value by adding the sum of most recent value to differenceof two most recent values:^ »^

•^ More complex predictors:^ –^

Context Based Predictors • Context Based Predictor –^ Relies on Tables to do trick –^ Classified according to the order: an “n-th” order model takes last nvalues and uses this to produce prediction^ »^ So – 0

•^ Consider sequence: a a a b c a a a b c a a a^ –^

•^ “Blending”: Use prediction of highest order available

Which is better? • Stride-based:^ –^ Learns faster^ –^ less state^ –^ Much cheaper interms of hardware!^ –^ runs into errors forany pattern that is notan infinite stride • Context-based:^ –^ Much longer to train^ –^ Performs perfectlyonce trained^ –^ Much more expensivehardware