Dependence Distances in Computer Microarchitecture: Hardware and Software, Study notes of Computer Architecture and Organization

The concept of dependence distances in computer microarchitecture, including control dependences, memory dependences, and source operand usage. It also covers methods for calculating dependence distances and the impact of processor capabilities on these distances.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-ueo
koofers-user-ueo 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
1
Machine
Description
Wen-mei Hwu
ECE
University of Illinois,
Urbana-Champaign
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
2
Outline
Dependence distances
Processor capabilities
Operand effects
Whole greater than sum of parts
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
3
Dependence Distances
Prevent unexpected interlocks
renaming
incomplete bypass
Prevent lost opportunities
Special hot paths in implementation
r1<- mem [r2+0]
r3<- r1 + r4
2
0
r1<- r1 + r3
0
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
4
Dependence Distances (cont.)
Control dependences
Reflect other constraints on reordering
Dependences to prevent branch
reordering
mem [r3+4] <- r4
bne r2, 0, cb11
r6<- mem [r5+0]
bne r1, 0, cb10
0
0
0
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
5
Dependence Distances (cont.)
Control dependences (cont.)
Dependences to prevent operations
from moving above branches
Dependences to prevent dependent or
ambiguous memory ops from reordering
(store-load, store-store dependences)
Can mislead scheduling heuristics that
use dependence height (next slide)
IMPACT
UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software
© 1999, Wen-mei W. Hwu, All Rights Reserved.
6
Dependence Distances (cont.)
Making dep. distances more accurate
If can only issue one branch per cycle,
then give dependences between
branches a distance of one cycle
mem [r3+4] <- r4
bne r2, 0, cb11
r6<- mem [r5+0]
bne r1, 0, cb10
2
1
1
pf3
pf4
pf5

Partial preview of the text

Download Dependence Distances in Computer Microarchitecture: Hardware and Software and more Study notes Computer Architecture and Organization in PDF only on Docsity!

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Machine

Description

Wen-mei Hwu

ECE

University of Illinois,

Urbana-Champaign

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Outline

• Dependence distances

• Processor capabilities

• Operand effects

• Whole greater than sum of parts

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

3

Dependence Distances

• Prevent unexpected interlocks

•renaming

•incomplete bypass

–Prevent lost opportunities

•Special hot paths in implementation

r1<- mem [r2+0]

r3<- r1 + r

r1<- r1 + r

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

4

Dependence Distances (cont.)

• Control dependences

–Reflect other constraints on reordering

–Dependences to prevent branch reordering

mem [r3+4] <- r

bne r2, 0, cb

r6<- mem [r5+0]

bne r1, 0, cb

5

Dependence Distances (cont.)

• Control dependences (cont.)

–Dependences to prevent operations from moving above branches

–Dependences to prevent dependent or ambiguous memory ops from reordering (store-load, store-store dependences)

–Can mislead scheduling heuristics that use dependence height (next slide)

6

Dependence Distances (cont.)

• Making dep. distances more accurate

–If can only issue one branch per cycle, then give dependences between branches a distance of one cycle

mem [r3+4] <- r

bne r2, 0, cb

r6<- mem [r5+0]

bne r1, 0, cb

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Dependence Distances (cont.)

  • Making dependence distances more accurate (cont.)

–If nothing can issue after a branch, then give dependences to operations held below branch a distance of 1

–If store ties up memory for 2 cycles, then give store-load and store-store control dependences a distance of 2 UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Dependence Distances (cont.)

  • Could use laundry list approach

–List operation latency (register flow distance) of each opcode

–List distance for other type of dependences

•Globally or for each opcode

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

9

Dependence Distance (cont.)

  • Weaknesses of laundry list approach

–Distance may also depend on the operands of the operation

•Addressing mode of memory operation

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

10

Dependence Distance (cont.)

–Distance may also depend on which operation is using the result

•Special hot paths

  • Can chain two dependent Ialu ops

•Early operand usage

  • Address generation operands must be ready one cycle before use

•Incomplete bypass across clusters

11

Operand Usage Approach

  • Define time 0 as just before an operation begins execution

OL_shft (dest(2) src(0 0) br_dest(0) br_src(0))

OL_cbr ( src(0 0 0) br_dest(1) br_src(0))

OL_st ( src(-1 -1 1) br_dest(1) br_src(0) mem_dest(2) mem_src(0))

12

Operand Usage Approach

  • Enumerate the time(s) that the dest operand(s) are written

–Physical destinations (to set register dependence distances)

•A two cycle shift operation writes

destination at time 2

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Processor’s Capability

  • Could use laundry list of machine’s characteristics

–Target a particular style of processor implementation

–Machine model commonly hard coded into the scheduler/simulator

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Processor’s Capabilities

•Can build to exact processor

specification

  • The approach used by industry to model new implementations
  • Allows fine-tuning final design parameters
  • Allows early generation of performance estimates

•Can use generic machine model with

a few high-level parameters

  • The approach taken by gcc’s instruction scheduler

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

21

Processor’s Capabilities

–Difficult to precisely reflect large implementation changes

•Each knob usually allows only

variation around a specific design

point

•Knobs usually do not change major

machine characteristics

•Can hamper high-level evaluations of

new implementations

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

22

Processor’s Capabilities

  • The resource modeling approach

–Enumerate the relevant processor resources:

•The instruction decode units

  • Three general-purpose decoders
  • One complex and two simple decoders
  • One int, one branch, and one float decoder

23

Processor’s Capabilities

•Register file read and write ports

  • Need to model only if port configuration constrains execution

•The function units or execution pipes

•The result buses and/or any other

limited resource that affects execution

•Use a detail level appropriate for the

processor’s constraints

  • Reservation Tables vs. FSA

24 Reservation Tables

  • Describe how operations use resources as they flow through the processor

Decoder Read Port Ialu 1 2 3 1 2 3 4 1 2

Cycle

  • 0 1

X

X

X

Wr Pt

X

  • Table usually bit encoded for efficiency

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Reservation Tables

–Table of which resources are used and at what times

–Fix usage times relative to some point in operation execution

•Arbitrarily define time 0 as just before

an operation begins execution

–For example, one table for a RISC Ialu op with one register source: UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Reservation Tables

  • Usage by the instruction scheduler

–Use processor resource information to create a resource usage table

•Keep track of what processor resources

are available each cycle

•Record the resources used by scheduled

operations

•Initially all resources available

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

27

Reservation Tables

–Operation can be scheduled if...

•Dependences have been are satisfied

•Execution resources required by

operation are available

  • Check resource usage table for resource availability
  • If resources available, mark resources as used and schedule op
  • If resources unavailable, operation cannot be scheduled in that cycle UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

28

Reservation Tables

  • Typically there are multiple ways a

operation can use resources

–For example, that Ialu op with one register src might be:

•Allowed to use decoder 1 or 2 but not 3

•Allowed to use any read port, Ialu, and

write port

29

Reservation Tables

–Enumeration solution: enumerate all of the reservation table variations

•Works well if just a few variations

  • One reason that a minimal set of resources should be used in model

30

Reservation Tables

–Counter-based solution: use counters for interchangeable resources

•Efficiently handles the “any of a

resource” cases

  • Use any read port, any of set of identical FUs, etc.

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Operand Effects

–Sign bit in branch target can affect branch behavior

–Literal width can affect grouping logic constraints

–Number of literals can affect grouping logic constraints

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

Operand Effects

  • Could expand opcode repertoire

to capture relevant operand information

–Adds complexity to rest of compiler

–Relevant operand info can change with each implementation

UIUC ECE 411 and NTU CA 718-Q: Computer Microarchitecture: Hardware and Software© 1999, Wen-mei W. Hwu, All Rights Reserved. IMPACT

39

Operand Effects (cont.)

  • Another approach: describe

operation formats separately

–Specify what each operand may be for each format

•For example, three sample formats for an

Ialu operation (Using one read port, two

read ports, or writing to a control register)