Download Machine Information and Scheduling in Compiler Design - Prof. Scott Mahlke and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!
EECS 583 ā Class 13 Instruction Scheduling
University of Michigan February 21, 2005
Reading Material^ Y
Todayās class
Ā»^ āMachine Description Driven Compilers for EPIC Processorsā,B. Rau, V. Kathail, and S. Aditya, HP Technical Report, HPL-98-40, 1998.
Y^
Material for the next lecture^ Ā»^
āThree Architectural Models for Compiler-ControlledSpeculative Executionā, P. Chang et al., IEEE Transactions onComputers, Vol. 44, No. 4, April 1995, pp. 481-494.
Machine Information^ Y
Each step of code generation requires knowledge of themachine
Ā»^ Hard code it? ā used to be common practice Ā»^ Retargetability, then cannot
Y^
What does the code generator need to know about thetarget processor?^ Ā»^
Structural information?^ y^
No
Ā»^ For each opcode
y^ What registers can be accessed as each of its operands y^ Other operand encoding limitations
Ā»^ Operation latencies
y^ Read inputs, write outputs
Ā»^ Resources utilized
y^ Which ones, when
Machine Description (mdes)^ Y
Elcor mdes supports very general class of EPICprocessors
Ā»^ Probably more general than you need
Ā»^ Weakness ā Does not support ISA changes like GCC
Y^
Terminology^ Ā»^
Generic opcode^ y^
Virtual opcode, machine supports k versions of it y ADD_W
Ā»^ Architecture opcode or unit specific opcode or sched opcode
y^ Specific assembly operation of the processor y^ ADD_W.0 = add on function unit 0 Y^
Each unit specific opcode has 3 properties^ Ā»^
IO format Ā» Latency Ā» Resource usage
Latency Information^ Y^
Multiply takes 3 cycles^ Ā»^
No, not that simple!!! Y^
Differential input/outputlatencies^ Ā»^
Earliest read latency for eachsource operand Ā» Latest read latency for eachsource operand Ā» Earliest write latency for eachdestination operand Ā» Latest write latency for eachdestination operand Y^
Why all this?^ Ā»^
Unexpected events may makeoperands arrive late or beproduced early
Y^
Compound op: part may finishearly or start late
Y^
Instruction re-execution by^ Ā»^
Exception handlers Ā» Interupt handlers Y^
Ex: mpyadd(d1, d2, s1, s2, s3)^ Ā»^
d1 = s1 * s2, d2 = d1 + s
s^
s2 d
s
E/L
s1: 0/
s2: 0/
s3: 2/
d1: 2/3 d2: 2/
d
Memory Serialization Latency^ Y
Ensuring the proper ordering of dependent memoryoperations Y Not the memory latency
Ā»^ But, point in the memory pipeline where 2 ops are guaranteed tobe processed in sequential order
Y^
Page fault ā memory op is re-executed, so need^ Ā»^
Earliest mem serialization latency Ā» Latest mem serialization latency
Y^
Remember^ Ā»^
Compiler will use this, so any 2 memory ops that cannot beproven independent, must be separated by mem serializationlatency.
Resources^ Y
A machine resource
is any aspect of the target processor
for which over-subscription is possible if not explicitlymanaged by the compiler^ Ā»^
Scheduler must pick conflict free combinations
Y^
3 kinds of machine resources^ Ā»^
Hardware resources
are hardware entities that would be occupied
or used during the execution of an opcode^ y^
Integer ALUS, pipeline stages, register ports, busses, etc.
Ā»^ Abstract resources
are conceptual entities that are used to model
operation conflicts or sharing constraints that do not directlycorrespond to any hardware resource^ y^
Sharing an instruction field
Ā»^ Counted resources
are identical resources such that k are required
to do something^ y^
Any 2 input busses
Reservation Tables^ For each opcode, the resources^ used at each cycle relative to its^ initiation time are specified in the^ form of a table^ Res1, Res2 are abstract resources^ to model issue constraints
Res
ALURes
ResultbusMPY
X^
X
X
relative^ time
0 1 Integer add
Res
ALURes
ResultbusMPY
X^
X^
X
X
Res
ALURes
ResultbusMPY
X^
X X
X
relative^ time
relative^ time
Load, uses ALU for addr calculation, canāt issue load with add or multiply
Non-pipelined multiply
Data Dependences^ Y
Data dependences
Ā»^ If 2 operations access the same register, they are dependent Ā»^ However, only keep dependences to most recentproducer/consumer as other edges are redundant Ā»^ Types of data dependences
Output
Anti
Flow
r1 = r2 + r3r1 = r4 * 6
r1 = r2 + r3r2 = r5 * 6
r1 = r2 + r3r4 = r1 * 6
More Dependences^ Y
Memory dependences
Ā»^ Similar as register, but through memory Ā»^ Memory dependences may be certain or maybe
Y^
Control dependences^ Ā»^
We discussed this earlier Ā» Branch determines whether an operation is executed or not Ā» Operation must execute after/before a branch Ā» Note, control flow (C0) is not a dependence
Mem-output
Mem-anti
Control (C1)
Mem-flow
r2 = load(r1)store (r1, r3)
store (r1, r2)store (r1, r3)
if (r1 != 0)r2 = load(r1)
store (r1, r2)r3 = load(r1)
Dependence Edge Latencies^ Y
Edge latency
= minimum number of cycles necessary
between initiation of the predecessor and successor inorder to satisfy the dependence
Y^
Register flow dependence, a
Ć
b
Ā»^ Latest_write(a) ā Earliest_read(b)
Y^
Register anti dependence, a
Ć
b
Ā»^ Latest_read(a) ā Earliest_write(b) + 1
Y^
Register output dependence, a
Ć
b
Ā»^ Latest_write(a) ā Earliest_write(b) + 1
Y^
Negative latency^ Ā»^
Possible, means successor can start before predecessor Ā» We will only deal with latency >= 0, so MAX any latency with 0
Dependence Edge Latencies (2)^ Y
Memory dependences, a
Ć
b (all types, flow, anti,
output)^ Ā»^
latency = latest_serialization_latency(a) āearliest_serialization_latency(b) + 1 Ā» Prioritized memory operations^ y^
Hardware orders memory ops by order in MultiOp y Latency can be 0 with this support
Y^
Control dependences^ Ā»^
branch
Ć^
b
y^ Op b cannot issue until prior branch completed y^ latency = branch_latency
Ā»^ a
Ć
branch
y^ Op a must be issued before the branch completes y^ latency = 1 ā branch_latency (can be negative) y^ conservative, latency = MAX(0, 1-branch_latency)
Dependence Graph Properties - Estart^ Y
Estart = earliest start time, (as soon as possible - ASAP)
Ā»^ Schedule length with infinite resources (dependence height) Ā»^ Estart = 0 if node has no predecessors Ā»^ Estart = MAX(Estart(pred) + latency) for each predecessor node Ā»^ Example
1
2 2 3 3
2
2 4
3
Lstart^ Y
Lstart = latest start time, ALAP
Ā»^ Latest time a node can be scheduled s.t. sched length notincreased beyond infinite resource schedule length Ā»^ Lstart = Estart if node has no successors Ā»^ Lstart = MIN(Lstart(succ) - latency) for each successor node Ā»^ Example
1
2 2
3 3 2
2 4 3