Download Machine Information, Scheduling a Basic Block - Lecture Notes | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!
EECS 583 – Lecture 15 Machine Information,Scheduling a Basic Block
University of Michigan March 5, 2003
Machine Information^ Y
Each step of code generation requires knowledge of themachine
»^ Hard code it? – used to be common practice »^ Retargetability, then cannot
Y^
What does the code generator need to know about thetarget processor?^ »^
Structural information?^ y^
No
»^ For each opcode
y^ What registers can be accessed as each of its operands y^ Other operand encoding limitations
»^ Operation latencies
y^ Read inputs, write outputs
»^ Resources utilized
y^ Which ones, when
IO Format^ Y
Registers, register files
»^ Number, width, static or rotating »^ Read-only (hardwired 0) or read-write
Y^
Operation^ »^
Number of source/dests » Predicated or not » For each source/dest/pred^ y^
What register file(s) can be read/written y Literals, if so, how big^ Multicluster machine example:
ADD_W.
gpr1, gpr1 : gpr
ADD_W_L.
gpr1, lit6 : gpr
ADD_W.
gpr2, gpr2 : gpr
Latency Information^ Y^
Multiply takes 3 cycles^ »^
No, not that simple!!! Y^
Differential input/outputlatencies^ »^
Earliest read latency for eachsource operand » Latest read latency for eachsource operand » Earliest write latency for eachdestination operand » Latest write latency for eachdestination operand Y^
Why all this?^ »^
Unexpected events may makeoperands arrive late or beproduced early
Y^
Compound op: part may finishearly or start late
Y^
Instruction re-execution by^ »^
Exception handlers » Interupt handlers Y^
Ex: mpyadd(d1, d2, s1, s2, s3)^ »^
d1 = s1 * s2, d2 = d1 + s
s^
s2 d
s
E/L
s1: 0/
s2: 0/
s3: 2/
d1: 2/3 d2: 2/
d
Branch Latency^ Y
Time relative to the initiation time of a branch at whichthe target of the branch is initiated Y What about branch prediction?
»^ Can reduce branch latency »^ But, may not make it 1
Y^
We will assume branch latency is 1 for this class (ie nodelay slots!)
0: branch 1: xxx 2: yyy 3: target
Example:
branch latency = k (3) delay slots = k – 1 (2) Note xxx and yyy are multiOps
Resources^ Y
A machine resource
is any aspect of the target processor
for which over-subscription is possible if not explicitlymanaged by the compiler^ »^
Scheduler must pick conflict free combinations
Y^
3 kinds of machine resources^ »^
Hardware resources
are hardware entities that would be occupied
or used during the execution of an opcode^ y^
Integer ALUS, pipeline stages, register ports, busses, etc.
»^ Abstract resources
are conceptual entities that are used to model
operation conflicts or sharing constraints that do not directlycorrespond to any hardware resource^ y^
Sharing an instruction field
»^ Counted resources
are identical resources such that k are required
to do something^ y^
Any 2 input busses
Now, Lets Get Back to Scheduling…^ Y
Scheduling constraints
»^ What limits the operations that can be concurrently executed orreordered? »^ Processor resources – modeled by mdes »^ Dependences between operations
y^ Data, memory, control Y^
Processor resources^ »^
Manage using resource usage map (RU_map) » When each resource will be used by already scheduled ops » Considering an operation at time t^ y^
See if each resource in reservation table is free
»^ Schedule an operation at time t
y^ Update RU_map by marking resources used by op busy
Data Dependences^ Y
Data dependences
»^ If 2 operations access the same register, they are dependent »^ However, only keep dependences to most recentproducer/consumer as other edges are redundant »^ Types of data dependences
Output
Anti
Flow
r1 = r2 + r3r1 = r4 * 6
r1 = r2 + r3r2 = r5 * 6
r1 = r2 + r3r4 = r1 * 6
Dependence Graph^ Y
Represent dependences between operations in a block viaa DAG
»^ Nodes = operations »^ Edges = dependences
Y^
Single-pass traversal required to insert dependences
Y^
Example
1: r1 = load(r2) 2: r2 = r1 + r4 3: store (r4, r2) 4: p1 = cmpp (r2 < 0) 5: branch if p1 to BB3 6: store (r1, r2)
BB3:
Dependence Edge Latencies^ Y
Edge latency
= minimum number of cycles necessary
between initiation of the predecessor and successor inorder to satisfy the dependence
Y^
Register flow dependence, a
Æ
b
»^ Latest_write(a) – Earliest_read(b)
Y^
Register anti dependence, a
Æ
b
»^ Latest_read(a) – Earliest_write(b) + 1
Y^
Register output dependence, a
Æ
b
»^ Latest_write(a) – Earliest_write(b) + 1
Y^
Negative latency^ »^
Possible, means successor can start before predecessor » We will only deal with latency >= 0, so MAX any latency with 0
Class Problem
- Draw dependence graph 2. Label edges with type and latencies
machine modelmin/max read/writelatenciesadd:
src 0/1dst 1/ mpy:
src 0/2dst 2/ load:
src 0/0dst 2/2sync 1/ store: src 0/
dst -sync 1/
r1 = load(r2) r2 = r2 + 1 store (r8, r2) r3 = load(r2) r4 = r1 * r3 r5 = r5 + r4 r2 = r6 + 4 store (r2, r5)
Dependence Graph Properties - Estart^ Y
Estart = earliest start time, (as soon as possible - ASAP)
»^ Schedule length with infinite resources (dependence height) »^ Estart = 0 if node has no predecessors »^ Estart = MAX(Estart(pred) + latency) for each predecessor node »^ Example
1
2 2 3 3
2
2 4
3
Slack^ Y
Slack = measure of the scheduling freedom
»^ Slack = Lstart – Estart for each node »^ Larger slack means more mobility »^ Example
1
2 2 33
2
2 4
3
Critical Path^ Y
Critical operations = Operations with slack = 0
»^ No mobility, cannot be delayed without extending the schedulelength of the block »^ Critical path = sequence of critical operations from node with nopredecessors to exit node, can be multiple crit paths
1
2 2 3 3
2
2 4
3