





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2002;
Typology: Study notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















Code generation
Y
Y
Classical optimizations
ILP optimizations
Formed regions, applied if-conversion
Y
2 big steps
y
Determine when every operation executions
y
Create MultiOps
y
Map virtual
Æ
physical registers
y
Spill to memory if necessary
What do we need to schedule?
Y
Number of resources
Which resources are used by each instruction
Latencies
Operand encoding limitations
Y
2 issue slots, 1 memory port, 1 adder/multiplier
load = 2 cycles, add = 1 cycle, mpy = 3 cycles
y
All units fully pipelined
Each operand can be register or 6 bit signed literal
How do we schedule?
Y
Correct execution
Avoid pipeline stalls
Need a precedence graph – flow, anti, output deps
y
What about memory deps? control deps? Delay slots?
Y
How do you know it is the best one?
y
What about a good guess?
y
Does it matter, just pick one at random?
Are decisions final?, or is this an iterative process?
Y
Need a reservation table
y
Matrix (resources x time)
Compiler code generation – 2
try
Y
Map optimized “machine-independent” assembly tofinal assembly code
Y
Virtual
physical binding
»
Cannot do this all at once,too many decisions!!
»
Do slowly
»
Each step refines thebinding by restrictingprevious choices
Y
Schedule both before andafter register allocation
»
Initial scheduling is free ofreal processor registerconstraints
»
2
nd
phase required due to
spill code
code selection, literal handling prepass operation binding
scheduling
register allocation and spill code insertion
postpass scheduling
code emission
Why not schedule after allocation?
physical regs
virtual regs
The 6 step program (cont)
Y
Assign physical registers
Bind access each equilvalent register to a specific physicalregister
Introduce additional code to spill registers to memory
Y
A second pass of scheduling to handle spill code
Resource assignments from first pass are ignored
But, registers are physical, so less code motion freedom
Y
Convert “fully qualified” operations into real assembly
A translator basically
Assembler converts this assembly to machine code
Y
Machine information
Y
Hard code it? – used to be common practice
Retargetability, then cannot
Y
Structural information?
y
No
For each opcode
y
What registers can be accessed as each of its operands
y
Other operand encoding limitations
Operation latencies
y
Read inputs, write outputs
Resources utilized
y
Which ones, when
IO format
Y
Number, width, static or rotating
Read-only (hardwired 0) or read-write
Y
Number of source/dests
Predicated or not
For each source/dest/pred
y
What register file(s) can be read/written
y
Literals, if so, how big Multicluster machine example:
gpr1, gpr1 : gpr
gpr1, lit6 : gpr
gpr2, gpr2 : gpr
Latency information
Y
No, not that simple!!!
Y
Earliest read latency for each source operand
Latest read latency for each source operand
Earliest write latency for each destination operand
Latest write latency for each destination operand
Y
s
s d
s
d
Memory serialization latency
Y
Y
But, point in the memory pipeline where 2 ops are guaranteedto be processed in sequential order
Y
Earliest mem serialization latency
Latest mem serialization latency
Y
Compiler will use this, so any 2 memory ops that cannot beproven independent, must be separated by mem serializationlatency.
Branch latency
Y
Y
Can reduce branch latency
But, may not make it 1
Y
0: branch 1: xxx 2: yyy 3: target
Example:
branch latency = k (3) delay slots = k – 1 (2) Note xxx and yyy are multiOps
Reservation tables
s
For each opcode, the resources used at each cycle relative to its initiation time are specified in the form of a table Res1, Res2 are abstract resources to model issue constraints
Res
Res
ALU
MPY
Resultbu
relative
time
Integer add
s
s
Res
Res
ALU
MPY
Resultbu
relative
time
Res
Res
ALU
MPY
Resultbu
relative
time
Load, uses ALU for addr calculation, can’t issue load with add or multiply
Non-pipelined multiply
Hmdes2 – Example integer add entries SECTION Operation{
// **** Integer operations ****$for (idx in $0..(integer_units-1)){
// Table 2: Integer computation operations$for (class in intarith1_int intarith2_int intarith2_intshift intarith2_intdiv intarith2_intmpy){
$for (op in ${OP_${class}}){
$for(w in ${int_alu_widths}){
"${op}${w}.${idx}"(alt(SA${class}_i${idx})); } } }
}
What this really says:ADD_W.0 gets alt(SA_intarith2_int_i0)
Add on Integer unit 0, SA = scheduling alternative
ADD_W.1 gets alt(SA_intarith2_int_i1)
Add on Integer unit 1
Trace back of relevant entries for integer add
see trimaran/elcor/mdes/hpl_pd_elcor_std.hmdes