Slides on Code Generation I - Fall 2002 | EECS 583, Study notes of Electrical and Electronics Engineering

Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2002;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-980
koofers-user-980 🇺🇸

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Lecture 12
Code Generation I
University of Michigan
February 18, 2002
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Slides on Code Generation I - Fall 2002 | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 12 Code Generation I

University of Michigan February 18, 2002

  • 1 -

Code generation

Y

Map optimized “machine-independent” assembly to finalassembly code

Y

Input code

Classical optimizations

ILP optimizations

Formed regions, applied if-conversion

Y

Virtual

Æ

physical binding

2 big steps

  1. Scheduling

y

Determine when every operation executions

y

Create MultiOps

  1. Register allocation

y

Map virtual

Æ

physical registers

y

Spill to memory if necessary

  • 3 -

What do we need to schedule?

Y

Information about the processor

Number of resources

Which resources are used by each instruction

Latencies

Operand encoding limitations

Y

Lets assume

2 issue slots, 1 memory port, 1 adder/multiplier

load = 2 cycles, add = 1 cycle, mpy = 3 cycles

y

All units fully pipelined

Each operand can be register or 6 bit signed literal

  • 4 -

How do we schedule?

Y

When is it legal to schedule an instruction?

Correct execution

Avoid pipeline stalls

Need a precedence graph – flow, anti, output deps

y

What about memory deps? control deps? Delay slots?

Y

Given multiple operations that can be scheduled, how do youpick the best one?

How do you know it is the best one?

y

What about a good guess?

y

Does it matter, just pick one at random?

Are decisions final?, or is this an iterative process?

Y

How do we keep track of resources that are busy/free

Need a reservation table

y

Matrix (resources x time)

  • 6 -

Compiler code generation – 2

nd

try

Y

Map optimized “machine-independent” assembly tofinal assembly code

Y

Virtual

Æ

physical binding

»

Cannot do this all at once,too many decisions!!

»

Do slowly

»

Each step refines thebinding by restrictingprevious choices

Y

Schedule both before andafter register allocation

»

Initial scheduling is free ofreal processor registerconstraints

»

2

nd

phase required due to

spill code

code selection, literal handling prepass operation binding

scheduling

register allocation and spill code insertion

postpass scheduling

code emission

  • 7 -

Why not schedule after allocation?

physical regs

virtual regs

r1 = load(r10) r2 = load(r11) r3 = r1 + 4 r4 = r1 – r12 r5 = r2 + r4 r6 = r5 + r3 r7 = load(r13) r8 = r7 * 23 store (r8, r6)

R1 = load(R1) R2 = load(R2) R5 = R1 + 4 R1 = R1 – R3 R2 = R2 + R1 R2 = R2 + R5 R5 = load(R4) R5 = R5 * 23 store (R5, R2)

  • 9 -

The 6 step program (cont)

Y

4. Register allocation

Assign physical registers

Bind access each equilvalent register to a specific physicalregister

Introduce additional code to spill registers to memory

Y

5. Postpass scheduling

A second pass of scheduling to handle spill code

Resource assignments from first pass are ignored

But, registers are physical, so less code motion freedom

Y

6. Code emission

Convert “fully qualified” operations into real assembly

A translator basically

Assembler converts this assembly to machine code

Y

Focus for now on 3, 4, 5, assume 1, 2, 6 are not needed

  • 10 -

Machine information

Y

Each step of code generation requires knowledge of themachine

Hard code it? – used to be common practice

Retargetability, then cannot

Y

What does the code generator need to know about thetarget processor?

Structural information?

y

No

For each opcode

y

What registers can be accessed as each of its operands

y

Other operand encoding limitations

Operation latencies

y

Read inputs, write outputs

Resources utilized

y

Which ones, when

  • 12 -

IO format

Y

Registers, register files

Number, width, static or rotating

Read-only (hardwired 0) or read-write

Y

Operation

Number of source/dests

Predicated or not

For each source/dest/pred

y

What register file(s) can be read/written

y

Literals, if so, how big Multicluster machine example:

ADD_W.

gpr1, gpr1 : gpr

ADD_W_L.

gpr1, lit6 : gpr

ADD_W.

gpr2, gpr2 : gpr

  • 13 -

Latency information

Y

Multiply takes 3 cycles

No, not that simple!!!

Y

Differential input/output latencies

Earliest read latency for each source operand

Latest read latency for each source operand

Earliest write latency for each destination operand

Latest write latency for each destination operand

Y

mpyadd(d1, d2, s1, s2, s3)

Æ

d1 = s1 * s2, d2 = d1 + s

s

s d

s

d

  • 15 -

Memory serialization latency

Y

Ensuring the proper ordering of dependent memoryoperations

Y

Not the memory latency

But, point in the memory pipeline where 2 ops are guaranteedto be processed in sequential order

Y

Page fault – memory op is re-executed, so need

Earliest mem serialization latency

Latest mem serialization latency

Y

Remember

Compiler will use this, so any 2 memory ops that cannot beproven independent, must be separated by mem serializationlatency.

  • 16 -

Branch latency

Y

Time relative to the initiation time of a branch at which thetarget of the branch is initiated

Y

What about branch prediction?

Can reduce branch latency

But, may not make it 1

Y

We will assume branch latency is 1 for this class (ie nodelay slots!)

0: branch 1: xxx 2: yyy 3: target

Example:

branch latency = k (3) delay slots = k – 1 (2) Note xxx and yyy are multiOps

  • 18 -

Reservation tables

s

For each opcode, the resources used at each cycle relative to its initiation time are specified in the form of a table Res1, Res2 are abstract resources to model issue constraints

Res

Res

ALU

MPY

Resultbu

relative

time

X

X

X

Integer add

s

s

Res

Res

ALU

MPY

Resultbu

relative

time

X

X

X

X

Res

Res

ALU

MPY

Resultbu

relative

time

X

X X

X

Load, uses ALU for addr calculation, can’t issue load with add or multiply

Non-pipelined multiply

  • 19 -

Hmdes2 – Example integer add entries SECTION Operation{

// **** Integer operations ****$for (idx in $0..(integer_units-1)){

// Table 2: Integer computation operations$for (class in intarith1_int intarith2_int intarith2_intshift intarith2_intdiv intarith2_intmpy){

$for (op in ${OP_${class}}){

$for(w in ${int_alu_widths}){

"${op}${w}.${idx}"(alt(SA${class}_i${idx})); } } }

}

What this really says:ADD_W.0 gets alt(SA_intarith2_int_i0)

Add on Integer unit 0, SA = scheduling alternative

ADD_W.1 gets alt(SA_intarith2_int_i1)

Add on Integer unit 1

Trace back of relevant entries for integer add

see trimaran/elcor/mdes/hpl_pd_elcor_std.hmdes