Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Slides on Code Generation I - Fall 2002 | EECS 583, Study notes of Electrical and Electronics Engineering

University of Michigan (UM) - Ann Arbor Electrical and Electronics Engineering

Prof. Scott Mahlke

Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2002;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-980 🇺🇸

10 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

EECS 583 – Lecture 12

Code Generation I

University of Michigan

February 18, 2002

Discover Study notes of Electrical and Electronics Engineering University of Michigan (UM) - Ann Arbor

Partial preview of the text

Download Slides on Code Generation I - Fall 2002 | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 12 Code Generation I

University of Michigan February 18, 2002

Code generation

Map optimized “machine-independent” assembly to finalassembly code

Input code

Classical optimizations

ILP optimizations

Formed regions, applied if-conversion

Virtual

Æ

physical binding

2 big steps

Scheduling

Determine when every operation executions

Create MultiOps

Map virtual

physical registers

Spill to memory if necessary

What do we need to schedule?

Information about the processor

Number of resources

Which resources are used by each instruction

Latencies

Operand encoding limitations

Lets assume

2 issue slots, 1 memory port, 1 adder/multiplier

load = 2 cycles, add = 1 cycle, mpy = 3 cycles

All units fully pipelined

Each operand can be register or 6 bit signed literal

How do we schedule?

When is it legal to schedule an instruction?

Correct execution

Avoid pipeline stalls

Need a precedence graph – flow, anti, output deps

What about memory deps? control deps? Delay slots?

Given multiple operations that can be scheduled, how do youpick the best one?

How do you know it is the best one?

What about a good guess?

Does it matter, just pick one at random?

Are decisions final?, or is this an iterative process?

How do we keep track of resources that are busy/free

Need a reservation table

Matrix (resources x time)

Compiler code generation – 2

nd

try

Map optimized “machine-independent” assembly tofinal assembly code

Virtual

Æ

physical binding

Cannot do this all at once,too many decisions!!

Do slowly

Each step refines thebinding by restrictingprevious choices

Schedule both before andafter register allocation

Initial scheduling is free ofreal processor registerconstraints

phase required due to

spill code

code selection, literal handling prepass operation binding

scheduling

postpass scheduling

code emission

Why not schedule after allocation?

physical regs

virtual regs

r1 = load(r10) r2 = load(r11) r3 = r1 + 4 r4 = r1 – r12 r5 = r2 + r4 r6 = r5 + r3 r7 = load(r13) r8 = r7 * 23 store (r8, r6)

R1 = load(R1) R2 = load(R2) R5 = R1 + 4 R1 = R1 – R3 R2 = R2 + R1 R2 = R2 + R5 R5 = load(R4) R5 = R5 * 23 store (R5, R2)

The 6 step program (cont)

4. Register allocation

Assign physical registers

Bind access each equilvalent register to a specific physicalregister

Introduce additional code to spill registers to memory

5. Postpass scheduling

A second pass of scheduling to handle spill code

Resource assignments from first pass are ignored

But, registers are physical, so less code motion freedom

6. Code emission

Convert “fully qualified” operations into real assembly

A translator basically

Assembler converts this assembly to machine code

Focus for now on 3, 4, 5, assume 1, 2, 6 are not needed

10 -

Machine information

Each step of code generation requires knowledge of themachine

Hard code it? – used to be common practice

Retargetability, then cannot

What does the code generator need to know about thetarget processor?

Structural information?

For each opcode

What registers can be accessed as each of its operands

Other operand encoding limitations

Operation latencies

Read inputs, write outputs

Resources utilized

Which ones, when

12 -

IO format

Registers, register files

Number, width, static or rotating

Read-only (hardwired 0) or read-write

Operation

Number of source/dests

Predicated or not

For each source/dest/pred

What register file(s) can be read/written

Literals, if so, how big Multicluster machine example:

ADD_W.

gpr1, gpr1 : gpr

ADD_W_L.

gpr1, lit6 : gpr

ADD_W.

gpr2, gpr2 : gpr

13 -

Latency information

Multiply takes 3 cycles

No, not that simple!!!

Differential input/output latencies

Earliest read latency for each source operand

Latest read latency for each source operand

Earliest write latency for each destination operand

Latest write latency for each destination operand

mpyadd(d1, d2, s1, s2, s3)

Æ

d1 = s1 * s2, d2 = d1 + s

s d

15 -

Memory serialization latency

Ensuring the proper ordering of dependent memoryoperations

Not the memory latency

But, point in the memory pipeline where 2 ops are guaranteedto be processed in sequential order

Page fault – memory op is re-executed, so need

Earliest mem serialization latency

Latest mem serialization latency

Remember

Compiler will use this, so any 2 memory ops that cannot beproven independent, must be separated by mem serializationlatency.

16 -

Branch latency

Time relative to the initiation time of a branch at which thetarget of the branch is initiated

What about branch prediction?

Can reduce branch latency

But, may not make it 1

We will assume branch latency is 1 for this class (ie nodelay slots!)

0: branch 1: xxx 2: yyy 3: target

Example:

branch latency = k (3) delay slots = k – 1 (2) Note xxx and yyy are multiOps

18 -

Reservation tables

For each opcode, the resources used at each cycle relative to its initiation time are specified in the form of a table Res1, Res2 are abstract resources to model issue constraints

Res

ALU

MPY

Resultbu

relative

time

X

Integer add

Res

ALU

MPY

Resultbu

relative

time

X

Res

ALU

MPY

Resultbu

relative

time

X

X X

X

Load, uses ALU for addr calculation, can’t issue load with add or multiply

Non-pipelined multiply

19 -

Hmdes2 – Example integer add entries SECTION Operation{

// **** Integer operations ****$for (idx in $0..(integer_units-1)){

// Table 2: Integer computation operations$for (class in intarith1_int intarith2_int intarith2_intshift intarith2_intdiv intarith2_intmpy){

$for (op in ${OP_${class}}){

$for(w in ${int_alu_widths}){

"${op}${w}.${idx}"(alt(SA${class}_i${idx})); } } }

}

What this really says:ADD_W.0 gets alt(SA_intarith2_int_i0)

Add on Integer unit 0, SA = scheduling alternative

ADD_W.1 gets alt(SA_intarith2_int_i1)

Add on Integer unit 1

Trace back of relevant entries for integer add

see trimaran/elcor/mdes/hpl_pd_elcor_std.hmdes

Slides on Code Generation I - Fall 2002 | EECS 583, Study notes of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download Slides on Code Generation I - Fall 2002 | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 12 Code Generation I

University of Michigan February 18, 2002

Map optimized “machine-independent” assembly to finalassembly code

Input code

Virtual

Æ

physical binding

Information about the processor

Lets assume

When is it legal to schedule an instruction?

Given multiple operations that can be scheduled, how do youpick the best one?

How do we keep track of resources that are busy/free

nd

Æ

r1 = load(r10) r2 = load(r11) r3 = r1 + 4 r4 = r1 – r12 r5 = r2 + r4 r6 = r5 + r3 r7 = load(r13) r8 = r7 * 23 store (r8, r6)

R1 = load(R1) R2 = load(R2) R5 = R1 + 4 R1 = R1 – R3 R2 = R2 + R1 R2 = R2 + R5 R5 = load(R4) R5 = R5 * 23 store (R5, R2)

4. Register allocation

5. Postpass scheduling

6. Code emission

Focus for now on 3, 4, 5, assume 1, 2, 6 are not needed

Each step of code generation requires knowledge of themachine

What does the code generator need to know about thetarget processor?

Registers, register files

Operation

ADD_W.

ADD_W_L.

ADD_W.

Multiply takes 3 cycles

Differential input/output latencies

mpyadd(d1, d2, s1, s2, s3)

Æ

d1 = s1 * s2, d2 = d1 + s

Ensuring the proper ordering of dependent memoryoperations

Not the memory latency

Page fault – memory op is re-executed, so need

Remember

Time relative to the initiation time of a branch at which thetarget of the branch is initiated

What about branch prediction?

We will assume branch latency is 1 for this class (ie nodelay slots!)

X

X

X

X

X

X

X

X

X X

X