Homework 2 - Advanced Compilers - Winter 2004 | EECS 583, Assignments of Electrical and Electronics Engineering

Material Type: Assignment; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2004;

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-3yg
koofers-user-3yg 🇺🇸

5

(2)

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Homework 2
Winter 2004
Assigned: Monday, February 16, 2004
Due: Sunday, March 7, 2004
This homework focuses on extending the Elcor optimization system to do 2 optimizations
1. Local common subexpression elimination (CSE)
2. Register renaming
#1: Local CSE
Your first mission is to implement the optimization local common subexpression
elimination. The top-level function should take an entire procedure and walk over each
basic block or hyperblock invoking local CSE on the block. The local CSE algorithm
should identify and eliminate all redundant expressions within a block. Elimination is
accomplished by creating 2 move operations. So, it should perform the following
transformation:
r1 = r2 + r3 if T r1 = r2 + r3 if T
r100 = r1 if T
r4 = r2 + r3 if T r4 = r100 if T
Subsequent copy propagation and dead code elimination will remove the copies (you
don’t need to implement these). Note that you are implementing a local algorithm only,
so any inter-block redundant expressions are not eliminated, only intra-block
redundancies. Your implementation should only handle arithmetic instructions (see
opcode_properties.h for useful functions to identify the appropriate opcodes). Branches,
pbrs, loads, stores, cmpps, and pseudo operations should be excluded from optimization.
You should use reaching definitions (DU/UD chains) to accomplish the optimization. A
sample piece of code will be provided to invoke reaching definition analysis and access
the information. Your optimization should work on either a basic block or hyperblock.
Thus, your optimization should be predicate cognizant. This is actually easier than it
sounds, as most Elcor analyses (ie liveness, reaching defs) are already predicate
cognizant, so this is not that difficult, but does require a bit of extra work.
Elcor already has a version of global CSE, see Opti/el_opti_common_subexpr_elim.cpp.
You are welcome to look at this and steal any code from it that you may find useful.
However, this implementation is rather difficult to understand and it uses available
expression analysis. Your implementation (which will use reaching defs) will be quite
different, so in reality the amount of useful code you can get from here will be limited
and it may be easier to ignore this code to avoid unnecessary confusion.
pf3

Partial preview of the text

Download Homework 2 - Advanced Compilers - Winter 2004 | EECS 583 and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Homework 2

Winter 2004

Assigned: Monday, February 16, 2004

Due: Sunday, March 7, 2004

This homework focuses on extending the Elcor optimization system to do 2 optimizations

  1. Local common subexpression elimination (CSE)
  2. Register renaming

#1: Local CSE

Your first mission is to implement the optimization local common subexpression elimination. The top-level function should take an entire procedure and walk over each basic block or hyperblock invoking local CSE on the block. The local CSE algorithm should identify and eliminate all redundant expressions within a block. Elimination is accomplished by creating 2 move operations. So, it should perform the following transformation: r1 = r2 + r3 if T  r1 = r2 + r3 if T  r100 = r1 if T r4 = r2 + r3 if T  r4 = r100 if T Subsequent copy propagation and dead code elimination will remove the copies (you don’t need to implement these). Note that you are implementing a local algorithm only, so any inter-block redundant expressions are not eliminated, only intra-block redundancies. Your implementation should only handle arithmetic instructions (see opcode_properties.h for useful functions to identify the appropriate opcodes). Branches, pbrs, loads, stores, cmpps, and pseudo operations should be excluded from optimization. You should use reaching definitions (DU/UD chains) to accomplish the optimization. A sample piece of code will be provided to invoke reaching definition analysis and access the information. Your optimization should work on either a basic block or hyperblock. Thus, your optimization should be predicate cognizant. This is actually easier than it sounds, as most Elcor analyses (ie liveness, reaching defs) are already predicate cognizant, so this is not that difficult, but does require a bit of extra work. Elcor already has a version of global CSE, see Opti/el_opti_common_subexpr_elim.cpp. You are welcome to look at this and steal any code from it that you may find useful. However, this implementation is rather difficult to understand and it uses available expression analysis. Your implementation (which will use reaching defs) will be quite different, so in reality the amount of useful code you can get from here will be limited and it may be easier to ignore this code to avoid unnecessary confusion.

#2: Register renaming

Your second mission is to implement register renaming to remove all anti and output dependences from basic blocks/hyperblocks to increase ILP. Register renaming consists of 2 related parts: global safe renaming of disjoint webs and rename with copy within a block. Safe renaming identifies disjoint variable lifetimes and renames each to be unique. Thus, no additional operations are inserted. Safe renaming is accomplished by identifying webs of defs/uses for a particular register. Then, each web is given a unique name in the case of 2 or more webs sharing a name. The primary analysis tool you should use for this optimization is again reaching definitions. Given a program reference (operation x operand) as a seed (assume a def reference is the seed for this explanation), a web can be constructed by iteratively getting all the uses of that definition. Then, all the defs of those uses, then all the uses of those defs, etc., until there are no changes. Any program reference that is not already in a web may serve as a seed for a new web. At the end of this process, every program reference should be in exactly 1 web. Each web can then be given a unique name in the case it does not have one already, with all the references in the web having their operand updated to the new name. Rename with copy breaks up webs by performing additional renaming. However, it is not possible to simply rename, a copy instruction must be inserted to maintain correctness. For example, consider the following code: r1 = r2 + r3 if p1  r1 = r2 + r3 if p1 (unchanged) r2 = r4 + r5 if p2  r100 = r4 + r5 if p2 (breaks anti dep) r2 = r100 if p2 (new copy) Things are not quite this simple though. You also want to substitute the renamed register (r100 above) to all subsequent uses of r2 for which r100 is guaranteed to be the only reaching definition. This enables these subsequent uses to be independent of the copy. So, consider a larger view of the above example: (assume T > p1 > p2 > p3, where > is the superset relation) r1 = r2 + r3 if p1  r1 = r2 + r3 if p1 (unchanged) r2 = r4 + r5 if p2  r100 = r4 + r5 if p2 (breaks anti dep) r2 = r100 if p2 (new copy) = r2 if T  = r2 (cannot change) = r2 if p2  = r100 (ok to change) = r2 if p3  = r100 (ok to change) Your rename with copy optimization is restricted in scope to a block (BB or HB). Thus, you do not need to break any cross block anti and output dependences with this optimization. Further, you do not need to substitute the renamed register outside of the block where it is defined. Rename with copy should just eliminate all remaining anti and output dependences in basic blocks and hyperblocks. Note that even after rename with copy in the above example, there is still an anti dependence (op1 to op3). So, to avoid an