



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various code optimization techniques used by compilers to improve program performance. Topics include loop unrolling, detection and elimination of redundant expressions, and value numbering. The document also explains the concept of a directed acyclic graph (dag) and its use in identifying redundant expressions.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




CS5363 Appendix 3 Some operation such as multiplication is more expensive than addition, for example. The compiler may replace some of the integer multiplications in the computation with additions, called operator strength reduction.
The goal of code optimization is to discover, at compile time, information about the run- time behavior of the program and to use that information to improve the code generated by the compiler.
Loop unrolling do 60 j = 1, n do 50 i = 1, n y(i)= y(i) + x(j) * m(i,j) 50 continue 60 continue The outer loop has been unrolled. 4 copies of the loop have been created with different values for j, ranging from j through j+3. The increment of the outer loop changes from 1 to 4. do 60 j = 1, n2, 4 do 50 i = 1, n y(i)= y(i) + x(j) * m(i,j)
Two issues, safety and profitability, lie at the heart of every optimization. Optimization should not change the meaning of an expression.
Redundant Expressions An expression x+y is redundant inside a block if it has already been computed in the block.
M = 2yz N = 3yz O = 2*y-z
Optimized as follow: t0 = 2y M = t0 z N = 3yz O = t0 –z
Building a Directed Acyclic Graph (DAG) to detect redundant expressions.
A DAG represents each distinct expression once. In a DAG, a node can have multiple parents. Any node with multiple parents must be a redundant expression.
If the parser uses hashing to detect identical subtrees, it will build DAGs that contain one subtree for each distinct expression. In this case, the instances of y have an identical value, so do the instances of z. However, the textual mechanism used to match y against y has no way to determine that an intervening assignment changes y’s value. Thus, the previous 2y is not equal to the later 2y! An easy way to solve this is to associate a counter in each variable and increase the counter on each assignment. Two expressions
have the same representation if and only if they are textually identical and none of the variables used in the expression is redefined between the two occurrences of the expression.
Value Numbering The compiler assigns a distinct number to each value computed at run time, with the property that two expressions, E1, and E2 have the same value number if and only if E and E2 are provably equal for all possible operands of the expressions.
Value Numbering a Single Block For each expression e in the block of the form, result = operand1 op operand
StmtList
StmtList
StmtList
z
2 y
3 y
z O -
z
Scopes of optimization include local, superlocal, regional, global, or whole program. Local method deals with optimization limited on basic blocks. An extended basic block (EBB) is a set of blocks B1, B2, …, Bn where B1 ay have multiple predecessors and every other blocks have a unique predecessor in the EBB. Global methods, called intraprocedural methods, examine an entire procedure. Whole-program method, called interprocedural methods, consider the entire program as their scope.
In Superlocal Value Numbering, the compiler extends its scope from a single basic block to an EBB. This approach can find redundancies and constant-valued expressions that the local algorithm misses.
Dominator-Based Value Numbering The superlocal value- numbering misses some opportunities because it must discard the entire value table when it reaches a block that has multiple predecessors in the CFG (according the definition of EBBs). The stat ic single-assignment (SSA) form has two important properties. Each name is defined by exactly one opera tion, and each use of a value refers to exactly o ne definition. The compiler can use the value table created for the most recent common ancestor along all paths that reach a block if the CFG is in SSA form.
In a CF G, if node x appears on every path from the graph’s entry to y, then we say that x dominates y. By definition, x dominates x. If x dominates y and x is not equal to y, then x strictly dominates y. The set of dominators for y is denoted Dom(y). The immediate dominator of y is the strict dominator of y that is closest to y, denoted IDom(y). The value table of IDom(y) is used to initialize y’s value table.
Global Redundancy Elimination
The classic data- flow analysis is used to compute the set of expressions that are available on entry to each block.
DEExpr(n): the set of downward exposed expressions in block n; i.e., those expressions defined in n that survive to the end of n. ExprKill(n): all expressions killed by a definition in block n.
Avail(n) can be computed by collecting expressions defined in predecessors survived to the end of a block and expressions available on entry to the block and are not killed in that block.
If local value numbering finds an evaluation of e in the block, where e in Avail(n), it rewrites the expression with a copy operation from a newly generated name, tempi , where
i is the index of e in the name space. For each block n, if e in DEExpr(n) and e is being referenced, the compiler must insert a copy after the last definition if e in n that moves the value of e into tempi.
Cloning to Increase Context The merging points in the CFG cause a loss of information during optimization. The compiler can clone basic blocks to eliminate merge points. It results in longer blocks, eliminating branches, and creating more optimization opportunities.
Inline Substitution Procedure calls present a significant barrier to optimization. The compiler can replace a call site with the body of the callee, with appropriate renaming and copying to simulate the effects of parameter binding at the origin al call site. In doing this, more opportunities for optimization can be found and operations involved in the calling sequence are eliminated.