Code Optimization Techniques: Loop Unrolling, Redundant Expressions, and Value Numbering, Assignments of Computer Science

Various code optimization techniques used by compilers to improve program performance. Topics include loop unrolling, detection and elimination of redundant expressions, and value numbering. The document also explains the concept of a directed acyclic graph (dag) and its use in identifying redundant expressions.

Typology: Assignments

Pre 2010

Uploaded on 08/18/2009

koofers-user-0xu-1
koofers-user-0xu-1 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CS5363 Appendix 3
Some operation such as multiplication is more expensive than addition, for example. The
compiler may replace some of the integer multiplications in the computation with
additions, called operator strength reduction.
The goal of code optimization is to discover, at compile time, information about the run-
time behavior of the program and to use that information to improve the code generated
by the compiler.
Loop unrolling
do 60 j = 1, n2
do 50 i = 1, n1
y(i)= y(i) + x(j) * m(i,j)
50 continue
60 continue
The outer loop has been unrolled. 4 copies of the loop have been created with different
values for j, ranging from j through j+3. The increment of the outer loop changes from 1
to 4.
do 60 j = 1, n2, 4
do 50 i = 1, n1
y(i)= y(i) + x(j) * m(i,j)
+ x(j+1) * m(i,j+1)
+ x(j+2) * m(i,j+2)
+ x(j+3) * m(i,j+3)
50 continue
60 continue
Two issues, safety and profitability, lie at the heart of every optimization. Optimization
should not change the meaning of an expression.
Redundant Expressions
An expression x+y is redundant inside a block if it has already been computed in the
block.
M = 2*y*z
N = 3*y*z
O = 2*y-z
Optimized as follow:
t0 = 2*y
M = t0 *z
N = 3*y*z
O = t0 z
Building a Directed Acyclic Graph (DAG) to detect redundant expressions.
pf3
pf4
pf5

Partial preview of the text

Download Code Optimization Techniques: Loop Unrolling, Redundant Expressions, and Value Numbering and more Assignments Computer Science in PDF only on Docsity!

CS5363 Appendix 3 Some operation such as multiplication is more expensive than addition, for example. The compiler may replace some of the integer multiplications in the computation with additions, called operator strength reduction.

The goal of code optimization is to discover, at compile time, information about the run- time behavior of the program and to use that information to improve the code generated by the compiler.

Loop unrolling do 60 j = 1, n do 50 i = 1, n y(i)= y(i) + x(j) * m(i,j) 50 continue 60 continue The outer loop has been unrolled. 4 copies of the loop have been created with different values for j, ranging from j through j+3. The increment of the outer loop changes from 1 to 4. do 60 j = 1, n2, 4 do 50 i = 1, n y(i)= y(i) + x(j) * m(i,j)

  • x(j+1) * m(i,j+1)
  • x(j+2) * m(i,j+2)
  • x(j+3) * m(i,j+3) 50 continue 60 continue

Two issues, safety and profitability, lie at the heart of every optimization. Optimization should not change the meaning of an expression.

Redundant Expressions An expression x+y is redundant inside a block if it has already been computed in the block.

M = 2yz N = 3yz O = 2*y-z

Optimized as follow: t0 = 2y M = t0 z N = 3yz O = t0 –z

Building a Directed Acyclic Graph (DAG) to detect redundant expressions.

A DAG represents each distinct expression once. In a DAG, a node can have multiple parents. Any node with multiple parents must be a redundant expression.

If the parser uses hashing to detect identical subtrees, it will build DAGs that contain one subtree for each distinct expression. In this case, the instances of y have an identical value, so do the instances of z. However, the textual mechanism used to match y against y has no way to determine that an intervening assignment changes y’s value. Thus, the previous 2y is not equal to the later 2y! An easy way to solve this is to associate a counter in each variable and increase the counter on each assignment. Two expressions

have the same representation if and only if they are textually identical and none of the variables used in the expression is redefined between the two occurrences of the expression.

Value Numbering The compiler assigns a distinct number to each value computed at run time, with the property that two expressions, E1, and E2 have the same value number if and only if E and E2 are provably equal for all possible operands of the expressions.

Value Numbering a Single Block For each expression e in the block of the form, result = operand1 op operand

  1. get the value numbers for operand1 and operand
  2. construct a has key from the op and the value numbers for operand1 and operand
  3. if the hash key is already present then replace expression e with a copy operation and record the value number for result else insert the hash key into the table; assign the hash key a new value number and record the value number for result.

StmtList

StmtList

StmtList

M *

z

2 y

N *

3 y

z O -

z

Scopes of optimization include local, superlocal, regional, global, or whole program. Local method deals with optimization limited on basic blocks. An extended basic block (EBB) is a set of blocks B1, B2, …, Bn where B1 ay have multiple predecessors and every other blocks have a unique predecessor in the EBB. Global methods, called intraprocedural methods, examine an entire procedure. Whole-program method, called interprocedural methods, consider the entire program as their scope.

In Superlocal Value Numbering, the compiler extends its scope from a single basic block to an EBB. This approach can find redundancies and constant-valued expressions that the local algorithm misses.

Dominator-Based Value Numbering The superlocal value- numbering misses some opportunities because it must discard the entire value table when it reaches a block that has multiple predecessors in the CFG (according the definition of EBBs). The stat ic single-assignment (SSA) form has two important properties. Each name is defined by exactly one opera tion, and each use of a value refers to exactly o ne definition. The compiler can use the value table created for the most recent common ancestor along all paths that reach a block if the CFG is in SSA form.

In a CF G, if node x appears on every path from the graph’s entry to y, then we say that x dominates y. By definition, x dominates x. If x dominates y and x is not equal to y, then x strictly dominates y. The set of dominators for y is denoted Dom(y). The immediate dominator of y is the strict dominator of y that is closest to y, denoted IDom(y). The value table of IDom(y) is used to initialize y’s value table.

Global Redundancy Elimination

The classic data- flow analysis is used to compute the set of expressions that are available on entry to each block.

  • An expression e is defined at point p in the CFG if its value is comp uted at p. We call p a definition site for e.
  • An expression e is killed at point p in the CFG if one or more of its operands is defined at p. We call p a kill site of e.
  • An expression e is available at point p in the CFG if every path leading to p contains a prior definition of e, and e is not killed between that definition and p. Computing Available Expressions

DEExpr(n): the set of downward exposed expressions in block n; i.e., those expressions defined in n that survive to the end of n. ExprKill(n): all expressions killed by a definition in block n.

Avail(n) can be computed by collecting expressions defined in predecessors survived to the end of a block and expressions available on entry to the block and are not killed in that block.

If local value numbering finds an evaluation of e in the block, where e in Avail(n), it rewrites the expression with a copy operation from a newly generated name, tempi , where

i is the index of e in the name space. For each block n, if e in DEExpr(n) and e is being referenced, the compiler must insert a copy after the last definition if e in n that moves the value of e into tempi.

Cloning to Increase Context The merging points in the CFG cause a loss of information during optimization. The compiler can clone basic blocks to eliminate merge points. It results in longer blocks, eliminating branches, and creating more optimization opportunities.

Inline Substitution Procedure calls present a significant barrier to optimization. The compiler can replace a call site with the body of the callee, with appropriate renaming and copying to simulate the effects of parameter binding at the origin al call site. In doing this, more opportunities for optimization can be found and operations involved in the calling sequence are eliminated.