Predicated Execution: Effective Compiler Support using Hyperblock - Prof. Scott Mahlke, Study notes of Electrical and Electronics Engineering

The concept of predicated execution and its effective compiler support through the use of hyperblocks. The authors, s. Mahlke et al., present the concept of predicated code and the assumptions required to make it work. They also discuss the use of or-type predicates and their importance in generating predicated code. The document also covers the handling of arbitrary complex graphs and the steps involved in control dependence analysis, backedge coalescing, and control flow substitution.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-5fc-1
koofers-user-5fc-1 🇺🇸

9 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Class 4
If-conversion
University of Michigan
January 19, 2005
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Predicated Execution: Effective Compiler Support using Hyperblock - Prof. Scott Mahlke and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Class 4 If-conversion

University of Michigan January 19, 2005

  • 1 -

Reading Material^ Y^ Today’s class

»^ “The Program Dependence Graph and Its Use in Optimization”,^ J. Ferrante, K. Ottenstein, and J. Warren, ACM TOPLAS, 1987 »^ “On Predicated Execution”, Park and Schlansker, HPL TechnicalReport, 1991. Y Material for the next lecture »^ "Effective Compiler Support for Predicated Execution using theHyperblock", S. Mahlke et al., MICRO-25, 1992. »^ "Control CPR: A Branch Height Reduction Optimization forEPIC Processors", M. Schlansker et al., PLDI-99, 1999.

  • 3 -

Recap: CMPP Action Specifiers

Guarding^ predicate^0011

Compare^ Result^0101

UN^ UC^0001

ON 0 0 1 0

OC -^ - -^ - -^11 -

AN^ AC^ -^ -^0 -

      • 0

UN/UC = Unconditional normal/complement

This is what we used in the earlier examples guard = 0, both outputs are 0 guard = 1, UN = Compare result, UC = opposite

ON/OC = OR-type normal/complement AN/AC = AND-type normal/complement

  • 4 -

Recap: OR-type, AND-type Predicates^ p1 = 0^ p1 = cmpp_ON (r1 < r2) if T^ p1 = cmpp_OC (r3 < r4) if T^ p1 = cmpp_ON (r5 < r6) if T^ p1 = (r1 < r2) | (!(r3 < r4)) |^ (r5 < r5)^ Wired-OR into p

p1 = 1 p1 = cmpp_AN (r1 < r2) if T p1 = cmpp_AC (r3 < r4) if T p1 = cmpp_AN (r5 < r6) if T p1 = (r1 < r2) & (!(r3 < r4)) &^ (r5 < r5) Wired-AND into p

Generating predicated code for some source code requires OR-type predicates

Talk about these later – used for control height reduction

  • 6 -

Class Problem^ w = w + 1^ if (a == 0 || b <= 1)

x = x + 1 else if (c != -1) y = y + 1 } z = z + 1

a.^ Draw the CFG b.^ Predicate the code removing^ all branches c.^ Where could you use AND-type^ predicates to potentially speed things up?

  • 7 -

If-conversion^ Y^ Algorithm for generating predicated code

»^ Automate what we’ve been doing by hand »^ Handle arbitrary complex graphs^ y^

But, acyclic subgraph only!! y Need a branch to get you back to the top of a loop

»^ Efficient Y Roots are from Vector computer days »^ Vectorize a loop with an if-statement in the body Y 4 steps »^ 1. Loop backedge coalescing »^ 2. Control dependence analysis »^ 3. Control flow substitution »^ 4. CMPP compaction Y My version of Park & Schlansker

  • 9 -

Step 1: Backedge Coalescing^ Y^ Recall – Loop backedge is branch from inside the loopback to the loop header^ Y^ This step only applicable for a loop body

»^ If not a loop body

Æ^ skip this step

Y^ Process^ »^

Create a new basic block^ y^ New BB contains an unconditional branch to the loop header » Adjust all other backedges to go to new BB rather than header

Y^ Why do this?^ »^

Heuristic step – Not essential for correctness^ y^ If-conversion cannot remove backedges (only forward edges)^ y^ But this allows the control logic to figure out which backedge youtake to be eliminated » Generally this is a good thing to do

  • 10 -

Running Example – Backedge Coalescing

BB2c > 0^ c <= 0 b <= 13

c <= 25^

c > 25 d++ b++^

c++

c > 25 c <= 25 c++

BB

BB6 BB

BB

BB1 b < 0 b >= 0^ BB3^ BB b > 13

e < 34 e >= 34 a++

e++^ BB

BB2c > 0^ c <= 0 BB4b <= 13^ BB

BB

BB

BB1 b < 0 b >= 0^ BB

e++

b > 13 b++

d++^ BB^

a++ e < 34

e >= 34

  • 12 -

Control Dependences^ Y^ Recall

»^ Post dominator – BBX is post dominated by BBY if every pathfrom BBX to EXIT contains BBY »^ Immediate post dominator – First breadth first successor of ablock that is a post dominator Y Control dependence – BBY is control dependent on BBXiff »^ 1. There exists a directed path P from BBX to BBY with anyBBZ in P (excluding BBX and BBY) post dominated by BBY »^ 2. BBX is not post dominated by BBY Y In English, »^ A BB is control dependent on the closest BB(s) that determine(s)its execution »^ Its actually not a BB, it’s a control flow edge coming out of a BB

  • 13 -

Control Dependence Example

BB2 T BB4^ BB5 BB

T^ F

F

BB^

Control dependences BB1: BB2: BB3: BB4: BB5: BB6: BB7: BB

Notation positive BB number = fallthru direction^ negative BB number = taken direction BB

  • 15 -

Algorithm for Control Dependence Analysis^ for each

basic block x in region for each^ outgoing control flow edge e of x y = destination basic block of e if (y not in pdom(x)) then lub = ipdom(x) if^ (e corresponds to a taken branch) then^ x_id = -x.id else^ x_id = x.id endif t = y while^ (t != lub) do^ cd(t) += x_id;^ t = ipdom(t) endwhile endif endfor endfor

Notes Compute cd(x) which contains those BBs which x is control dependent on^ Iterate on per edge basis, adding^ edge to each cd set it is a member of

  • 16 -

Running Example – Post Dominators^ Entry

BB^

pdom^

ipdom

BB1:^

1, 9, ex

BB2:^

2, 7, 8, 9, ex

BB3:^

3, 9, ex

BB4:^

4, 7, 8, 9, ex

BB5:^

5, 7, 8, 9, ex

BB6:^

6, 7, 8, 9, ex

BB7:^

7, 8, 9, ex

BB8:^

8, 9, ex

BB9:^

9, ex^

ex

b < 0^

b >= 0 BB2c > 0^ c <= 0 b <= 13

c <= 25^

c > 25 d++

e++ BB3 (^) c++

BB4 b > 13 BB^

BB

b++

BB

a++^ BB

e < 34^

BB

Exit

  • 18 -

Running Example – CDs Via Algorithm (2)

x = 3e = taken edge 3

Æ^8

y = 8y not in pdom(x)lub = 9x_id = -3t = 88 != 9cd(8) += -3t = 99 == 9 3 Æ^ 8 edge (aka -3)

Entry^

BB1 b < 0 b >= 0 BB2 c <= 0 c > 0

b <= 13

c <= 25^

c > 25 d++

e++ BB3 (^) c++

BB4 b > 13 BB^

BB

b++

BB^

Class Problem: 1

Æ^ 3 edge (aka 1)

a++^ BB

e < 34^

BB

Exit

  • 19 -

Running Example – CDs Via Algorithm (3)^ Entry

BB

Control deps (left is taken) BB1: none BB2: -1 BB3: 1 BB4: -2 BB5: -4 BB6: 2, 4 BB7: -1 BB8: -1, -3 BB9: none

b < 0^

b >= 0 BB2c > 0^ c <= 0 b <= 13

c <= 25^

c > 25 d++

e++ BB3 (^) c++

BB4 b > 13 BB^

BB

b++

BB

a++^ BB

e < 34^

BB

Exit