Optimization Techniques for Compiler Control Flow - Prof. Scott Mahlke, Study notes of Electrical and Electronics Engineering

Various optimization techniques for compiler control flow, including the use of regions, traces, superblocks, and predicated execution. It includes region types, region formation, tail duplication, and the implementation of predicated execution using hardware mechanisms.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-psh
koofers-user-psh 🇺🇸

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Lecture 4
Regions, Predicated execution
University of Michigan
January 15, 2003
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Optimization Techniques for Compiler Control Flow - Prof. Scott Mahlke and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Lecture 4 Regions, Predicated execution

University of Michigan January 15, 2003

  • 1 -

Reading Material^ Y^

All these on the course webpage Y Useful reference^ »^ “HPL-PD Architecture Specification Version 1.0”, Kathail,Schlansker, Rau, HPL Technical Report, 1993. Y Material for the next lecture^ »^ “On Predicated Execution”, Park and Schlansker, HPL TechnicalReport, 1991.^ »^ “Effective Compiler Support for Predicated Execution using theHyperblock”, Mahlke, Lin, Chen, Hank, Bringmann, MICRO-25,1992.

  • 3 -

Regions^ Y^

Region

: A collection of operations that are treated as a

single unit by the compiler^ »^ Examples

y^ Basic block y^ Procedure y^ Body of a loop » Properties y^ Connected subgraph of operations y^ Control flow is the key parameter that defines regions y^ Hierarchically organized

Y^ Problem^ »

Basic blocks are too small (3-5 operations)^ y^

Hard to extract sufficient parallelism

»^ Procedure control flow too complex for many compiler xforms^ y

Plus only parts of a procedure are important (90/10 rule)

  • 4 -

Regions (2)^ Y^

Want^ »^ Intermediate sized regions with simple control flow^ »^ Bigger basic blocks would be ideal !!^ »^ Separate important code from less important^ »^ Optimize frequently executed code at the expense of the rest Y Solution^ »^ Define new region types that consist of multiple BBs^ »^ Profile information used in the identification^ »^ Sequential control flow (sorta)^ »^ Pretend the regions are basic blocks

  • 6 -

Linearizing a Trace

10 (entry count) BB1 BB2 BB4 BB 80

20 (side exit) 10 (side exit) 80 90

20 (side entrance)10 (side entrance)

BB

90 (entry/ exit count)

BB

10 (exit count)

  • 7 -

Intelligent Trace Layout for Icache Performance

BB1^ BB2 BB4 BB

Intraprocedural code placement Procedure positioning Procedure splitting

trace

BB3 BB

trace 2 trace 3 The rest Procedure view

Trace view

  • 9 -

Trace Selection Algorithm^ i = 0;^ mark all BBs unvisited^ while

(there are unvisited nodes) do seed = unvisited BB with largest execution freq trace[i] += seed mark seed visited current = seed /* Grow trace forward */ while^ (1) do^ next = best_successor_of(current)^ if^ (next == 0) then

break

trace[i] += next mark next visited current = next endwhile /* Grow trace backward analogously */ i++ endwhile

  • 10 -

Best Successor/Predecessor^ Y^

Node weight vs edge weight^ »^ edge more accurate Y THRESHOLD^ »^ controls off-trace probability^ »^ 60-70% found best Y Notes on this algorithm^ »^ BB only allowed in 1 trace^ »^ Cumulative probabilityignored^ »^ Min weight for seed to bechose (ie executed 100 times)

best_successor_of(BB)e = control flow edge with highest

probability leaving BBif (e is a backedge) then return 0endif if (probability(e) <= THRESHOLD) then return 0endif d = destination of eif (d is visited) then return 0endif return d endprocedure

  • 12 -

Class Problem 2

Find the traces. Assume a threshold probability of 60%.

(^100) BB (^60) BB^

BB BB^

BB

BB

BB7 BB (^25 )

(^403515) 50

10 5

135 100 (^75100)

  • 13 -

Traces are Nice, But …^ Y^

Treat trace as a big BB^ »^ Transform trace ignoring sideentrance/exits^ »^ Insert fixup code

y^ aka bookkeeping » Side entrance fixup is morepainful » Sometimes not possible sotransform not allowed

Y^ Solution^ »

Eliminate side entrances » The superblock

is born

BB

BB4 BB

BB

BB

BB

  • 15 -

Tail Duplication^ Y^

To eliminate all side entrancesreplicate the “tail” portion ofthe trace^ »^ Identify first side entrance^ »^ Replicate all BB from thetarget to the bottom^ »^ Redirect all side entrances tothe duplicated BBs^ »^ Copy each BB only once^ »^ Max code expansion = 2x-1where x is the number of BBin the trace^ »^ Adjust profile information

BB

BB4 BB

BB

BB

BB

  • 16 -

Superblock Formation

BB1 80 BB2 BB4 BB

BB4’^20 2.

BB2^80 BB4^10 BB5^ BB

BB

BB

BB

20 BB5’

28 BB6’^ 25.

  • 18 -

Class Problem 3

BB

BB

49

10 10

41

BB

20 BB

100 80

Create the superblocks, trace threshold is 60%

BB1 20

80 BB

51

49 BB

41 BB

450

BB

  • 19 -

Predicated Execution^ Y^

Hardware mechanism that allows operations to beconditionally executed Y Add an additional boolean source operand (predicate)^ »^ ADD r1, r2, r3 if p

y^ if (p1 is True), r1 = r2 + r3 y^ else if (p1 is False), do nothing (Add treated like a NOP) y^ p1 referred to as the guarding predicate y^ Predicated on True means always executed y^ Omitted predicated also means always executed

Y^ Provides compiler with an alternative to using branches toselectively execute operations^ »

If statements in the source » Realize with branches in the assembly code » Could also realize with conditional instructions » Or use a combination of both