




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science at the University of Illinois at Urbana-Champaign in 2005. The thesis presents a new class of techniques named 'Macroscopic Data Structure Analyses and Optimizations' for analyzing and optimizing pointer-intensive programs. The approach identifies, analyzes, and transforms entire memory structures as a unit, giving control over the layout of the data structure in memory to the compiler. The thesis describes several performance-improving optimizations for pointer-intensive programs based on the foundation techniques.
Typology: Thesis
1 / 114
This page cannot be seen from the preview
Don't miss anything!





























































































(^) Once the program is pool allocated, several
techniques, and evaluates the net performance impact of the transformations. Finally, it describesThis thesis describes the approach, analysis, and transformation of programs with macroscopicbandwidth for pointer-intensive programs.size of pointers on 64-bit targets to 32-bits or less, increasing effective cache capacity and memorySecond, we describe an aggressive technique, Automatic Pointer Compression, which reduces thepool-specific optimizations can be performed to reduce inter-object padding and pool overhead. iii
v To Tanya, for her unwavering love and support.
(^) Among other things, Chris
on his web page (http://nondot.org/sabre/of Illinois. For the curious, Chris’s work history, publications and many other things are availablethinks that author biographies should not be compulsory for Ph.D. dissertations at the University 213 ).
geneous abstractions.^ In^ Proceedings of the ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI)
, pages 25–34, New York, NY, USA, 2004. [145] Curtis Yarvin, Richard Bukowski, and Thomas Anderson.
Anonymous RPC: Low-latency protection in a 64-bit address space. In^ USENIX Summer
, pages 175–186, 1993. [146] Youtao Zhang and Rajiv Gupta. Data compression transformations for dynamically allocateddata structures.^ In^ Proceedings of the International Conference on Compiler Construction(CC), Apr 2002.[147] Craig B. Zilles. Benchmark health considered harmful.
ACM SIGARCH Computer Architec- ture News, 29(3):4–5, 2001.^212
She helped me make it through the occasionally grueling all nighters and other challenging parts of these last fiveyears, selflessly supporting me even when under pressures from her own research work and job. Inaddition to support of my research, she continues to enrich my life as a whole.I have deeply enjoyed my interactions and friendships with the members of the LLVM researchgroup as well as the open-source community we have built around LLVM. Both have providedimportant insights, hard problems, and a desire to make LLVM as stable, robust, and extensibleas possible. Special thanks go to Misha Brukman for reading (and rereading) many of my paperswith his particularly critical eye for incorrect-hyphenation and mispellings.I would like to thank the UIUC Classical Fencing Club as a whole, and John Mainzer and LudaYafremava in particular, for absorbing many of the frustrations and craziness accumulated overthe course of this work.^ They provided an important outlet and taught me physical awareness,flexibility, and dexterity that I did not think was possible. They are also responsible for keepingrandom workplace violence to an tolerable level, for which my colleagues are undoubtedly thankful!Finally, I would like to thank Steven Vegdahl, who encouraged me to pursue graduate studiesand whose infectious love of compilers started me on this path in the first place.
vi
[126] Amitabh Srivastava and David Wall. A practical system for intermodule code optimizationat link-time.^ Journal of Programming Languages
, 1(1):1–18, Dec. 1992. [127] T.B. Steel.^ Uncol: The myth and the fact.
Annual Review in Automated Programming 2
1961.[128] Bjarne Steensgaard.^ Points-to analysis by type inference of programs with structures andunions.^ In^ Proceedings of the International Conference on Compiler Construction (CC)
pages 136–150, London, UK, 1996.[129] Bjarne Steensgaard.^ Points-to analysis in almost linear time.
In^ Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages
, pages 32–41, Jan 1996.[130] Phil Stocks, Barbara G. Ryder, William Landi, and Sean Zhang. Comparing flow and contextsensitivity on the modification-side-effects problem.
In^ Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis
, pages 21–31, 1998. [131] Masamichi Takagi and Kei Hiraki.^ Field array compression in data caches for dynamicallyallocated recursive data structure.^ In
Proceedings of 5th International Symposium on HighPerformance Computing (ISHPC’03), pages 127–145, October 2003. [132] Madhusudhan Talluri, Shing I. Kong, Mark D. Hill, and David A. Patterson.
Tradeoffs in supporting two page sizes.^ In^ Proceedings of the International Conference on ComputerArchitecture (ISCA), pages 415–424, 1992.[133] Mads Tofte and Lars Birkedal.^ A region inference algorithm.
ACM Transactions on Pro- gramming Languages and Systems (TOPLAS)
, 20(4):724–768, July 1998. [134] Mads Tofte and Jean-Pierre Talpin.^
Implementation of the typed call-by-value^
λ-calculus using a stack of regions.^ In^ Proceedings of the ACM SIGACT-SIGPLAN Symposium onPrinciples of Programming Languages, pages 188–201, 1994.[135] Mads Tofte and Jean-Pierre Talpin.^ Region-based memory management.
Information and Computation, pages 132(2):109–176, February 1997.^210
3.2.2^ Local Analysis Phase................................
3.2.3^ Bottom-Up Analysis Phase^............................
3.2.4^ Top-Down Analysis Phase.............................
3.2.5^ Complexity Analysis................................
3.2.6^ Bounding Graph Size^...............................
3.3^ Engineering an Efficient Pointer Analysis
3.3.1^ The Globals Graph.................................
3.3.2^ Efficient Graph Inlining^..............................
3.3.3^ Partitioning^ Efor Efficient Global Variable Iteration............ .V^
3.3.4^ Shrinking^ Ewith Global Value Equivalence Classes............. .V^
(^2) 3.3.5 Avoiding N Inlining for Function Pointers
3.3.6^ Merge Call Nodes for External Functions
3.3.7^ Direct Call Nodes^.................................
3.4^ Experimental Results....................................
3.4.1^ Benchmark Suite and Simple Measurements...................
3.4.2^ Analysis Time & Memory Consumption
3.4.3^ Inferred Type Information.............................
3.5^ Related Work........................................
3.5.1^ Shape Analyses...................................
3.5.2^ Cloning-based Context-Sensitive Analyses....................
3.5.3^ Non-cloning Context Sensitive Analyses
3.6^ Data Structure Analysis: Summary of Contributions
Chapter 4^ Using Data Structure Analysis for Alias and IP Mod/Ref Analysis
4.1^ Alias Analysis and Mod/Ref Information
4.1.1^ Alias Analysis Assumptions and Applications..................
4.1.2^ Mod/Ref Analysis Assumptions and Applications................
4.2^ Implementing Alias and Mod/Ref Analysis with DSA:
ds-aa^.............^82 4.2.1^ Computing Alias Analysis Responses.......................
4.2.2^ Computing Mod/Ref Responses..........................
4.3^ Alias Analysis Implementations for Comparison.....................
4.3.1^ local^ Alias Analysis................................
4.3.2^ steens-fi^ Alias Analysis^.............................
4.3.3^ steens-fs^ Alias Analysis^.............................
4.3.4^ anders^ Alias Analysis^...............................
4.4^ Analysis Precision with a Synthetic Client........................
4.4.1^ Alias Precision^...................................
4.4.2^ Mod/Ref Precision.................................
4.5^ Analysis Precision with Scalar Loop Optimizations...................
4.5.1^ Number of Transformations Performed......................
4.5.2^ Alias and Mod/Ref Queries^............................ 1034.6 Observations and Conclusions............................... 104Chapter 5^ Automatic Pool Allocation^.............................. 1085.1 The Transformation Overview and Example....................... 1105.1.1^ Pool Allocator Runtime Library^......................... 1115.1.2^ Overview Using an Example............................ 113viii
5.2 (^) The Core Pool Allocation Transformation 5.2.3 (^) Passing Descriptors for Indirect Function Calls (^)................. 1185.2.2 (^) The Simple Transformation (No Indirect Calls)................. 1155.2.1 (^) Analysis: Finding Pool Descriptors for each H Node (^).............. 114 (^)........................ 114 5.4.2 (^) poolcreate/pooldestroy Placement5.4.1 (^) Argument Passing for Global Pools........................ 1215.4^ Simple Pool Allocation Refinements (^)........................... 1215.3^ Algorithmic Complexity (^).................................. 120
(^)........................ 121 5.5.1 (^) Methodology and Benchmarks 5.5^ Experimental Results.................................... 123 5.5.3 (^) Pool Allocation Compile Time (^).......................... 1255.5.2 (^) Pool Allocation Statistics (^)............................. 124.......................... 123 5.7 (^) Research Contributions of Automatic Pool Allocation5.6 (^) Related Work........................................ 126 (^)................. 129
Chapter 6 (^) Optimizing Pool Allocated Code 6.4 (^) Research Contributions of Pool Allocation Optimizations (^)............... 1566.3.7 (^) Access Pattern and Locality Changes (^)...................... 1496.3.6 (^) Cache and TLB Impact of Pool Allocation.................... 1486.3.5 (^) Performance Contribution of Individual Pool Optimizations.......... 1456.3.4 (^) Aggregate Performance Effect of Pool Allocation & Optimizations (^)...... 1446.3.3 (^) Performance Baseline, Allocator Influence, and Overhead (^)........... 1436.3.2 (^) Number of Pool Optimization Opportunities (^).................. 1426.3.1 (^) Implementation and Evaluation Framework (^)................... 1426.3 (^) Pool Allocation and Optimization Performance Results................. 1416.2.3 (^) Experiences with Node Collocation........................ 1396.2.2 (^) Node Collocation Heuristics............................ 1386.2.1 (^) Algorithm Extensions to Support Collocation.................. 1386.2 (^) Collocation of DS Nodes into Shared Pools (^)....................... 1376.1.5 (^) Tail Padding Optimization (^)............................ 1376.1.4 (^) Avoiding Alignment Padding: AlignOpt..................... 1366.1.3 (^) Avoid Object Header Overhead: Bump-Pointer................. 1346.1.2 (^) poolfree (^) Elimination: PoolFreeElim....................... 1336.1.1 (^) Avoiding Pool Allocation for Singleton Objects: SelectivePA (^)......... 1326.1 (^) Pool Optimizations..................................... 132 (^)........................... 131 7.1.4 (^) Minimizing Pool Size Violations with Static Compression7.1.3 (^) Interprocedural Pointer Compression....................... 1657.1.2 (^) Intraprocedural Pointer Compression....................... 1627.1.1 (^) Pointer Compression Runtime Library...................... 1617.1^ Static Pointer Compression (^)................................ 159Chapter 7 (^) Transparent Pointer Compression........................... 157
(^)........... 166
7.2.2 (^) Dynamic Compression Runtime Library7.2.1 (^) Intraprocedural Dynamic Compression...................... 1677.2^ Dynamic Pointer Compression (^).............................. 167
(^)..................... 169 ix 7.3^ Optimizing Pointer Compressed Code^.......................... 1727.2.3^ Interprocedural Dynamic Compression...................... 171 [117] Mooly Sagiv, Thomas Reps, and Reinhard Wilhelm. (^) Solving shape-analysis problems in
[119] Matthew L. Seidl and Benjamin G. Zorn. [118] Robert Sedgewick. (^) Algorithms. Addison-Wesley, Inc., Reading, MA, 1988.Systems (TOPLAS), 20(1), January 1998.languages with destructive updating. (^) ACM Transactions on Programming Languages and Segregating heap objects by reference behavior
Programming Languages and Operating Systems (ASPLOS)and lifetime. (^) In (^) Proceedings of the International Conference on Architectural Support for , pages 12–23, San Jose, USA,
[120] Lui Sha. Dependable system upgrades. In (^) Proceedings of IEEE Real Time System Symposium1998. ,
Boneh. On the effectiveness of address-space randomization. In[121] Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan1998. (^) Proceedings ACM Conference
on Computer and Communications Security (CCS ’04) , pages 298–307, 2004.
heap safety properties with applications to compile-time memory management. In[122] Ran Shaham, Eran Yahav, Elliot K. Kolodner, and Mooly Sagiv. Establishing local temporal (^) Proceedings
of the International Symposium on Static Analysis (SAS) , San Diego, USA, June 2003.
[123] Zhong Shao, Christopher League, and Stefan Monnier. (^) Implementing Typed Intermediate
[124] Anand Shukla. (^) Lightweight, cross-procedure tracing for runtime optimization.Programming, pages 313–323, 1998.Languages. (^) In (^) Proceedings of the ACM SIGPLAN International Conference on Functional (^) Master’s
[125] James E. Smith, Timothy Heil, Subramanya Sastry, and Todd Bezenek.IL, Aug 2003.thesis, Computer Science Deptartment, University of Illinois at Urbana-Champaign, Urbana, (^) Achieving high
performance via co-designed virtual machines. In (^) Proceedings of the International Workshop 209 on Innovative Architecture (IWIA), 1999.
2.2 (^) LLVM code for complex memory addressing....................... (^18)
2.3 (^) C++ exception handling example............................. (^20)
2.4 (^) LLVM code for the C++ example. The handler code specified by (^) invoke (^) executes
the destructor. (^)....................................... (^21)
2.5 (^) LLVM uses a runtime library for C++ exceptions support but exposes control-flow. (^21)
2.6 (^) LLVM system architecture diagram............................ (^23)
2.7 (^) Executable sizes for LLVM, X86, Sparc (in KB)..................... (^32)
2.8 (^) Interprocedural optimization timings (in seconds).................... (^33)
3.1 (^) C code for running example (^)................................. (^44)
3.2 (^) Graph Notation (^)....................................... (^45)
3.3 (^) Local DSGraphs for (^) do (^) all (^) and (^) addG (^)............................ (^46)
3.4 (^) Primitive operations used in the algorithm
(^)....................... 50
3.5 (^) The LocalAnalysis function (^)................................ (^52)
3.6 (^) makeNode and updateType operations.......................... (^53)
3.7 (^) Construction of the BU DS graph for (^) addGToList
(^)................... (^55)
3.8 (^) Bottom-Up Closure Algorithm (^).............................. (^57)
3.9 (^) Handling recursion due to an indirect call in the Bottom-Up phase
(^).......... (^58)
3.10 Finished BU graph for (^) main (^)................................ (^59)
3.11 C Source, DSGraph, and LLVM code for Global Value Equivalence Class Example
(^). (^65)
3.12 Benchmark Suite and Basic DSA Measurements..................... (^69)
3.13 Scaling of Analysis Time with Program Size (Number of Memory Operations).... (^70)
3.14 Scaling of Analysis Space with Program Size (Number of Memory Operations)
(^)... (^71)
3.15 DSA Analysis Time and Space Consumption Data
(^)................... (^72)
4.1 (^) Results of Example Pointer Analysis Clients...................... .3.16 Number of Load & Store instructions which access non-collapsed, complete, DS Nodes 73 (^81)
4.2 (^) Example clients of mod/ref results (^)............................ (^82)
4.3 (^) Percent of AA-EVAL Alias Queries Returned “May Alias”............... (^91)
4.4 (^) AA-EVAL Mod/Ref Query Responses of “May Mod or Ref”.............. (^91)
4.5 (^) AA-EVAL Mod/Ref Query Responses of “No Mod or Ref”
(^).............. (^91)
4.6 (^) AA-EVAL Mod/Ref Query Responses of “May Only Ref”............... (^92)
4.7 (^) AA-EVAL Mod/Ref Query Responses of “May Mod Only”
(^).............. (^92)
4.8 (^) AA-EVAL Mod/Ref Query Responses for ds-aa..................... (^92)
4.9 (^) Scalar Loop Optimization Transformations
(^)....................... 97
4.10 Number of Memory Locations Promoted To Registers
(^)................. (^99)
4.11 Number of Loads Hoisted or Sunk (^)............................ 100 xi [100] Todd Mowry, Monica S. Lam, and Anoop Gupta. (^) Design and evaluation of a compiler
Support for Programming Languages and Operating Systems (ASPLOS)algorithm for prefetching. (^) In (^) Proceedings of the International Conference on Architectural , pages 62–73, Boston,
[101] Robert M. Muth. (^) Alto: A Platform for Object Code ModificationUSA, October 1992.
. Ph.d. Thesis, Department
sensitive summary-based pointer analysis. In[102] Erik M. Nystrom, Hong-Seok Kim, and Wen mei W. Hwu. Bottom-up and top-down context-of Computer Science, University of Arizona, 1999. (^) Proceedings of the International Symposium on
Program Analysis for Software Tools and Engineering (PASTE)in pointer analysis. (^) In (^) Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on[103] Erik M. Nystrom, Hong-Seok Kim, and Wen mei W. Hwu. Importance of heap specializationStatic Analysis (SAS), 2004. , pages 43–48, New York,
Source Code Analysis and Manipulation (SCAM)propagation for pointer analysis. (^) In (^) Proceedings of the International IEEE Workshop on[105] David J. Pearce, Paul H. J. Kelly, and Chris Hankin. Online cycle detection and difference2004.Experimental Algorithms (WEA 2004), Lecture Notes in Computer Science. Springer-Verlag,rected acyclic graphs. (^) In (^) Proceedings of the 3rd International Workshop on Efficient and[104] David J. Pearce and Paul H. J. Kelly. (^) A dynamic algorithm for topologically sorting di-NY, USA, 2004. , 2003.
embedded memory systems. (^) Transactions on Embedded Computing Systems[106] Rodric M. Rabbah and Krishna V. Palem. Data remapping for design space optimization of , 2(2):186–218,
[107] Chrislain Razafimahefa. (^) A study of side-effect analyses for java.2003. (^) Master’s thesis, McGill 207 [108] Martin C. Rinard and Pedro C. Diniz.^ Commutativity analysis: a new analysis techniqueUniversity, Dec 1999.
[90] Chris Lattner and Vikram Adve. Transparent Pointer Compression for Linked Data Struc-tures. In^ Proceedings of the ACM Workshop on Memory System Performance
, Chicago, IL, Jun 2005.[91] Donglin Liang and Mary Jean Harrold. Efficient points-to analysis for whole-program analysis.In^ Proceedings of the European Software Engineering Conference (ESEC)
, pages 199–215, 1999.[92] Donglin Liang and Mary Jean Harrold. Efficient computation of parameterized pointer infor-mation for interprocedural analysis. In^ Proceedings of the International Symposium on StaticAnalysis (SAS), July 2001.[93] Tim Lindholm and Frank Yellin.^ The Java Virtual Machine Specification
. Addison-Wesley, Reading, MA, 1997.[94] Chi-Keung Luk and Todd C. Mowry.^ Automatic compiler-inserted prefetching for pointer-based applications.^ IEEE Transactions on Computers
[95] Erik Meijer and John Gough. A technical overview of the Commmon Language Infrastructure,2002.^ http://research.microsoft.com/~
emeijer/ Papers/CLR.pdf. [96] Microsoft Corp. Managed extensions for C++ specification. .NET Framework Compiler andLanguage Reference.[97] Ana Milanova, Atanas Rountev, and Barbara Ryder.
Parameterized object sensitivity for points-to and side-effect analyses for java. In
Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis
, pages 1–11, 2002. [98] Jeffrey C. Mogul, Joel F. Bartlett, Robert N. Mayo, and Amitabh Srivastava. Performanceimplications of multiple pointer sizes. In
USENIX Winter, pages 187–200, 1995. [99] Greg Morrisett, David Walker, Karl Crary, and Neal Glew.
From System F to typed as- sembly language.^ ACM Transactions on Programming Languages and Systems (TOPLAS)
21(3):528–569, May 1999.^206
4.12 Number of Instructions Hoisted or Sunk
4.13 Percent of LICM Alias Queries Returned “May Alias”
4.14 Percent of LICM Mod/Ref Query Responses Returned “Mod and Ref”........ 1024.15 DSA LICM Mod/Ref Query Responses Breakdown................... 1025.1^ Interface to the Pool Allocator Runtime Library
5.2^ Example illustrating the Pool Allocation Transformation................ 1125.3^ BU DSGraphs for functions in Figure 5.2 (a)
5.4^ Pseudo code for basic algorithm.............................. 1155.5^ Pool Allocation Example with Function Pointers
5.6^ Pseudo code for complete pool allocator
5.7^ After moving^ pooldestroy(&PD1)^ earlier
6.1^ Figure 5.2 after eliminating^ poolfree^
calls^....................... 133 6.2^ After eliminating^ poolfree^ calls and dead loops
6.3^ Standard Pool of 16-byte Objects with Default 8-Byte Alignment........... 1346.4^ Bump-Pointer Pool of 16-byte Objects with Default 8-Byte Alignment
6.5^ Normal Pool of 16-byte Objects with Reduced 4-Byte Alignment
6.6^ Example Structure with Tail Padding^
6.7^ Linked List of Doubles without Node Collocation.................... 1406.8^ Linked List of Doubles with Perfect Node Collocation
6.9^ Statistics for Pool Optimizations^............................. 1426.10 Aggregate execution time ratios (Left 1.0 = NoPA, Right 1.0 = BasePA)....... 1456.11 Pool Optimization Contributions (1.0 = No Pool Allocation)
6.12 Pool Optimization Contributions (1.0 = PA with all PoolOpts)
6.13 L1/L2/TLB Cache Miss Ratios^.............................. 1486.14 chomp Access Pattern with Standard malloc/free.................... 1516.15 chomp Access Pattern with Pool Allocation
6.16 ft Access Pattern with Standard malloc/free....................... 1536.17 ft Access Pattern with Pool Allocation.......................... 1537.1^ Linked List of 4-byte characters.............................. 1577.2^ Pool Allocated Linked List^................................ 1587.3^ Pointer Compressed Linked List.............................. 1587.4^ Simple linked list example................................. 1597.5^ Example after static compression............................. 1607.6^ Pool Compression Runtime Library............................ 1617.7^ Pseudo code for pointer compression^........................... 1627.8^ Example with TH and non-TH nodes........................... 1637.9^ Rewrite rules for pointer compression........................... 1647.10 Interprocedural rewrite rules.^............................... 1657.11 Example after dynamic compression^........................... 1687.12 Dynamic pointer compression rules............................ 1697.13 Dynamic expansion example................................ 1707.14 Rewrite rules for non-compressed pools.......................... 1717.15^ MakeList^ pc32^ after optimization^............................ 1737.16 Pointer Compression Benchmark Results......................... 1757.17 llubenchmark: time to process one node vs problem size
................ 176xii
[70] Gernot Heiser, Kevin Elphinstone, Jerry Vochteloo, Stephen Russell, and Jochen Liedtke.The Mungi single-address-space operating system.
Proceedings of Software–Practice and Ex- perience, 28(9):901–928, 1998.[71] Laurie J. Hendren and Alexandru Nicolau. Parallelizing programs with recursive data struc-tures.^ IEEE Transactions on Parallel and Distributed System
, pages 35–47, 1990. [72] Michael Hind. Which Pointer analysis Should I Use? In
Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis
[73] Michael Hind.^ Pointer analysis:^ Haven’t we solved this problem yet?
In^ Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools andEngineering (PASTE), pages 54–61, 2001.[74] Martin Hirzel, Amer Diwan, and Matthew Hertz. Connectivity-based garbage collection. In Proceedings of the ACM SIGPLAN conference on Object-Oriented Programming, Systems,Languages, and Applications (OOPSLA), pages 359–373, 2003.[75] Xianglong Huang, Stephen Blackburn, Kathryn McKinley, Eliot Moss, Zhenlin Wang, andPerry Cheng. The garbage collection advantage: improving program locality. In
Proceedings of the ACM SIGPLAN conference on Object-Oriented Programming, Systems, Languages,and Applications (OOPSLA), pages 69–80, 2004.[76] IBM Corp. XL FORTRAN: Eight Ways to Boost Performance. White Paper, 2000.[77] Bertrand Jeannet, Alexey Loginov, Thomas Reps, and Mooly Sagiv. A relational approachto interprocedural shape analysis. In^ Proceedings of the International Symposium on StaticAnalysis (SAS), Verona, Italy, August 2004.[78] Trevor Jim, Greg Morrisett, Dan Grossman, Michael Hicks, James Cheney, and YanlingWang.^ Cyclone: A safe dialect of C.^ In^ USENIX Annual Technical Conference
, Monterey, CA, 2002.[79] Richard Jones.^ Garbage Collection. Algorithms for Automatic Dynamic Memory Manage-ment. John Wiley & Sons, 1999.^204
all instances^ of a particular type or none of them (for example, a field that is unused in one instance of a data structure, but notanother, cannot be removed from either).Most importantly, neither of these approaches is able to attack the root cause of the problem: thecompiler cannot analyze or control the layout of objects on the heap. In particular, the reason that
ficult for several reasons. (^) First, interprocedural analysis isAggressively optimizing programs that heavily use recursive data structures is inherently dif-performance of dense arrays cannot be applied to nodes in a recursive data structure.connected to the layout of the objects on the heap, standard techniques for improving the cacheof the program. (^) Because the access patterns of these recursive data structures are not directlythe heap with little correlation between the layout of the nodes and the access/traversal patternrecursive data structures exhibit poor locality is that their nodes are often distributed throughout (^) required (^) for any real-world program:
these libraries may be used in different ways in different portions of the program.software design techniques encourage the use of modular and reusable data structure libraries, andlist of lists). Second, (^) extremely aggressive (^) forms of interprocedural analysis are required: modernoften passed throughout the program, and often used to build larger aggregate structures (e.g. arecursive data structures are often created, traversed, and destroyed with recursive functions, are (^) Ideally, we
points-to analyses are insufficient for these programs).the instances of that type are processed and created with common functions (traditional scalablewould like to be able to optimize individual instances of a particular data structure, even if all of (^) Third, compilers for statically compiled
Throughout this work, we use the term “data structure” to mean andisk or across a network).pointers or rely on the precise layout of data in memory (e.g., programs that copy structures todesigned to optimize unsafe languages (like C or C++) must correctly handle programs that castthe information and control it has over the runtime layout of a data structure. Finally, compilerslanguages generally do not have control over the memory management runtime, greatly limiting (^) instance (^) of a heap allocated
list, a binary tree, etc. (^) Unfortunately, shape analyses cannot handle non-type-safe programs,of data structures in the program as various high-level types, such as a singly- or doubly-linkedanalyses of programs that use data structures. Shape analysis is able to provide strong classificationPrior to our work, shape analysis was the only extant approach for performing macroscopicthat are independent of the high level conceptual type (e.g. node layout properties).high-level conceptual type (e.g. a binary tree or a linked list), instead, we focus on the properties‘node’ objects). This work is not concerned with classification of a data structure instance as somerecursive data structure potentially formed with multiple node types (e.g. a graph with ‘edge’ and 2 [61] Rakesh Ghiya and Laurie J. Hendren. the ACM SIGACT-SIGPLAN Symposium on Principles of Programming LanguagesPutting pointer analysis to work. (^) In (^) Proceedings of , pages
other memory disambiguation methods for C programs. In[62] Rakesh Ghiya, Daniel Lavery, and David Sehr. On the importance of points-to analysis and121–133, New York, NY, USA, 1998. (^) Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI) , 2001.
eney. Region-based memory management in cyclone. In[63] Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Ch- (^) Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and Implementation (PLDI) , June 2002.
[65] Brian Hackett and Radu Rugina. (^) Region-based shape analysis with tracked locations. SP&E, 23(8):851–869, 1993.[64] Dirk Grunwald and Benjamin Zorn. Customalloc: Efficient synthesized memory allocators. (^) In
Proceedings of Software–Practice and Experience[68] David R. Hanson. Fast Allocation and Deallocation of Memory Based on Object Lifetimes.(SIGMOD), pages 1–12, 2000.tion. In (^) Proceedings of the ACM SIGMOD International Conference on Management of Data[67] Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate genera-Design and Implementation (PLDI), Berlin, Germany, June 2002.collection. (^) In (^) Proceedings of the ACM SIGPLAN Conference on Programming Language[66] Niels Hallenberg, Martin Elsman, and Mads Tofte. Combining region inference and garbageguages, pages 310–323, New York, NY, USA, 2005. Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Lan- , 20(1):5–12, Jan 1990.
Language Design and Implementation (PLDI)memory leak detector. (^) In (^) Proceedings of the ACM SIGPLAN Conference on Programming[69] David L. Heine and Monica S. Lam. A practical flow-sensitive and context-sensitive c and c++ (^203) , pages 168–181, 2003.
niques: (^) Data Structure Analysis and Automatic Pool Allocation. Our implementation of macroscopic algorithms are built on a foundation consisting of two tech-1.1 (^) Foundations of the Macroscopic Approach (^) Based on this analysis and
(^) It is used to
of code size (past 200,000 lines of code), never taking more than 3.2s on these codes.and is fast and scalable in our experiments on programs spanning 4-5 orders of magnitudelyzing large programs, despite its aggressive analysis. We show that DSA uses little memory(i) New techniques used to achieve its speed, scalability, and low memory footprint when ana-The primary research contributions of DSA are:analysis of incomplete programs and libraries.thesis. DSA supports the full generality of C/C++ programs and provides conservatively correctcall graph analyses, in addition to supporting the macroscopic analyses described throughout thisa superset of the clients supported by most flow-insensitive interprocedural alias, mod-ref, andstructures, even if they are created and processed by common helper functions. DSA can supportand stack objects (it is “fully” context sensitive), allowing it to identify disjoint instances of dataDSA is an aggressive interprocedural analysis which uses full acyclic call paths to name heaptype-safe).and capture important properties of these structures (such as whether accesses to the objects areidentify the connectivity of memory objects in a program, identify instances of data structures, (^) We
4 (ii) Use of a novel extension to Tarjan’s Strongly Connected Component (SCC) finding algorithmless than 6% in our experiments.compiler (GCC). Further, the fraction of compile time used by DSA is quite small: alwaysprograms in a fraction of the time taken to compile the program with a standard optimizingand 3.4.2.^ DSA is the first fully context sensitive algorithm we are aware of that analyzesdescribe why we believe that it will continue to scale well to larger programs in Sections 3.2. speculation, recovery and adaptive retranslation to address real-life challenges. In (^) Proceedings
of the International Symposium on Code Generation and Optimization (CGO) , San Francisco,
Combining generational and conservative garbage collection:[45] Alan Demers, Mark Weiser, Barry Hayes, Hans Boehm, Daniel Bobrow, and Scott Shenker.Implementation (PLDI), Snowbird, UT, June 2001.In (^) Proceedings of the ACM SIGPLAN Conference on Programming Language Design and[44] Robert DeLine and Manuel F¨ahndrich. Enforcing high-level protocols in low-level software.CA, Mar 2003. (^) framework and implementa-
[46] Alain Deutsch. Interprocedural may-alias analysis for pointers: Beyond k-limiting. Inming Languages, pages 261–269, 1990.tions. In (^) Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Program- (^) Pro-
Language, Compiler, and Tool Support for Embedded Systems (LCTES)runtime checks or garbage collection. In (^) Proceedings of the ACM SIGPLAN Conference on[48] Dinakar Dhurjati, Sumant Kowshik, Vikram Adve, and Chris Lattner. Memory safety withoutLanguages, pages 297–302, Jan 1984.In (^) Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming[47] L. Peter Deutsch and Allan M. Schiffman. Efficient implementation of the smalltalk-80 system.mentation (PLDI), pages 230–241, June 1994.ceedings of the ACM SIGPLAN Conference on Programming Language Design and Imple- , San Diego, Jun 2003.
garbage collection for embedded applications.[49] Dinakar Dhurjati, Sumant Kowshik, Vikram Adve, and Chris Lattner. Memory safety without (^) Transactions on Embedded Computing Systems ,
[51] Maryam Emami, Rakesh Ghiya, and Laurie J. Hendren.(ISCA), pages 26–37, 1997.tural compatibility. In (^) Proceedings of the International Conference on Computer Architecture[50] Kemal Ebcioglu and Erik R. Altman. (^) DAISY: Dynamic compilation for 100% architec-4(1):73–111, February 2005. (^) Context-sensitive interprocedural
points-to analysis in the presence of function pointers. In (^201) Proceedings of the ACM SIGPLAN
[35] Keith Cooper and Ken Kennedy.^ Interprocedural side-effect analysis in linear time.
In
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Im-plementation (PLDI), Atlanta, GA, June 1988.[36] Keith D. Cooper.^ Analyzing aliases of reference formal parameters.
In^ Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages
, pages 281– 290, New York, NY, USA, 1985.[37] Keith D. Cooper and John Lu.^ Register promotion in C programs.
In^ Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
pages 308–319, 1997.[38] Francisco Corbera, Rafael Asenjo, and Emilio L. Zapata.
New shape analysis techniques for automatic parallelization of c codes.^ In
Proceedings of the International Conference on Supercomputing (ICS), pages 220–227, 1999.[39] Robert Courts. Improving locality of reference in a garbage-collecting memory managementsystem.^ Proceedings of the Communications of the ACM
[40] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck.Efficiently computing static single assignment form and the control dependence graph.
Transactions on Programming Languages and Systems (TOPLAS)
, pages 13(4):451–490, Oc- tober 1991.[41] Manuvir Das. Unification-based pointer analysis with directional assignments. In
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation(PLDI), pages 35–46, 2000.[42] Manuvir Das, Ben Liblit, Manuel F¨ahndrich, and Jakob Rehof.
Estimating the impact of scalable pointer analysis on optimization. In
Proceedings of the International Symposium on Static Analysis (SAS), pages 260–278. Springer-Verlag, 2001.[43] James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler,Alexander Klaiber, and Jim Mattson.^ The Transmeta Code Morphing Software:
that allows incremental discovery of SCCs in the call graph, even when edges are dynamicallydiscovered and added to the call graph.(iii) Uses of a simple mechanism (fine-grain incompleteness tracking) to solve several hard prob-lems in pointer analysis, including the use of speculative type information, dynamic discoveryof the call graph without iteration, and conservatively correct handling of incomplete pro-grams.DSA is used by all macroscopic techniques and is described in detail in Chapter 3. Using 200
Because
partial control of the dynamic layout of a data structure^ to the compiler.
While prior compiler transformations have provided limited control over layout of heap objects (garbage collector or allocation library heuris-tics [68, 64, 12, 39, 30, 119, 29], for example), none have been able to control the layout of adata structure at the granularity of individual instances of the data structure, and none have beenable to support subsequent aggressive compiler transformations that optimize data structures on aper-instance basis (e.g., the simple ones in Chapter 6 or the aggressive one in Chapter 7). In partic-ular, because all of the nodes of transformed data structures are managed by a compiler-controlledruntime library, compiler transformations can emit code that identifies and manipulates all of the
[17] Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, and Martin Rinard. Owner-ship types for safe region-based memory management in real-time java. In
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
2003.[18] Michael Burke and Linda Torczon.^ Interprocedural optimization:
eliminating unnecessary recompilation.^ ACM Transactions on Programming Languages and Systems (TOPLAS)
15(3):367–399, 1993.[19] Michael G. Burke et al.^ The Jalape˜no Dynamic Optimizing Compiler for Java.
In^ Java Grande, pages 129–141, 1999.[20] Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. Cache-conscious data place-ment. In^ Proceedings of the International Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS)
, pages 139–149, San Jose, USA, 1998. [21] David Callahan, Ken Kennedy, and Allan Porterfield. Software prefetching. In
Proceedings of the International Conference on Architectural Support for Programming Languages andOperating Systems (ASPLOS), pages 40–52, Santa Clara, USA, April 1991.[22] David Chase. Implementation of exception handling.
The Journal of C Language Translation, 5(4):229–240, June 1994.[23] J. Bradley Chen, Anita Borg, and Norman P. Jouppi. A simulation based study of TLB per-formance. In^ Proceedings of the International Conference on Computer Architecture (ISCA)
pages 114–123, 1992.[24] Juan Chen, Dinghao Wu, Andrew W. Appel, and Hai Fang.
A provably sound TAL for back-end optimization. In^ Proceedings of the ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation (PLDI)
, San Diego, CA, Jun 2003. [25] Ben-Chung Cheng and Wen mei Hwu.
Modular interprocedural pointer analysis using ac-cess paths: Design, implementation, and evaluation. In^ Proceedings of the ACM SIGPLANConference on Programming Language Design and Implementation (PLDI), pages 57–69,Vancouver, British Columbia, Canada, June 2000.^198
resources used is quite reasonable for an aggressive optimizing transformation targetting memorysystem performance.The Automatic Pool Allocation algorithm is described in detail in Chapter 5. Portions of thiswork were published in [89]. 1.2^ Applications of Macroscopic Techniques Building on the foundation of DSA and Automatic Pool Allocation, a wide range of new macroscopictechniques are possible. This thesis explores several macroscopic techniques which target improvedperformance, described briefly below. 1.2.1^ Simple Pool Allocation Optimizations The first and most straight-forward application is a collection of simple improvements to the poolallocated code.^ Because the pool allocator has complete control over the pool runtime library,we can expose a richer interface to the compiler than what is provided by the standard C library malloc^ and^ free^ family of functions. In particular, if the compiler can prove that a pool of memoryonly contain nodes that require 4-byte alignment, it can lower the alignment requirement for thepool (which defaults to 8-byte alignment), potentially reducing inter-object padding. Likewise, ifthe compiler can prove the memory is never deallocated from a pool, it can inform the runtimethat it does not need to keep track of any metadata for objects in the pool (reducing allocationtime and eliminating a per-object header word).The key contribution that pool allocation provides is by
partitioning distinct data structures in the heap, so that these decisions can be made on a per-data-structure basis. For example, thisallows some data structures in the program to be fully aligned where necessary, and others to useless alignment when possible. Without pool allocation, even with a mutable runtime library, thesesorts of decisions would have to be made on global (per-program) basis, which would rarely allowany improvement. Chapter 6 describes and evaluates these techniques in more detail, showing thatsimple optimizations like this can provide up to a 40% performance improvement over that alreadyprovided by pool allocation alone.
nodes), without encountering a^32
(^8) Finally, our group is investi- [8] Todd (^) Austin, (^) et (^) al. The (^) Pointer-intensive (^) Benchmark Suite.
www.cs.wisc.edu/~austin/ptr-dist.html , Sept 1995.
Language Design and Implementation (PLDI)optimization system. (^) In (^) Proceedings of the ACM SIGPLAN Conference on Programming[10] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: A transparent dynamicDesign and Implementation (PLDI), Montreal, June 1998.optimization. In (^) Proceedings of the ACM SIGPLAN Conference on Programming Language[9] Andrew Ayers, Stuart de Jong, John Peyton, and Richard Schooler. Scalable cross-module , pages 1–12, June 2000.
ming, Systems, Languages, and Applications (OOPSLA)ory allocation. In (^) Proceedings of the ACM SIGPLAN conference on Object-Oriented Program-[13] Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. Reconsidering custom mem-Design and Implementation (PLDI), pages 187–196, Albuquerque, New Mexixo, June 1993.performance. In (^) Proceedings of the ACM SIGPLAN Conference on Programming Language[12] David A. Barrett and Ben G. Zorn. Using lifetime predictors to improve memory allocationProgramming Languages, pages 29–41, New York, NY, USA, 1979.of variables. (^) In (^) Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of[11] John P. Banning. An efficient way to find the side effects of procedure calls and the aliases , Seattle, Washington, November
[14] Bruno Blanchet. (^) Escape Analysis for Java(TM): Theory and Practice.2002. (^) ACM Transactions
on Programming Languages and Systems (TOPLAS) , 25(6):713–775, Nov 2003.
Zhou. (^) Cilk: (^) An efficient multithreaded runtime system.[15] Robert D. Blumofe, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall, and Yuli (^) In (^) Proceedings of the 5ACMth
SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) , pages
[16] Greg Bollella and James Gosling. (^) The real-time specification for Java.207–216, Santa Barbara, CA, July 1995. (^) IEEE Computer,