Essentials of Compilation, Exams of Compiler Design

In this chapter, we review the basic tools that are needed for implementing a compiler. We use abstract syntax trees (ASTs), which refer to ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

anuradha
anuradha 🇺🇸

4.6

(9)

240 documents

1 / 136

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Essentials of Compilation
An Incremental Approach
Jeremy G. Siek, Ryan R. Newton
Indiana University
with contributions from:
Carl Factora
Andre Kuhlenschmidt
Michael M. Vitousek
Cameron Swords
August 13, 2018
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Essentials of Compilation and more Exams Compiler Design in PDF only on Docsity!

Essentials of Compilation

An Incremental Approach

Jeremy G. Siek, Ryan R. Newton

Indiana University

with contributions from:

Carl Factora

Andre Kuhlenschmidt

Michael M. Vitousek

Cameron Swords

August 13, 2018

ii

iv

Contents

viii CONTENTS

  • 1 Preliminaries
    • 1.1 Abstract Syntax Trees
    • 1.2 Grammars
    • 1.3 S-Expressions
    • 1.4 Pattern Matching
    • 1.5 Recursion
    • 1.6 Interpreters
    • 1.7 Example Compiler: a Partial Evaluator
  • 2 Compiling Integers and Variables
    • 2.1 The R 1 Language
    • 2.2 The x86 Assembly Language
    • 2.3 Planning the trip to x86 via the C 0 language
    • 2.4 Uniquify Variables
    • 2.5 Flatten Expressions
    • 2.6 Select Instructions
    • 2.7 Assign Homes
    • 2.8 Patch Instructions
    • 2.9 Print x86
  • 3 Register Allocation
    • 3.1 Liveness Analysis
    • 3.2 Building the Interference Graph
    • 3.3 Graph Coloring via Sudoku
    • 3.4 Print x86 and Conventions for Registers
    • 3.5 Challenge: Move Biasing∗
  • 4 Booleans, Control Flow, and Type Checking
    • 4.1 The R 2 Language
    • 4.2 Type Checking R 2 Programs vi CONTENTS
    • 4.3 The C 1 Language
    • 4.4 Flatten Expressions
    • 4.5 XOR, Comparisons, and Control Flow in x86
    • 4.6 Select Instructions
    • 4.7 Register Allocation
      • 4.7.1 Liveness Analysis
      • 4.7.2 Build Interference
      • 4.7.3 Assign Homes
    • 4.8 Lower Conditionals (New Pass)
    • 4.9 Patch Instructions
    • 4.10 An Example Translation
    • 4.11 Challenge: Optimizing Conditions∗
  • 5 Tuples and Garbage Collection
    • 5.1 The R 3 Language
    • 5.2 Garbage Collection
      • 5.2.1 Graph Copying via Cheney’s Algorithm
      • 5.2.2 Data Representation
      • 5.2.3 Implementation of the Garbage Collector
    • 5.3 Compiler Passes
      • 5.3.1 Expose Allocation (New)
      • 5.3.2 Flatten and the C 2 intermediate language
      • 5.3.3 Select Instructions
      • 5.3.4 Register Allocation
      • 5.3.5 Print x86
  • 6 Functions
    • 6.1 The R 4 Language
    • 6.2 Functions in x86
    • 6.3 The compilation of functions
    • 6.4 An Example Translation
  • 7 Lexically Scoped Functions
    • 7.1 The R 5 Language
    • 7.2 Interpreting R
    • 7.3 Type Checking R
    • 7.4 Closure Conversion
    • 7.5 An Example Translation
  • 8 Dynamic Typing CONTENTS vii
    • 8.1 The R 6 Language: Typed Racket + Any
    • 8.2 The R 7 Language: Untyped Racket
    • 8.3 Compiling R
    • 8.4 Compiling R 7 to R
  • 9 Gradual Typing
  • 10 Parametric Polymorphism
  • 11 High-level Optimization
  • 12 Appendix
    • 12.1 Interpreters
    • 12.2 Utility Functions
      • 12.2.1 Graphs
      • 12.2.2 Testing
    • 12.3 x86 Instruction Set Quick-Reference
  • 1.1 The syntax of R 0 , a language of integer arithmetic.
  • 1.2 Interpreter for the R 0 language.
  • 1.3 A partial evaluator for R 0 expressions.
  • 2.1 The syntax of R 1 , a language of integers and variables.
  • 2.2 Interpreter for the R 1 language.
  • 2.3 A subset of the x86 assembly language (AT&T syntax).
  • 2.4 An x86 program equivalent to (+ 10 32).
  • 2.5 An x86 program equivalent to (+ 52 (- 10)).
  • 2.6 Memory layout of a frame.
  • 2.7 Abstract syntax for x86 assembly.
  • 2.8 The C 0 intermediate language.
  • 2.9 Skeleton for the uniquify pass.
  • 2.10 Overview of the passes for compiling R
  • 3.1 An example program for register allocation.
  • 3.2 An example program annotated with live-after sets.
  • 3.3 The interference graph of the example program.
  • 3.4 A Sudoku game board and the corresponding colored graph.
  • 3.5 The saturation-based greedy graph coloring algorithm.
  • 3.6 Diagram of the passes for R 1 with register allocation.
  • 4.1 The syntax of R 2 , extending R 1 with Booleans and conditionals.
  • 4.2 Interpreter for the R 2 language.
  • 4.3 Skeleton of a type checker for the R 2 language.
  • 4.4 The C 1 language, extending C 0 with Booleans and conditionals.
  • 4.5 The x86 1 language (extends x86 0 of Figure 2.7).
  • 4.6 Example compilation of an if expression to x86.
  • 4.7 Diagram of the passes for R 2 , a language with conditionals.
  • 4.8 Example program with optimized conditionals.
  • 5.1 Example program that creates tuples and reads from them. x LIST OF FIGURES
  • 5.2 The syntax of R 3 , extending R 2 with tuples.
  • 5.3 Interpreter for the R 3 language.
  • 5.4 Type checker for the R 3 language.
  • 5.5 A copying collector in action.
  • 5.6 Depiction of the Cheney algorithm copying the live tuples.
  • 5.7 Maintaining a root stack to facilitate garbage collection.
  • 5.8 Representation for tuples in the heap.
  • 5.9 The compiler’s interface to the garbage collector.
    • has-type forms. 5.10 Output of the expose-allocation pass, minus all of the
  • 5.11 The C 2 language, extending C 1 with support for tuples.
  • 5.12 Output of flatten for the running example.
  • 5.13 The x86 2 language (extends x86 1 of Figure 4.5).
  • 5.14 Output of the select-instructions pass.
  • 5.15 Output of the print-x86 pass.
  • 5.16 Diagram of the passes for R 3 , a language with tuples.
  • 6.1 Syntax of R 4 , extending R 3 with functions.
  • 6.2 Example of using functions in R
  • 6.3 Interpreter for the R 4 language.
  • 6.4 Memory layout of caller and callee frames.
  • 6.5 The F 1 language, an extension of R 3 (Figure 5.2).
  • 6.6 The C 3 language, extending C 2 with functions.
  • 6.7 The x86 3 language (extends x86 2 of Figure 5.13).
  • 6.8 Example compilation of a simple function to x86.
  • 7.1 Example of a lexically scoped function.
  • 7.2 Syntax of R 5 , extending R 4 with lambda.
  • 7.3 Example closure representation for the lambda’s in Figure 7.1.
  • 7.4 Interpreter for R
  • 7.5 Type checking the lambda’s in R
  • 7.6 Example of closure conversion.
  • 8.1 Syntax of R 6 , extending R 5 with Any.
  • 8.2 Type checker for the R 6 language.
  • 8.3 Interpreter for R
  • 8.4 Syntax of R 7 , an untyped language (a subset of Racket).
  • 8.5 Interpreter for the R 7 language.
  • 8.6 Compiling R 7 to R

Preliminaries

In this chapter, we review the basic tools that are needed for implementing a compiler. We use abstract syntax trees (ASTs), which refer to data struc- tures in the compilers memory, rather than programs as they are stored on disk, in concrete syntax. ASTs can be represented in many different ways, depending on the programming language used to write the compiler. Be- cause this book uses Racket (http://racket-lang.org), a descendant of Scheme, we use S-expressions to represent programs (Section 1.1) and pat- tern matching to inspect individual nodes in an AST (Section 1.4). We use recursion to construct and deconstruct entire ASTs (Section 1.5).

1.1 Abstract Syntax Trees

The primary data structure that is commonly used for representing pro- grams is the abstract syntax tree (AST). When considering some part of a program, a compiler needs to ask what kind of part it is and what sub-parts it has. For example, the program on the left, represented by an S-expression, corresponds to the AST on the right.

(+ ( read ) (- 8))

read -

1.2. GRAMMARS 3

The third rule says that, given an exp node, you can build another exp node by negating it. exp ::= (- exp ) (1.4)

Symbols such as - in typewriter font are terminal symbols and must literally appear in the program for the rule to be applicable. We can apply the rules to build ASTs in the R 0 language. For example, by rule (1.2), 8 is an exp , then by rule (1.4), the following AST is an exp.

(- 8)

The following grammar rule defines addition expressions:

exp ::= (+ exp exp ) (1.6)

Now we can see that the AST (1.1) is an exp in R 0. We know that ( read ) is an exp by rule (1.3) and we have shown that (- 8) is an exp , so we can apply rule (1.6) to show that (+ (read) (- 8)) is an exp in the R 0 language. If you have an AST for which the above rules do not apply, then the AST is not in R 0. For example, the AST (- (read) (+ 8)) is not in R 0 because there are no rules for + with only one argument, nor for - with two arguments. Whenever we define a language with a grammar, we implicitly mean for the language to be the smallest set of programs that are justified by the rules. That is, the language only includes those programs that the rules allow. The last grammar for R 0 states that there is a program node to mark the top of the whole program:

R 0 ::= (program exp )

The read-program function provided in utilities.rkt reads programs in from a file (the sequence of characters in the concrete syntax of Racket) and parses them into the abstract syntax tree. The concrete syntax does not include a program form; that is added by the read-program function as it creates the AST. See the description of read-program in Appendix 12. for more details. It is common to have many rules with the same left-hand side, such as exp in the grammar for R 0 , so there is a vertical bar notation for gathering several rules, as shown in Figure 1.1. Each clause between a vertical bar is called an alternative.

4 1. PRELIMINARIES

exp ::= int | (read) | (- exp ) | (+ exp exp ) R 0 ::= (program exp )

Figure 1.1: The syntax of R 0 , a language of integer arithmetic.

1.3 S-Expressions

Racket, as a descendant of Lisp, has convenient support for creating and manipulating abstract syntax trees with its symbolic expression feature, or S-expression for short. We can create an S-expression simply by writing a backquote followed by the textual representation of the AST. (Technically speaking, this is called a quasiquote in Racket.) For example, an S-expression to represent the AST (1.1) is created by the following Racket expression:

‘(+ (read) (- 8))

To build larger S-expressions one often needs to splice together sev- eral smaller S-expressions. Racket provides the comma operator to splice an S-expression into a larger one. For example, instead of creating the S-expression for AST (1.1) all at once, we could have first created an S- expression for AST (1.5) and then spliced that into the addition S-expression.

(define ast1.4 ‘(- 8)) (define ast1.1 ‘(+ ( read ) ,ast1.4))

In general, the Racket expression that follows the comma (splice) can be any expression that computes an S-expression.

1.4 Pattern Matching

As mentioned above, one of the operations that a compiler needs to perform on an AST is to access the children of a node. Racket provides the match form to access the parts of an S-expression. Consider the following example and the output on the right.

(match ast1. [‘(,op ,child1 ,child2) ( print op) (newline) ( print child1) (newline) ( print child2)])

’+ ’( read ) ’(- 8)

6 1. PRELIMINARIES

(define (R0? sexp) (define ( exp? ex) (match ex [(? fixnum?) #t] [‘( read ) #t] [‘(- ,e) ( exp? e)] [‘(+ ,e1 ,e2) ( and ( exp? e1) ( exp? e2))])) (match sexp [‘(program ,e) ( exp? e)] [else #f]))

(R0? ‘(+ ( read ) (- 8))) (R0? ‘(- ( read ) (+ 8)))

#t #f

Indeed, the structural recursion follows the grammar itself. We can generally expect to write a recursive function to handle each non-terminal in the grammar^1 You may be tempted to write the program like this:

(define (R0? sexp) (match sexp [(? fixnum?) #t] [‘( read ) #t] [‘(- ,e) (R0? e)] [‘(+ ,e1 ,e2) ( and (R0? e1) (R0? e2))] [‘(program ,e) (R0? e)] [else #f]))

Sometimes such a trick will save a few lines of code, especially when it comes to the program wrapper. Yet this style is generally not recommended, be- cause it can get you into trouble. For instance, the above function is subtly wrong: (R0? ‘(program (program 3))) will return true, when it should re- turn false.

1.6 Interpreters

The meaning, or semantics, of a program is typically defined in the spec- ification of the language. For example, the Scheme language is defined in (^1) If you took the How to Design Programs course http://www.ccs.neu.edu/home/ matthias/HtDP2e/, this principle of structuring code according to the data definition is probably quite familiar.

1.6. INTERPRETERS 7

(define (interp-R0 p) (define ( exp ex) (match ex [(? fixnum?) ex] [‘( read ) ( let ([r ( read )]) ( cond [(fixnum? r) r] [else ( error ’interp-R0 "input␣not␣an␣integer" r)]))] [‘(- ,e) (fx- 0 ( exp e))] [‘(+ ,e1 ,e2) (fx+ ( exp e1) ( exp e2))])) (match p [‘(program ,e) ( exp e)]))

Figure 1.2: Interpreter for the R 0 language.

the report by Sperber et al. [2009]. The Racket language is defined in its reference manual [Flatt and PLT, 2014]. In this book we use an interpreter to define the meaning of each language that we consider, following Reynold’s advice in this regard [Reynolds, 1972]. Here we will warm up by writing an interpreter for the R 0 language, which will also serve as a second example of structural recursion. The interp-R0 function is defined in Figure 1.2. The body of the function is a match on the input program p and then a call to the exp helper function, which in turn has one match clause per grammar rule for R 0 expressions. The exp function is naturally recursive: clauses for internal AST nodes make recursive calls on each child node. Note that the recursive cases for negation and addition are a place where we could have made use of the app feature of Racket’s match to apply a function and bind the result. The two recursive cases of interp-R0 would become:

[‘(- ,(app exp v)) (fx- 0 v)] [‘(+ ,(app exp v1) ,(app exp v2)) (fx+ v1 v2)])) Here we use (app exp v) to recursively apply exp to the child node and bind the result value to variable v. The difference between this version and the code in Figure 1.2 is mainly stylistic, although if side effects are involved the order of evaluation can become important. Further, when we write functions with multiple return values, the app form can be convenient for binding the resulting values. Let us consider the result of interpreting some example R 0 programs. The following program simply adds two integers.

1.7. EXAMPLE COMPILER: A PARTIAL EVALUATOR 9

(define (pe-neg r) ( cond [(fixnum? r) (fx- 0 r)] [else ‘(- ,r)]))

(define (pe-add r1 r2) ( cond [( and (fixnum? r1) (fixnum? r2)) (fx+ r1 r2)] [else ‘(+ ,r1 ,r2)]))

(define (pe-arith e) (match e [(? fixnum?) e] [‘( read ) ‘( read )] [‘(- ,(app pe-arith r1)) (pe-neg r1)] [‘(+ ,(app pe-arith r1) ,(app pe-arith r2)) (pe-add r1 r2)]))

Figure 1.3: A partial evaluator for R 0 expressions.

1.7 Example Compiler: a Partial Evaluator

In this section we consider a compiler that translates R 0 programs into R 0 programs that are more efficient, that is, this compiler is an optimizer. Our optimizer will accomplish this by trying to eagerly compute the parts of the program that do not depend on any inputs. For example, given the following program

(+ ( read ) (- (+ 5 3)))

our compiler will translate it into the program

(+ ( read ) -8)

Figure 1.3 gives the code for a simple partial evaluator for the R 0 lan- guage. The output of the partial evaluator is an R 0 program, which we build up using a combination of quasiquotes and commas. (Though no quasiquote is necessary for integers.) In Figure 1.3, the normal structural recursion is captured in the main pe-arith function whereas the code for partially eval- uating negation and addition is factored into two separate helper functions: pe-neg and pe-add. The input to these helper functions is the output of partially evaluating the children nodes. Our code for pe-neg and pe-add implements the simple idea of checking whether the inputs are integers and if they are, to go ahead and perform

10 1. PRELIMINARIES

the arithmetic. Otherwise, we use quasiquote to create an AST node for the appropriate operation (either negation or addition) and use comma to splice in the child nodes. To gain some confidence that the partial evaluator is correct, we can test whether it produces programs that get the same result as the input program. That is, we can test whether it satisfies Diagram (1.7). The following code runs the partial evaluator on several examples and tests the output program. The assert function is defined in Appendix 12.2.

(define (test-pe p) ( assert "testing␣pe-arith" ( equal? (interp-R0 p) (interp-R0 (pe-arith p)))))

(test-pe ‘(+ ( read ) (- (+ 5 3)))) (test-pe ‘(+ 1 (+ ( read ) 1))) (test-pe ‘(- (+ ( read ) (- 5))))

Exercise 1. We challenge the reader to improve on the simple partial eval- uator in Figure 1.3 by replacing the pe-neg and pe-add helper functions with functions that know more about arithmetic. For example, your partial evaluator should translate

(+ 1 (+ ( read ) 1))

into

(+ 2 ( read ))

To accomplish this, we recommend that your partial evaluator produce out- put that takes the form of the residual non-terminal in the following gram- mar. exp ::= (read) | (- (read)) | (+ exp exp ) residual ::= int | (+ int exp ) | exp