




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In this chapter, we review the basic tools that are needed for implementing a compiler. We use abstract syntax trees (ASTs), which refer to ...
Typology: Exams
1 / 136
This page cannot be seen from the preview
Don't miss anything!





























































































ii
iv
viii CONTENTS
In this chapter, we review the basic tools that are needed for implementing a compiler. We use abstract syntax trees (ASTs), which refer to data struc- tures in the compilers memory, rather than programs as they are stored on disk, in concrete syntax. ASTs can be represented in many different ways, depending on the programming language used to write the compiler. Be- cause this book uses Racket (http://racket-lang.org), a descendant of Scheme, we use S-expressions to represent programs (Section 1.1) and pat- tern matching to inspect individual nodes in an AST (Section 1.4). We use recursion to construct and deconstruct entire ASTs (Section 1.5).
1.1 Abstract Syntax Trees
The primary data structure that is commonly used for representing pro- grams is the abstract syntax tree (AST). When considering some part of a program, a compiler needs to ask what kind of part it is and what sub-parts it has. For example, the program on the left, represented by an S-expression, corresponds to the AST on the right.
(+ ( read ) (- 8))
read -
The third rule says that, given an exp node, you can build another exp node by negating it. exp ::= (- exp ) (1.4)
Symbols such as - in typewriter font are terminal symbols and must literally appear in the program for the rule to be applicable. We can apply the rules to build ASTs in the R 0 language. For example, by rule (1.2), 8 is an exp , then by rule (1.4), the following AST is an exp.
(- 8)
The following grammar rule defines addition expressions:
exp ::= (+ exp exp ) (1.6)
Now we can see that the AST (1.1) is an exp in R 0. We know that ( read ) is an exp by rule (1.3) and we have shown that (- 8) is an exp , so we can apply rule (1.6) to show that (+ (read) (- 8)) is an exp in the R 0 language. If you have an AST for which the above rules do not apply, then the AST is not in R 0. For example, the AST (- (read) (+ 8)) is not in R 0 because there are no rules for + with only one argument, nor for - with two arguments. Whenever we define a language with a grammar, we implicitly mean for the language to be the smallest set of programs that are justified by the rules. That is, the language only includes those programs that the rules allow. The last grammar for R 0 states that there is a program node to mark the top of the whole program:
R 0 ::= (program exp )
The read-program function provided in utilities.rkt reads programs in from a file (the sequence of characters in the concrete syntax of Racket) and parses them into the abstract syntax tree. The concrete syntax does not include a program form; that is added by the read-program function as it creates the AST. See the description of read-program in Appendix 12. for more details. It is common to have many rules with the same left-hand side, such as exp in the grammar for R 0 , so there is a vertical bar notation for gathering several rules, as shown in Figure 1.1. Each clause between a vertical bar is called an alternative.
exp ::= int | (read) | (- exp ) | (+ exp exp ) R 0 ::= (program exp )
Figure 1.1: The syntax of R 0 , a language of integer arithmetic.
1.3 S-Expressions
Racket, as a descendant of Lisp, has convenient support for creating and manipulating abstract syntax trees with its symbolic expression feature, or S-expression for short. We can create an S-expression simply by writing a backquote followed by the textual representation of the AST. (Technically speaking, this is called a quasiquote in Racket.) For example, an S-expression to represent the AST (1.1) is created by the following Racket expression:
‘(+ (read) (- 8))
To build larger S-expressions one often needs to splice together sev- eral smaller S-expressions. Racket provides the comma operator to splice an S-expression into a larger one. For example, instead of creating the S-expression for AST (1.1) all at once, we could have first created an S- expression for AST (1.5) and then spliced that into the addition S-expression.
(define ast1.4 ‘(- 8)) (define ast1.1 ‘(+ ( read ) ,ast1.4))
In general, the Racket expression that follows the comma (splice) can be any expression that computes an S-expression.
1.4 Pattern Matching
As mentioned above, one of the operations that a compiler needs to perform on an AST is to access the children of a node. Racket provides the match form to access the parts of an S-expression. Consider the following example and the output on the right.
(match ast1. [‘(,op ,child1 ,child2) ( print op) (newline) ( print child1) (newline) ( print child2)])
’+ ’( read ) ’(- 8)
(define (R0? sexp) (define ( exp? ex) (match ex [(? fixnum?) #t] [‘( read ) #t] [‘(- ,e) ( exp? e)] [‘(+ ,e1 ,e2) ( and ( exp? e1) ( exp? e2))])) (match sexp [‘(program ,e) ( exp? e)] [else #f]))
(R0? ‘(+ ( read ) (- 8))) (R0? ‘(- ( read ) (+ 8)))
#t #f
Indeed, the structural recursion follows the grammar itself. We can generally expect to write a recursive function to handle each non-terminal in the grammar^1 You may be tempted to write the program like this:
(define (R0? sexp) (match sexp [(? fixnum?) #t] [‘( read ) #t] [‘(- ,e) (R0? e)] [‘(+ ,e1 ,e2) ( and (R0? e1) (R0? e2))] [‘(program ,e) (R0? e)] [else #f]))
Sometimes such a trick will save a few lines of code, especially when it comes to the program wrapper. Yet this style is generally not recommended, be- cause it can get you into trouble. For instance, the above function is subtly wrong: (R0? ‘(program (program 3))) will return true, when it should re- turn false.
1.6 Interpreters
The meaning, or semantics, of a program is typically defined in the spec- ification of the language. For example, the Scheme language is defined in (^1) If you took the How to Design Programs course http://www.ccs.neu.edu/home/ matthias/HtDP2e/, this principle of structuring code according to the data definition is probably quite familiar.
(define (interp-R0 p) (define ( exp ex) (match ex [(? fixnum?) ex] [‘( read ) ( let ([r ( read )]) ( cond [(fixnum? r) r] [else ( error ’interp-R0 "input␣not␣an␣integer" r)]))] [‘(- ,e) (fx- 0 ( exp e))] [‘(+ ,e1 ,e2) (fx+ ( exp e1) ( exp e2))])) (match p [‘(program ,e) ( exp e)]))
Figure 1.2: Interpreter for the R 0 language.
the report by Sperber et al. [2009]. The Racket language is defined in its reference manual [Flatt and PLT, 2014]. In this book we use an interpreter to define the meaning of each language that we consider, following Reynold’s advice in this regard [Reynolds, 1972]. Here we will warm up by writing an interpreter for the R 0 language, which will also serve as a second example of structural recursion. The interp-R0 function is defined in Figure 1.2. The body of the function is a match on the input program p and then a call to the exp helper function, which in turn has one match clause per grammar rule for R 0 expressions. The exp function is naturally recursive: clauses for internal AST nodes make recursive calls on each child node. Note that the recursive cases for negation and addition are a place where we could have made use of the app feature of Racket’s match to apply a function and bind the result. The two recursive cases of interp-R0 would become:
[‘(- ,(app exp v)) (fx- 0 v)] [‘(+ ,(app exp v1) ,(app exp v2)) (fx+ v1 v2)])) Here we use (app exp v) to recursively apply exp to the child node and bind the result value to variable v. The difference between this version and the code in Figure 1.2 is mainly stylistic, although if side effects are involved the order of evaluation can become important. Further, when we write functions with multiple return values, the app form can be convenient for binding the resulting values. Let us consider the result of interpreting some example R 0 programs. The following program simply adds two integers.
(define (pe-neg r) ( cond [(fixnum? r) (fx- 0 r)] [else ‘(- ,r)]))
(define (pe-add r1 r2) ( cond [( and (fixnum? r1) (fixnum? r2)) (fx+ r1 r2)] [else ‘(+ ,r1 ,r2)]))
(define (pe-arith e) (match e [(? fixnum?) e] [‘( read ) ‘( read )] [‘(- ,(app pe-arith r1)) (pe-neg r1)] [‘(+ ,(app pe-arith r1) ,(app pe-arith r2)) (pe-add r1 r2)]))
Figure 1.3: A partial evaluator for R 0 expressions.
1.7 Example Compiler: a Partial Evaluator
In this section we consider a compiler that translates R 0 programs into R 0 programs that are more efficient, that is, this compiler is an optimizer. Our optimizer will accomplish this by trying to eagerly compute the parts of the program that do not depend on any inputs. For example, given the following program
(+ ( read ) (- (+ 5 3)))
our compiler will translate it into the program
(+ ( read ) -8)
Figure 1.3 gives the code for a simple partial evaluator for the R 0 lan- guage. The output of the partial evaluator is an R 0 program, which we build up using a combination of quasiquotes and commas. (Though no quasiquote is necessary for integers.) In Figure 1.3, the normal structural recursion is captured in the main pe-arith function whereas the code for partially eval- uating negation and addition is factored into two separate helper functions: pe-neg and pe-add. The input to these helper functions is the output of partially evaluating the children nodes. Our code for pe-neg and pe-add implements the simple idea of checking whether the inputs are integers and if they are, to go ahead and perform
the arithmetic. Otherwise, we use quasiquote to create an AST node for the appropriate operation (either negation or addition) and use comma to splice in the child nodes. To gain some confidence that the partial evaluator is correct, we can test whether it produces programs that get the same result as the input program. That is, we can test whether it satisfies Diagram (1.7). The following code runs the partial evaluator on several examples and tests the output program. The assert function is defined in Appendix 12.2.
(define (test-pe p) ( assert "testing␣pe-arith" ( equal? (interp-R0 p) (interp-R0 (pe-arith p)))))
(test-pe ‘(+ ( read ) (- (+ 5 3)))) (test-pe ‘(+ 1 (+ ( read ) 1))) (test-pe ‘(- (+ ( read ) (- 5))))
Exercise 1. We challenge the reader to improve on the simple partial eval- uator in Figure 1.3 by replacing the pe-neg and pe-add helper functions with functions that know more about arithmetic. For example, your partial evaluator should translate
(+ 1 (+ ( read ) 1))
into
(+ 2 ( read ))
To accomplish this, we recommend that your partial evaluator produce out- put that takes the form of the residual non-terminal in the following gram- mar. exp ::= (read) | (- (read)) | (+ exp exp ) residual ::= int | (+ int exp ) | exp