Download Compiler 'main program' read of Control Semantic Analysis and more Schemes and Mind Maps Compiler Design in PDF only on Docsity!
CS 4120
Introduction to Compilers
Andrew Myers
Cornell University
Lecture 8: AST construction and
semantic analysis
12 Sep 11
CS 4120 Introduction to Compilers 2
Compiler ‘main program’
class Compiler {
void compile() throws CompileError {
Lexer l = new Lexer(input);
Parser p = new Parser(l);
AST tree = p.parse();
// calls l.getToken() to read tokens
if (typeCheck(tree))
IR = genIntermediateCode(tree);
IR.emitCode();
Thread of Control
Compiler.main
Parser.parse
Lexer.getToken
InputStream.read
easier to make re-entrant
AST
bytes/chars tokens
Semantic Analysis
Source code
lexical analysis
parsing
semantic analysis
tokens abstract syntax tree valid programs: decorated AST semantic errors lexical errors syntax errors
CS 4120 Introduction to Compilers 5 input
Do we need an AST?
- Old-style compilers: semantic actions generate code
during parsing!
- Especially for stack machine: stack parser code
Problems:
- hard to maintain
- limits language features (e.g., recursion)
- bad code! expr ::= expr PLUS expr {: emitCode(add); :} CS 4120 Introduction to Compilers 6
AST
• A bstract S yntax T ree is a tree
representation of the program. Used for
– semantic analysis (type checking)
– some optimization ( e.g. constant folding)
– intermediate code generation (sometimes
intermediate code = AST with somewhat
different set of nodes)
• Compiler phases = recursive tree traversals
• Object-oriented languages convenient for
defining AST nodes
Outline
• Abstract syntax trees
• Type checking
• Symbol tables
• Using symbol tables for analysis
Semantic Analysis
Source code
lexical analysis
parsing
semantic analysis
tokens abstract syntax tree valid programs: decorated AST semantic errors lexical errors syntax errors
CS 4120 Introduction to Compilers 13
Using class hierarchy
• Can use subclassing to solve problem
- write abstract class for each “interesting” non-
terminal in grammar
- write non-abstract subclass for (almost) every
prod’n
E! E + E | E * E | - E | ( E )
abstract class Expr { … } // E class Add extends Expr { Expr left, right; … } class Mult extends Expr { Expr left, right; … } // or: class BinExpr extends Expr { Oper o; Expr l, r; } class Negate extends Expr { Expr e; …} CS 4120 Introduction to Compilers 14
Creating the AST
non terminal Expr expr; …
expr ::= expr:e1 PLUS expr:e
{: RESULT = new BinaryExpr(plus, e1, e2); :}
| expr:e1 TIMES expr:e
{: RESULT = new BinaryExpr(times, e1, e2); :}
| MINUS expr:e
{: RESULT = new UnaryExpr(negate, e); :}
| LPAREN expr:e RPAREN
{: RESULT = e; :}
Expr plus, times, negate: Oper^ BinaryExpr^ UnaryExpr “RESULT has type Expr in all semantic actions for expr”
Another Example
expr ::= num | ( expr ) | expr + expr | id
stmt ::= expr ; | if ( expr ) stmt |
if (expr) stmt else stmt | id = expr ; | ;
abstract class Expr { … } class Num extends Expr { Num(int value) … } class Add extends Expr { Add(Expr e1, Expr e2) … } class Id extends Expr { Id(String name) … } abstract class Stmt { … } class If extends Stmt { If(Expr cond, Stmt s1, Stmt s2) } class EmptyStmt extends Stmt { EmptyStmt() … } class Assign extends Stmt { Assign(String id, Expr e)…}
And…top-down
- parse_ X method for each non-terminal X
- (^) Return type is abstract class for X Stmt parseStmt() { switch (next_token) { case IF: consume(IF); consume(LPAREN); Expr e = parseExpr; consume(RPAREN); Stmt s2, s1 = parseStmt(); if (next_token == ELSE) { consume(ELSE); s2 = parseStmt(); } else s2 = new EmptyStmt(); return new IfStmt(e, s1, s2); } case ID: …
CS 4120 Introduction to Compilers 17
AST
• A bstract S yntax T ree is a tree representation of
the program. Used for
- semantic analysis (type checking)
- some optimization ( e.g. constant folding)
- intermediate code generation (sometimes
intermediate code = AST with somewhat different
set of nodes)
• Compiler phases = recursive tree traversals
- building new tree or modifying tree in place for
next compiler phase
• Object-oriented and functional languages both
convenient for defining AST nodes
CS 4120 Introduction to Compilers 18
Goals of Semantic Analysis
• Find all possible remaining errors that
would make program invalid
– undefined variables, types
– type errors that can be caught statically
– uninitialized variables, unreachable code
• Figure out useful information for later
compiler phases
– types of all expressions
– data layout: memory sizes
Recursive semantic checking
• Program is tree, so...
- recursively traverse tree, checking each
component
- traversal routine returns information about node
checked
class Add extends Expr { Expr e1, e2; Type typeCheck() throws SemanticError { Type t1 = e1.typeCheck(), t2 = e2.typeCheck(); if (t1 == Int && t2 == Int) return Int; else throw new TypeCheckError(“type error +”); }
Type-checking identifiers
class Id extends Expr { String name; Type typeCheck() {
return?
Need a environment that keeps track of types
of all identifiers in scope: symbol table
CS 4120 Introduction to Compilers 25
Adding entries
• Java, Iota9: statement may declare new variables.
{ a = b; int x = 2; a = a + x }
• Suppose { stmt 1 ; stmt 2 ; stmt 3 ...} represented by
AST nodes:
abstract class Stmt { … }
class Block { Vector/Stmt/ stmts; … }
• And declarations are a kind of statement:
class Decl extends Stmt {
String id; TypeExpr typeExpr; ...
CS 4120 Introduction to Compilers 26
A stab at adding entries
class Block { Vector stmts; Type typeCheck(SymTab s) { Type t; for (st : stmts) { t = st.typeCheck(s); if (st instanceof Decl) Decl d = (Decl) st; s.add(d.id, d.typeExpr.interpret()); } return t; }
} Does it work?
Restoring Symbol Table
{ int x = 5;
{ int y = 1; }
x = y; // should be illegal!
scope of y
Handling declarations
class Block { Vector stmts; Type typeCheck(SymTab s) { Type t; SymTab s1 = s.clone(); for (int i = 0; i < stmts.length(); i++) { t = stmts[i].typeCheck(s1); Decl d = (Decl) stmts[i]; s1.add(d.id, d.typeExpr.interpret()); } return t; } } Declarations added in block (to s1) don’t affect code after the block
CS 4120 Introduction to Compilers 29
Storing Symbol Tables
- Many symbol tables constructed during checking
- May keep track of more than just variables: type definitions, break & continue labels, …
- Top-level symbol table contains global variables, type & module declarations,
- Nested scopes result in extended symbol tables containing add’l definitions for those scopes.
- Can reconstruct symbol tables, but useful to save in
corresponding AST nodes to avoid recomputation
CS 4120 Introduction to Compilers 30
How to implement Symbol Table?
• Imperative? Three operations:
Object lookup(String name);
void add (String name, Object type);
SymTab clone(); // expensive?
• Functional? Two operations:
Object lookup(String name);
SymTab add (String, Object); // expensive?
Imperative: Linked list of tables
class SymTab {
SymTab parent;
HashMap table;
Object lookup(String id) {
if (table.get(id) != null) return table.get(id);
else return parent.lookup(id); // can cache..
void add(String id, Object t)
{ table.add(id,t); }
SymTab(Symtab p)
{ parent = p; } // =clone
Functional: Binary trees
• Discussed in Appel Ch. 5
• Implements the two-operation interface
Object lookup(String name);
SymTab add (String, Object);
– non-destructive add so no cloning is needed
– O(lg n) performance: clones only the path
from added node to the root.