Compiler 'main program' read of Control Semantic Analysis, Schemes and Mind Maps of Compiler Design

Cornell University. Lecture 8: AST construction and semantic analysis. 12 Sep 11. CS 4120 Introduction to Compilers. 2. Compiler 'main program'.

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 05/11/2023

newfound
newfound 🇨🇦

4.5

(13)

362 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 4120
Introduction to Compilers
Andrew Myers
Cornell University
Lecture 8: AST construction and
semantic analysis
12 Sep 11
CS 4120 Introduction to Compilers 2
Compiler ‘main program’
class Compiler {
void compile() throws CompileError {
Lexer l = new Lexer(input);
Parser p = new Parser(l);
AST tree = p.parse();
// calls l.getToken() to read tokens
if (typeCheck(tree))
IR = genIntermediateCode(tree);
IR.emitCode();
}
}
CS 4120 Introduction to Compilers 3
Thread of Control
Compiler.main
Parser.parse
Lexer.getToken
InputStream.read easier to make re-entrant
AST
bytes/chars
tokens
CS 4120 Introduction to Compilers 4
Semantic Analysis
Source code
lexical analysis
parsing
semantic analysis
tokens
abstract syntax tree
valid programs: decorated AST
semantic
errors
lexical
errors
syntax
errors
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Compiler 'main program' read of Control Semantic Analysis and more Schemes and Mind Maps Compiler Design in PDF only on Docsity!

CS 4120

Introduction to Compilers

Andrew Myers

Cornell University

Lecture 8: AST construction and

semantic analysis

12 Sep 11

CS 4120 Introduction to Compilers 2

Compiler ‘main program’

class Compiler {

void compile() throws CompileError {

Lexer l = new Lexer(input);

Parser p = new Parser(l);

AST tree = p.parse();

// calls l.getToken() to read tokens

if (typeCheck(tree))

IR = genIntermediateCode(tree);

IR.emitCode();

Thread of Control

Compiler.main

Parser.parse

Lexer.getToken

InputStream.read

easier to make re-entrant

AST

bytes/chars tokens

Semantic Analysis

Source code

lexical analysis

parsing

semantic analysis

tokens abstract syntax tree valid programs: decorated AST semantic errors lexical errors syntax errors

CS 4120 Introduction to Compilers 5 input

Do we need an AST?

  • Old-style compilers: semantic actions generate code

during parsing!

  • Especially for stack machine: stack parser code

Problems:

  • hard to maintain
  • limits language features (e.g., recursion)
  • bad code! expr ::= expr PLUS expr {: emitCode(add); :} CS 4120 Introduction to Compilers 6

AST

• A bstract S yntax T ree is a tree

representation of the program. Used for

– semantic analysis (type checking)

– some optimization ( e.g. constant folding)

– intermediate code generation (sometimes

intermediate code = AST with somewhat

different set of nodes)

• Compiler phases = recursive tree traversals

• Object-oriented languages convenient for

defining AST nodes

Outline

• Abstract syntax trees

• Type checking

• Symbol tables

• Using symbol tables for analysis

Semantic Analysis

Source code

lexical analysis

parsing

semantic analysis

tokens abstract syntax tree valid programs: decorated AST semantic errors lexical errors syntax errors

CS 4120 Introduction to Compilers 13

Using class hierarchy

• Can use subclassing to solve problem

  • write abstract class for each “interesting” non-

terminal in grammar

  • write non-abstract subclass for (almost) every

prod’n

E! E + E | E * E | - E | ( E )

abstract class Expr { … } // E class Add extends Expr { Expr left, right; … } class Mult extends Expr { Expr left, right; … } // or: class BinExpr extends Expr { Oper o; Expr l, r; } class Negate extends Expr { Expr e; …} CS 4120 Introduction to Compilers 14

Creating the AST

non terminal Expr expr; …

expr ::= expr:e1 PLUS expr:e

{: RESULT = new BinaryExpr(plus, e1, e2); :}

| expr:e1 TIMES expr:e

{: RESULT = new BinaryExpr(times, e1, e2); :}

| MINUS expr:e

{: RESULT = new UnaryExpr(negate, e); :}

| LPAREN expr:e RPAREN

{: RESULT = e; :}

Expr plus, times, negate: Oper^ BinaryExpr^ UnaryExpr “RESULT has type Expr in all semantic actions for expr”

Another Example

expr ::= num | ( expr ) | expr + expr | id

stmt ::= expr ; | if ( expr ) stmt |

if (expr) stmt else stmt | id = expr ; | ;

abstract class Expr { … } class Num extends Expr { Num(int value) … } class Add extends Expr { Add(Expr e1, Expr e2) … } class Id extends Expr { Id(String name) … } abstract class Stmt { … } class If extends Stmt { If(Expr cond, Stmt s1, Stmt s2) } class EmptyStmt extends Stmt { EmptyStmt() … } class Assign extends Stmt { Assign(String id, Expr e)…}

And…top-down

  • parse_ X method for each non-terminal X
  • (^) Return type is abstract class for X Stmt parseStmt() { switch (next_token) { case IF: consume(IF); consume(LPAREN); Expr e = parseExpr; consume(RPAREN); Stmt s2, s1 = parseStmt(); if (next_token == ELSE) { consume(ELSE); s2 = parseStmt(); } else s2 = new EmptyStmt(); return new IfStmt(e, s1, s2); } case ID: …

CS 4120 Introduction to Compilers 17

AST

• A bstract S yntax T ree is a tree representation of

the program. Used for

  • semantic analysis (type checking)
  • some optimization ( e.g. constant folding)
  • intermediate code generation (sometimes

intermediate code = AST with somewhat different

set of nodes)

• Compiler phases = recursive tree traversals

  • building new tree or modifying tree in place for

next compiler phase

• Object-oriented and functional languages both

convenient for defining AST nodes

CS 4120 Introduction to Compilers 18

Goals of Semantic Analysis

• Find all possible remaining errors that

would make program invalid

– undefined variables, types

– type errors that can be caught statically

– uninitialized variables, unreachable code

• Figure out useful information for later

compiler phases

– types of all expressions

– data layout: memory sizes

Recursive semantic checking

• Program is tree, so...

  • recursively traverse tree, checking each

component

  • traversal routine returns information about node

checked

class Add extends Expr { Expr e1, e2; Type typeCheck() throws SemanticError { Type t1 = e1.typeCheck(), t2 = e2.typeCheck(); if (t1 == Int && t2 == Int) return Int; else throw new TypeCheckError(“type error +”); }

Type-checking identifiers

class Id extends Expr { String name; Type typeCheck() {

return?

Need a environment that keeps track of types

of all identifiers in scope: symbol table

CS 4120 Introduction to Compilers 25

Adding entries

• Java, Iota9: statement may declare new variables.

{ a = b; int x = 2; a = a + x }

• Suppose { stmt 1 ; stmt 2 ; stmt 3 ...} represented by

AST nodes:

abstract class Stmt { … }

class Block { Vector/Stmt/ stmts; … }

• And declarations are a kind of statement:

class Decl extends Stmt {

String id; TypeExpr typeExpr; ...

CS 4120 Introduction to Compilers 26

A stab at adding entries

class Block { Vector stmts; Type typeCheck(SymTab s) { Type t; for (st : stmts) { t = st.typeCheck(s); if (st instanceof Decl) Decl d = (Decl) st; s.add(d.id, d.typeExpr.interpret()); } return t; }

} Does it work?

Restoring Symbol Table

{ int x = 5;

{ int y = 1; }

x = y; // should be illegal!

scope of y

Handling declarations

class Block { Vector stmts; Type typeCheck(SymTab s) { Type t; SymTab s1 = s.clone(); for (int i = 0; i < stmts.length(); i++) { t = stmts[i].typeCheck(s1); Decl d = (Decl) stmts[i]; s1.add(d.id, d.typeExpr.interpret()); } return t; } } Declarations added in block (to s1) don’t affect code after the block

CS 4120 Introduction to Compilers 29

Storing Symbol Tables

  • Many symbol tables constructed during checking
    • May keep track of more than just variables: type definitions, break & continue labels, …
    • Top-level symbol table contains global variables, type & module declarations,
    • Nested scopes result in extended symbol tables containing add’l definitions for those scopes.
  • Can reconstruct symbol tables, but useful to save in

corresponding AST nodes to avoid recomputation

CS 4120 Introduction to Compilers 30

How to implement Symbol Table?

• Imperative? Three operations:

Object lookup(String name);

void add (String name, Object type);

SymTab clone(); // expensive?

• Functional? Two operations:

Object lookup(String name);

SymTab add (String, Object); // expensive?

Imperative: Linked list of tables

class SymTab {

SymTab parent;

HashMap table;

Object lookup(String id) {

if (table.get(id) != null) return table.get(id);

else return parent.lookup(id); // can cache..

void add(String id, Object t)

{ table.add(id,t); }

SymTab(Symtab p)

{ parent = p; } // =clone

Functional: Binary trees

• Discussed in Appel Ch. 5

• Implements the two-operation interface

Object lookup(String name);

SymTab add (String, Object);

– non-destructive add so no cloning is needed

– O(lg n) performance: clones only the path

from added node to the root.