Implementing Compilers: Symbol Tables and Abstract Syntax Trees, Papers of Computer Science

An overview of implementing symbol tables and abstract syntax trees (ast) in compiler design. It covers topics such as array address calculation, parse trees, ast implementation in c, evaluating synthesized attributes, and implementing symbol tables in c++ and yacc. The document also includes exercises and examples.

Typology: Papers

Pre 2010

Uploaded on 08/18/2009

koofers-user-p4m
koofers-user-p4m 🇺🇸

10 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
cs5363 1
Intermediate
Representation
Abstract syntax trees, control-
flow graph, three-address code
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Implementing Compilers: Symbol Tables and Abstract Syntax Trees and more Papers Computer Science in PDF only on Docsity!

Intermediate

Representation

Abstract syntax trees, control-

flow graph, three-address code

Intermediate code

 Intermediate language between source and target

 Multiple machines can be targeted

 Attaching a different backend for each machine  Intel, PowerPC, UltraSparc can all share the same parser for C/C++

 Multiple source languages can be supported

 Attaching a different frontend (parser) for each language  Eg. C and C++ can share the same backend

 Allow independent code optimizations

 Multiple levels of intermediate representation  Low-level intermediate language: close to target machine  AST, post-fix, three-address code, stack-based code, …

parser Static checker

Intermediate

Code generator

Code generator

Abstraction level in IR

 Source-level IR

 High-level constructs are readily available for optimization

 Array access, loops, Classes, virtual functions

 Machine-level IR

 Expose implicit low-level instructions for optimization

 Array address calculation, goto branches

Subscript

A i j

loadI 1 => r

sub rj, r1 => r

loadI 10 => r

mult r2, r3 => r

sub ri, r1 => r

add r4, r5 => r

loadI @A => r

add r7, r6 => r

load r8 => rAij

Source-level tree

ILOC code

Parse trees and abstract syntax

trees

 Graphically represent grammatical structure of input program

 Parse tree: tree representation of grammar derivation  AST: condensed form of parse tree  Operators and keywords do not appear as leaves  Chains of single productions are collapsed

If-then-else

B S1 S

S

IF B THEN^ S1^ ELSE^ S

E

E +^ T

T^5

Parse trees Abstract syntax trees

Implementing AST in Java

 Define AST node abstract class ASTexpression { public System.String toString(); } class ASTidentifier extends ASTexpression { private symbol_table_entry id_entry; … } class ASTvalue extends ASTexpression { private int num_value; … } class ASTplus extends ASTexpression { private ASTnode opds[2]; … } Class ASTminus extends ASTexpression { private ASTnode opds[2]; ... }  Define AST node construction routines  ASTexpression mkleaf_id(symbol_table_entry e) { return new ASTidentifier(e); }  ASTexpression mkleaf_num(int n) { return new ASTvalue(n); }  ASTexpression mknode_plus(ASTnode opd1, struct ASTNode opd2) { return new ASTplus(opd1, opd2);  ASTexpression mknode_minus(ASTnode opd1, struct ASTNode opd2) { return new ASTminus(opd1, opd2);

E ::= E + T | E – T | T

T ::= (E) | id | num

Grammar:

Constructing AST

 Use syntax-directed definitions

 Associate each non-terminal with an AST

 a pointer to a node in AST: E.nptr T.nptr

 Evaluate synthesized attribute bottom-up

 From AST of children, how to compute AST of the parent?

E ::= E1 + T { E.nptr=mknode_plus(E1.nptr,T.nptr); }

E ::= E1 – T { E.nptr=mknode_minus(E1.nptr,T.nptr); }

E ::= T { E.nptr=T.nptr; }

T ::= (E) {T.nptr=E.nptr; }

T ::= id { T.nptr=mkleaf_id(id.entry); }

T ::= num { T.nptr=mkleaf_num(num.val); }

Exercise: what is the AST for 5 + (15-b)?

How to add support for assignment and conditional?

What if top-down parsing is used

(need to eliminate left-recursion)?

Symbol tables  Symbol tables

 Record information about names defined in programs

 Types of variables and functions

 Additional properties (eg., static, global, scope)

 Contain information about context of program fragment

 Can use different symbol tables for different information

 Name conflicts

 The same name may represent different things in

different places

 Use separate symbol tables for names in different scopes

 Multiple layers of symbol definitions for nested scopes

 Implementation of symbol tables

 Map strings (names) to additional information (types,

values, etc.)

 Efficient implementation: using hash tables

Implementing symbol tables

 Interface

 Lookup(name)

 Returns the record for name if one exists in the table;

otherwise, indicates that name is not found

 Insert(name, record)

 Stores the information in record in the table for name.

 Symbol tables in nested scopes

 InitializeScope()

 Increment the current scope level and creates a new

symbol table

 FinalizeScope()

 Changes the current-level symbol table pointer so that it

points to the symbol table of surrounding scope

Building symbol tables in YACC

prog : {InitializeScope();} decls stmts decls : decls decl | ; decl : type {idSeq.type = type.AST; } idSeq ‘;’ type : INT { type.AST = make_atomicType(TYPE_INT); } | FLOAT { type.AST = make_atomicType(TYPE_FLOAT); } idSeq: {idSeq1.type=idSeq.type;} idSeq1 ‘,’ {idDecl.type=idSeq.type;} idDecl | {idDecl.type=idSeq.type; } idDecl idDecl: ID {add_entry_type(ID.entry, idDecl.type);}

 decls: declare semantic information of names

 Call Initializecope to start a nested new scope  Call Finalizecope to end a local scope (revert to surrounding scope)

Exercise: how to add support for typename declarations?

How to add support for struct (record) type?

Separate tables for names in a scope

class TypeAST { ……};

class SymbolTable { ……};

class ASTScope {

SymbolTable varTable, typeTable;

public:

TypeAST* lookup_typename( const std::string& name) const

{ return typeTable.lookup_type(name); }

TypeAST* lookup_var_type( const std::string& name) const

{ return varTable.lookup_type(name); }

void add_typename( const std::string& name, TypeAST* ast)

{ typeTable.add_type(name, ast); }

void add_var_type( const std::string& name, TypeAST* ast)

{ varTable.add_type(name, ast); }

std::string toString() const

{ return ":typenames:\n" + typeTable.toString()+ “\n”

+":variables:\n" + varTable.toString(); }

Contain information for variable names

Contain information for type names

Building symbol tables with typenames prog : {InitializeScope();} decls stmts { set_stmts($3); } decls : decl decls | ; decl : type idDecl SEMICOLON | TYPEDEF type typenameDecl SEMICOLON {$$ = $3; } type : STRUCT LBRACE {InitializeScope();} decls RBRACE { $$ = make_recordType($4); FinalizeScope(); } | TYPENAME ID { $$ = make_typename($2); }; | INT { $$ = make_atomicType(TYPE_INT); } | FLOAT { $$ = make_atomicType(TYPE_FLOAT); } idDecl: idDecl COMMA {$$=$0; } idDecl typenameDecl: typenameDecl COMMA {$$ = $0; } typenameDecl | ID {$$ = $1; } postType { add_typename($1,$3); }

Linear IR

 Low level IL before final code generation

 A linear sequence of low-level instructions

 Resemble assembly code for an abstract machine

 Explicit conditional branches and goto jumps

 Reflect instruction sets of the target machine

 Stack-machine code and three-address code

 Implemented as a collection (table or list) of tuples

Push 2

Push y

Multiply

Push x

subtract

Linear IR for x – 2 * y

MOV 2 => t

MOV y => t

MULT t2 => t

MOV x => t

SUB t1 => t

Stack-machine code two-address code three-address code

t1 := 2

t2 := y

t3 := t1*t

t4 := x

t5 := t4-t

Three address code

 Every instruction manipulates at most two operands and one

result. Typical forms include

 Arithmetic operations: x := y op z | x := op y  Data movement: x := y [ z ] | x[z] := y | x := y  Control flow: if y op z goto x | goto x  Function call: param x | return y | call foo

 Each instruction maps to at most a few machine instructions

 Additional constraints depend on target machine instructions

 Eg., for x := y op z and x := op y all operands must be in registers  all operands must be temporaries?

 Reasonably compact, while allowing reuse of names and values

t1 := 2

t2 := y

t3 := t1*t

t4 := x

t5 := t4-t

Three-address code for x – 2 * y

cs5363 20

Storing three-address code

(5) Assign t5 a

(4) Plus t2 t4 t

(3) Mult b t3 t

(2) Uminus c t

(1) Mult b t1 t

(0) Uminus c t

t1 := - c op arg1 arg2^ result

t2 := b * t

t3 := -c

t4 := b * t

t5 := t2 + t

a := t

Three-address code

 Store all instructions in a quadruple table

 Every instruction has four fields: op, arg1, arg2, result

 The label of instructions  index of instruction in table

Quadruple entries

Alternative: store all the instructions in a singly/doubly linked list

What is the tradeoff?