Download Implementing Compilers: Symbol Tables and Abstract Syntax Trees and more Papers Computer Science in PDF only on Docsity!
Intermediate
Representation
Abstract syntax trees, control-
flow graph, three-address code
Intermediate code
Intermediate language between source and target
Multiple machines can be targeted
Attaching a different backend for each machine Intel, PowerPC, UltraSparc can all share the same parser for C/C++
Multiple source languages can be supported
Attaching a different frontend (parser) for each language Eg. C and C++ can share the same backend
Allow independent code optimizations
Multiple levels of intermediate representation Low-level intermediate language: close to target machine AST, post-fix, three-address code, stack-based code, …
parser Static checker
Intermediate
Code generator
Code generator
Abstraction level in IR
Source-level IR
High-level constructs are readily available for optimization
Array access, loops, Classes, virtual functions
Machine-level IR
Expose implicit low-level instructions for optimization
Array address calculation, goto branches
Subscript
A i j
loadI 1 => r
sub rj, r1 => r
loadI 10 => r
mult r2, r3 => r
sub ri, r1 => r
add r4, r5 => r
loadI @A => r
add r7, r6 => r
load r8 => rAij
Source-level tree
ILOC code
Parse trees and abstract syntax
trees
Graphically represent grammatical structure of input program
Parse tree: tree representation of grammar derivation AST: condensed form of parse tree Operators and keywords do not appear as leaves Chains of single productions are collapsed
If-then-else
B S1 S
S
IF B THEN^ S1^ ELSE^ S
E
E +^ T
T^5
Parse trees Abstract syntax trees
Implementing AST in Java
Define AST node abstract class ASTexpression { public System.String toString(); } class ASTidentifier extends ASTexpression { private symbol_table_entry id_entry; … } class ASTvalue extends ASTexpression { private int num_value; … } class ASTplus extends ASTexpression { private ASTnode opds[2]; … } Class ASTminus extends ASTexpression { private ASTnode opds[2]; ... } Define AST node construction routines ASTexpression mkleaf_id(symbol_table_entry e) { return new ASTidentifier(e); } ASTexpression mkleaf_num(int n) { return new ASTvalue(n); } ASTexpression mknode_plus(ASTnode opd1, struct ASTNode opd2) { return new ASTplus(opd1, opd2); ASTexpression mknode_minus(ASTnode opd1, struct ASTNode opd2) { return new ASTminus(opd1, opd2);
E ::= E + T | E – T | T
T ::= (E) | id | num
Grammar:
Constructing AST
Use syntax-directed definitions
Associate each non-terminal with an AST
a pointer to a node in AST: E.nptr T.nptr
Evaluate synthesized attribute bottom-up
From AST of children, how to compute AST of the parent?
E ::= E1 + T { E.nptr=mknode_plus(E1.nptr,T.nptr); }
E ::= E1 – T { E.nptr=mknode_minus(E1.nptr,T.nptr); }
E ::= T { E.nptr=T.nptr; }
T ::= (E) {T.nptr=E.nptr; }
T ::= id { T.nptr=mkleaf_id(id.entry); }
T ::= num { T.nptr=mkleaf_num(num.val); }
Exercise: what is the AST for 5 + (15-b)?
How to add support for assignment and conditional?
What if top-down parsing is used
(need to eliminate left-recursion)?
Symbol tables Symbol tables
Record information about names defined in programs
Types of variables and functions
Additional properties (eg., static, global, scope)
Contain information about context of program fragment
Can use different symbol tables for different information
Name conflicts
The same name may represent different things in
different places
Use separate symbol tables for names in different scopes
Multiple layers of symbol definitions for nested scopes
Implementation of symbol tables
Map strings (names) to additional information (types,
values, etc.)
Efficient implementation: using hash tables
Implementing symbol tables
Interface
Lookup(name)
Returns the record for name if one exists in the table;
otherwise, indicates that name is not found
Insert(name, record)
Stores the information in record in the table for name.
Symbol tables in nested scopes
InitializeScope()
Increment the current scope level and creates a new
symbol table
FinalizeScope()
Changes the current-level symbol table pointer so that it
points to the symbol table of surrounding scope
Building symbol tables in YACC
prog : {InitializeScope();} decls stmts decls : decls decl | ; decl : type {idSeq.type = type.AST; } idSeq ‘;’ type : INT { type.AST = make_atomicType(TYPE_INT); } | FLOAT { type.AST = make_atomicType(TYPE_FLOAT); } idSeq: {idSeq1.type=idSeq.type;} idSeq1 ‘,’ {idDecl.type=idSeq.type;} idDecl | {idDecl.type=idSeq.type; } idDecl idDecl: ID {add_entry_type(ID.entry, idDecl.type);}
decls: declare semantic information of names
Call Initializecope to start a nested new scope Call Finalizecope to end a local scope (revert to surrounding scope)
Exercise: how to add support for typename declarations?
How to add support for struct (record) type?
Separate tables for names in a scope
class TypeAST { ……};
class SymbolTable { ……};
class ASTScope {
SymbolTable varTable, typeTable;
public:
TypeAST* lookup_typename( const std::string& name) const
{ return typeTable.lookup_type(name); }
TypeAST* lookup_var_type( const std::string& name) const
{ return varTable.lookup_type(name); }
void add_typename( const std::string& name, TypeAST* ast)
{ typeTable.add_type(name, ast); }
void add_var_type( const std::string& name, TypeAST* ast)
{ varTable.add_type(name, ast); }
std::string toString() const
{ return ":typenames:\n" + typeTable.toString()+ “\n”
+":variables:\n" + varTable.toString(); }
Contain information for variable names
Contain information for type names
Building symbol tables with typenames prog : {InitializeScope();} decls stmts { set_stmts($3); } decls : decl decls | ; decl : type idDecl SEMICOLON | TYPEDEF type typenameDecl SEMICOLON {$$ = $3; } type : STRUCT LBRACE {InitializeScope();} decls RBRACE { $$ = make_recordType($4); FinalizeScope(); } | TYPENAME ID { $$ = make_typename($2); }; | INT { $$ = make_atomicType(TYPE_INT); } | FLOAT { $$ = make_atomicType(TYPE_FLOAT); } idDecl: idDecl COMMA {$$=$0; } idDecl typenameDecl: typenameDecl COMMA {$$ = $0; } typenameDecl | ID {$$ = $1; } postType { add_typename($1,$3); }
Linear IR
Low level IL before final code generation
A linear sequence of low-level instructions
Resemble assembly code for an abstract machine
Explicit conditional branches and goto jumps
Reflect instruction sets of the target machine
Stack-machine code and three-address code
Implemented as a collection (table or list) of tuples
Push 2
Push y
Multiply
Push x
subtract
Linear IR for x – 2 * y
MOV 2 => t
MOV y => t
MULT t2 => t
MOV x => t
SUB t1 => t
Stack-machine code two-address code three-address code
t1 := 2
t2 := y
t3 := t1*t
t4 := x
t5 := t4-t
Three address code
Every instruction manipulates at most two operands and one
result. Typical forms include
Arithmetic operations: x := y op z | x := op y Data movement: x := y [ z ] | x[z] := y | x := y Control flow: if y op z goto x | goto x Function call: param x | return y | call foo
Each instruction maps to at most a few machine instructions
Additional constraints depend on target machine instructions
Eg., for x := y op z and x := op y all operands must be in registers all operands must be temporaries?
Reasonably compact, while allowing reuse of names and values
t1 := 2
t2 := y
t3 := t1*t
t4 := x
t5 := t4-t
Three-address code for x – 2 * y
cs5363 20
Storing three-address code
(5) Assign t5 a
(4) Plus t2 t4 t
(3) Mult b t3 t
(2) Uminus c t
(1) Mult b t1 t
(0) Uminus c t
t1 := - c op arg1 arg2^ result
t2 := b * t
t3 := -c
t4 := b * t
t5 := t2 + t
a := t
Three-address code
Store all instructions in a quadruple table
Every instruction has four fields: op, arg1, arg2, result
The label of instructions index of instruction in table
Quadruple entries
Alternative: store all the instructions in a singly/doubly linked list
What is the tradeoff?