Download Syntactic Analysis and more Summaries Compiler Construction in PDF only on Docsity!
Syntactic Analysis
Syntactic analysis, or parsing, is the
second phase of compilation: The
token file is converted to an abstract
syntax tree.
Compiler Passes
Analysis
of input program
front
-end)
character
stream
Lexical Analysis
Code Generation
Intermediate Optimization
Code Generation
Semantic Analysis
Syntactic Analysis
annotated
AST
abstract
syntax tree
token stream
target
language
intermediate
form
intermediate
form
Synthesis
of output program
back
-end)
Context-free Grammars
Compromise between
- Res, can’t nest or specify recursive structure– General grammars, too powerful, undecidable
Context-free grammars are a sweet spot
- Powerful enough to describe nesting, recursion– Easy to parse; but also allow restrictions for speed
Not perfect
- Cannot capture semantics, as in, “variable must be
declared,” requiring later semantic pass
EBNF, Extended Backus Naur Form, is popularnotation
CFG Terminology
Terminals
-- alphabet of language defined by CFG
Nonterminals
-- symbols defined in terms of
terminals and nonterminals
Productions
-- rules for how a nonterminal (lhs) is
defined in terms of a (possibly empty) sequence ofterminals and nonterminals
Multiple productions allowed for a nonterminal, alternatives
State symbol -- root of the defining language
Program ::= StmtStmt ::=
if (
Expr
) then
Stmt
else
Stmt
Stmt ::=
while (
Expr
) do
Stmt
Initial miniJava [continued]
Stmt ::= Type ID
{Stmt}
if (
Expr
Stmt
else
Stmt
while (
Expr
Stmt
System.out.println (
Expr
| ID
Expr
Expr ::= Expr Op Expr
Expr
| Expr
ID
[ Expr {
Expr } ]
| ID |
this
| Integer |
true
false
Expr
Op
RE Specification of initial MiniJava Lex
Program ::= (Token | Whitespace)*Token ::= ID | Integer | ReservedWord | Operator |
Delimiter
ID ::= Letter (Letter | Digit)*Letter ::=
a
z
A
Z
Digit ::=
Integer ::= Digit
ReservedWord::=
class
public
static
extends
void
int
boolean
if
else
while
return
true
false
this
new
String
main
System.out.println
Operator ::=
Delimiter ::=
[
]
Example Grammar
E
::= E op E |
E |
E
| id
op ::=
a
b
c
Ambiguity
Some grammars are
ambiguous
– Multiple distinct parse trees for the same terminal
string
Structure of the parse tree captures much ofthe meaning of the program
– ambiguity implies multiple possible meanings for
the same program
Resolving Ambiguity
Option 1: add a meta-rule
– For example “
else
associates with closest
previous
if
- works, keeps original grammar intact• ad hoc and informal
Resolving Ambiguity [continued]
Option 2: rewrite the grammar to resolve
ambiguity explicitly
Stmt
::= MatchedStmt | UnmatchedStmt
MatchedStmt
if (
Expr
MatchedStmt
else
MatchedStmt
UnmatchedStmt ::=
if (
Expr
Stmt |
if (
Expr
MatchedStmt
else
UnmatchedStmt
– formal, no additional rules beyond syntax– sometimes obscures original grammar
Resolving Ambiguity [continued]
Option 3: redesign the language to remove the
ambiguity
Stmt ::= ... |
if
Expr
then
Stmt
end
if
Expr
then
Stmt
else
Stmt
end
– formal, clear, elegant– allows sequence of
Stmts
in
then
and
else
branches, no { , } needed
– extra
end
required for every
if
Another Famous Example
E
::= E Op E |
E |
E
| id
Op ::=
a
b
c
a
b
c
Removing Ambiguity (Option 2)
Option2: Modify the grammar to explicitly resolve the
ambiguity
Strategy:•
create a nonterminal for each precedence level
expr is lowest precedence nonterminal,
each nonterminal can be rewritten with higherprecedence operator, highest precedenceoperator includes atomic exprs
at each precedence level, use:
- left recursion for left-associative operators– right recursion for right-associative operators– no recursion for non-associative operators
Redone Example
E
::= E
E0 ::= E
E1 | E
left associative
E1 ::= E
E2 | E
left associative
E2 ::= E3 (
) E
non associative
E3 ::= E3 (
) E4 | E
left associative
E4 ::= E4 (
) E5 | E
left associative
E5 ::= E
E5 | E
right associative
E6 ::=
E6 | E
right associative
E7 ::= E
| E
left associative
E8 ::= id |
E