Syntactic Analysis, Summaries of Compiler Construction

Syntactic analysis, or parsing, is the second phase of compilation: The token file is converted to an abstract syntax tree. Page 2. Compiler Passes. Analysis of ...

Typology: Summaries

2021/2022

Uploaded on 09/27/2022

arwen
arwen 🇬🇧

4.3

(10)

248 documents

1 / 58

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Syntactic Analysis
Syntactic analysis, or parsing, is the
second phase of compilation: The
token file is converted to an abstract
syntax tree.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a

Partial preview of the text

Download Syntactic Analysis and more Summaries Compiler Construction in PDF only on Docsity!

Syntactic Analysis

Syntactic analysis, or parsing, is the

second phase of compilation: The

token file is converted to an abstract

syntax tree.

Compiler Passes

Analysis

of input program

front

-end)

character

stream

Lexical Analysis

Code Generation

Intermediate Optimization

Code Generation

Semantic Analysis

Syntactic Analysis

annotated

AST

abstract

syntax tree

token stream

target

language

intermediate

form

intermediate

form

Synthesis

of output program

back

-end)

Context-free Grammars

Compromise between

  • Res, can’t nest or specify recursive structure– General grammars, too powerful, undecidable

Context-free grammars are a sweet spot

  • Powerful enough to describe nesting, recursion– Easy to parse; but also allow restrictions for speed

Not perfect

  • Cannot capture semantics, as in, “variable must be

declared,” requiring later semantic pass

  • Can be ambiguous

EBNF, Extended Backus Naur Form, is popularnotation

CFG Terminology

Terminals

-- alphabet of language defined by CFG

Nonterminals

-- symbols defined in terms of

terminals and nonterminals

Productions

-- rules for how a nonterminal (lhs) is

defined in terms of a (possibly empty) sequence ofterminals and nonterminals

  • Recursion is allowed!

Multiple productions allowed for a nonterminal, alternatives

State symbol -- root of the defining language

Program ::= StmtStmt ::=

if (

Expr

) then

Stmt

else

Stmt

Stmt ::=

while (

Expr

) do

Stmt

Initial miniJava [continued]

Stmt ::= Type ID

{Stmt}

if (

Expr

Stmt

else

Stmt

while (

Expr

Stmt

System.out.println (

Expr

| ID

Expr

Expr ::= Expr Op Expr

Expr

| Expr

ID

[ Expr {

Expr } ]

| ID |

this

| Integer |

true

false

Expr

Op

RE Specification of initial MiniJava Lex

Program ::= (Token | Whitespace)*Token ::= ID | Integer | ReservedWord | Operator |

Delimiter

ID ::= Letter (Letter | Digit)*Letter ::=

a

z

A

Z

Digit ::=

Integer ::= Digit

ReservedWord::=

class

public

static

extends

void

int

boolean

if

else

while

return

true

false

this

new

String

main

System.out.println

Operator ::=

Delimiter ::=

[

]

Example Grammar

E

::= E op E |

E |

E

| id

op ::=

a

b

c

Ambiguity

Some grammars are

ambiguous

– Multiple distinct parse trees for the same terminal

string

Structure of the parse tree captures much ofthe meaning of the program

– ambiguity implies multiple possible meanings for

the same program

Resolving Ambiguity

Option 1: add a meta-rule

– For example “

else

associates with closest

previous

if

  • works, keeps original grammar intact• ad hoc and informal

Resolving Ambiguity [continued]

Option 2: rewrite the grammar to resolve

ambiguity explicitly

Stmt

::= MatchedStmt | UnmatchedStmt

MatchedStmt

if (

Expr

MatchedStmt

else

MatchedStmt

UnmatchedStmt ::=

if (

Expr

Stmt |

if (

Expr

MatchedStmt

else

UnmatchedStmt

– formal, no additional rules beyond syntax– sometimes obscures original grammar

Resolving Ambiguity [continued]

Option 3: redesign the language to remove the

ambiguity

Stmt ::= ... |

if

Expr

then

Stmt

end

if

Expr

then

Stmt

else

Stmt

end

– formal, clear, elegant– allows sequence of

Stmts

in

then

and

else

branches, no { , } needed

– extra

end

required for every

if

Another Famous Example

E

::= E Op E |

E |

E

| id

Op ::=

a

b

c

a

b

c

Removing Ambiguity (Option 2)

Option2: Modify the grammar to explicitly resolve the

ambiguity

Strategy:•

create a nonterminal for each precedence level

expr is lowest precedence nonterminal,

each nonterminal can be rewritten with higherprecedence operator, highest precedenceoperator includes atomic exprs

at each precedence level, use:

  • left recursion for left-associative operators– right recursion for right-associative operators– no recursion for non-associative operators

Redone Example

E

::= E

E0 ::= E

E1 | E

left associative

E1 ::= E

E2 | E

left associative

E2 ::= E3 (

) E

non associative

E3 ::= E3 (

) E4 | E

left associative

E4 ::= E4 (

) E5 | E

left associative

E5 ::= E

E5 | E

right associative

E6 ::=

E6 | E

right associative

E7 ::= E

| E

left associative

E8 ::= id |

E