Download Syntax Analysis Part 1 - Compiler Construction | EECS 483 and more Papers Electrical and Electronics Engineering in PDF only on Docsity!
Syntax Analysis
– Part I
EECS 483 – Lecture 4 University of Michigan Monday, September 20, 2004
Announcements/Reading^ Y
Class newsgroup
» umich.class.eecs.483 » Yuan will check this regularly, so can post
questions here
Y
Reading - Ch 4
» 4.1 – 4.3 for today
Parsing Analogy^ • Syntax analysis for natural languages
- Recognize whether a sentence is grammatically correct • Identify the function of each word
sentence
subject
verb
indirect object
object
I^
gave
him
noun phrase article
nounbook
the
“I gave him the book”
Syntax Analysis Overview^ Y
Goal – determine if the input token streamsatisfies the syntax of the program
Y
What do we need to do this?
» An expressive way to describe the syntax » A mechanism that determines if the input
token stream satisfies the syntax description
Y
For lexical analysis
» Regular expressions describe tokens » Finite automata = mechanisms to generate
tokens from input stream
Context-Free Grammars^ Y
Consist of 4 components:
» Terminal symbols = token or
ε
» Non-terminal symbols = syntactic variables » Start symbol S = special non-terminal » Productions of the form LHS
Æ
RHS
y^
LHS = single non-terminal y^
RHS = string of terminals and non-terminals y^
Specify how non-terminals may be expanded
Y
Language generated by a grammar is the set ofstrings of terminals derived from the start symbolby repeatedly applying the productions
» L(G) = language generated by grammar G
S^
Æ
a S a S^
Æ
T
T
Æ
b T b T
Æ
CFG - Example^ Y
Grammar for balanced-parentheses language
» S
Æ
( S ) S
» S
Æ
ε y^
1 non-terminal: S y^
2 terminals: “(”, “)” y^
Start symbol: S y^
2 productions
Y
If grammar accepts a string, there is a derivation ofthat string using the productions
» “(())” » S = (S)
ε
= ((S) S)
ε
ε)
ε
)^
ε^
? Why is the final S required?
A Parser
Context free grammar, G
Yes, if s in L(G) No, otherwise
Parser
Token stream, s (from lexer)
Error messages
Syntax analyzers (parsers) = CFG acceptors which also output the corresponding derivation when the token stream is accepted
Various kinds: LL(k), LR(k), SLR, LALR
RE is a Subset of CFG^ Can inductively build a grammar for each RE
ε^
S
Æ
a^
S
Æ
a
R1 R
S
Æ
S1 S
R1 | R
S
Æ
S1 | S
R1*
S
Æ
S1 S |
Where
G1 = grammar for R1, with start symbol S1 G2 = grammar for R2, with start symbol S
Constructing a Derivation^ Y
Start from S (the start symbol)
Y
Use productions to derive a sequence oftokens
Y
For arbitrary strings
,^
,^
γ^
and for a
production: A
Æ
» A single step of the derivation is»^
A
(substitute
for A)
Y
Example
» S
Æ
E + S
» (S
+ E) + E
Æ
(E + S
+ E) + E
Class Problem
»^
S^
Æ
E + S | E
»^
E^
Æ
number | (S)
Y^
Derive: (1 + 2 + (3 + 4)) + 5
Parse Tree vs Abstract Syntax Tree
S
E
+^
S
( S )
E
E + S
E + S
E
( S )E + S
E
Parse tree also called “concrete syntax”
AST discards (abstracts) unneeded information – more compact format
Derivation Order^ Y
Can choose to apply productions in any order,select non-terminal and substitute RHS ofproduction
Y
Two standard orders: left and right-most
Y
Leftmost derivation
» In the string, find the leftmost non-terminal and apply
a production to it » E + S
Æ
1 + S
Y
Rightmost derivation
» Same, but find rightmost non-terminal » E + S
Æ
E + E + S
Class Problem
»^
S^
Æ
E + S | E
»^
E^
Æ
number | (S) | -S
Y^
Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5
Ambiguous Grammars^ Y
In the sum expression grammar, leftmostand rightmost derivations producedidentical parse trees
Y
+ operator associates to the right in parsetree regardless of derivation order