Search in the document preview
Introduction to Parsing
Docsity.com
Administrivia
• Programming Assignment 2 is out this week – Due October 1st
– Work in teams begins
• Required Readings – Lex Manual
– Red Dragon Book Chapter 4
Docsity.com
Outline
• Regular languages revisited
• Parser overview
• Contextfree grammars (CFG’s)
• Derivations
Docsity.com
Languages and Automata
• Formal languages are very important in CS – Especially in programming languages
• Regular languages – The weakest formal languages widely used
– Many applications
• We will also study contextfree languages
Docsity.com
Limitations of Regular Languages
• Intuition: A finite automaton that runs long enough must repeat states
• Finite automaton can’t remember # of times it has visited a particular state
• Finite automaton has finite memory – Only enough to store in which state it is
– Cannot count, except up to a finite limit
• E.g., language of balanced parentheses is not regular: { (i )i  i ¸ 0}
Docsity.com
The Functionality of the Parser
• Input: sequence of tokens from lexer
• Output: parse tree of the program
Docsity.com
Example
• Cool
if x = y then 1 else 2 fi
• Parser input
IF ID = ID THEN INT ELSE INT FI
• Parser output
IFTHENELSE
=
ID ID
INT INT
Docsity.com
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of characters
Sequence of tokens
Parser Sequence of tokens
Parse tree
Docsity.com
The Role of the Parser
• Not all sequences of tokens are programs . . .
• . . . Parser must distinguish between valid and invalid sequences of tokens
• We need – A language for describing valid sequences of tokens
– A method for distinguishing valid from invalid sequences of tokens
Docsity.com
ContextFree Grammars
• Programming language constructs have recursive structure
• An EXPR is if EXPR then EXPR else EXPR fi , or
while EXPR loop EXPR pool , or
…
• Contextfree grammars are a natural notation for this recursive structure
Docsity.com
CFGs (Cont.)
• A CFG consists of – A set of terminals T – A set of nonterminals N
– A start symbol S (a nonterminal)
– A set of productions
Assuming X N
X => e , or
X => Y1 Y2 ... Yn where Yi (N U T)
Docsity.com
Notational Conventions
• In these lecture notes – Nonterminals are written uppercase
– Terminals are written lowercase
– The start symbol is the lefthand side of the first production
Docsity.com
Examples of CFGs
A fragment of Cool:
EXPR if EXPR then EXPR else EXPR fi
 while EXPR loop EXPR pool
 id
Docsity.com
Examples of CFGs (cont.)
Simple arithmetic expressions:
E E E
 E + E
 E
 id
Docsity.com
The Language of a CFG
Read productions as replacement rules:
X => Y1 ... Yn Means X can be replaced by Y1 ... Yn
X => e Means X can be erased (replaced with empty string)
Docsity.com
Key Idea
1. Begin with a string consisting of the start symbol “S”
2. Replace any nonterminal X in the string by a righthand side of some production
X => Y1 … Yn
3. Repeat (2) until there are no nonterminals in the string
Docsity.com
The Language of a CFG (Cont.)
More formally, write
X1 … Xi … Xn => X1 … Xi1 Y1 … Ym Xi+1 … Xn
if there is a production
Xi => Y1 … Ym
Docsity.com
The Language of a CFG (Cont.)
Write
X1 … Xn =>* Y1 … Ym
if
X1 … Xn => … => … => Y1 … Ym
in 0 or more steps
Docsity.com
The Language of a CFG
Let G be a contextfree grammar with start symbol S. Then the language of G is:
{ a1 … an  S =>* a1 … an and every ai is a terminal }
Docsity.com
Terminals
• Terminals are called because there are no rules for replacing them
• Once generated, terminals are permanent
• Terminals ought to be tokens of the language
Docsity.com
Examples
L(G) is the language of CFG G
Strings of balanced parentheses
Two grammars:
( )S S
S e
( )

S S
e
( )  0i i i
OR
Docsity.com
Cool Example
A fragment of COOL:
EXPR if EXPR then EXPR else EXPR fi
 while EXPR loop EXPR pool
 id
Docsity.com
Cool Example (Cont.)
Some elements of the language
id
if id then id else id fi
while id loop id pool
if while id loop id pool then id else id
if if id then id else id fi then id else id fi
Docsity.com
Arithmetic Example
Simple arithmetic expressions:
Some elements of the language:
E E+E  E E  (E)  id
id id + id
(id) id id
(id) id id (id)
Docsity.com
Notes
The idea of a CFG is a big step. But:
• Membership in a language is “yes” or “no” – we also need parse tree of the input
• Must handle errors gracefully
• Need an implementation of CFG’s (e.g., bison)
Docsity.com