







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to Context Free Grammar (CFG), a fundamental concept in compiler design. CFG is used for syntactic analysis, which involves analyzing the syntactic structure of a program to check for errors. the role of a parser, the approach to constructing a parser using CFG, and the derivation of strings from a CFG. It also covers the notation and properties of CFGs, as well as the differences between CFGs and regular grammars.
Typology: Slides
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Introduction I (^) Second phase of the compiler. I (^) Main task: I (^) Analyze syntactic structure of program and its components I (^) to check these for errors. I (^) Role of parser: Lexical Analyzer Parser FrontRest of end
Symbol Table
source tree
parse req
token (^) IR
I (^) Approach to constructing parser: similar to lexical analyzer I (^) Represent source language by a meta-language, Context Free Grammar I (^) Use algorithms to construct a recognizer that recognizes strings generated by the grammar. This step can be automated for certain classes of grammars. One such tool: YACC. I (^) Parse strings of language using the recognizer.
Context Free Grammar (CFG) I (^) Syntax analysis based on theory of automata and formal languages, specifically the equivalence of two mechanisms of context free grammars and pushdown automata. I (^) Context free grammars used to describe the syntactic structures of programs of a programming language. Describe what elementary constructs there are and how composite constructs can be built from other constructs. Stmt → if (Expr) Stmt else Stmt Note recursive nature of definition. I (^) Formally, a CFG has four components: a) a set of tokens Vt , called terminal symbols, (token set produced by the scanner) examples: if, then, identifier, etc. b) a set of different intermediate symbols, called non-terminals, syntactic categories, syntactic variables, Vn c) a start symbol, S ∈ Vn, and d) a set of productions P of the form A → X 1 · · · Xn where A ∈ Vn, Xi ∈ (Vn ∪ Vt ), 1 ≤ i ≤ m, m ≥ 0. I (^) Sentences generated by starting with S and applying productions until left with nothing but terminals. I (^) Set of strings derivable from a CFG G comprises the context free language, denoted L(G ).
E → E A E | ( E ) | − E | id A → +| − | ∗ |/| ↑
E ⇒ −E ⇒ −(E ) ⇒ −(id)
CFG - cont’d. I (^) Notation: Write αAβ ⇒ αγβ if A → γ. I (^) Notation: Write α ⇒∗ β to denote that β can be derived from α in zero or more steps. L(G ) = {α| S ⇒∗ α} I Sentential form: α is a sentential form, if S ⇒∗ α and α contains non-terminals. Example: E + E I (^) Leftmost derivation: Derivation α ⇒ β is leftmost if the leftmost terminal in α is replaced. Example: E ⇒∗ EAE ⇒∗ idAE ⇒∗ id + E ⇒∗ id + id Production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation; hence, these parsers are said to produce leftmost parse. I (^) Rightmost derivation: Derivation α ⇒ β is left most if the rightmost terminal in α is replaced. Example: E ⇒∗ EAE ⇒∗ EAid ⇒∗ E + id ⇒∗ id + id Also, called canonical derivation. Corresponds well to an important class of parsers (the bottom-up parsers). In particular, as a bottom up parser discovers the productions used to derive a token sequence, it discovers a rightmost derivation, but in reverse order : last production applied is discovered first, while the first production is the last to be discovered.
Stmt
IfStmt
if ( exp ) Stmt ElseStmt
0 other else Stmt
other
if
0 other other
StmtSeq
Stmt ; StmtSeq Stmt ; StmtSeq s s (^) s
seq
s s s
Properties of Context Free Grammars
I (^) Context free grammars that are limited to productions of the form A → a B and C → form the class of regular grammars. Languages defined by regular grammars are a proper subset of the context-free languages. I (^) Why not use lexical analysis during parsing? I (^) Lexical rules are in general simple. I (^) RE are more concise and easier to understand. I (^) Domain specific language so that efficient lexical analyzer can be constructed. I (^) Separate into two manageable parts. Useful for multi-lingual programming. I (^) Non-reduced CFGs: A CFG containing nonterminals that are unreachable or derive no terminal string. Example: S → A|B A → a B → B b C → c Nonterminal C cannot be reached from S. B does not derive any strings. Useless terminals can be safely removed from a CFG without affecting the language. Reduced grammar: S → A A → a Algorithms exist that check for useless nonterminals.
Properties of Context Free Grammars - cont’d.
I (^) Left recursion: G is left recursive if for a nonterminal A, there is a derivation A ⇒+ Aα Top-down parsing methods cannot handle left-recursive grammars. So eliminate left recursion. I (^) Left factoring : Factor out the common left prefixes of grammars: Replace grammar A → αβ 1 |αβ 2 by the rule: A → αA′ A′^ → β 1 |β 2 I (^) Context free grammars are not powerful enough to represent all constructs of programming languages. Cannot distinguish the following: I (^) L 1 = {wcw |w ∈ (a|b)∗}: Conceptually represents problem of verifying that an identifier is declared before used. Such checkings are done during the semantic analysis phase. I (^) L 2 = {anbmcncm|n ≥ 1 ∧ m ≥ 1 }. Abstracts the problem of checking that number of formal parameters agrees with the number of actual parameters. I (^) L 3 = {anbncn|n ≥ 0 }.
CFG’s can keep count of two items but not three.
Extended BNF (EBNF) I (^) Extend BNF by adding more meta-notation =⇒ shorter productions I (^) Nonterminals begin with uppercase letters (discard <>) I (^) Terminals that are grammar symbols (’[’ for instance) are enclosed in ‘’. I (^) Repetitions (zero or more) are enclosed in {} I (^) Options are enclosed in []: I (^) Use () to group items together: Exp ::= Item {+ Item} | Item {- Item} =⇒ Exp ::= Item {(+|-) Item}
Conversion from EBNF to BNF and Vice Versa I (^) BNF to EBNF: i) Look for recursion in grammar: A ::= a A | B =⇒ { a } B ii) Look for common string that can be factored out with grouping and options. A ::= a B | a =⇒ A := a [B] I (^) EBNF to BNF: i) Options []: A ::= a [B] C =⇒ A’ ::= a N C N ::= B | ii) Repetition {}: A ::= a B1 B2 ... Bn C =⇒ A’ ::= a N C N ::= B1 B2 ... Bn N |