Context Free Grammar: Syntactic Analysis in Compiler Design, Slides of Theory of Automata

An introduction to Context Free Grammar (CFG), a fundamental concept in compiler design. CFG is used for syntactic analysis, which involves analyzing the syntactic structure of a program to check for errors. the role of a parser, the approach to constructing a parser using CFG, and the derivation of strings from a CFG. It also covers the notation and properties of CFGs, as well as the differences between CFGs and regular grammars.

Typology: Slides

2021/2022

Uploaded on 07/05/2022

carol_78
carol_78 🇦🇺

4.8

(59)

1K documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Syntactic Analysis
Introduction
ISecond phase of the compiler.
IMain task:
IAnalyze syntactic structure of program and its components
Ito check these for errors.
IRole of parser:
Lexical
Analyzer
Rest of
Front end
Parser
Symbol
Table
source
tree
parse
req
token
IR
IApproach to constructing parser: similar to lexical analyzer
IRepresent source language by a meta-language, Context Free Grammar
IUse algorithms to construct a recognizer that recognizes strings generated by the
grammar.
This step can be automated for certain classes of grammars. One such tool:
YACC.
IParse strings of language using the recognizer.
1/1
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Context Free Grammar: Syntactic Analysis in Compiler Design and more Slides Theory of Automata in PDF only on Docsity!

Syntactic Analysis

Introduction I (^) Second phase of the compiler. I (^) Main task: I (^) Analyze syntactic structure of program and its components I (^) to check these for errors. I (^) Role of parser: Lexical Analyzer Parser FrontRest of end

Symbol Table

source tree

parse req

token (^) IR

I (^) Approach to constructing parser: similar to lexical analyzer I (^) Represent source language by a meta-language, Context Free Grammar I (^) Use algorithms to construct a recognizer that recognizes strings generated by the grammar. This step can be automated for certain classes of grammars. One such tool: YACC. I (^) Parse strings of language using the recognizer.

Context Free Grammar (CFG) I (^) Syntax analysis based on theory of automata and formal languages, specifically the equivalence of two mechanisms of context free grammars and pushdown automata. I (^) Context free grammars used to describe the syntactic structures of programs of a programming language. Describe what elementary constructs there are and how composite constructs can be built from other constructs. Stmt → if (Expr) Stmt else Stmt Note recursive nature of definition. I (^) Formally, a CFG has four components: a) a set of tokens Vt , called terminal symbols, (token set produced by the scanner) examples: if, then, identifier, etc. b) a set of different intermediate symbols, called non-terminals, syntactic categories, syntactic variables, Vn c) a start symbol, S ∈ Vn, and d) a set of productions P of the form A → X 1 · · · Xn where A ∈ Vn, Xi ∈ (Vn ∪ Vt ), 1 ≤ i ≤ m, m ≥ 0. I (^) Sentences generated by starting with S and applying productions until left with nothing but terminals. I (^) Set of strings derivable from a CFG G comprises the context free language, denoted L(G ).

Context Free Grammar (CFG) - cont’d.

I Notations:

  1. Nonterminals: Uppercase letters such as A, B, C
  2. Terminals: lower case letters such as a,b, c, operators +,−, etc, punctuation, digits, and boldface strings such as id.
  3. Nonterminals or terminals: Upper-case letters late in alphabet, such as X , Y , Z.
  4. Strings of terminals: lower-case letters late in alphabet, such as x, y , z.
  5. Strings of grammar symbols: lower-case greek letters α, β, etc.
  6. Write A → α 1 , A → α 2 , etc as A → α 1 |α 2 | · · ·

I Example:

E → E A E | ( E ) | − E | id A → +| − | ∗ |/| ↑

I Derivation of strings: a production can be thought of as a rewrite

rule in which nonterminal on left is replaced by string on right side.

Notation: Write such a replacement as E ⇒ (E).

Example:

E ⇒ −E ⇒ −(E ) ⇒ −(id)

CFG - cont’d. I (^) Notation: Write αAβ ⇒ αγβ if A → γ. I (^) Notation: Write α ⇒∗ β to denote that β can be derived from α in zero or more steps. L(G ) = {α| S ⇒∗ α} I Sentential form: α is a sentential form, if S ⇒∗ α and α contains non-terminals. Example: E + E I (^) Leftmost derivation: Derivation α ⇒ β is leftmost if the leftmost terminal in α is replaced. Example: E ⇒∗ EAE ⇒∗ idAE ⇒∗ id + E ⇒∗ id + id Production sequence discovered by a large class of parsers (the top-down parsers) is a leftmost derivation; hence, these parsers are said to produce leftmost parse. I (^) Rightmost derivation: Derivation α ⇒ β is left most if the rightmost terminal in α is replaced. Example: E ⇒∗ EAE ⇒∗ EAid ⇒∗ E + id ⇒∗ id + id Also, called canonical derivation. Corresponds well to an important class of parsers (the bottom-up parsers). In particular, as a bottom up parser discovers the productions used to derive a token sequence, it discovers a rightmost derivation, but in reverse order : last production applied is discovered first, while the first production is the last to be discovered.

Parse Tree - Examples

I Parse tree for string: if (o) other else other

Stmt

IfStmt

if ( exp ) Stmt ElseStmt

0 other else Stmt

other

if

0 other other

I Parse tree for string: s;s;s

StmtSeq

Stmt ; StmtSeq Stmt ; StmtSeq s s (^) s

seq

s s s

Properties of Context Free Grammars

I (^) Context free grammars that are limited to productions of the form A → a B and C →  form the class of regular grammars. Languages defined by regular grammars are a proper subset of the context-free languages. I (^) Why not use lexical analysis during parsing? I (^) Lexical rules are in general simple. I (^) RE are more concise and easier to understand. I (^) Domain specific language so that efficient lexical analyzer can be constructed. I (^) Separate into two manageable parts. Useful for multi-lingual programming. I (^) Non-reduced CFGs: A CFG containing nonterminals that are unreachable or derive no terminal string. Example: S → A|B A → a B → B b C → c Nonterminal C cannot be reached from S. B does not derive any strings. Useless terminals can be safely removed from a CFG without affecting the language. Reduced grammar: S → A A → a Algorithms exist that check for useless nonterminals.

Properties of Context Free Grammars - cont’d.

I (^) Left recursion: G is left recursive if for a nonterminal A, there is a derivation A ⇒+ Aα Top-down parsing methods cannot handle left-recursive grammars. So eliminate left recursion. I (^) Left factoring : Factor out the common left prefixes of grammars: Replace grammar A → αβ 1 |αβ 2 by the rule: A → αA′ A′^ → β 1 |β 2 I (^) Context free grammars are not powerful enough to represent all constructs of programming languages. Cannot distinguish the following: I (^) L 1 = {wcw |w ∈ (a|b)∗}: Conceptually represents problem of verifying that an identifier is declared before used. Such checkings are done during the semantic analysis phase. I (^) L 2 = {anbmcncm|n ≥ 1 ∧ m ≥ 1 }. Abstracts the problem of checking that number of formal parameters agrees with the number of actual parameters. I (^) L 3 = {anbncn|n ≥ 0 }.

CFG’s can keep count of two items but not three.

Properties of Context Free Grammars - cont’d.

I Context free grammar can capture some of language semantics as

well.

I Example grammar:

::= + | ::= * | ‘(’‘)’ | ::= 0 | 1 | · · · | 9

I Precedence of * over +: by deriving * lower in the parse tree.

I Left recursion

::= +

left associativity of +

I Right recursion:

::= +

right associativity of +

Extended BNF (EBNF) I (^) Extend BNF by adding more meta-notation =⇒ shorter productions I (^) Nonterminals begin with uppercase letters (discard <>) I (^) Terminals that are grammar symbols (’[’ for instance) are enclosed in ‘’. I (^) Repetitions (zero or more) are enclosed in {} I (^) Options are enclosed in []: I (^) Use () to group items together: Exp ::= Item {+ Item} | Item {- Item} =⇒ Exp ::= Item {(+|-) Item}

Conversion from EBNF to BNF and Vice Versa I (^) BNF to EBNF: i) Look for recursion in grammar: A ::= a A | B =⇒ { a } B ii) Look for common string that can be factored out with grouping and options. A ::= a B | a =⇒ A := a [B] I (^) EBNF to BNF: i) Options []: A ::= a [B] C =⇒ A’ ::= a N C N ::= B |  ii) Repetition {}: A ::= a B1 B2 ... Bn C =⇒ A’ ::= a N C N ::= B1 B2 ... Bn N |