Convert Context-Free Grammars to Chomsky Normal Form: Eliminate ϵ-Productions, Unit Produc, Study notes of Computer Science

The process of converting a context-free grammar into chomsky normal form by eliminating ϵ-productions, unit productions, and useless symbols. The document also includes examples and algorithms for each simplification step.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-vbz
koofers-user-vbz 🇺🇸

9 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 373: Theory of Computation
Manoj Prabhakaran Mahesh Viswanathan
Fall 2008
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Convert Context-Free Grammars to Chomsky Normal Form: Eliminate ϵ-Productions, Unit Produc and more Study notes Computer Science in PDF only on Docsity!

CS 373: Theory of Computation

Manoj Prabhakaran Mahesh Viswanathan

Fall 2008

1 Normal Forms for CFG

Normal Forms for Grammars

It is typically easier to work with a context free language if given a CFG in a normal form.

Normal Forms A grammar is in a normal form if its production rules have a special structure:

  • Chomsky Normal Form: Productions are of the form A → BC or A → a
  • Greibach Normal Form Productions are of the form A → aα, where α ∈ V ∗

If  is in the language, we allow the rule S → . We will require that S does not appear on the right hand side of any rules.

  • Today: How to convert any context-free grammar to an equivalent grammar in the Chomsky Normal Form
  • We will start with a series of simplifications...

2 Three Simplifications

2.1 Eliminating -productions

Eliminating -productions

  • Often would like to ensure that the length of the intermediate strings in a derivation are not longer than the final string derived
  • But a long intermediate string can lead to a short final string if there are -productions (rules of the form A → ).
  • Can we rewrite the grammar not to have -productions?

Eliminating -productions Given a grammar G produce an equivalent grammar G′^ (i.e., L(G) = L(G′)) such that G′^ has no rules of the form A → , except possibly S → , and S does not appear on the right hand side of any rule.

Note: If S can appear on the RHS of a rule, say S → SS, then when there is the rule S → , we can again have long intermediate strings yielding short final strings.

Eliminating -productions

Definition: Nullable Variables A variable A (of grammar G) is nullable if A ⇒∗ .

How do you determine if a variable is nullable?

– S → AB|A|B

  • A → AaA|aA|Aa|a
  • B → BbB|bB|Bb|b
  • S′^ → S|

2.2 Eliminating Unit Productions

Eliminating Unit Productions

  • Often would like to ensure that the number of steps in a derivation are not much more than the length of the string derived
  • But can have a long chain of derivation steps that make little or no “progress,” if the grammar has unit productions (rules of the form A → B, where B is a non-terminal). - Note: A → a is not a unit production
  • Can we rewrite the grammar not to have unit-productions?

Eliminating unit-productions Given a grammar G produce an equivalent grammar G′^ (i.e., L(G) = L(G′)) such that G′^ has no rules of the form A → B where B ∈ V ′.

Eliminating Unit Productions Unit Productions Unit productions can play an important role in designing grammars:

  • While eliminating -productions we added a rule S′^ → S. This is a unit production.
  • We have used unit productions in building an unambiguous grammar:

I → a | b | Ia | Ib N → 0 | 1 | N 0 | N 1 | − N | + N F → I | N | (E) T → F | T ∗ F E → T | E + T

But as we shall see now, they can be (safely) eliminated

Eliminating Unit Productions Basic Idea Introduce new “look-ahead” productions to replace unit productions: look ahead to see where the unit production (or a chain of unit productions) leads to and add a rule to directly go there.

Example 2. E → T → F → I → a|b|Ia|Ib. So introduce new rules E → a|b|Ia|Ib

But what if the grammar has cycles of unit productions? For example, A → B|a, B → C|b and C → A|c. You cannot use the “look-ahead” approach, because then you will get into an infinite loop.

Eliminating Unit Productions Basic Idea: Fixed

Algorithm

  1. Determine pairs 〈A, B〉 such that A ⇒∗u B, i.e., A derives B using only unit rules. Such pairs are called unit pairs. - Easy to determine unit pairs: Make a directed graph with vertices = V , and edges = unit productions. 〈A, B〉 is a unit pair, if there is a directed path from A to B in the graph.
  2. If 〈A, B〉 is a unit pair, then add production rules A → β 1 |β 2 | · · · βk, where B → β 1 |β 2 | · · · |βk are all the non-unit production rules of B
  3. Remove all unit production rules.

Let G′^ be the grammar obtained from G using this algorithm. Then L(G′) = L(G)

Eliminating Unit Productions L(G) = L(G′): Proof

  • L(G′) ⊆ L(G): For every rule A → w in G′, we have A ⇒∗G w (by a sequence of zero or more unit productions followed by a nonunit production of G)
  • L(G) ⊆ L(G′): For w ∈ L(G) consider a leftmost derivation S ∗ ⇒lm w in G.
  • All these derivation steps are possible in G′^ also, except the ones using the unit produc- tions of G.
  • Suppose S ⇒∗ xAα ⇒ 1 xBα ⇒ 2 · · · , where ⇒ 1 corresponds to a unit rule. Then (in a leftmost derivation) ⇒ 2 must correspond to using a rule for B.
  • So a leftmost derivation of w in G can be broken up into “big-steps” each consisting of zero or more unit productions on the leftmost variable, followed by a non-unit production.
  • For each such “big-step” there is a single production rule in G′^ that yields the same result.

Doesn’t remove any useful symbol in either step (Why?) Only remains to show how to do the two steps in this algorithm Eliminating Useless Symbols Generating and Reachable Symbols

The set of generating symbols

  • If A → x, where x ∈ Σ∗, is a production then A is generating
  • If A → γ is a production and all variables in γ are generating, then A is generating.

The set of reachable symbols

  • S is reachable
  • If A is reachable and A → αBβ is a production, then B is reachable

Fixed point algorithm: Propagate the label (generating or reachable) until no change.

2.4 Putting Together the Three Simplifications

The Three Simplifications, Together

Given a grammar G, such that L(G) 6 = ∅, we can find a grammar G′^ such that L(G′) = L(G) and G′^ has no -productions (except possibly S → ), unit productions, or useless symbols, and S does not appear in the RHS of any rule.

Proof. Apply the following 3 steps in order:

  1. Eliminate -productions
  2. Eliminate unit productions
  3. Eliminate useless symbols.

Note: Applying the steps in a different order may result in a grammar not having all the desired properties.

3 Chomsky Normal Form

Chomsky Normal Form

Proposition 3. For any non-empty context-free language L, there is a grammar G, such that L(G) = L and each rule in G is of the form

  1. A → a where a ∈ Σ, or
  1. A → BC where neither B nor C is the start symbol, or
  2. S →  where S is the start symbol (iff  ∈ L)

Furthermore, G has no useless symbols.

Chomsky Normal Form Outline of Normalization Given G = (V, Σ, S, P ), convert to CNF

  • Let G′^ = (V ′, Σ, S, P ′) be the grammar obtained after eliminating -productions, unit pro- ductions, and useless symbols from G.
  • If A → x is a rule of G′, where |x| = 0, then A must be S (because G′^ has no other - productions). If A → x is a rule of G′, where |x| = 1, then x ∈ Σ (because G′^ has no unit productions). In either case A → x is in a valid form.
  • All remaining productions are of form A → X 1 X 2 · · · Xn where Xi ∈ V ′^ ∪ Σ, n ≥ 2 (and S does not occur in the RHS). We will put these rules in the right form by applying the following two transformations: 1. Make the RHS consist only of variables 2. Make the RHS be of length 2.

Chomsky Normal Form Make the RHS consist only of variables Let A → X 1 X 2 · · · Xn, with Xi being either a variable or a terminal. We want rules where all the Xi are variables.

Example 4. Consider A → BbCdef G. How do you remove the terminals? For each a, b, c... ∈ Σ add variables Xa, Xb, Xc,... with productions Xa → a, Xb → b,.. .. Then replace the production A → BbCdef G by A → BXbCXdXeXf G

For every a ∈ Σ

  1. Add a new variable Xa
  2. In every rule, if a occurs in the RHS, replace it by Xa
  3. Add a new rule Xa → a

Chomsky Normal Form Make the RHS be of length 2

  • Now all productions are of the form A → a or A → B 1 B 2 · · · Bn, where n ≥ 2 and each Bi is a variable.
  • How do you eliminate rules of the form A → B 1 B 2... Bn where n > 2?