# Regular Expressions Three - Automata and Complexity Theory - Lecture Slides, Slides for Theory of Automata. Bidhan Chandra Krishi Viswa Vidyalaya

PDF (300 KB)
40 pages
1000+Number of visits
Description
Some concept of Automata and Complexity Theory are Administrivia, Closure Properties, Context-Free Grammars, Decision Properties, Deterministic Finite Automata, Intractable Problems, More Undecidable Problems. Main point...
20points
this document
Preview3 pages / 40

Chapter Seven: Regular Expressions

Docsity.com

The first time a young student sees the mathematical constant , it looks like just one more school artifact: one more arbitrary symbol whose definition to memorize for the next test. Later, if he or she

persists, this perception changes. In many branches of mathematics and with many practical applications, keeps on turning up. "There

it is again!" says the student, thus joining the ranks of mathematicians for whom mathematics seems less like an artifact

invented and more like a natural phenomenon discovered.

So it is with regular languages. We have seen that DFAs and NFAs have equal definitional power. It turns out that regular

expressions also have exactly that same definitional power: they can be used to define all the regular languages, and only the regular

languages. There it is again!

Docsity.com

Outline

• 7.1 Regular Expressions, Formally Defined • 7.2 Regular Expression Examples • 7.3 For Every Regular Expression, a Regular

Language • 7.4 Regular Expressions and Structural

Induction • 7.5 For Every Regular Language, a Regular

Expression

Docsity.com

Concatenation of Languages

• The concatenation of two languages L1 and L2 is L1L2 = {xy | xL1 and yL2}

• The set of all strings that can be constructed by concatenating a string from the first language with a string from the second

• For example, if L1 = {a, b} and L2 = {c, d} then L1L2 = {ac, ad, bc, bd}

Docsity.com

Kleene Closure of a Language

• The Kleene closure of a language L is L* = {x1x2 ... xn | n ≥ 0, with all xiL}

• The set of strings that can be formed by concatenating any number of strings, each of which is an element of L

• Not the same as {xn | n ≥ 0 and xL} • In L*, each xi may be a different element of L • For example, {ab, cd}* = {ε, ab, cd, abab, abcd, cdab,

cdcd, ababab, ...} • For all L, ε ∈ L* • For all L containing at least one string other than ε,

L* is infinite

Docsity.com

Regular Expressions

• A regular expression is a string r that denotes a language L(r) over some alphabet Σ

• Regular expressions make special use of the symbols ε, ∅, +, *, and parentheses

• We will assume that these special symbols are not included in Σ

• There are six kinds of regular expressions…

Docsity.com

The Six Regular Expressions • The six kinds of regular expressions, and the

languages they denote, are: – Three kinds of atomic regular expressions:

• Any symbol a ∈ Σ, with L(a) = {a} • The special symbol ε, with L(ε) = {ε} • The special symbol ∅, with L(∅) = {}

– Three kinds of compound regular expressions built from smaller regular expressions, here called r, r1, and r2:

• (r1 + r2), with L(r1 + r2) = L(r1) ∪ L(r2) • (r1r2), with L(r1r2) = L(r1)L(r2) • (r)*, with L((r)*) = (L(r))*

• The parentheses may be omitted, in which case * has highest precedence and + has lowest

Docsity.com

Other Uses of the Name

• These are classical regular expressions • Many modern programs use text patterns

also called regular expressions: – Tools like awk, sed and grep – Languages like Perl, Python, Ruby, and PHP – Language libraries like those for Java and the

.NET languages • All slightly different from ours and each other • More about them in a later chapter

Docsity.com

Outline

• 7.1 Regular Expressions, Formally Defined • 7.2 Regular Expression Examples • 7.3 For Every Regular Expression, a Regular

Language • 7.4 Regular Expressions and Structural

Induction • 7.5 For Every Regular Language, a Regular

Expression

Docsity.com

ab

• Denotes the language {ab} • Our formal definition permits this because

a is an atomic regular expression denoting {a} – b is an atomic regular expression denoting {b} – Their concatenation (ab) is a compound – Unnecessary parentheses can be omitted

• Thus any string x in Σ* can be used by itself as a regular expression, denoting {x}

Docsity.com

ab+c

• Denotes the language {ab,c} • We omitted parentheses from the fully

parenthesized form ((ab)+c) • The inner pair is unnecessary because + has

lower precedence than concatenation • Thus any finite language can be defined

using a regular expression • Just list the strings, separated by +

Docsity.com

ba*

• Denotes the language {ban}: the set of strings consisting of b followed by zero or more as

• Not the same as (ba)*, which denotes {(ba)n} • * has higher precedence than concatenation • The Kleene star is the only way to define an

infinite language using regular expressions

Docsity.com

(a+b)*

• Denotes {a,b}*: the whole language of strings over the alphabet {a,b}

• The parentheses are necessary here, because * has higher precedence than +

a+b* denotes {a} ∪ {b}* • Reminder: not "zero or more copies…" • That would be a*+b*, which denotes

{a}* ∪ {b}*

Docsity.com

ab

• Denotes the language {ab,ε} • Occasionally, we need to use the atomic

regular expression ε to include ε in the language

• But it's not needed in (a+b)*+ε, because ε is already part of every Kleene star

Docsity.com

• Denotes {} • There is no other way to denote the empty set

with regular expressions • That's all you should ever use ∅ for • It is not useful in compounds:

L(r∅) = L(∅r) = {} – L(r+∅) = L(∅+r) = L(r) – L(∅*) = {ε}

Docsity.com

More Examples

• (a+b)(c+d) – Denotes {ac, ad, bc, bd}

• (abc)* – Denotes {(abc)n} = {ε, abc, abcabc, …}

a*b* – Denotes {anbm} = {xy | x ∈ {a}* and y ∈ {b}*}

Docsity.com

More Examples

• (a+b)*aa(a+b)* – Denotes {x ∈ {a,b}* | x contains at least 2 consecutive as}

• (a+b)*a(a+b)*a(a+b)* – Denotes {x ∈ {a,b}* | x contains at least 2 as}

• (a*b*)* – Denotes {a,b}*, same as the simpler (a+b)* – Because L(a*b*) contains both a and b, and that's enough: we

already have L((a+b)*) = {a,b}* – In general, whenever Σ ⊆ L(r), then L((r)*) = Σ*

Docsity.com

Outline

• 7.1 Regular Expressions, Formally Defined • 7.2 Regular Expression Examples • 7.3 For Every Regular Expression, a Regular

Language • 7.4 Regular Expressions and Structural

Induction • 7.5 For Every Regular Language, a Regular

Expression

Docsity.com

Regular Expression to NFA

• Goal: to show that every regular expression defines a regular language

• Approach: give a way to convert any regular expression to an NFA for the same language

• Advantage: large NFAs can be composed from smaller ones using ε-transitions

Docsity.com

Standard Form

• To make them easier to compose, our NFAs will all have the same standard form: – Exactly one accepting state, not the start state

• That is, for any regular expression r, we will show how to construct an NFA N with L(N) = L(r), pictured like this:

r

Docsity.com

Composing Example

• That form makes composition easy • For example, given NFAs for L(r1) and L(r2),

we can easily construct one for L(r1+r2):

• This new NFA still has our special form

r1

r2

Docsity.com

Lemma 7.3

• Proof sketch: – There are six kinds of regular expressions – We will show how to build a suitable NFA for each kind

If r is any regular expression, there is some NFA N that has a single accepting state, not the same as the start state, with L(N) = L(r).

Docsity.com

Proof Sketch: Atomic Expressions

• There are three kinds of atomic regular expressions – Any symbol a ∈ Σ, with L(a) = {a}

– The special symbol ε, with L(ε) = {ε}

– The special symbol ∅, with L(∅) = {}

a a ∈ :

:

∅:

Docsity.com

Proof: Compound Expressions

• There are three kinds of compound regular expressions: – (r1 + r2), with L(r1 + r2) = L(r1) ∪ L(r2)

r1

r2

(r1 + r2):

Docsity.com

– (r1r2), with L(r1r2) = L(r1) L(r2)

– (r1)*, with L((r1)*) = (L(r1))*

r1

r2

(r1r2):

r1

(r1)*:

Docsity.com