



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of regular expressions as an alternative specification method for regular languages. Regular expressions are easier to construct and understand than automata, and they are commonly used in computer applications to describe patterns in texts. The formal definition of regular expressions, their benefits, and examples of regular expressions over the ascii alphabet.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word and tells whether it is in the language or not. Though this is a valid and unambiguous specification, it is sometimes not a very helpful one. Specifying languages by automata has two major shortcomings: First, when given a language, it is often difficult to construct an automaton that accepts it; second, when given an automaton, it is often difficult to understand which language it accepts. Regular expressions are an alternative specification method for regular languages: They are easier to construct, and it is easier to see which language they describe by just looking at the expression. Both benefits stem from the fact that regular expressions describe the structure of words contained in a language, rather than giving a machine that must be “run” in order to decide a word. Regular expressions are very common in computer applications because they are a powerful way to describe patterns in texts. Text editors (in their search and replace functions), programming languages such as PERL, and UNIX utilities such as grep, awk and lex all use regular expressions to describe patterns. Programming language compilers typically use regular expressions to define the lowest-level constructs of program source code (“tokens”), and the stage of the compiler responsible for recognizing tokens (“parser”) is automatically constructed from those regular expressions using the lex utility.
2 Regular Expressions
Regular expressions over an alphabet Σ define languages over Σ by describing the structure of words in a language. They are based on the three regular operations: Union, concatenation and Kleene Star. They are very similar to arithmetic expressions like 3 + (4 · 5): They consist of constants and operators, and they construct complex expressions from simpler building blocks. As opposed to arithmetic expressions, their values are not numbers, but languages. Examples:
Since every regular expression defines one language, we will write L(R) to denote the language defined by regular expression R.
3 Formal Definition of Regular Expressions
Regular expressions over an alphabet Σ are defined in a recursive fashion, very similarly to arithmetic expressions. We start by defining the simplest regular expressions, and then define operations to create more complex ones from simpler building blocks:
. Kleene Star has precedence over union.
We also define the following shorthand notation:
({E, e}{+, - , }DD∗)∪
describes the language of all floating-point constants with an optional sign and exponential part, as recognized by the C programming language.
4 Equivalence of Regular Expressions and Fi-
nite State Machines
Earlier we have claimed that the class of languages that can be described by regular expressions is exactly the class of regular languages. We are now going to prove this statement. First we will show that the language L(R) generated by any regular expression R is accepted by some NFA M. Second, we show that the language L(M ) accepted by any automaton M is generated by some regular expression R.
5 Construction of Automata from Regular Ex-
pressions
We will prove the existence of an automata that accepts the language gen- erated by a regular expression by structural induction. This means, we will
Figure 1: An NFA M accepting the empty language, L(M ) = ∅.
Figure 2: An NFA M accepting the language consisting only of the empty word, L(M ) = {}.
follow the recursive definition of regular expressions and construct automata accepting the languages generated by simple regular expressions first, and will then show how to combine those automata to accept the languages gen- erated by more complex ones. For all the following constructions, we will assume that all regular expressions are over some alphabet Σ.
If R = ∅, then L(R) = ∅. The empty language is accepted by the NFA M∅ :=
{q 0 }, Σ, δ, q 0 , ∅
where δ(q 0 , a) = ∅ for all a ∈ Σ. A transition diagram for this automaton is shown in Figure 1.
If R = , then L(R) = {}. The language consisting only of the empty word is accepted by the NFA M :=
{q 0 }, Σ, δ, q 0 , {q 0 }
where δ(q 0 , a) = ∅ for all a ∈ Σ. A transition diagram for this automaton is shown in Figure 2.
If R = a for some character a ∈ Σ, then L(R) = {a}. The language consisting only of the word a is accepted by the NFA Ma :=
{q 0 , q 1 }, Σ, δ, q 0 , {q 1 }
where
∀q ∈ {q 0 , q 1 }, x ∈ Σ : δ(q, x) =
{q 1 }, if q = q 0 and x = a ∅ otherwise