Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regular Expressions: An Alternative Specification Method for Regular Languages, Study notes of Computer Science

University of California - Davis Computer Science

The concept of regular expressions as an alternative specification method for regular languages. Regular expressions are easier to construct and understand than automata, and they are commonly used in computer applications to describe patterns in texts. The formal definition of regular expressions, their benefits, and examples of regular expressions over the ascii alphabet.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-9xi 🇺🇸

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

ECS 120 Lesson 7 – Regular Expressions, Pt. 1

Oliver Kreylos

Friday, April 13th, 2001

1 Outline

Thus far, we have been discussing one way to specify a (regular) language:

Giving a machine that reads a word and tells whether it is in the language or

not. Though this is a valid and unambiguous specification, it is sometimes

not a very helpful one. Specifying languages by automata has two major

shortcomings: First, when given a language, it is often difficult to construct

an automaton that accepts it; second, when given an automaton, it is often

difficult to understand which language it accepts. Regular expressions are

an alternative specification method for regular languages: They are easier to

construct, and it is easier to see which language they describe by just looking

at the expression. Both benefits stem from the fact that regular expressions

describe the structure of words contained in a language, rather than giving

a machine that must be “run” in order to decide a word.

Regular expressions are very common in computer applications because

they are a powerful way to describe patterns in texts. Text editors (in their

search and replace functions), programming languages such as PERL, and

UNIX utilities such as grep, awk and lex all use regular expressions to describe

patterns. Programming language compilers typically use regular expressions

to define the lowest-level constructs of program source code (“tokens”), and

the stage of the compiler responsible for recognizing tokens (“parser”) is

automatically constructed from those regular expressions using the lex utility.

1

Discover Study notes of Computer Science University of California - Davis

Partial preview of the text

Download Regular Expressions: An Alternative Specification Method for Regular Languages and more Study notes Computer Science in PDF only on Docsity!

ECS 120 Lesson 7 – Regular Expressions, Pt. 1

Oliver Kreylos

Friday, April 13th, 2001

1 Outline

Thus far, we have been discussing one way to specify a (regular) language: Giving a machine that reads a word and tells whether it is in the language or not. Though this is a valid and unambiguous specification, it is sometimes not a very helpful one. Specifying languages by automata has two major shortcomings: First, when given a language, it is often difficult to construct an automaton that accepts it; second, when given an automaton, it is often difficult to understand which language it accepts. Regular expressions are an alternative specification method for regular languages: They are easier to construct, and it is easier to see which language they describe by just looking at the expression. Both benefits stem from the fact that regular expressions describe the structure of words contained in a language, rather than giving a machine that must be “run” in order to decide a word. Regular expressions are very common in computer applications because they are a powerful way to describe patterns in texts. Text editors (in their search and replace functions), programming languages such as PERL, and UNIX utilities such as grep, awk and lex all use regular expressions to describe patterns. Programming language compilers typically use regular expressions to define the lowest-level constructs of program source code (“tokens”), and the stage of the compiler responsible for recognizing tokens (“parser”) is automatically constructed from those regular expressions using the lex utility.

2 Regular Expressions

Regular expressions over an alphabet Σ define languages over Σ by describing the structure of words in a language. They are based on the three regular operations: Union, concatenation and Kleene Star. They are very similar to arithmetic expressions like 3 + (4 · 5): They consist of constants and operators, and they construct complex expressions from simpler building blocks. As opposed to arithmetic expressions, their values are not numbers, but languages. Examples:

hello specifies the language consisting of the single word hello.
hello ∪ world specifies the language consisting of the two words hello and world.
(aa)∗^ specifies the language of all words consisting of an even number of as.
a∗^ ◦bb, often written just as a∗bb, specifies the set of all words consisting of any number of as followed by two bs.

Since every regular expression defines one language, we will write L(R) to denote the language defined by regular expression R.

3 Formal Definition of Regular Expressions

Regular expressions over an alphabet Σ are defined in a recursive fashion, very similarly to arithmetic expressions. We start by defining the simplest regular expressions, and then define operations to create more complex ones from simpler building blocks:

∅ is a regular expression defining the empty language, L(∅) = ∅ ⊂ Σ∗.
is a regular expression defining the language consisting only of the empty word, L() = {} ⊂ Σ∗.
If a ∈ Σ is a character, then a is a regular expression over Σ defining the language consisting of the single one-character word a, L(a) = {a} ⊂ Σ∗.

• R 1 ∪ R 2 ∗^ :=

R 1 ∪ (R 2 ∗)

. Kleene Star has precedence over union.

We also define the following shorthand notation:

If A = {a 1 , a 2 ,... , an} ⊂ Σ ∪ {} is a set of characters from Σ or the symbol , then A is a shorthand for a 1 ∪ a 2 ∪ · · · ∪ an, the regular expression denoting the language L(A) = {a 1 , a 2 ,... , an} ⊂ Σ∗. Here are some more relevant examples for regular expressions over the ASCII alphabet. In the following, let L := {A,... , Z, a,... , z} be the set of letters, and D := { 0 ,... , 9 } the set of decimal digits.
DD∗^ describes all words starting with a digit, followed by any number of digits. This is the set of all positive integers in decimal notation.
{+, - , }DD∗^ describes the language of all integer constants with an optional sign.
{+, - , }(DD∗^ ∪DD∗.D∗^ ∪D∗.DD∗)

({E, e}{+, - , }DD∗)∪

describes the language of all floating-point constants with an optional sign and exponential part, as recognized by the C programming language.

(L ∪ )(L ∪ D ∪ )∗^ describes the language of all valid identifiers in the C programming language (not taking reserved words into account).

4 Equivalence of Regular Expressions and Fi-

nite State Machines

Earlier we have claimed that the class of languages that can be described by regular expressions is exactly the class of regular languages. We are now going to prove this statement. First we will show that the language L(R) generated by any regular expression R is accepted by some NFA M. Second, we show that the language L(M ) accepted by any automaton M is generated by some regular expression R.

5 Construction of Automata from Regular Ex-

pressions

We will prove the existence of an automata that accepts the language gen- erated by a regular expression by structural induction. This means, we will

q 0

Figure 1: An NFA M accepting the empty language, L(M ) = ∅.

q 0

Figure 2: An NFA M accepting the language consisting only of the empty word, L(M ) = {}.

follow the recursive definition of regular expressions and construct automata accepting the languages generated by simple regular expressions first, and will then show how to combine those automata to accept the languages gen- erated by more complex ones. For all the following constructions, we will assume that all regular expressions are over some alphabet Σ.

5.1 Case 1: R = ∅

If R = ∅, then L(R) = ∅. The empty language is accepted by the NFA M∅ :=

{q 0 }, Σ, δ, q 0 , ∅

where δ(q 0 , a) = ∅ for all a ∈ Σ. A transition diagram for this automaton is shown in Figure 1.

5.2 Case 2: R =

If R = , then L(R) = {}. The language consisting only of the empty word is accepted by the NFA M :=

{q 0 }, Σ, δ, q 0 , {q 0 }

where δ(q 0 , a) = ∅ for all a ∈ Σ. A transition diagram for this automaton is shown in Figure 2.

5.3 Case 3: R = a

If R = a for some character a ∈ Σ, then L(R) = {a}. The language consisting only of the word a is accepted by the NFA Ma :=

{q 0 , q 1 }, Σ, δ, q 0 , {q 1 }

where

∀q ∈ {q 0 , q 1 }, x ∈ Σ : δ(q, x) =

{q 1 }, if q = q 0 and x = a ∅ otherwise

Regular Expressions: An Alternative Specification Method for Regular Languages, Study notes of Computer Science

Related documents

Partial preview of the text

Download Regular Expressions: An Alternative Specification Method for Regular Languages and more Study notes Computer Science in PDF only on Docsity!

ECS 120 Lesson 7 – Regular Expressions, Pt. 1

Oliver Kreylos

Friday, April 13th, 2001

1 Outline

• R 1 ∪ R 2 ∗^ :=

R 1 ∪ (R 2 ∗)

q 0

q 0

5.1 Case 1: R = ∅

5.2 Case 2: R = 

5.3 Case 3: R = a

5.2 Case 2: R =