

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Languages, Alphabet, Set of strings of charaters, Finite sequence of character, Regular expression, Finite automation, Set of transitions, Set of accepting states are the points from this lecture. You can find series of lecture notes for compiler construction here.
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


How to Describe Tokens?
Regular Languages are the most popular for specifying tokens because
Languages
Let Σ ?be a set of characters. Σ is called the alphabet. A language over Σ is set of strings of characters drawn from Σ.?Here are some examples of languages:
Languages are sets of strings (finite sequence of characters). We need some notation for specifying which sets we want. For lexical analysis we care about regular languages. Regular languages can be described using regular expressions. Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A.
Regular Expression
A regular expression ( RE ) is defined inductively a ordinary character from Σ ε the empty string
R|S either R or S RS R followed by S (concatenation) R* concatenation of R zero or more times (R* = ε|R|RR|RRR...)
Regular expression extensions are used as convenient notation of complex RE:
R? ε | R (zero or one R) R+^ RR* (one or more R) (R) R (grouping) [abc] a|b|c (any of listed) [a-z] a|b|....|z (range) [^ab] c|d|... (anything but ‘a’‘b’)
Here are some Regular Expressions and the strings of the language denoted by the RE.
RE Strings in L(R) a “a” ab “ab” a|b “a” “b” (ab)* “” “ab” “abab” ... (a|ε)b “ab” “b”
Here are examples of common tokens found in programming languages.
digit ‘0’|’1’|’2’|’3’|’4’|’5’|’6’|’7’|’8’|’9’ integer digit digit* identifier [a-zA- Z_][a- zA-Z0-9_]*
Finite Automaton
We need mechanism to determine if an input string w belongs to L(R), the language denoted by regular expression R. Such a mechanism is called an acceptor.
The acceptor is based on Finite Automata (FA). A Finite Automaton consists of
A finite automaton accepts a string if we can follow transitions labeled with characters in the string from start state to some accepting state. Here are some examples of FA.