

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
DFA minimization, Lexical analyzers, Lexical analyzers generators, Using flex, Large number of states, Hopcroft algorithm, Groups of equivalent states, Optimized acceptor are the points from this lecture. You can find series of lecture notes for compiler construction here.
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


The generated DFA may have a large number of states. The Hopcroft’s algorithm can be used to minimize DFA states. The behind the algorithm is to find groups of equivalent states. All transitions from states in one group G 1 go to states in the same group G2. Construct the minimized DFA such that there is one state for each group of states from the initial DFA. Here is the minimized version of the DFA created earlier; states A and C have been merged.
b a
b b
a
b
a
a
We can construct an optimized acceptor with the following structure:
Lexical analyzers (scanners) use the same mechanism but they have multiple RE descriptions for multiple tokens and have a character stream at the input. The lexical analyzer returns a sequence of matching tokens at the output (or an error) and it always return the longest matching token.
Lexical Analyzer Generators
The process of constructing a lexical analyzer can automated. We only need to specify Regular expressions for tokens and rules for assigning priorities for multiple longest match cases, e.g, “==” and “=”, “==” is longer.
Two popular lexical analyzer generators are
Using Flex
We will use for the projects in this course. To use Flex, one has to provide a specification file as input to Flex. Flex reads this file and produces an output file contains the lexical analyzer source in C or C++.
The input specification file consists of three sections: C or C++ and flex definitions %% token definitions and actions %% user code
The symbols “%%” mark each section. A detailed guide to Flex is included in supplementary reading material for this course. We will go through a simple example.
The following is the Flex specification file for recognizing tokens found in a C++ function. The file is named “lex.l”; it is customary to use the “.l” extension for Flex input files.
%{ #include “tokdefs.h” %} D [0-9] L [a-zA-Z_] id {L}({L}|{D})* %% "void" {return(TOK_VOID);} "int" {return(TOK_INT);} "if" {return(TOK_IF);} Specification File lex.l "else" {return(TOK_ELSE);} "while"{return(TOK_WHILE)}; "<=" {return(TOK_LE);}