java programming as a case study, Study notes of Java Programming

java programming as case study

Typology: Study notes

2017/2018

Uploaded on 02/12/2018

dheeraj-kharwar
dheeraj-kharwar 🇮🇳

1 document

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Implementation of Lexical Analysis
Compiler Design 1 (2011) 2
Outline
Specifying lexical structure using regular
expressions
Finite automata
Deterministic Finite Automata (DFAs)
Non-deterministic Finite Automata (NFAs)
Implementation of regular expressions
RegExp NFA DFA Tables
Compiler Design 1 (2011) 3
Notation
For convenience, we use a variation (allow user-
defined abbreviations) in regular expression
notation
Union: A + B A | B
•Option: A + ε≡A?
Range: ‘a’+’b’+…+’z’ [a-z]
Excluded range:
complement of [a-z] [^a-z]
Compiler Design 1 (2011) 4
Regular Expressions in Lexical Specification
Last lecture: a specification for the predicate
s L(R)
But a yes/no answer is not enough !
Instead: partition the input into tokens
We will adapt regular expressions to this goal
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download java programming as a case study and more Study notes Java Programming in PDF only on Docsity!

Implementation of Lexical Analysis

Compiler Design 1 (2011) 2

Outline

  • Specifying lexical structure using regular expressions
  • Finite automata
    • Deterministic Finite Automata (DFAs)
    • Non-deterministic Finite Automata (NFAs)
  • Implementation of regular expressions RegExp ⇒ NFA ⇒ DFA ⇒ Tables

Notation

  • For convenience, we use a variation (allow user- defined abbreviations) in regular expression notation
  • Union: A + B ≡ A | B
  • Option: A + ε ≡ A?
  • Range: ‘a’+’b’+…+’z’ ≡ [a-z]
  • Excluded range: complement of [a-z] ≡ [^a-z]

Regular Expressions in Lexical Specification

  • Last lecture: a specification for the predicate s ∈ L(R)
  • But a yes/no answer is not enough!
  • Instead: partition the input into tokens
  • We will adapt regular expressions to this goal

Compiler Design 1 (2011) 5

Regular ExpressionsLexical Spec. (1)

  1. Select a set of tokens
    • Integer, Keyword, Identifier, OpenPar, ...
  2. Write a regular expression (pattern) for the lexemes of each token
    • Integer = digit +
    • Keyword = ‘if’ + ‘else’ + …
    • Identifier = letter (letter + digit)*
    • OpenPar = ‘(‘

Compiler Design 1 (2011) 6

Regular ExpressionsLexical Spec. (2)

  1. Construct R, matching all lexemes for all tokens

R = Keyword + Identifier + Integer + … = R 1 + R 2 + R 3 + …

Facts: If s ∈ L(R) then s is a lexeme

  • Furthermore s ∈ L(Ri ) for some “i”
  • This “i” determines the token that is reported

Regular ExpressionsLexical Spec. (3)

  1. Let input be x 1 …xn
    • (x 1 ... x (^) n are characters)
    • For 1 ≤ i ≤ n check x 1 …x (^) i ∈ L(R)?
  2. It must be that x 1 …x (^) i ∈ L(Rj ) for some j (if there is a choice, pick a smallest such j)
  3. Remove x 1 …x (^) i from input and go to previous step

How to Handle Spaces and Comments?

  1. We could create a token Whitespace Whitespace = (‘ ’ + ‘\n’ + ‘\t’)+
  • We could also add comments in there
  • An input “ \t\n 5555 “ is transformed into Whitespace Integer Whitespace
  1. Lexer skips spaces (preferred)
    • Modify step 5 from before as follows: It must be that xk ... x (^) i ∈ L(Rj ) for some j such that x1 ... x (^) k-1 ∈ L(Whitespace)
    • Parser is not bothered with spaces

Compiler Design 1 (2011) 13

Regular Languages & Finite Automata

Basic formal language theory result :

Regular expressions and finite automata both

define the class of regular languages.

Thus, we are going to use:

  • Regular expressions for specification
  • Finite automata for implementation (automatic generation of lexical analyzers)

Compiler Design 1 (2011) 14

Finite Automata

A finite automaton is arecognizer for the

strings of a regular language

A finite automaton consists of

  • A finite input alphabet Σ
  • A set of states S
  • A start state n
  • A set of accepting states F ⊆ S
  • A set of transitions state →input^ state

Finite Automata

  • Transition s 1 →a^ s 2
  • Is read In state s 1 on input “a” go to state s 2
  • If end of input (or no transition possible)
    • If in accepting state ⇒ accept
    • Otherwise ⇒ reject

Finite Automata State Graphs

  • A state
  • The start state
  • An accepting state
  • A transition

a

Compiler Design 1 (2011) 17

A Simple Example

  • A finite automaton that accepts only “1”

1

Compiler Design 1 (2011) 18

Another Simple Example

  • A finite automaton accepting any number of 1’s followed by a single 0
  • Alphabet: {0,1}

0

1

And Another Example

  • Alphabet {0,1}
  • What language does this recognize?

0

1 0

1

0

1

And Another Example

  • Alphabet still { 0, 1 }
  • The operation of the automaton is not completely defined by the input - On input “11” the automaton could be in either state

1

1

Compiler Design 1 (2011) 25

NFA vs. DFA (1)

  • NFAs and DFAs recognize the same set of languages (regular languages)
  • DFAs are easier to implement
    • There are no choices to consider

Compiler Design 1 (2011) 26

NFA vs. DFA (2)

  • For a given language the NFA can be simpler than the DFA (^1 ) 0

0

1 0 1

0

1

NFA

DFA

  • DFA can be exponentially larger than NFA

Regular Expressions to Finite Automata

  • High-level sketch

Regular expressions

NFA

DFA

Lexical Specification

Table-driven Implementation of DFA

Regular Expressions to NFA (1)

  • For each kind of reg. expr, define an NFA
    • Notation: NFA for regular expression M

M

  • For ε ε
  • For input a a

Compiler Design 1 (2011) 29

Regular Expressions to NFA (2)

  • For AB

A ε B

  • For A + B

A

B ε

ε

ε

ε

Compiler Design 1 (2011) 30

Regular Expressions to NFA (3)

  • For A*

ε^ A

ε

ε

Example of Regular Expression → NFA conversion

  • Consider the regular expression (1+0)*
  • The NFA is

ε

C 1 E

D 0 F

ε ε

B ε

ε G

ε

ε

ε

A H I 1 J

NFA to DFA. The Trick

  • Simulate the NFA
  • Each state of DFA = a non-empty subset of states of the NFA
  • Start state = the set of NFA states reachable through ε-moves from NFA start state
  • Add a transition S →a^ S’ to DFA iff
    • S’ is the set of NFA states reachable from any state in S after seeing the input a - considering ε-moves as well

Implementation (Cont.)

  • NFA → DFA conversion is at the heart of tools such as lex, ML-Lex or flex
  • But, DFAs can be huge
  • In practice, lex/ML-Lex/flex-like tools trade off speed for space in the choice of NFA and DFA representations

Theory vs. Practice

Two differences:

  • DFAsrecognize lexemes. A lexer must return

atype of acceptance (token type) rather than

simply an accept/reject indication.

  • DFAs consume the complete string and accept

or reject it. A lexer mustfind the end of the

lexeme in the input stream and then find the

next one, etc.