Download Lexical Analyzer and Parser in LR Parsing | CS 5300 and more Study notes Computer Science in PDF only on Docsity!
LR Parsing
CS 5300 - SJAllan 2
Lexical Analyzer and Parser
Lexical
Analyzer
Parser
token
get next
token
source
program
CS 5300 - SJAllan 3
Shift-Reduce or LR(k) Parsing
Bottom-up parsing technique
LR(k) parsing
– L – scan the input from left-to-right
– R – construct a right-most derivation
– k – number of look-ahead symbols needed
Called a shift-reduce parsing method
– Shift and reduce are the major actions done by
the parser
CS 5300 - SJAllan 4
LR(k) Parsers
Advantages
– Can be constructed for virtually all programming
language constructs for which a CFG can be written
– Is more general than other parsers
Better than LL(k) parsers
– Can detect syntactic errors as soon as possible on a
left-to-right scan of the input
Disadvantages
– Too much work to implement by hand; you must use a
parser generator
CS 5300 - SJAllan 7
Why LR(k) Grammars are Useful
The parser know when to cease scanning given a
sentential form αβγ, i.e., it can detect the boundary
between β and γ
The parser is able to identify the handle β
The parser is able to uniquely select a production
A → β that corresponds to the handle and to this
sentential form
– Grammars can be LR(k) and yet have productions A →
β, B → β with the same right-hand side
The parser knows when to stop
CS 5300 - SJAllan 8
LR(K) (Shift-Reduce) Parsing
Parsing performed by a finite state machine
Parsing algorithm is language-independent
FSM driven by table (s) generated
automatically from grammar
Language Generator Tables
CS 5300 - SJAllan 9
Model of LR Parser
a 1 … ai … an $
LR
Parsing
Program
action goto
Output
sm
xm
sm-
xm-
s 0
Parse table
Stack
Input
CS 5300 - SJAllan 10
LR Parsing Algorithm
set ip to point to the first symbol in ω$ initialize stack to s (^0) repeat forever let s be topmost state on stack let a be symbol pointed to by ip if action[s,a] = shift s’ push a then s’ onto stack advance ip to next input symbol else if action[s,a] = reduce A Æ B pop 2*|B| symbols of stack let s’ be state now on top of stack push A then goto[s’,A] onto stack output production A Æ B else if action[s,a] == accept return success else error()
CS 5300 - SJAllan 13
Another Example
(14) 0 E 1 $ accept
(13) 0 E 1 + 6 T 9 $ reduce by E Æ E + T
(12) 0 E 1 + 6 F 3 $ reduce by T Æ F
(11) 0 E 1 + 6 id 5 $ reduce by F Æ id
(10) 0 E 1 + 6 id $ shift
(9) 0 E 1 + id $ shift
(8) 0 T 2 + id $ reduce by E Æ T
(7) 0 T 2 * 7 F 10 + id $ reduce by T Æ T * F
(6) 0 T 2 * 7 id 5 + id $ reduce by F Æ id
(5) 0 T 2 * 7 id + id $ shift
(4) 0 T 2 * id + id $ shift
(3) 0 F 3 * id + id $ reduce by T Æ F
(2) 0 id 5 * id + id $ reduce by F Æ id
(1) 0 id * id + id $ shift
Stack Input Action
CS 5300 - SJAllan 14
Yet Another Example (G
4. T→F 9. F→e
3. T→T F 8. F→b
2. E→T 7.^ F→a
1. E→E+T 6. F→(E)
0. E’→E$ 5. F→F*
12 S5 S6 S7 S4 R1 R1 R1 9 13 R6 R6 R6 R6 R6 R6 R6 R
11 S13 S
10 R5 R5 R5 R5 R5 R5 R5 R
9 R3 R3 R3 R3 R3 R3 S10 R
8 S5 S6 S7 S4 12 3
7 R9 R9 R9 R9 R9 R9 R9 R
6 R8 R8 R8 R8 R8 R8 R8 R
5 R7 R7 R7 R7 R7 R7 R7 R
4 S5 S6 S7 S4 11 2 3
3 R4 R4 R4 R4 R4 R4 S10 R
2 S5 S6 S7 S4 R2 R2 R2 9
1 S8 acc
0 S5 S6 S7 S4 1 2 3
a b e ( ) + * $ E T F
action goto
Grammar
Parse Table
CS 5300 - SJAllan 15
Yet Another Example
0 4 11 8 3 )*b$ R
0 4 11 8 5 )*b$ R7 0 1 $ acc
0 4 11 8 a)*b$ S5 0 2 $ R
0 4 11 +a)*b$ S8 0 2 9 $ R
0 4 2 +a)*b$ R2 0 2 6 $ R
0 4 2 9 +a)*b$ R3 0 2 b$ S
0 4 2 6 +a)*b$ R8 0 3 b$ R
0 4 2 b+a)*b$ S6 0 3 10 b$ R
0 4 3 b+a)*b$ R4 0 3 *b$ S
0 4 5 b+a)*b$ R7 0 4 11 13 *b$ R
0 4 ab+a)b$ S5 0 4 11 )b$ S
0 (ab+a)b$ S4 0 4 11 8 12 )b$ R
Stack Input Action Stack Input Action
CS 5300 - SJAllan 16
LR(k) Parsing
The only difference between parsing one LR
language and another is the information in
the action and goto fields of the parse table
The parsing algorithm is always the same
CS 5300 - SJAllan 19
Item Set Construction
Start operation:
– If S is the start symbol, and S→α is some production,
the item [S→⋅α] is associated with the start state
Completion operation (closure):
– If [A→α⋅Bγ], where B∈Vn , is an item in some state I,
then every item of the form [B→⋅β] must be included in
state I. This rule is repeated until no more new items
can be added to state I
Read operation:
– Let [A→α⋅Xγ], where X∈(Vn∪Vt ), be an item associated
with some state I. Then [A→αX⋅γ] is associated with
state J (possibly the same as I), and a transition from I
to J on symbol X exists
CS 5300 - SJAllan 20
Construction of Finite State System
1. Give the start state a number and use the start operation
to put one item into it
- Use the completion operation to get more items into this state
Eventually the completion operation has to end
2. Use the read operation to start one or more new states,
based on the present state
- It is possible for this new state to be equivalent to some previous state
If so, these states are merged
3. Complete the new state started previously by applying the
completion operation
4. Repeat steps 2 and 3 until no new states are added
CS 5300 - SJAllan 21
Example of LR(0) Items
0. S’→S
- S→aSbS
- S→a
- [S’→⋅S] 3. [S→aS⋅bS] [S→⋅aSbS] [S→⋅a] 4. [S→aSb⋅S] [S→⋅aSbS]
- [S’→S⋅] [S→⋅a]
- [S→a⋅SbS] 5. [S→aSbS⋅] [S→a⋅] [S→⋅aSbS] [S→⋅a]
a b S
Grammar
LR(0) Items
Transition Table
CS 5300 - SJAllan 22
Inadequate or Inconsistent States
Definition:
– Any state containing both a completed item
[A→α⋅] and any other item is said to be
inadequate or inconsistent
– Such a state represents a conflict in a parsing
decision
If the item sets contain no inadequate
states, the grammar is said to be LR(0)
CS 5300 - SJAllan 25
FOLLOW
FOLLOW(A), for A∈Vn, is the set of
terminals a that can appear immediately to
the right if A in some sentential form
More formally, a is in FOLLOW(A) if and
only if there exists a derivation of the form
S⇒
αAaβ
$ is in FOLLOW(A) if and only if there
exists a derivation of the form S⇒
αA
CS 5300 - SJAllan 26
Computing FOLLOW
Place $ in FOLLOW(S)
If there is a production A Æ αBβ, then
everything in FIRST(β) (except for ε) is in
FOLLOW(B)
If there is a production A Æ αB, or a
production A Æ αBβ where FIRST(β)
contains ε,then everything in FOLLOW(A) is
also in FOLLOW(B)
CS 5300 - SJAllan 27
FIRST and FOLLOW Example
E Æ TE’
E’ Æ +TE’ | ε
T Æ FT’
T’ Æ *FT’ | ε
F Æ (E) | id
FIRST(E) = FIRST(T) = FIRST(F) = {(, id }
FIRST(E’) = {+, ε}
FIRST(T’) = {*, ε}
FOLLOW(E) = FOLLOW(E’) = {), $}
FOLLOW(T) = FOLLOW(T’) = {+, ), $}
FOLLOW(F) = {+, *, $}
CS 5300 - SJAllan 28
SLR(1) Parse Table Construction
1. Construct LR(0) item sets {I 0 , …, In }
2. State i is constructed from I i:
a. If [A→α⋅aβ] ∈ Ii , goto(Ii ,a) = Ij , and a ∈ Vt, then set action [i,a] = shift j b. If [A→α⋅] ∈ Ii then action [i,a] = reduce A→α ∀ Follow (A) c. If [S’→S⋅] ∈ Ii , then action [i,$] = accept
3. The goto transitions for state i are constructed for all
nonterminals A using the rule: If goto(Ii,A) = Ij, then
goto [i,A] = j
4. All entires not defined by rules 2 and 3 are made error
5. The intial state of the parser is the one constructed from
the set of items containing [S’→S⋅]
CS 5300 - SJAllan 31
Item Sets Using G
I 0 : [E’→⋅E] I 4 : [F→(⋅E)] I 9 : [T→TF⋅]
[E→⋅E+T] [E→⋅E+T] [F→F⋅∗]
[E→⋅T] [E→⋅T] I 10 : [F→F∗⋅]
[T→⋅TF] [T→⋅TF] I 11 : [F→(E⋅)]
[T→⋅F] [T→⋅F] [E→E⋅+T]
[F→⋅F∗] [F→⋅F∗] I 12 : [E→E+T⋅]
[F→⋅(E)] [F→⋅(E)] [T→T⋅F]
[F→⋅a] [F→⋅a] [F→⋅F∗] [F→⋅b] [F→⋅b] [F→⋅(E)] [F→⋅e] [F→⋅e] [F→⋅a] I 1 : [E’→E⋅] I 5 : [F→a⋅] [F→⋅b] [E→E⋅+T] I 6 : [F→b⋅] [F→⋅e] I 2 : [E→T⋅] I 7 : [F→e⋅] I 13 : [F→(E)⋅] [T→T⋅F] I 8 : [E→E+⋅T] [F→⋅F∗] [T→⋅TF] [F→⋅(E)] [T→⋅F] [F→⋅a] [F→⋅F∗] [F→⋅b] [F→⋅(E)] [F→⋅e] [F→⋅a] I 3 : [T→F⋅] [F→⋅b] [F→F⋅∗] [F→⋅e]
Follow(E) = {+, ), $} Follow(T) = {+, ), $, (, a, b, e} Follow(F) = {+, ), $, ∗, (, a, b, e}
The SLR(1) table is shown on slide 14
CS 5300 - SJAllan 32
Grammar G
10 R
9 R
8 R
7 R5/S10 R
6 S
5 R
4 R5 S8/R
3 S7 6
2 S
1 acc
0 S4 S3 1 2
a b c d $ S A
- S’→S action^ goto
- S→Aa
- S→dAb
- S→dca
- S→cb
- A→c Grammar G 3
SLR(1) Parse Table
CS 5300 - SJAllan 33
Looking at G
Consider the string cb
We run into problems in state 4
– Notice that we can never have a b following an
A (under the assumption we reduce)
– Ab is not a viable prefix
A viable prefix is so called because it is
always possible to add terminal symbols to
the end of a viable prefix to obtain a right
sentential form
CS 5300 - SJAllan 34
Construction of LR(1) Parse Table
Using LR(0) item, inadequate are resolved as
follows:
– On the item [A→α⋅], the reduce operation is used on the
Follow(A)
– As we have seen, this doesn’t always work
We want to carry more information in the item
– We will add look ahead symbols
– [A→α⋅β,a] where
A → αβ a is a terminal or $
CS 5300 - SJAllan 37
LR(1) Parsing Tables – Example 1
- [S’→⋅S,$] 1. [S’→S⋅,$] 2. [S→A⋅a,$] [S→⋅Aa,$] [S→⋅dAb,$] [S→⋅dca,$] [S→⋅cb,$] [A→⋅c,a]
- [S→d⋅Ab,$] 4. [S→c⋅b,$] 5. [S→Aa⋅,$] [S→d⋅ca,$] [A→c⋅,a] [A→⋅c,b]
- [S→dA⋅b,$] 7. [S→dc⋅a,$] 8. [S→cb⋅,$] [A→c⋅,b]
- [S→dAb⋅,$] 10. [S→dca⋅,$]
0. S’→S
- S→Aa
- S→dAb
- S→dca
- S→cb
- A→c
CS 5300 - SJAllan 38
LR(1) Parsing Tables – Example 1
10 R
9 R
8 R
7 S10 R
6 S
5 R
4 R5 S
3 S7 6
2 S
1 acc
0 S4 S3 1 2
a b c d $ S A
action goto
CS 5300 - SJAllan 39
LR(1) Parsing Tables – Example 2
0. [S’→⋅S,$] 1. [S’→S⋅,$] 2. [S→C⋅C,$]
[S→⋅CC,$] [C→⋅eC,$] [C→⋅eC,e/d] [C→⋅d,$] [C→⋅d,e/d]
- [C→e⋅C,e/d] 4. [C→d⋅,e/d] 5. [S→CC⋅,$] [C→eC,e/d] [C→⋅d,e/d]
- [C→e⋅C,$] 7. [C→d⋅,$] 8. [C→eC⋅,e/d] [C→⋅eC,$] [C→⋅d,$]
- [C→eC⋅,$]
0. S’→S
1. S→CC
- C→eC
- C→d
CS 5300 - SJAllan 40
LR(1) Parsing Tables – Example 2
9 R
8 R2 R
7 R
6 S6 S7 9
5 R
4 R3 S
3 S3 S4 8
2 S6 S7 5
1 acc
0 S3 S4 1 2
e d $ S C
action goto