Building a Recursive Descent Parser: Understanding BNF and Java Code, Slides of Programming Languages

An in-depth explanation of how to build a recursive descent parser using bnf (backus-naur form) and java code. It covers the basics of bnf, extended bnf, recognizing simple alternatives, helper methods, and sequences. The document also discusses the importance of the dry (don't repeat yourself) principle and provides java code examples.

Typology: Slides

2012/2013

Uploaded on 09/29/2013

dhanvant
dhanvant 🇮🇳

4.9

(9)

89 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Recognizers
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Building a Recursive Descent Parser: Understanding BNF and Java Code and more Slides Programming Languages in PDF only on Docsity!

Recognizers

Parsers and recognizers

 Given a grammar (say, in BNF) and a string,

 A recognizer will tell whether the string belongs to the language defined by the grammar  A parser will try to build a tree corresponding to the string, according to the rules of the grammar

Input string Recognizer result Parser result

2 + 3 * 4 true

2 + 3 * false Error

Review of BNF

 “Plain” BNF

 < > indicate a nonterminal that needs to be further expanded, for example,  Symbols not enclosed in < > are terminals; they represent themselves, for example, if, while, (  The symbol ::= means is defined as  The symbol | means or; it separates alternatives, for example, ::= + | -

 Extended BNF

 [ ] enclose an optional part of the rule  Example: ::= if ( ) [ else ]  { } mean the enclosed can be repeated zero or more times  Example: ::= ( ) | ( { , } )

Recognizing simple alternatives, I

 Consider the following BNF rule:

 <add_operator> ::= + | -  That is, an add operator is a plus sign or a minus sign

 To recognize an add operator, we need to get the next token,

and test whether it is one of these characters

 If it is a plus or a minus, we simply return true  But what if it isn’t?  We not only need to return false, but we also need to put the token back because it doesn’t belong to us, and some other grammar rule probably wants it

 Our tokenizer needs to be able to take back tokens

 Usually, it’s enough to be able to put just one token back  More complex grammars may require the ability to put back several tokens

Java code

 public boolean addOperator() { Token t = myTokenizer.next(); if (t.type == Type.SYMBOL && t.value.equals("+")) { return true; } if (t.type == Type.SYMBOL && t.value.equals("-")) { return true; } myTokenizer.pushBack(1); return false; }

 While this code isn’t particularly long or hard to read, we are going to have a lot of very similar methods

Helper methods

 Remember the DRY principle: Don’t Repeat Yourself

 If we turn each BNF production directly into Java, we will be

writing a lot of very similar code

 We should write some auxiliary or “helper” methods to hide

some of the details for us

 First helper method:

 private boolean symbol(String expectedSymbol)

 Gets the next token and tests whether it matches the expectedSymbol  If it matches, returns true  If it doesn’t match, puts the symbol back and returns false

 We’ll look more closely at this method in a moment

First implementation of symbol

 Here’s what symbol does:

 Gets a token  Makes sure that the token is a symbol  Compares the symbol to the desired symbol (by value)  If all the above is satisfied, returns true  Else (if not satisfied) puts the token back, and returns false

 private boolean symbol(String value) { Token t = tokenizer.next(); if (t.type == Type.SYMBOL && value.equals(t.value())) { return true; } else { tokenizer.pushBack(1); return false; } }

Implementing symbol

 We can implement methods name, number, and maybe eol

the same way

 All this code will look pretty much alike

 The main difference is in checking for the type  The DRY principle suggests we should use a helper method for symbol

 private boolean symbol(String expectedValue) { return nextTokenMatches(Type.SYMBOL, expectedValue); }

nextTokenMatches

 The previous method is fine for symbols, but what if we only care

about the type?

 For example, we want to get a number— any number  We need to compare only type, not value  private boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); omit this parameter if (type == t.type() && value.equals(t.getValue())) return true; else tokenizer.pushBack(1); omit this test return false; }

 It’s easier to overload nextTokenMatches than to combine the

two versions, and both versions are fairly short, so we are

probably better off with the code duplication

addOperator reprise

 public boolean addOperator() { return symbol("+") || symbol("-"); }

 private boolean symbol(String expectedValue) { return nextTokenMatches(Type.SYMBOL, expectedValue); }

 private boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); if (type == t.type() && value.equals(t.value())) return true; else tokenizer.pushBack(1); return false; }

Sequences, II

 The grammar rule is <empty_list> ::= “[” “]”

 And the token string contains [ 5 ]

Solution #1: Write a pushBack method that push back more than one token at a time  This will allow you to put the back both the “[” and the “ 5 ”  You have to be very careful of the order in which you return tokens  This is a good use for a Stack  Solution #2: Call it an error  You might be able to get away with this, depending on the grammar  For example, for any reasonable grammar, (2 + 3 +) is clearly an error  Solution #3: Change the grammar  Tricky, and may not be possible  Solution #4: Combine rules  See the next slide

Implementing a fancier pushBack()

 java.io.StreamTokenizer does almost everything you need in

a tokenizer

 Its pushBack() method only “puts back” a single token

 If you need more than that, you have to extend

StreamTokenizer

 To push back more tokens than one, you need to either:

 Make your tokenizer keep track of the last several tokens (and have a pushBack(int n) method, or  Expect the calling program to tell you what tokens to push back (with a pushBack(Token t) method)

 Plus, you will have to override nextToken()

 Inside your nextToken() method, you can call super.nextToken() to get the next never-before-seen token  Your nextToken() method will also have to do something about nval and sval, such as provide methods to get these values

Sequences, IV

 Another possibility is to revise the grammar (but make sure the

new grammar is equivalent to the old one!)

 Old grammar:

::= “[” “]” | “[” “]”

 New grammar:

::= “[” <rest_of_list> <rest_of_list> ::= “]” | “]”

 New pseudocode:

 public boolean list() { if first token is “[” { if restOfList() return true } else put back first token }

 private boolean restOfList() { if first token is “]”, return true if first token is a number and second token is a “]”, return true else return false }

Simple sequences in Java

 Suppose you have this rule:

::= ( )

 A good way to do this is often to test whether the grammar rule

is not met

 public boolean factor() { if (symbol("(")) { if (!expression()) error("Error in parenthesized expression"); if (!symbol(")")) error("Unclosed parenthetical expression"); return true; } return false; }

 To do this, you need to be careful that the “(” is not the start of some other production that can be used where a factor can be used  In other words, be sure that if you get a “(” it must start a factor

 Also, error(String) must throw an Exception—why?