




















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth explanation of how to build a recursive descent parser using bnf (backus-naur form) and java code. It covers the basics of bnf, extended bnf, recognizing simple alternatives, helper methods, and sequences. The document also discusses the importance of the dry (don't repeat yourself) principle and provides java code examples.
Typology: Slides
1 / 28
This page cannot be seen from the preview
Don't miss anything!





















A recognizer will tell whether the string belongs to the language defined by the grammar A parser will try to build a tree corresponding to the string, according to the rules of the grammar
“Plain” BNF
< > indicate a nonterminal that needs to be further expanded, for example,
Extended BNF
[ ] enclose an optional part of the rule Example:
<add_operator> ::= + | - That is, an add operator is a plus sign or a minus sign
If it is a plus or a minus, we simply return true But what if it isn’t? We not only need to return false, but we also need to put the token back because it doesn’t belong to us, and some other grammar rule probably wants it
Usually, it’s enough to be able to put just one token back More complex grammars may require the ability to put back several tokens
public boolean addOperator() { Token t = myTokenizer.next(); if (t.type == Type.SYMBOL && t.value.equals("+")) { return true; } if (t.type == Type.SYMBOL && t.value.equals("-")) { return true; } myTokenizer.pushBack(1); return false; }
While this code isn’t particularly long or hard to read, we are going to have a lot of very similar methods
Gets the next token and tests whether it matches the expectedSymbol If it matches, returns true If it doesn’t match, puts the symbol back and returns false
Gets a token Makes sure that the token is a symbol Compares the symbol to the desired symbol (by value) If all the above is satisfied, returns true Else (if not satisfied) puts the token back, and returns false
private boolean symbol(String value) { Token t = tokenizer.next(); if (t.type == Type.SYMBOL && value.equals(t.value())) { return true; } else { tokenizer.pushBack(1); return false; } }
The main difference is in checking for the type The DRY principle suggests we should use a helper method for symbol
private boolean symbol(String expectedValue) { return nextTokenMatches(Type.SYMBOL, expectedValue); }
For example, we want to get a number— any number We need to compare only type, not value private boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); omit this parameter if (type == t.type() && value.equals(t.getValue())) return true; else tokenizer.pushBack(1); omit this test return false; }
public boolean addOperator() { return symbol("+") || symbol("-"); }
private boolean symbol(String expectedValue) { return nextTokenMatches(Type.SYMBOL, expectedValue); }
private boolean nextTokenMatches(Type type, String value) { Token t = tokenizer.next(); if (type == t.type() && value.equals(t.value())) return true; else tokenizer.pushBack(1); return false; }
Solution #1: Write a pushBack method that push back more than one token at a time This will allow you to put the back both the “[” and the “ 5 ” You have to be very careful of the order in which you return tokens This is a good use for a Stack Solution #2: Call it an error You might be able to get away with this, depending on the grammar For example, for any reasonable grammar, (2 + 3 +) is clearly an error Solution #3: Change the grammar Tricky, and may not be possible Solution #4: Combine rules See the next slide
Make your tokenizer keep track of the last several tokens (and have a pushBack(int n) method, or Expect the calling program to tell you what tokens to push back (with a pushBack(Token t) method)
Inside your nextToken() method, you can call super.nextToken() to get the next never-before-seen token Your nextToken() method will also have to do something about nval and sval, such as provide methods to get these values
::= “[” “]” | “[”
::= “[” <rest_of_list> <rest_of_list> ::= “]” |
public boolean list() { if first token is “[” { if restOfList() return true } else put back first token }
private boolean restOfList() { if first token is “]”, return true if first token is a number and second token is a “]”, return true else return false }
public boolean factor() { if (symbol("(")) { if (!expression()) error("Error in parenthesized expression"); if (!symbol(")")) error("Unclosed parenthetical expression"); return true; } return false; }
To do this, you need to be careful that the “(” is not the start of some other production that can be used where a factor can be used In other words, be sure that if you get a “(” it must start a factor