CS 1530 Software Engineering Fall 2004: Scanner and Parser for Formula Parser, Study notes of Software Engineering

The role of a scanner and parser in designing a formula parser during the cs 1530 software engineering fall 2004 course. The scanner is responsible for converting the character stream into logical units (tokens), while the parser creates a parse tree from these tokens, representing the grammatical structure and precedence and associativity rules. The document also covers the use of jlex and java cup for generating scanners and parsers.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-ioa-1
koofers-user-ioa-1 🇺🇸

9 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CS 1530 Software Engineering Fall
2004
Software Engineering
CS / COE 1530
Jlex & Java CUP
CS 1530 Software Engineering Fall
2004
Designing the Formula Parser
Scanner
break input into lexical units (tokens)
A11 + A12 -> CELLID PLUS CELLID
uses regular expressions that specify the tokens
Parser
create parse tree from tokens
represents the grammatical structure
represents precdence and associativity rules
uses (context-free) grammar to represent structure
Can be written by hand
more flexible and productive to use scanner- and
parser generators
CS 1530 Software Engineering Fall
2004
Scanner Role
Scanner (aka
lexical analyzer)
symbol
table
parser
source
(formula)
token
get next
token
Scanning = converting
character stream into logical
units (aka. tokens)
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download CS 1530 Software Engineering Fall 2004: Scanner and Parser for Formula Parser and more Study notes Software Engineering in PDF only on Docsity!

CS 1530 Software 2004 Engineering Fall

Software Engineering

CS / COE 1530

Jlex & Java CUP

CS 1530 Software Engineering Fall 2004

Designing the Formula Parser

■ Scanner

■ break input into lexical units (tokens)

■ A11 + A12 -> CELLID PLUS CELLID

■ uses regular expressions that specify the tokens ■ Parser

■ create parse tree from tokens

■ represents the grammatical structure ■ represents precdence and associativity rules ■ uses (context-free) grammar to represent structure ■ Can be written by hand

■ more flexible and productive to use scanner- and

parser generators

CS 1530 Software 2004 Engineering Fall

Scanner Role

Scanner (aka lexical analyzer) symbol table (formula)^ source^ parser token get next token Scanning = converting character stream into logical units (aka. tokens)

CS 1530 Software 2004 Engineering Fall

Parser Role

lexical analyzer symbol table source parser token get next token parse tree Rest of spreadsheet

Parsing = determining whether a string of tokens

can be generated by a grammar

CS 1530 Software Engineering Fall 2004

Jlex: a scanner generator

JLex.Main

(java)

JLex.Main

(java)

javacjavac

P.main

(java)

P.main

(java)

jlex specification

xxx.jlex

xxx.jlex.java

generated scanner

xxx.jlex.java

Yylex.class

Yylex.class

input program

test.sim Output^ of^ P.main

CS 1530 Software 2004 Engineering Fall

Creation & Invokation

public class P { public static void main(String[] FileReader inFile = new FileReader args) {(args[0]); Yylex scanner = new Yylex(inFile); while (token.sym != sym.EOF) {^ Symbol token = scanner.next_token(); switch (token.sym) { case sym.INTLITERAL: System.out.println("INTLITERAL ("

  • ((IntLitTokenVal)token.value).intVal \
  • ")"); break; … } }^ token = scanner.next_token(); }

CS 1530 Software 2004 Engineering Fall

Regular expressions

■ Closely follow standard conventions.

■ most characters match themselves:

■ abc ■ == ■ while

■ characters in quotes, including special characters,

except \”, match themselves

■ “a|b” matches a|b not a or b ■ “a\”\”\tb” matches a””\tb not a””b CS 1530 Software Engineering Fall 2004

Regular-expression

operators

■ the traditional ones, plus the?

operator

| means "or"

* means zero or more instances of

+ means one or more instances of

? means zero or one instance of

() are used for grouping

CS 1530 Software 2004 Engineering Fall

Backslash is special escape

character:

\n newline

\t tab

\” double quote

To match a backslash character, put it in quotes

CS 1530 Software 2004 Engineering Fall

More operators

■ ^ matches beginning of line

^main matches string “main” only when it appears

at the beginning of line.

■ $ matches end of line

main$ matches string “main” only when it appears

at the end of line.

. (dot) matches any character except newline –

usually used in the last rule of specification to

match all “bad” characters

CS 1530 Software Engineering Fall 2004

Character classes

■ [abc] ■ matches one character (either a or b or c) ■ [a-z] ■ matches any character between a and z, inclusive ■ [^abc] ■ matches any character except a, b, or c. ■ ^ has special meaning only at 1 st^ position in […] ■ [\t\] ■ matches tab or
■ [a bc] is equivalent to a|" "|b|c ■ white-space in char class and strings matches itself CS 1530 Software 2004 Engineering Fall

JLex directives

■ specified in the second part of xxx.jlex. ■ can also specify (see the manual for details) ■ the value to be returned on end-of-file, ■ that line counting should be turned on, and ■ that the scanner will be used with the parser generator java cup. ■ directives includes macro definitions (very useful): ■ ■ name is any valid Java identifier ■ DIGIT= [0-9] ■ LETTER= [a-zA-Z] ■ WHITESPACE= [ \t\n] ■ To use a macro, use its name inside curly braces. ■ {LETTER}({LETTER}|{DIGIT})*

CS 1530 Software 2004 Engineering Fall

Another example

{DIGIT}+ {

int val = (new Integer(yytext())).intValue(); Symbol S = new Symbol(sym.INTLITERAL, new IntLitTokenVal(yyline+1, CharNum.num, val)); CharNum.num += yytext().length(); return S; } {WHITESPACE}+ {CharNum.num += yytext().length();} CS 1530 Software 2004 Engineering Fall

Java CUP

Parser generator -

generates a bottom up

parser

CS 1530 Software 2004 Engineering Fall

What is Java CUP

■ a parser generator

■ CUP = Construction of Useful Parsers

■ Java successor to yacc

■ a standard UNIX parser generator

■ yacc = Yet Another Compiler Compiler

Java_CUP.Main

java Java_CUP.Main < xxx.cup

xxx.cup

parser specification

parser.java

sym.java

CS 1530 Software 2004 Engineering Fall

Input to Java CUP

■ The specification includes:

■ optional package and import declarations ■ optional user code ■ terminal and nonterminal declarations ■ optional precedence and associativity declarations ■ grammar rules with associated actions CS 1530 Software Engineering Fall 2004

Example of calling the parser

FileReader inFile = new FileReader(args[0]); parser P = new parser(new Yylex(inFile)); Symbol root=null; // the parser will return a Symbol whose value // field's type is the type associated with the // root nonterminal (i.e., with the nonterminal // "program") root = P.parse(); // do the parse CS 1530 Software 2004 Engineering Fall

Declarations of terminals and

nonterminals

■ all terminals and nonterminals used by your grammar must be declared terminal type name1, name2, ... ; non terminal type name1, name2, ... ; non terminal name1, name2, ... ;

■ all terminals must declare their type

■ type = the type of Symbol. value returned by scanner

■ non terminals with a value must also declare their

type

CS 1530 Software 2004 Engineering Fall

Precedence declarations again

terminal UMINUS, LPAREN, RPAREN, … ;

precedence left PLUS, MINUS;

precedence left TIMES, DIVIDE, MOD;

precedence left UMINUS;

exp ::= exp PLUS exp

| exp DIVIDE exp

| MINUS exp %prec UMINUS