Understanding Syntax in Programming Languages: Context-Free Grammars and Parse Trees, Study notes of Programming Languages

An introduction to the fundamentals of programming languages, focusing on syntax and semantics. It covers the concepts of syntax, semantics, and programming language implementation. The document also delves into describing language syntax using lexical grammar, context-free grammar, and backus-naur form (bnf). The importance of syntax description is discussed, and the document explains how bnfs are used to express context-free grammars. Examples and exercises are included to help students understand the concepts.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-afi
koofers-user-afi 🇺🇸

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
cs3723 1
Fundamantals (1)
Syntax of Programming
Languages
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Understanding Syntax in Programming Languages: Context-Free Grammars and Parse Trees and more Study notes Programming Languages in PDF only on Docsity!

Fundamantals (1)

Syntax of Programming

Languages

Syntax and Semantics

 Syntax

 The symbols and rules to write legal programs

 Semantics

 The meaning of legal programs

 Programming language implementation

 Syntax −> semantics  Translate program syntax into machine actions

 Example: date specification

 Syntax  date ::= dd/dd/dddd d = 0|1|2|3|4|5|6|7|8|  Semantics  01/02/2005 => Jan 02, 2005 (or Feb 01,2005)?

Why Describing Syntax?

 A translator/compiler needs to understand

programs via syntax analysis

 Needs to implement syntax analysis in C/C++/Java etc.

 Why does the syntax need to be formally

defined?

 Support communications between programmers and translators/compilers  Support automated generation and validation of syntax analyzers Every automation requires an interface language  Regular expressions and BNFs are themselves languages for describing language syntax

BNF: Expressing Context-free Grammars  Each BNF includes  A set of terminals: the words/tokens of the language  A set of non-terminals: variables that could be replaced with different sequences of terminals  A set of productions  Rules identifying the structure of each non-terminal  Each production has format A ::= B where  A is a single non-terminal  B is a sequence of terminals and non-terminals  A start non-terminal: the top-level syntax of the language  Example: BNF for expressions e ::= n | e+e | e−e | e * e | e / e n ::= d | nd d ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9  Non-terminals: e, n, d; start non-terminal: e  Terminals: 0,1,2,3,4,5,6,7,8,  What language does the grammar describe?

Derivations and Parse Trees (Semantics of CFG)  Derivation: top-down replacement of non-terminals  Each replacement follows a production rule  One or more derivations for each valid program  Derivations for 5 + 15 * 20  e=> ee* => e+ee* =>n+ee=>d+ee=> 5+ee* =>5+ne=>5+nd e=>5+dde=>5+1de=> 5+15e* =>…=> 5+15*  E=> e+e =>…=> 5+e => 5+ee* =>…=> 5+15e* =>…=> 5+15* e e e e (^) e n d 5

n d 1 n d 5 n d 2 n d 0 e e e n e d 5

n d 1 n d 5 n d 2 n d 0 e Parse trees:

Parse Trees  A parse tree of each program satisfies  Each leaf node represent a terminal  Each non-leaf node represent a non-terminal  The children of each non-leaf node A, from left to right, form the right-side of a production rule for A (with A at left-side)  The root of the parse tree is the starting non-terminal  A parse tree represents a syntactically correct program  Regenerates a program reading terminals at its leaves from left to right  Parsing (checking syntactical correctness)  Constructing a parse tree for a program  Top-down and bottom-up parsers

Abstract vs. Concrete Syntax  Concrete syntax: the syntax that programmers write  Example: different notations of expressions  Prefix + 5 * 15 20  Infix 5 + 15 * 20  Postfix 5 15 20 * +  Abstract syntax: the program structure recognized by compilers/interpreters  Identifies only the meaningful components  What is the operation and which are the operands? e e e e 5

15 20 e Parse Tree for 5+15*

20 5 15

Abstract Syntax Tree for 5 + 15 * 20

cs3723 11 Abstract syntax trees

 Condensed form of parse tree for representing

language constructs

 Operators and keywords do not appear as leaves  They define the meaning of the interior (parent) node  Chains of single productions may be collapsed If-then-else B S1 S S IF B THEN^ S1 ELSE S E E +^ T T^5 3 + (^3 )

Ambiguous Grammars

 A grammar is syntactically ambiguous if

 some program has multiple parse trees  Multiple choices of production rules during derivation

 Consequence of multiple parse trees

 Parse trees are used to interpret programs  Multiple ways to interpret a program e e e e (^) e n d 5

n d 1 n d 5 n d 2 n d 0 e e e n e d 5

n d 1 n d 5 n d 2 n d 0 e

Rewrite ambiguous Grammars  Solution1: introduce precedence and associativity rules to dictate the choices of applying production rules  Original grammar: e ::= n | e+e | e−e | e * e | e / e  Precedence and associativity  * / >> + - all operators are left associative  Derivation for n+nn  e=>e+e=>n+e=>n+ee=>n+ne=>n+nn  Solution2: rewrite production rules by introducing additional non-terminals  Alternative grammar E ::= E + T | E – T | T T ::= T * F | T / F | F F ::= n  Derivation for n + n * n  E=>E+T=>T+T=>F+T=>n+T=>n+TF=>n+FF=>n+nF=>n+nn  How to modify the grammar if  + and - has high precedence than * and /  All operators are right associative

Additional exercises

 Give a context-free grammar for a small graph

description language

 Terminals: digits(0',1',...,9'),(', )',;' and ->'  Each node of the graph is represented by an integer number,  Each edge is represented by a pair of nodes connected with->'  eg., 3->4 is an edge from node 3' to node4'  Each graph description is a sequence of edges  Eg. ( 1->2; 2->5; 5->1)  Write a parse tree and an abstract syntax tree for ( 1->2; 2->5; 5->1)

Additional Exercises (practice on your own)  Give a CFG to describe the set of symmetric strings over {a,b}  Give a CFG to describe the set of strings over {a,b} that have the same numbers of a’s and b’s?  Give a CFG for the syntax of regular expressions over {0,1}

. For example  “0|1”, “0”, (01|10) are in the languages  “0|” and “*0” are not in the language  Can you give a CFG to describe the set of strings that have the format xx, where x is an arbitrary string over {a,b}