









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Evolution of programming languages – describing syntax – context-free grammars – attribute grammars – describing semantics – lexical analysis – parsing – recursive-decent – bottom up parsing
Typology: Lecture notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










ALIGURAJUPALLI, MACHERLA, GUNTUR (Dt.), (A.P) – 5
Aravind. Y.N.D
Date : 01/05/
Page: 01 of 06
Sub Code : CP Sub Name: PRINCIPLES OF PROGRAMMING LANGUAGES Unit : I Branch : (CSE) Year : III Semester : I
UNIT - 1
Evolution of programming languages – describing syntax – context-free grammars – attribute grammars – describing semantics – lexical analysis – parsing – recursive-decent – bottom up parsing
Objectives : To understand and describe syntax and semantics of programming languages.
S.NO. Topic to be covered Duration Reference Teaching Method
1 Reasons for Studying Concepts of Programming Languages
50 minutes 1
Black Board
Evolution of the Major Programming Languages
50 minutes 1 Black Board
(^3) The General Problem of Describing Syntax 50 minutes 1
Black Board
4 Formal Methods of Describing Syntax- Context- Free Grammars
50 minutes 1
Black Board
(^5) Attribute Grammars 50 minutes 1
Black Board
6 Describing the Meanings of Programs: Dynamic Semantics
50 minutes 1
Black Board
(^7) Lexical Analysis 50 minutes 1
Black Board
The Parsing Problem
50 minutes 1
Black Board
Recursive-Descent Parsing
50 minutes 1
Black Board
Bottom-Up Parsing
50 minutes 1
Black Board
LANGUAGE CATEGORIES Imperative – based on the von Neumann architecture Functional – based on mathematical functions Logic – rule based Object oriented – object based
1.2 Evaluation of programming languages:
Zuse’s Plankalkül: 1945 The world's first complete high-level language was designed in 1940s (probably between 1941 and 1945,
but the concept first published in 1948), by the german computer pioneer Konrad Zuse, the creator of the
first relay computer.The Plankalkül is a typed high-level imperative programming language.
Pseudocodes: 1949 The code written using any natural language like English is called "psuedocode". These codes help us to understand the logic. What was wrong with using machine code? a. Poor readability b. Poor modifiability c. Expression coding was tedious (boring) d. Machine deficiencies--no indexing
The IBM 704 AND FORTRAN I - 1957 (FORTRAN 0 - 1954 - not implemented) Designed for the new IBM 704, which had index registers and floating point hardware Environment of development: Computers were small and unreliable Applications were scientific No programming methodology or tools Machine efficiency was most important Impact of environment on design No need for dynamic storage Need good array handling and counting loops No string handling, decimal arithmetic, or powerful input/output (commercial stuff) FORTRAN Fortran (formerly FORTRAN , derived from "Formula Translation"[2]) is a general purpose ,imperative programming language that is especially suited to numeric computation and scientific computing. Originally developed by IBM[3]^ in the 1950s for scientific and engineering applications, FORTRAN II - 1958 IBM's FORTRAN II appeared in 1958. The main enhancement was to support procedural programming by allowing user-written subroutines and functions which returned values, with parameters passed by reference. FORTRAN III - 1958 IBM also developed a FORTRAN III in 1958 that allowed for inline assembly code among other features; however, this version was never released as a product.
Starting in 1961, as a result of customer demands, IBM began development of a FORTRAN IV that removed the machine-dependent features of FORTRAN II (such as READ INPUT TAPE), while adding new features such as a LOGICAL data type, logical Boolean expressions and the logical IF statement as an alternative to the arithmetic IF statement. FORTRAN 77 - 1978 The new standard, called FORTRAN 77 added a number of significant features to address many of the shortcomings of FORTRAN 66. Character string handling Logical loop control statement IF-THEN-ELSE statement FORTRAN 90 - 1990 The much delayed successor to FORTRAN 77, informally known as Fortran 90 Modules Dynamic arrays Pointers Recursion CASE statement Parameter type checking FORTRAN Evaluation Dramatically changed forever the way computers are used
Functional Programming: LISP LISP , an acronym for list processing, is a programming language that was designed for easy manipulation of data strings. Developed in 1959 by John McCarthy, it is a commonly used language for artificial intelligence (AI) programming. It is one of the oldest programming languages still in relatively wide use.
ML, Miranda, and Haskell are related languages
The First Step Toward Sophistication: ALGOL 60 ALGOL 60 (short for ALGO rithmic L anguage 19 60 ) is a member of the ALGOL family of computer programming languages. It followed on from ALGOL58 which had introduced code blocks and the begin and end pairs for delimiting them. ALGOL 60 was the first language implementing nested
function definitions with lexical scope. It gave rise to many other programming languages, including CPL, Simula, BCPL, B, Pascal and C.
Computerizing Business Records: COBOL 1960 Although COBOL has been used more than any other programming language it has ahd little effect on the design of any other languages, with the exception of PL/I. Historical Background Similar to ALGOL, COBOL was designed by a committee, Three other languages for business applications that existed before COBOL were FLOW-MATIC, AIMACO, and COMTRAN
COBOL Design Process
Programming Based on Logic: Prolog Developed at the University of Aix-Marseille, by Comerauer and Roussel, with some help from Kowalski at the University of Edinburgh Based on formal logic Non-procedural Can be summarized as being an intelligent database system that uses an inferencing process to infer the truth of given queries.
History’s Largest Design Effort: Ada Ada was originally developed for the Department of Defense. The Army, Navy, and Air Force all proposed the development of a high-level language for embedded systems in 1974 as an attempt to standardize their own embedded systems. The committee assigned to this task was responsible for identifying requirements for a new Department of Defense high-level language, evaluating the existing languages to determine whether there was a viable candidate, and recommend adoption or implementation of a minimal set of programming languages, The resulting language went through multiple phases and was named Ada
Object-Oriented Programming: Smalltalk
Combining Imperative and Object-Oriented Features: C++ Developed at Bell Labs by Bjarne Stroustrup in the year 1979. Evolved from C and SIMULA 67 , Facilities for object-oriented programming, taken partially from SIMULA 67, were added to C. Also has exception handling, A large and complex language, in part because it supports both procedural and OO programming.Rapidly grew in popularity, along with OOP. ANSI standard approved in November, 1997 , Eiffel - a related language that supports OOP (Designed by Bertrand Meyer - 1992). Not directly derived from any other language Smaller and simpler than C++, but still has most of the power
An Imperative-Based Object-Oriented Language: Java Developed at Sun in the early 1990s, Based on C++ , Significantly simplified , Supports only OOP, Has references, but not pointers and it Includes support for applets and a form of concurrency Scripting Languages: Java Script, PHP, Python and Ruby
The Flagship .NET Language: C#
Markup/Programming Hybrid Languages -XSLT –eXtensible Style Sheet language used for for transforming the markup languages - JSP –Java Server pages are used for Server Side Programming
1.3 DESCRIBING SYNTAX AND SEMANTICS
INTRODUCTION The study of programming languages can be divided into the examination of syntax and semantics · Syntax - is the form of expressions, statements, and program units · Semantics - is the meaning of those expressions, statements, and program units In a well-designed programming language, semantics should follow directly from syntax, Describing syntax is easier than describing semantics.
Lexemes - A Lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. the lowest level of syntactic unit. The lexemes of a programming language include its identifiers, literals,operators and special words. Token of a language is a category of its lexemes.
Token’s - A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes.
A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier)
Consider the following Java statement Index = 2 * count + 17;
Lexemes Tokens Index Identifier = Equal sign 2 Int literal
LANGUAGE RECOGNIZERS Languages can be defined in two ways: by recognition and by generation A recognition device reads input strings of the language abd decides whether the input strings belongs to the language. Example : syntax analysis part of a compiler.
LANGUAGE GENERATOR A language generator is a device that can be used to generate the sentences of a language. One can determine if the syntax of a particular sentence is correct by comparing it to the structure of generator.
1.4 FORMAL METHODS OF DESCRIBING SYNTAX John Backus and Noam Chomsky invented a notation that is most widely used for describing programming language syntax
Chomsky described 4 classes of grammars that define 4 classes of languages. Two of these grammar classes, context-free and regular turned out to be useful for describing the syntax of programming languages The tokens of programming languages can be described by regular grammars, Developed by Noam Chomsky in the mid-1950s. Language generators, meant to describe the syntax of natural languages. Define a class of languages called context-free Languages. A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal nd nonterminal symbols
The basic idea of a recursive decent parser is that there is a subprogram for each non-terminal in the grammar
An attribute grammar is a device used to describe more of the structure of a programming language than is possible with a context-free grammar An attribute grammar is an extension to a context-free grammar Attribute grammars can perform several useful functions in specifying the syntax and semantics of a programming language. An attribute grammar can be used to specify the context-sensitive aspects of the syntax of a language, such as checking that an item has been declared and that the use of the item is consistent with its declaration.
The static semantics of a language is only indirectly related to the meaning of programs execution; rather, it has to do with the legal form of programs (syntax rather than semantics)
Attributes which are associated with grammar symbols, are similar to variables in the sense that they can have values assigned to them Attribute computation functions are associated with grammar rules to specify how attribute values are computed Predicate functions which state some of the syntax and static semantic rules of the language, are associated with grammar rules Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree
Dynamic semantics are the meaning of the expressions, statements and program units programmers need to know precisely what statements of a language do Compile writers determine the semantics of a language for which they are writing compilers from English descriptions
Operational semantics the idea is to describe the meaning of a program by executing its statements on a machine, either real or simulated Operational semantics provides an effective means of describing semantics for language users and language implementers, as long as the descriptions are kept simple and informal Operational semantics depends on algorithms, not mathematics
Axiomatic semantics defined in conjunction with the development of a method to prove the correctness of a program Axiomatic semantics is based on mathematical logic. The logical expressions are called predicates, or assertions. An assertion immediately following a statement describes a new constraints on those variables after execution of the statement. These assertions are called the precondition and post- condition. Developing an axiomatic description or proof of a given program requires that every statement in the program have both a precondition and a post-condition.
Denotation semantics is the most rigorous widely known method for describing the meaning of programs. It is based on recursive function theory. The fundamental concept of denotation semantics is to define for each language entity both a mathematical object and a function that maps instances of that entity onto instances of the mathematical object.
Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. Lexical analysis breaks up a program into tokens Lexical analysis is first phase of a compiler. It is also called scanner. Main task: read the input characters and produce as output a sequence of tokens. Process: o Input: program as a single string of characters. o Collects characters into logical groupings and assigns internal codes to the groupings according to their structur.
Groupings: lexemes Internal codes: token
program gcd (input, output); var i, j : integer; begin read (i, j); while i <> j do if i > j then i := i - j else j := j - i; writeln (i) end.
The programs are divided into tokens as program gcd ( input , output ) ; var i , j : integer ; begin read ( i , j ) ; while i < > j do if i > j then i := i - j else j := j - i ; writeln ( i ) end.
A lexical analyzer is a pattern matcher for character strings
A lexical analyzer is a “front-end” for the parser
Identifies substrings of the source program that belong together - lexemes
Lexemes match a character pattern, which is associated with a lexical category called a token sum is a lexeme; its token may be IDENT
The lexical analyzer is usually a function that is called by the parser when it needs the next token Three approaches to building a lexical analyzer: Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description Design a state diagram that describes the tokens and write a program that implements the state diagram Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram
State Diagram Design A naïve state diagram would have a transition from every state on every character in the source language, such a diagram would be very large!
In many cases, transitions can be combined to simplify the state diagram When recognizing an identifier, all uppercase and lowercase letters are equivalent
1.9 Recursive-Descent Parsing
In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures, where each such procedure usually implements one of the productions of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.
Recursive descent parsing is a top-down method of syntax analysis in which a set of recursive procedures to process the input is executed.
A procedure is associated with each nonterminal of a grammer.
Top-Down parsing can be used as an attempt to find the leftmost derivation for a input string.
Equivalently, it attempts to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder,
Recursive descent parsing involves backtracking.
Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it (if not left factored) cannot avoid back-tracking. A form of recursive- descent parsing that does not require any back-tracking is known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against the production rules to replace them (if matched). To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does not match with the next input symbol. So the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted
1.9 Bottom-Up Parsing
Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the root node. Here, we start from a sentence and then apply production rules in reverse manner in order to reach the start symbol. The image given below depicts the bottom-up parsers available.
Shift-Reduce Parsing
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step and reduce-step.
Shift step : The shift step refers to the advancement of the input pointer to the next input symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single node of the parse tree.
Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.
LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the input stream; R stands for the construction of right- most derivation in reverse, and k denotes the number of lookahead symbols to make decisions.
There are three widely used algorithms available for constructing an LR parser: SLR(1) – Simple LR Parser: o Works on smallest class of grammar o Few number of states, hence very small table o Simple and fast construction LR(1) – LR Parser: o Works on complete set of LR(1) Grammar o Generates large table and large number of states o Slow construction LALR(1) – Look-Ahead LR Parser: o Works on intermediate size of grammar o Number of states are same as in SLR(1) LR Parsing Algorithm Here we describe a skeleton algorithm of an LR parser:
token = next_token() repeat forever s = top of stack
if action[s, token] = “shift si” then PUSH token PUSH si token = next_token()
else if action[s, token] = “reduce A::= β“ then POP 2 * |β| symbols s = top of stack PUSH A PUSH goto[s,A]
else if action[s, token] = “accept” then return
else error()