Principles of Programming Languages: Lecture Notes for III B.Tech CSE I Semester, Lecture notes of Introduction to Computers

Evolution of programming languages – describing syntax – context-free grammars – attribute grammars – describing semantics – lexical analysis – parsing – recursive-decent – bottom up parsing

Typology: Lecture notes

2016/2017

Uploaded on 10/12/2017

ynd-aravind
ynd-aravind 🇮🇳

4

(1)

1 document

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
LECTURE NOTES
ON
PRINCIPLES OF PROGRAMMING LANGUAGES
III B.TECH CSE I SEMESTER
(JNTUK-R13)
Mr. Y.N.D.ARAVIND, M.Tech.,
Associate Professor
DEPARTMENT OF COMPUTER SCIENCE &
ENGINEERING
NEWTON’S GROUP OF INSTITUTIONS
ALIGURAJUPALLI, MACHERLA, GUNTUR (Dt.), (A.P) – 5
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Principles of Programming Languages: Lecture Notes for III B.Tech CSE I Semester and more Lecture notes Introduction to Computers in PDF only on Docsity!

LECTURE NOTES

ON

PRINCIPLES OF PROGRAMMING LANGUAGES

III B.TECH CSE I SEMESTER

(JNTUK-R13)

Mr. Y.N.D.ARAVIND, M.Tech.,

Associate Professor

DEPARTMENT OF COMPUTER SCIENCE &

ENGINEERING

NEWTON’S GROUP OF INSTITUTIONS

ALIGURAJUPALLI, MACHERLA, GUNTUR (Dt.), (A.P) – 5

LESSON PLAN

Aravind. Y.N.D

Date : 01/05/

Page: 01 of 06

Sub Code : CP Sub Name: PRINCIPLES OF PROGRAMMING LANGUAGES Unit : I Branch : (CSE) Year : III Semester : I

UNIT - 1

SYNTAX AND SEMANTICS

Evolution of programming languages – describing syntax – context-free grammars – attribute grammars – describing semantics – lexical analysis – parsing – recursive-decent – bottom up parsing

Objectives :  To understand and describe syntax and semantics of programming languages.

S.NO. Topic to be covered Duration Reference Teaching Method

1 Reasons for Studying Concepts of Programming Languages

50 minutes 1

Black Board

Evolution of the Major Programming Languages

50 minutes 1 Black Board

(^3) The General Problem of Describing Syntax 50 minutes 1

Black Board

4 Formal Methods of Describing Syntax- Context- Free Grammars

50 minutes 1

Black Board

(^5) Attribute Grammars 50 minutes 1

Black Board

6 Describing the Meanings of Programs: Dynamic Semantics

50 minutes 1

Black Board

(^7) Lexical Analysis 50 minutes 1

Black Board

The Parsing Problem

50 minutes 1

Black Board

Recursive-Descent Parsing

50 minutes 1

Black Board

Bottom-Up Parsing

50 minutes 1

Black Board

  • The number of data types and structures
  • The syntax or form of the elements of a language. Three types of syntax affect readability: identifier forms, special words, and form and meaning  Writability is a measure of how easily a language can be used to create programs for a chosen problem domain and can be directly related to:
  • Simplicity and orthogonality (defined in readability)
  • Support for abstraction, which is the ability to define and use complicated structures or operations in a way that allows many details to be ignored
  • Expressivity, which is when a language has a convenient way of specifying computations  Reliability is when a language performs to its specifications under all conditions and is directly related to:
  • Type checking, which is simply testing for type errors in a program either by the compiler or run- time
  • Exception handling, which is the ability of a program to intercept run-time errors, take corrective action, and then continue on
  • Aliasing, which is having two or more distinct referencing methods or names for the same memory location
  • Readability and Writability

LANGUAGE CATEGORIES  Imperative – based on the von Neumann architecture  Functional – based on mathematical functions  Logic – rule based  Object oriented – object based

1.2 Evaluation of programming languages:

Zuse’s Plankalkül: 1945 The world's first complete high-level language was designed in 1940s (probably between 1941 and 1945,

but the concept first published in 1948), by the german computer pioneer Konrad Zuse, the creator of the

first relay computer.The Plankalkül is a typed high-level imperative programming language.

Pseudocodes: 1949 The code written using any natural language like English is called "psuedocode". These codes help us to understand the logic. What was wrong with using machine code? a. Poor readability b. Poor modifiability c. Expression coding was tedious (boring) d. Machine deficiencies--no indexing

The IBM 704 AND FORTRAN I - 1957 (FORTRAN 0 - 1954 - not implemented) Designed for the new IBM 704, which had index registers and floating point hardware Environment of development: Computers were small and unreliable Applications were scientific No programming methodology or tools Machine efficiency was most important Impact of environment on design No need for dynamic storage Need good array handling and counting loops No string handling, decimal arithmetic, or powerful input/output (commercial stuff) FORTRAN Fortran (formerly FORTRAN , derived from "Formula Translation"[2]) is a general purpose ,imperative programming language that is especially suited to numeric computation and scientific computing. Originally developed by IBM[3]^ in the 1950s for scientific and engineering applications, FORTRAN II - 1958 IBM's FORTRAN II appeared in 1958. The main enhancement was to support procedural programming by allowing user-written subroutines and functions which returned values, with parameters passed by reference. FORTRAN III - 1958 IBM also developed a FORTRAN III in 1958 that allowed for inline assembly code among other features; however, this version was never released as a product.

FORTRAN IV - 1960-

Starting in 1961, as a result of customer demands, IBM began development of a FORTRAN IV that removed the machine-dependent features of FORTRAN II (such as READ INPUT TAPE), while adding new features such as a LOGICAL data type, logical Boolean expressions and the logical IF statement as an alternative to the arithmetic IF statement. FORTRAN 77 - 1978 The new standard, called FORTRAN 77 added a number of significant features to address many of the shortcomings of FORTRAN 66. Character string handling Logical loop control statement IF-THEN-ELSE statement FORTRAN 90 - 1990 The much delayed successor to FORTRAN 77, informally known as Fortran 90 Modules Dynamic arrays Pointers Recursion CASE statement Parameter type checking FORTRAN Evaluation Dramatically changed forever the way computers are used

Functional Programming: LISP LISP , an acronym for list processing, is a programming language that was designed for easy manipulation of data strings. Developed in 1959 by John McCarthy, it is a commonly used language for artificial intelligence (AI) programming. It is one of the oldest programming languages still in relatively wide use.

 ML, Miranda, and Haskell are related languages

The First Step Toward Sophistication: ALGOL 60 ALGOL 60 (short for ALGO rithmic L anguage 19 60 ) is a member of the ALGOL family of computer programming languages. It followed on from ALGOL58 which had introduced code blocks and the begin and end pairs for delimiting them. ALGOL 60 was the first language implementing nested

function definitions with lexical scope. It gave rise to many other programming languages, including CPL, Simula, BCPL, B, Pascal and C.

Computerizing Business Records: COBOL 1960 Although COBOL has been used more than any other programming language it has ahd little effect on the design of any other languages, with the exception of PL/I. Historical Background Similar to ALGOL, COBOL was designed by a committee, Three other languages for business applications that existed before COBOL were FLOW-MATIC, AIMACO, and COMTRAN

COBOL Design Process

  • The biggest concern regarding this new application language was that it be easy to use, even at the expense of being less powerful
  • The language specifications for COBOL were published in 1960. Evaluation
  • COBOL originated a number of concepts, such as, constructs for macros, implementation of hierarchical data structures, allowed connotative names
  • It was the first language whose use was mandated by the Department of Defense
  • The poor performance of the early compilers made COBOL expensive to use, fortunately with the advent of better compiler designs and the mandate of the Defense Department COBOL became very popular

Programming Based on Logic: Prolog Developed at the University of Aix-Marseille, by Comerauer and Roussel, with some help from Kowalski at the University of Edinburgh  Based on formal logic  Non-procedural  Can be summarized as being an intelligent database system that uses an inferencing process to infer the truth of given queries.

History’s Largest Design Effort: Ada Ada was originally developed for the Department of Defense. The Army, Navy, and Air Force all proposed the development of a high-level language for embedded systems in 1974 as an attempt to standardize their own embedded systems. The committee assigned to this task was responsible for identifying requirements for a new Department of Defense high-level language, evaluating the existing languages to determine whether there was a viable candidate, and recommend adoption or implementation of a minimal set of programming languages, The resulting language went through multiple phases and was named Ada

  • There are four major features of the Ada language
  • Packages provide the means for encapsulation of data objects
  • It includes extensive facilities for exception handling
  • Allows program units to be generic
  • It provides for concurrent execution of special program units and named tasks · Ada 95 (began in 1988) Support for OOP through type derivation Better control mechanisms for shared data (new concurrency features) More flexible libraries

Object-Oriented Programming: Smalltalk

  • Developed at Xerox PARC, initially by Alan Kay, later by Adele Goldberg
  • First full implementation of an object-oriented language (data abstraction, inheritance, and dynamic type binding)
  • Pioneered the graphical user interface everyone now uses

Combining Imperative and Object-Oriented Features: C++ Developed at Bell Labs by Bjarne Stroustrup in the year 1979. Evolved from C and SIMULA 67 , Facilities for object-oriented programming, taken partially from SIMULA 67, were added to C. Also has exception handling, A large and complex language, in part because it supports both procedural and OO programming.Rapidly grew in popularity, along with OOP. ANSI standard approved in November, 1997 , Eiffel - a related language that supports OOP (Designed by Bertrand Meyer - 1992). Not directly derived from any other language Smaller and simpler than C++, but still has most of the power

An Imperative-Based Object-Oriented Language: Java Developed at Sun in the early 1990s, Based on C++ , Significantly simplified , Supports only OOP, Has references, but not pointers and it Includes support for applets and a form of concurrency Scripting Languages: Java Script, PHP, Python and Ruby

  • These are used in web applications
  • Java Script is a HTML Resident client side scripting language
  • PHP is an HTML Resident server side scripting language
  • Python and Ruby are used for Common Gateway Interface programming

The Flagship .NET Language: C#

  • C#, along with new development platform .NET was , developed by Microsoft in the year 2000
  • It is based on C++ and Java
  • The purpose of C# is to provide a language for component based software development
  • Components from different languages such as Visual Basics .NET, Managed C++, J# .NET, and Jscript can be easily combined to form systems.

Markup/Programming Hybrid Languages -XSLT –eXtensible Style Sheet language used for for transforming the markup languages - JSP –Java Server pages are used for Server Side Programming

1.3 DESCRIBING SYNTAX AND SEMANTICS

INTRODUCTION The study of programming languages can be divided into the examination of syntax and semantics · Syntax - is the form of expressions, statements, and program units · Semantics - is the meaning of those expressions, statements, and program units In a well-designed programming language, semantics should follow directly from syntax, Describing syntax is easier than describing semantics.

THE GENERAL PROBLEM OF DESCRIBING SYNTAX

Lexemes - A Lexeme is a sequence of characters in the source program that matches the pattern for a token and is identified by the lexical analyzer as an instance of that token. the lowest level of syntactic unit. The lexemes of a programming language include its identifiers, literals,operators and special words. Token of a language is a category of its lexemes.

Token’s - A token is a pair consisting of a token name and an optional attribute value. The token name is an abstract symbol representing a kind of lexical unit, e.g., a particular keyword, or sequence of input characters denoting an identifier. The token names are the input symbols that the parser processes.

A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin) A token is a category of lexemes (e.g., identifier)

Consider the following Java statement Index = 2 * count + 17;

Lexemes Tokens Index Identifier = Equal sign 2 Int literal

  • Mult op Count Identifier
  • Plus op 17 Int literal ; semicoln

LANGUAGE RECOGNIZERS  Languages can be defined in two ways: by recognition and by generation  A recognition device reads input strings of the language abd decides whether the input strings belongs to the language. Example : syntax analysis part of a compiler.

LANGUAGE GENERATOR  A language generator is a device that can be used to generate the sentences of a language. One can determine if the syntax of a particular sentence is correct by comparing it to the structure of generator.

1.4 FORMAL METHODS OF DESCRIBING SYNTAX  John Backus and Noam Chomsky invented a notation that is most widely used for describing programming language syntax

CONTEXT FREE GRAMMERS

 Chomsky described 4 classes of grammars that define 4 classes of languages. Two of these grammar classes, context-free and regular turned out to be useful for describing the syntax of programming languages  The tokens of programming languages can be described by regular grammars, Developed by Noam Chomsky in the mid-1950s. Language generators, meant to describe the syntax of natural languages. Define a class of languages called context-free Languages. A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal nd nonterminal symbols

 The basic idea of a recursive decent parser is that there is a subprogram for each non-terminal in the grammar

1.5 ATTRIBUTE GRAMMARS

 An attribute grammar is a device used to describe more of the structure of a programming language than is possible with a context-free grammar  An attribute grammar is an extension to a context-free grammar  Attribute grammars can perform several useful functions in specifying the syntax and semantics of a programming language. An attribute grammar can be used to specify the context-sensitive aspects of the syntax of a language, such as checking that an item has been declared and that the use of the item is consistent with its declaration.

STATIC SEMANTICS

 The static semantics of a language is only indirectly related to the meaning of programs execution; rather, it has to do with the legal form of programs (syntax rather than semantics)

BASIC CONCEPTS

 Attributes which are associated with grammar symbols, are similar to variables in the sense that they can have values assigned to them  Attribute computation functions are associated with grammar rules to specify how attribute values are computed  Predicate functions which state some of the syntax and static semantic rules of the language, are associated with grammar rules  Intrinsic attributes are synthesized attributes of leaf nodes whose values are determined outside the parse tree

1.6 DESCRIBING THE MEANING OF PROGRAMS: DYNAMIC SYMANTICS

 Dynamic semantics are the meaning of the expressions, statements and program units  programmers need to know precisely what statements of a language do  Compile writers determine the semantics of a language for which they are writing compilers from English descriptions

OPERATIONAL SEMANTICS

 Operational semantics the idea is to describe the meaning of a program by executing its statements on a machine, either real or simulated  Operational semantics provides an effective means of describing semantics for language users and language implementers, as long as the descriptions are kept simple and informal  Operational semantics depends on algorithms, not mathematics

AXIOMATIC SEMANTICS

 Axiomatic semantics defined in conjunction with the development of a method to prove the correctness of a program  Axiomatic semantics is based on mathematical logic. The logical expressions are called predicates, or assertions. An assertion immediately following a statement describes a new constraints on those variables after execution of the statement. These assertions are called the precondition and post- condition. Developing an axiomatic description or proof of a given program requires that every statement in the program have both a precondition and a post-condition.

DENOTATIONAL SESMANTICS

 Denotation semantics is the most rigorous widely known method for describing the meaning of programs. It is based on recursive function theory.  The fundamental concept of denotation semantics is to define for each language entity both a mathematical object and a function that maps instances of that entity onto instances of the mathematical object.

1.7 LEXICAL ANALYZER

Lexical analysis is the first phase of a compiler. It takes the modified source code from language preprocessors that are written in the form of sentences. The lexical analyzer breaks these syntaxes into a series of tokens, by removing any whitespace or comments in the source code. Lexical analysis breaks up a program into tokens Lexical analysis is first phase of a compiler.  It is also called scanner.  Main task: read the input characters and produce as output a sequence of tokens.  Process: o Input: program as a single string of characters. o Collects characters into logical groupings and assigns internal codes to the groupings according to their structur.

Groupings: lexemes Internal codes: token

program gcd (input, output); var i, j : integer; begin read (i, j); while i <> j do if i > j then i := i - j else j := j - i; writeln (i) end.

The programs are divided into tokens as program gcd ( input , output ) ; var i , j : integer ; begin read ( i , j ) ; while i < > j do if i > j then i := i - j else j := j - i ; writeln ( i ) end.

 A lexical analyzer is a pattern matcher for character strings

 A lexical analyzer is a “front-end” for the parser

 Identifies substrings of the source program that belong together - lexemes

 Lexemes match a character pattern, which is associated with a lexical category called a token  sum is a lexeme; its token may be IDENT

 The lexical analyzer is usually a function that is called by the parser when it needs the next token  Three approaches to building a lexical analyzer:  Write a formal description of the tokens and use a software tool that constructs table-driven lexical analyzers given such a description  Design a state diagram that describes the tokens and write a program that implements the state diagram  Design a state diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram

State Diagram Design  A naïve state diagram would have a transition from every state on every character in the source language, such a diagram would be very large!

 In many cases, transitions can be combined to simplify the state diagram  When recognizing an identifier, all uppercase and lowercase letters are equivalent

1.9 Recursive-Descent Parsing

In computer science, a recursive descent parser is a kind of top-down parser built from a set of mutually recursive procedures, where each such procedure usually implements one of the productions of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.

Recursive descent parsing is a top-down method of syntax analysis in which a set of recursive procedures to process the input is executed.

A procedure is associated with each nonterminal of a grammer.

Top-Down parsing can be used as an attempt to find the leftmost derivation for a input string.

Equivalently, it attempts to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder,

Recursive descent parsing involves backtracking.

Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated with it (if not left factored) cannot avoid back-tracking. A form of recursive- descent parsing that does not require any back-tracking is known as predictive parsing.

This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature.

Back-tracking

Top- down parsers start from the root node (start symbol) and match the input string against the production rules to replace them (if matched). To understand this, take the following example of CFG:

S → rXd | rZd

X → oa | ea

Z → ai

For an input string: read, a top-down parser, will behave like this:

It will start with S from the production rules and will match its yield to the left-most letter of the input, i.e. ‘r’. The very production of S (S → rXd) matches with it. So the top-down parser advances to the next input letter (i.e. ‘e’). The parser tries to expand non-terminal ‘X’ and checks its production from the left (X → oa). It does not match with the next input symbol. So the top-down parser backtracks to obtain the next production rule of X, (X → ea).

Now the parser matches all the input letters in an ordered manner. The string is accepted

1.9 Bottom-Up Parsing

Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches the root node. Here, we start from a sentence and then apply production rules in reverse manner in order to reach the start symbol. The image given below depicts the bottom-up parsers available.

Shift-Reduce Parsing

Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step and reduce-step.

Shift step : The shift step refers to the advancement of the input pointer to the next input symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted symbol is treated as a single node of the parse tree.

Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To reduce, a POP function is performed on the stack which pops off the handle and replaces it with LHS non-terminal symbol.

LR Parser

The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the input stream; R stands for the construction of right- most derivation in reverse, and k denotes the number of lookahead symbols to make decisions.

There are three widely used algorithms available for constructing an LR parser:  SLR(1) – Simple LR Parser: o Works on smallest class of grammar o Few number of states, hence very small table o Simple and fast construction  LR(1) – LR Parser: o Works on complete set of LR(1) Grammar o Generates large table and large number of states o Slow construction  LALR(1) – Look-Ahead LR Parser: o Works on intermediate size of grammar o Number of states are same as in SLR(1) LR Parsing Algorithm Here we describe a skeleton algorithm of an LR parser:

token = next_token() repeat forever s = top of stack

if action[s, token] = “shift si” then PUSH token PUSH si token = next_token()

else if action[s, token] = “reduce A::= β“ then POP 2 * |β| symbols s = top of stack PUSH A PUSH goto[s,A]

else if action[s, token] = “accept” then return

else error()