Compiler Design: Bottom-Up Parsing & Intermediate Code, Summaries of Compiler Design

– Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing). • Bottom-Up Parsing: – Construction of the parse tree starts at the leaves, and ...

Typology: Summaries

2022/2023

Uploaded on 05/11/2023

ekaraj
ekaraj 🇺🇸

4.6

(31)

264 documents

1 / 119

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COMPILER DESIGN
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Compiler Design: Bottom-Up Parsing & Intermediate Code and more Summaries Compiler Design in PDF only on Docsity!

COMPILER DESIGN

LECTURE NOTES

ON

COMPILER DESIGN

Prepared by

Dr. Subasish Mohapatra

Welcome

Department of Computer Science and Application

College of Engineering and Technology, Bhubaneswar

Biju Patnaik University of Technology, Odisha

CONTENTS

Lecture- 1 Introduction to compiler & its phases

Lecture- 2 Overview of language processing system

Lecture- 3 Phases of a Compiler

Lecture- 4 Languages

Lecture- 5 Converting RE to NFA (Thomson Construction)

Lecture- 6 Lexical Analysis

Lecture- 7 Lexical Analyzer Generator

Lecture- 8 Basics of Syntax Analysis

Lecture- 9 Context-Free Grammar

Lecture- 10 Left Recursion

Lecture- 11 YACC

Lecture- 12 Top-down Parsing

Lecture- 13 Recursive Predictive Parsing

Lecture- 14 Non-recursive Predictive Parsing-LL(1)

Lecture- 15 LL(1) Grammar

Lecture- 16 Basics of Bottom-up parsing

Lecture- 17 Conflicts during shift-reduce parsing

Lecture- 18 Operator precedence parsing

Lecture- 19 LR Parsing

Letcure- 20 Construction of SLR parsing table

Lecture- 21 Construction of canonical LR(0) collection

Lecture- 22 Shift-Reduce & Reduce-Reduce conflicts

Lecture- 23 Construction of canonical LR(1) collection

Lecture- 24 Construction of LALR parsing table

Lecture- 25 Using ambiguous grammars

Lecture- 26 SYNTAX-DIRECTED TRANSLATION

Lecture- 27 Translation of Assignment Statements

Lecture- 28 Generating 3-address code for Numerical Representation of Boolean expressions

Lecture- 29 Statements that Alter Flow of Control

Lecture- 30 Postfix Translations

Lecture- 31 Array references in arithmetic expressions

Lecture- 32 SYMBOL TABLES

Lecture- 33 Intermediate Code Generation

Lecture- 34 Directed Acyclic Graph

Lecture- 35 Flow of control statements with Jump method

Lecture- 36 Backpatching

Lecture- 37 RUN TIME ADMINISTRATION

Lecture- 38 Storage Organization

Lecture- 39 ERROR DETECTION AND RECOVERY

Lecture- 40 Error Recovery in Predictive Parsing

Lecture- 41 CODE OPTIMIZATION

Lecture- 42 Local Optimizations

Module- 1

Lecture

INTRODUCTION TO COMPILERS AND ITS PHASES

A compiler is a program takes a program written in a source language and translates it into an equivalent program in a target language. The source language is a high level language and target language is machine language. Source program - > COMPILER - > Target program Necessity of compiler  Techniques used in a lexical analyzer can be used in text editors, information retrieval system, and pattern recognition programs.  Techniques used in a parser can be used in a query processing system such as SQL.  Many software having a complex front-end may need techniques used in compiler design.  A symbolic equation solver which takes an equation as input. That program should parse the given input equation.  Most of the techniques used in compiler design can be used in Natural Language Processing (NLP) systems. Properties of Compiler a) Correctness i) Correct output in execution. ii) It should report errors iii) Correctly report if the programmer is not following language syntax. b) Efficiency c) Compile time and execution. d) Debugging / Usability. Compiler Interpreter

  1. It translates the whole program at a time.
  2. Compiler is faster.
  3. Debugging is not easy.
  4. Compilers are not portable.
    1. It translate statement by statement.
    2. Interpreter is slower.
    3. Debugging is easy.
    4. Interpreter are portable. **Types of compiler
  1. Native code compiler** A compiler may produce binary output to run /execute on the same computer and operating system. This type of compiler is called as native code compiler. 2) Cross Compiler A cross compiler is a compiler that runs on one machine and produce object code for another machine. 3) Bootstrap compiler If a compiler has been implemented in its own language. self-hosting compiler.

Lecture

OVERVIEW OF LANGUAGE PROCESSING SYSTEM

A source program may be divided into modules stored in separate files. Preprocessor – collects all the separate files to the source program.

A preprocessor produce input to compilers. They may perform the following functions.

1. Macro processing: A preprocessor may allow a user to define macros that are short

hands for longer constructs.

2. File inclusion: A preprocessor may include header files into the program text.

3. Rational preprocessor: these preprocessors augment older languages with more

modern flow-of-control and data structuring facilities.

3. Language Extensions: These preprocessor attempts to add capabilities to the language

by certain amounts to build-in macro

ASSEMBLER

Programmers found it difficult to write or read programs in machine language. They

begin to use a mnemonic (symbols) for each machine instruction, which they would

subsequently translate into machine language. Such a mnemonic machine language is

now called an assembly language. Programs known as assembler were written to

automate the translation of assembly language in to machine language. The input to an

assembler program is called source program, the output is a machine language translation

(object program).

INTERPRETER

An interpreter is a program that appears to execute a source program as if it were

machine language

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA

also uses interpreter. The process of interpretation can be carried out in following phases.

1. Lexical analysis

2. Synatx analysis

3. Semantic analysis

4. Direct Execution

Advantages

Modification of user program can be easily made and implemented as

execution proceeds.

Type of object that denotes a various may change dynamically.

Debugging a program and finding errors is simplified task for a program used

for interpretation.

The interpreter for the language makes it machine independent.

Disadvantages

The execution of the program is slower .

Memory consumption is more.

Loader and Linker

Once the assembler procedures an object program, that program must be placed into

memory and executed. The assembler could place the object program directly in memory

and transfer control to it, thereby causing the machine language program to be

execute. This would waste core by leaving the assembler in memory while the user’s

program was being executed. Also the programmer would have to retranslate his program

with each execution, thus wasting translation time. To over come this problems of wasted

translation time and memory. System programmers developed another component called

Loader

“A loader is a program that places programs into memory and prepares them for

execution.” It would be more efficient if subroutines could be translated into object form the

loader could” relocate” directly behind the user’s program. The task of adjusting programs o

they may be placed in arbitrary core locations is called relocation. Relocation loaders

perform four functions.

Lecture

Phases of a Compiler

Each phase transforms the source program from one representation into another representation. They communicate with error handlers and the symbol table. Lexical Analyzer  Lexical Analyzer reads the source program character by character and returns the tokens of the source program.  A token describes a pattern of characters having same meaning in the source program. (such as identifiers, operators, keywords, numbers, delimiters and so on) Example: In the line of code newval := oldval + 12 , tokens are: newval (identifier) := (assignment operator) oldval (identifier) + (add operator) 12 (a number)

  • Puts information about identifiers into the symbol table.
  • Regular expressions are used to describe tokens (lexical constructs).
  • A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer. Syntax Analyzer
  • A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
  • A syntax analyzer is also called a parser.
  • A parse tree describes a syntactic structure. Example: For the line of code newval := oldval + 12 , parse tree will be: assignment identifier := expression newval expression + expression identifier number oldval 12
  • The syntax of a language is specified by a context free grammar (CFG).
  • The rules in a CFG are mostly recursive.
  • A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
  • If it satisfies, the syntax analyzer creates a parse tree for the given program. Example: CFG used for the above parse tree is: assignment-> identifier := expression expression - > identifier expression - > number expression - > expression + expression
  • Depending on how the parse tree is created, there are different parsing techniques.
  • These parsing techniques are categorized into two groups:
  • Top-Down Parsing,
  • Bottom-Up Parsing
  • Top-Down Parsing:
  • Construction of the parse tree starts at the root, and proceeds towards the leaves.
  • Efficient top-down parsers can be easily constructed by hand.
  • Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL Parsing).
  • Bottom-Up Parsing:
  • Construction of the parse tree starts at the leaves, and proceeds towards the root.
  • Normally efficient bottom-up parsers are created with the help of some software tools.
  • Bottom-up parsing is also known as shift-reduce parsing.
  • Operator-Precedence Parsing – simple, restrictive, easy to implement
  • LR Parsing – much general form of shift-reduce parsing, LR, SLR, LALR Semantic Analyzer  A semantic analyzer checks the source program for semantic errors and collects the type information for the code generation.  Type-checking is an important part of semantic analyzer.  Normally semantic information cannot be represented by a context-free language used in syntax analyzers.  Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules). The result is a syntax-directed translation and Attribute grammars Example: In the line of code newval := oldval + 12 , the type of the identifier newval must match with type of the expression (oldval+12). Intermediate Code Generation  A compiler may produce an explicit intermediate codes representing the source program.  These intermediate codes are generally machine architecture independent. But the level of intermediate codes is close to the level of machine codes. Example:

Phases of a compiler are the sub-tasks that must be performed to complete the compilation process. Passes refer to the number of times the compiler has to traverse through the entire program.

Symbol Table Management:

A symbol table is a data structure that contains a record for each identifier with field for

attributes of the identifier.

The type information about the identifier is detected during the lexical analysis phases and

is entered into the symbol table.

Position= initial + rate*60;

Address Symbol Location attributes

1 Position 1000 id, float

2 Intial 2000 id, float

3 Rate 3000 id, float

4 60 4000 constant, int

Error Detection and Reporting:

Each phase detects/encounters errors after detecting errors.

This phase must deal with errors to continue with the process of compilation.

The following are some errors encountered in each phase:

i) Lexical Analyzer- Miss spell token.

ii) Semantic Analyzer- Type Mismatch.

iii) Syntax Analyzer-Missing parenthesis , less no. of operands.

iv) Intermediate code generation – In compatible operands for an operand.

v) Code optimization- Unreachable statement.

vi) Code Generation- Memory restriction to store a variable.

Lecture

Languages

Terminology

  • Alphabet : a finite set of symbols (ASCII characters)
  • String : finite sequence of symbols on an alphabet
  • Sentence and word are also used in terms of string
  • ε is the empty string
  • |s| is the length of string s.
  • Language: sets of strings over some fixed alphabet
  • ∅ the empty set is a language.
  • {ε} the set containing empty string is a language
  • The set of all possible identifiers is a language.
  • Operators on Strings:
  • Concatenation : xy represents the concatenation of strings x and y. s ε = s ε s = s
  • sn^ = s s s .. s ( n times) s^0 = ε Operations on Languages
  • Concatenation: L 1 L 2 = { s 1 s 2 | s 1 ∈ L 1 and s 2 ∈ L 2 }
  • Union: L 1 ∪ L 2 = { s | s ∈ L 1 or s ∈ L 2 }
  • Exponentiation: L^0 = {ε} L^1 = L L^2 = LL
  • Kleene Closure: L* =
  • Positive Closure: L+^ = Examples:
  • L 1 = {a,b,c,d} L 2 = {1,2}
  • L 1 L 2 = {a1,a2,b1,b2,c1,c2,d1,d2}
  • L 1 ∪ L 2 = {a,b,c,d,1,2} 3
  • L 1
  • L *
  • L + = all strings with length three (using a,b,c,d} = all strings using letters a,b,c,d and empty string = doesn’t include the empty string 1 1

 Both deterministic and non-deterministic finite automaton recognize regular sets.  Which one?

  • deterministic – faster recognizer, but it may take more space
  • non-deterministic – slower, but it may take less space
  • Deterministic automatons are widely used lexical analyzers.  First, we define regular expressions for tokens; Then we convert them into a DFA to get a lexical analyzer for our tokens. Non-Deterministic Finite Automaton (NFA)  A non-deterministic finite automaton (NFA) is a mathematical model that consists of:
  • S - a set of states
  • Σ - a set of input symbols (alphabet)
  • move - a transition function move to map state-symbol pairs to sets of states.
  • s 0 - a start (initial) state
  • F- a set of accepting states (final states)  ε-^ transitions are allowed in NFAs. In other words, we can move from one state to another one  without consuming any symbol.  A NFA accepts a string x, if and only if there is a path from the starting state to one of accepting states such that edge labels along this path spell out x. Example: Transition Graph 0 is the start state s {2} is the set of final states F Σ = {a,b} S = {0,1,2} Transition Function: a b 0 {0,1} {0} 1 {} {2} 2 {} {} The language recognized by this NFA is (a|b)*ab

Deterministic Finite Automaton (DFA)  A Deterministic Finite Automaton (DFA) is a special form of a NFA.  No state has ε- transition  For each symbol a and state s, there is at most one labeled edge a leaving s. i.e. transition  function is from pair of state-symbol to state (not set of states) Example: The DFA to recognize the language (a|b)* ab is as follows. 0 is the start state s {2} is the set of final states F Σ = {a,b} S = {0,1,2} Transition Function: a B 0 1 0 1 1 2 2 1 0 Note that the entries in this function are single value and not set of values (unlike NFA).

Example: For a RE (a|b) * a, the NFA construction is shown below. Converting NFA to DFA (Subset Construction) We merge together NFA states by looking at them from the point of view of the input characters: From the point of view of the input, any two states that are connected by an -transition may as well be the same, since we can move from one to the other without consuming any character. Thus states which are connected by an - transition will be represented by the same states in the DFA. If it is possible to have multiple transitions based on the same symbol, then we can (^) regard a transition on a symbol as moving from a state to a set of states (ie. the union of all those states reachable by a transition on the current symbol). Thus these states will be combined into a single DFA state. To perform this operation, let us define two functions:

  • The  - closure function takes a state and returns the set of states reachable from it based on (one or more) - transitions. Note that this will always include the state tself. We should be able to get from a state to any state in its - closure without consuming any input.
  • The function move takes a state and a character, and returns the set of states reachable by one transition on this character.

We can generalize both these application to individual states. functions to apply to sets of states by taking the union of the For Example, if A, B and C are states, move ({A,B,C},a') = move(A,a') U move(B, a') U move (C,a'). The Subset Construction Algorithm is a follows: put ε-closure({s0}) as an unmarked state into the set of DFA (DS) while (there is one unmarked S1 in DS) do begin End mark S for each input symbol a do begin S2<-ε-closure(move(S1,a)) if (S2 is not in DS) then add S2 into DS as an unmarked state transfunc[S1,a]<-S end