Generating Compilers with Coco/R: A Comprehensive Guide, Slides of Compiler Construction

An in-depth exploration of using coco/r to generate compilers. Topics covered include error handling, ll(1) conflicts, and case studies. It also discusses syntax error handling, syntax error recovery, semantic error handling, and the errors class. Additionally, it explains terminal start symbols and successors, ll(1) condition, and methods to remove ll(1) conflicts and hidden ll(1) conflicts.

Typology: Slides

2011/2012

Uploaded on 07/11/2012

dhansukh
dhansukh 🇮🇳

5

(2)

33 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
31
Semantic Actions
Arbitrary C# code between (. and .)
IdentList (. int n; .)
= ident (. n = 1; .)
{ ',' ident (. n++; .)
} (. Console.WriteLine(n); .)
.
local semantic declaration
semantic action
Semantic actions are copied to the generated parser without being checked by Coco/R
Global semantic declarations
using System.IO;
COMPILER Sample
Stream s;
void OpenStream(string path) {
s = File.OpenRead(path);
...
}
...
PRODUCTIONS
Sample = ... (. OpenStream("in.txt"); .)
...
END Sample.
global semantic declarations
(become fields and methods of the parser)
import of namespaces
semantic actions can access global declarations
as well as imported classes
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Generating Compilers with Coco/R: A Comprehensive Guide and more Slides Compiler Construction in PDF only on Docsity!

Semantic Actions

Arbitrary C# code between (. and .)

IdentList (. int n; .) = ident (. n = 1; .) { ',' ident (. n++; .) } (. Console.WriteLine(n); .) .

local semantic declaration

semantic action

Semantic actions are copied to the generated parser without being checked by Coco/R

Global semantic declarations

using System.IO; COMPILER Sample Stream s; void OpenStream(string path) { s = File.OpenRead(path); ... } ... PRODUCTIONS Sample = ... (. OpenStream("in.txt"); .) ... END Sample.

global semantic declarations

(become fields and methods of the parser)

import of namespaces

semantic actions can access global declarations

as well as imported classes

Attributes

For nonterminal symbols

output attributes

pass results of a production

to the "caller"

... = ... Expr ... Expr = ... ... = ... List ... List = ...

actual attributes formal attributes

For terminal symbols

no explicit attributes;

values are returned

by the scanner

Number = number (. n = Convert.ToInt32(t.val); .).

adapter nonterminals necessary

Ident = ident (. name = t.val; .).

Parser has two global token variables

Token t ; // most recently recognized token Token la ; // lookahead token (not yet recognized)

input attributes

pass values from the "caller"

to a production

... = ... IdentLIst ... IdentList = ...

Frame Files

Scanner spec

Parser spec

Sample.atg

Scanner.frame

Parser.frame

Scanner.cs

Parser.cs

Coco/R

Scanner.frame snippet

public class Scanner { const char EOL = '\n'; const int eofSym = 0; -->declarations ... public Scanner (Stream s) { buffer = new Buffer(s, true); Init(); } void Init () { pos = -1; line = 1; … -->initialization ... }

  • Coco/R inserts generated parts at positions

marked by "-->..."

  • Users can edit the frame files for adapting

the generated scanner and parser to their needs

  • Frame files are expected to be in the same directory

as the compiler specification (e.g. Sample.atg )

Interface of the Generated Parser

public class Parser { public Scanner scanner ; // the scanner of this parser public Errors errors ; // the error message stream public Token t ; // most recently recognized token public Token la ; // lookahead token public Parser (Scanner scanner); public void Parse (); public void SemErr (string msg); }

public class MyCompiler {

public static void Main (string[] arg) { Scanner scanner = new Scanner(arg[0]); Parser parser = new Parser(scanner); parser.Parse(); Console.WriteLine(parser.errors.count + " errors detected"); } }

Parser invocation in the main program

Syntax Error Handling

Syntax error messages are generated automatically

For invalid terminal symbols

production S = a b c.

input a x c

error message -- line ... col ...: b expected

For invalid alternative lists

production S = a (b | c | d) e.

input a x e

error message -- line ... col ...: invalid S

Error message can be improved by rewriting the production

productions S = a T e.

T = b | c | d.

input a x e

error message -- line ... col ...: invalid T

Syntax Error Recovery

The user must specify synchronization points where the parser should recover

Statement = SYNC ( Designator "=" Expr SYNC ';' | "if" '(' Expression ')' Statement ["else" Statement] | "while" '(' Expression ')' Statement | '{' {Statement} '}' | ... }.

synchronization points

What are good synchronization points?

Locations in the grammar where particularly "safe" tokens are expected

  • start of a statement: if, while, do, ...
  • start of a declaration: public, static, void, ...
  • in front of a semicolon

while (la.kind is not accepted here ) { la = scanner.Scan(); }

  • parser reports the error
  • parser continues to the next synchronization point
  • parser skips input symbols until it finds one that is expected at the synchronization point

What happens if an error is detected?

Errors Class

public class Errors { public int count = 0; // number of errors detected public TextWriter errorStream = Console.Out; // error message stream public string errMsgFormat = "-- line {0} col {1}: {2}"; // 0=line, 1=column, 2=text // called by the programmer (via Parser.SemErr) to report semantic errors public void SemErr (int line, int col, string msg) { errorStream.WriteLine(errMsgFormat, line, col, msg); count++; }

Coco/R generates a class for error message reporting

// called automatically by the parser to report syntax errors public void SynErr (int line, int col, int n) { string msg; switch (n) { case 0: msg = "..."; break; case 1: msg = "..."; break; ... } errorStream.WriteLine(errMsgFormat, line, col, msg); count++; }

syntax error messages generated by Coco/R

Generating Compilers with Coco/R

1. Compilers

2. Grammars

3. Coco/R Overview

4. Scanner Specification

5. Parser Specification

6. Error Handling

7. LL(1) Conflicts

8. Case Study

Terminal Successors of Nonterminals

Those terminal symbols that can follow a nonterminal in the grammar

Expr = ["+" | "-"] Term {("+" | "-") Term}. Term = Factor {("*" | "/") Factor}. Factor = ident | number | "(" Expr ")".

Follow(Expr) = ")", eof

Follow(Term) = "+", "-", Follow(Expr)

= "+", "-", ")", eof

Follow(Factor) = "*", "/", Follow(Term)

= "*", "/", "+", "-", ")", eof

Where does Expr occur on the

right-hand side of a production?

What terminal symbols can

follow there?

LL(1) Condition

For recursive descent parsing a grammar must be LL(1)

(parseable from L eft to right with L eftcanonical derivations and 1 lookahead symbol)

Definition

1. A grammar is LL(1) if all its productions are LL(1).

2. A production is LL(1) if all its alternatives start with different terminal symbols

S = a b | c.

LL(1)

First(a b) = {a} First(c) = {c}

S = a b | T. T = [a] c.

not LL(1)

First(a b) = {a} First(T) = {a, c}

In other words

The parser must always be able to select one of the alternatives by looking at the lookahead token.

S = (a b | T).

if the parser sees an "a" here it cannot decide which alternative to select

How to Remove Left Recursion

Left recursion is always an LL(1) conflict and must be eliminated

IdentList = ident | IdentList "," ident.

For example

can always be replaced by iteration

IdentList = ident {"," ident}.

(both alternatives start with ident )

generates the following phrases

IdentList

ident IdentList "," ident

ident "," ident IdentList "," ident "," ident

ident "," ident "," ident IdentList "," ident "," ident "," ident

Hidden LL(1) Conflicts

EBNF options and iterations are hidden alternatives

S = a [b]. First(b)  Follow(S) must be {}

S = a {b}. First(b)  Follow(S) must be {}

S = [a] b.  S = a b | b. a and b are arbitrary EBNF expressions

S = {a} b.  S = b | a b | a a b | ....

S = [a] b. First(a)  First(b) must be {}

S = {a} b. First(a)  First(b) must be {}

Rules

Dangling Else

If statement in C# or Java

Statement = "if" "(" Expr ")" Statement ["else" Statement] | ....

This is an LL(1) conflict!

First("else" Statement)  Follow(Statement) = {"else"}

It is even an ambiguity which cannot be removed

if (expr1) if (expr2) stat1; else stat2;

Statement

Statement

Statement

Statement

We can build 2 different syntax trees!

Can We Ignore LL(1) Conflicts?

An LL(1) conflict is only a warning

The parser selects the first matching alternative

S = a b c | a d.

if the lookahead token is a the parser selects this alternative

if (expr1) if (expr2) stat1; else stat2;

Statement

Statement

Luckily this is what we want here.

Statement = "if" "(" Expr ")" Statement [ "else" Statement ] | ....

If the lookahead token is "else" here

the parser starts parsing the option;

i.e. the "else" belongs to the innermost "if"

Example: Dangling Else