PCAT Programming Language Reference Manual - Handout | CS 322, Study notes of Computer Science

Material Type: Notes; Professor: Li; Class: LANG COMPILER DESIGN; Subject: Computer Science; University: Portland State University; Term: Fall 2004;

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-9gv
koofers-user-9gv 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The PCAT Programming Language
Reference Manual
Andrew Tolmach and Jingke Li
Dept. of Computer Science
Portland State University
(revised October 8, 2004)
1 Introduction
The PCAT language (Pascal Clone with an ATtitude) is a small imperative programming language with nested func-
tions, record values with implicit pointers, arrays, integer and real variables, and a few simple structured control
constructs.
This manual gives an inform al specification for the language. Fragments of EBNF syntax are introduced a t relevant
points in the text; the complete grammar is given in Section 12.
2 Lexical Issues
PCAT’s character set is the standard 7-bit ASCII set. PCAT is case sensitive; upper and lower-case letters are not
considered equivalent.
Whitespace (blank, tab or newline characters) serves to separate tokens; otherwise it is ignored. Whitespace is
needed between two adjacent keywords or identifiers, or between a keyword or identifier and a number. However, no
whitespace is required between a number and a keyword, since this causes no ambiguity. Delimiters and operators
don’t need whitespace to separate them from their neighbors on either side. Whitespace may not appear in any token
except a string (see below).
Comments are enclosed in the pair (* and *); they cannot be nested. Any character is legal in a comment. Of
course, the first occurrence of the sequence of characters *) will terminate the comment. Comments may appear
anywhere a token may appear; they are self-delimiting, i.e., they do not need to be separated from their surroundings
by whitespace.
2.1 Tokens
Tokens consist of keywords, literal constants, identifiers, operators, and delimiters.
The following are reserved keywords. They must be written in upper case.
AND ARRAY BEGIN BY DIV DO ELSE
ELSIF END EXIT FOR IF IS LOOP
MOD NOT OF OR PROCEDURE PROGRAM READ
RECORD RETURN THEN TO TYPE VAR WHILE
WRITE
Literal constants are either integer, real, or string. Integers contain only digits; they must be in the range 0 to
231 1.Reals consist of one or more digits, followed by a decimal point, followed by zero or more digits. There is no
specific range constraint on reals, but the literal as a whole is limited to 255 characters in length. Note that no numeric
literal can be negative, since there is no provision for a minus sign. Strings begin and end with a double quote (")
and contain any sequence of printable ASCII characters (i.e., having decimal character codes in the range 32 126)
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download PCAT Programming Language Reference Manual - Handout | CS 322 and more Study notes Computer Science in PDF only on Docsity!

The PCAT Programming Language

Reference Manual

Andrew Tolmach and Jingke Li

Dept. of Computer Science

Portland State University

(revised October 8, 2004)

1 Introduction

The PCAT language ( P ascal C lone with an AT titude) is a small imperative programming language with nested func- tions, record values with implicit pointers, arrays, integer and real variables, and a few simple structured control constructs. This manual gives an informal specification for the language. Fragments of EBNF syntax are introduced at relevant points in the text; the complete grammar is given in Section 12.

2 Lexical Issues

PCAT’s character set is the standard 7-bit ASCII set. PCAT is case sensitive; upper and lower-case letters are not considered equivalent. Whitespace (blank, tab or newline characters) serves to separate tokens; otherwise it is ignored. Whitespace is needed between two adjacent keywords or identifiers, or between a keyword or identifier and a number. However, no whitespace is required between a number and a keyword, since this causes no ambiguity. Delimiters and operators don’t need whitespace to separate them from their neighbors on either side. Whitespace may not appear in any token except a string (see below). Comments are enclosed in the pair (* and *); they cannot be nested. Any character is legal in a comment. Of course, the first occurrence of the sequence of characters *) will terminate the comment. Comments may appear anywhere a token may appear; they are self-delimiting, i.e., they do not need to be separated from their surroundings by whitespace.

2.1 Tokens

Tokens consist of keywords, literal constants, identifiers, operators, and delimiters. The following are reserved keywords. They must be written in upper case.

AND ARRAY BEGIN BY DIV DO ELSE ELSIF END EXIT FOR IF IS LOOP MOD NOT OF OR PROCEDURE PROGRAM READ RECORD RETURN THEN TO TYPE VAR WHILE WRITE

Literal constants are either integer, real, or string. Integers contain only digits; they must be in the range 0 to 231 − 1. Reals consist of one or more digits, followed by a decimal point, followed by zero or more digits. There is no specific range constraint on reals, but the literal as a whole is limited to 255 characters in length. Note that no numeric literal can be negative, since there is no provision for a minus sign. Strings begin and end with a double quote (") and contain any sequence of printable ASCII characters (i.e., having decimal character codes in the range 32 – 126)

except double quote. Note in particular that strings may not contain tabs or newlines. String literals are limited to 255 characters in length, not including the delimiting double quotes. Identifiers are strings of letters and digits starting with a letter, excluding the reserved keywords. Identifiers are limited to 255 characters in length. The following are the operators :

:= + - * / < <= > >= = <>

and the delimiters :

: ; ,. ( ) [ ] { }

3 Programs

A program is the unit of compilation for PCAT. Programs have the following syntax:

program -> PROGRAM IS body ’;’ body -> {declaration} BEGIN {statement} END

A program is executed by first elaborating its declaration sequence, then executing its statement sequence, and then terminating. Each file read by the compiler must consist of exactly one program. There is no facility for linking multiple programs or for separate compilation of parts of a program.

4 Declarations

All identifiers occurring in a program must be introduced by a declaration, except for a small set of pre-defined identifiers: REAL, INTEGER, BOOLEAN, TRUE, FALSE (see Section 5.1), and NIL (see Section 5.3). Declarations serve to specify whether the identifier represents a type, a variable, or a procedure (all of which live in a single name space ) or a record component name (which live in separate name spaces; see Section 5.3).

declaration -> VAR var-decls -> TYPE type-decls -> PROCEDURE procedure-decls

Declarations may be global to the program or local to a particular procedure. The scope of a declaration extends roughly from the point of declaration to the end of the enclosing procedure (for local declarations) or the end of the program (for global declarations). The detailed scope rules differ for each kind of declaration (see Sections 5,7, and 8). A local declaration of an identifier hides any outer declarations and makes them inaccessible in the inner scope. No identifier may be declared twice in the same procedure or at global level. An identifier declared as a type name may not be redeclared anywhere within its scope. Declaration elaboration is only meaningful for VAR declarations (see Section 7).

5 Types

PCAT is a strongly-typed language; every expression has a unique type, and types must match at assignments, calls, etc. (except that an integer can be used where a real is expected; see Section 5.1.) Types are referred to by type names. The built-in basic types (see Section 5.1) have predefined names; new types are created by type declarations in which the type constructors ARRAY or RECORD are applied to existing types.

declaration -> TYPE type-decls type-decls -> type-decl {AND type-decl} type-decl -> typename IS type ’;’ typename -> ID

7 Variables

Variables are declared thus:

declaration -> VAR var-decls var-decls -> var-decl { var-decl } var-decl -> ID { ’,’ ID } [ ’:’ typename ] ’:=’ expression ’;’

Every value must have an initial value, given by expression. The type name can be omitted whenever the type can be deduced from the initial value (which is always possible except when the initial value is NIL). A VAR declaration is elaborated by elaborating the var-decl clauses in order; a var-decl is elaborated by evaluating the initializing expression and storing the resulting value into the specified variables. The scope of each declared variable begins just after the containing var-decl clause, so it includes any subse- quent var-decl clauses in the same VAR declaration; it does not include the variable’s own initializing expression, so declarations are never recursive.

8 Procedures

Procedures are declared thus:

declaration -> PROCEDURE procedure-decls procedure-decls -> procedure-decl {AND procedure-decl} procedure-decl -> ID formal-params [ ’:’ typename ] IS body ’;’ formal-params -> ’(’ fp-section {’;’ fp-section } ’)’ -> ’(’ ’)’ fp-section -> ID {’,’ ID} ’:’ typename body -> {declaration} BEGIN {statement} END

Procedures encompass both proper procedures , which are activated by the execution of a procedure call state- ment and do not return a value, and function procedures , which are activated by the evaluation of a procedure call expression and return a value which becomes the value of the call expression. Proper procedure declarations are distinguished by the lack of a return type (see also Section 11.10). A procedure may have zero or more formal parameters , whose names and types are specified in the procedure declaration, and whose actual values are specified when the procedure is activated. The scope of formal parameters is the body of the procedure (including its local declarations). Parameters are always passed by value. A procedure body is executed by first elaborating its declaration sequence, then executing its statement sequence, and finally returning to the calling procedure. There is an implicit RETURN statement at the bottom of every procedure body. Each set of procedures declared following a single PROCEDURE keyword (and separated by AND keywords) is treated as (potentially) mutually recursive; that is, the scope of each procedure name begins at the point of declaration of the first procedure in the set, and includes the bodies of all the procedures in the set as well as the body of the enclosing procedure (or, for top-level procedures, the whole program).

9 L-values

An l-value is a location whose value can be either read or assigned. Variables, procedure parameters, record compo- nents, and array elements are all l-values.

lvalue -> ID -> lvalue ’[’ expression ’]’ -> lvalue ’.’ ID The square brackets notation ([]) denotes array element dereferencing; the expression within the brackets must evaluate to an integer expression within the bounds of the array. The dot notation (.) denotes record component dereferencing; the identifier after the dot must be a component name within the record.

10 Expressions

10.1 Simple expressions

expression -> number -> lvalue -> ’(’ expression ’)’ number -> INTEGER | REAL

A number expression evaluates to the literal value specified. Note that reals are distinguished from integers by lexical criteria (see Section 2). An l-value expression evaluates to the current contents of the specified location. Parentheses can be used to alter precedence in the usual way.

10.2 Arithmetic operators

expression -> unary-op expression -> expression binary-op expression unary-op -> ’-’ binary-op -> ’+’ | ’-’ | ’*’ | ’/’ | DIV | MOD

Operators +,-,* require integer or real arguments. If both arguments are integers, an integer operation is per- formed and the integer result is returned; otherwise, any integer arguments are coerced to reals, a real operation is performed, and the real result is returned. Operator / requires integer or real arguments, coerces any integer argu- ments to reals, performs a real division, and always returns a real result. Operators DIV (integer quotient) and MOD (integer remainder) take integer arguments and return an integer result. All the binary operators evaluate their left argument first.

10.3 Logical operators

expression -> unary-op expression -> expression binary-op expression unary-op -> NOT binary-op -> OR | AND

These operators require boolean operands and return a boolean result. OR and AND are “short-circuit” operators; they do not evaluate the right-hand operand if the result is determined by the left-hand one.

10.4 Relational operators

expression -> expression binary-op expression binary-op -> ’>’ | ’<’ | ’=’ | ’>=’ | ’<=’ | ’<>’

These operators all return a boolean result. These operators all work on numeric arguments; if both arguments are integer, an integer comparison is made; otherwise, any integer argument is coerced to real and a real comparison is made. Operators = and <> also work on pairs of boolean arguments, or pairs of record or array arguments of the same type; for the latter, they test “pointer” equality (that is, whether two records or arrays are the same instance, not whether they have the same contents). These operators all evaluate their left argument first.

10.5 Procedure call

expression -> ID actual-params ’;’ actual-params -> ’(’ expression {’,’ expression} ’)’ -> ’(’ ’)’ This expression is evaluated by evaluating the argument expressions left-to-right to obtain actual parameter values, and then executing the function procedure specified by ID with its formal parameters bound to the actual parameter values. The procedure returns by executing an explicit RETURN statement (with an expression for the value to be returned). The returned value becomes the value of the procedure call expression.

11.3 Read

statement -> READ ’(’ lvalue {’,’ lvalue} ’)’ ’;’

This statement is executed by evaluating the l-values to locations in left-to-right order, and then reading numeric literals from standard input, evaluating them, and assigning the resulting values into the locations. The l-values must have type integer or real, and their types guide the evaluation of the corresponding literals. Input literals are delimited by whitespace, and the last one must be followed by a carriage return.

11.4 Write

statement -> WRITE write-params ’;’ write-params -> ’(’ write-expr {’,’ write-expr } ’)’ -> ’(’ ’)’ write-expr -> STRING -> expression

Executing this statement evaluates the specified expressions (which must be simple integers, reals, booleans, or string literals) in left-to-right order, and then writes the resulting values to standard output (with no separation between values), followed by a newline.

11.5 If-then-else

statement -> IF expression THEN {statement} {ELSIF expression THEN {statement}} [ ELSE {statement} ] END ’;’

This statement specifies the conditional execution of guarded statements. The expression preceding a statement sequence, which must evaluate to a boolean, is called its guard. The guards are evaluated in left-to-right order, until one evaluates to TRUE, after which its associated statement sequence is executed. If no guard is satisfied, the statement sequence following the ELSE (if any) is executed.

11.6 While

statement -> WHILE expression DO {statement} END ’;’

The statement sequence is repeatedly executed as long as the expression evaluates to TRUE, or until the execution of an EXIT statement within the sequence (but not inside any nested WHILE, LOOP, or FOR).

11.7 Loop

statement -> LOOP {statement} END ’;’

The statement sequence is repeatedly executed. The only way to terminate the iteration is by executing an EXIT statement within the sequence but not inside any nested WHILE, LOOP, or FOR.

11.8 For

statement -> FOR ID ’:=’ expression TO expression [ BY expression ] DO {statement} END ’;’

Executing the statement FOR id := exp 1 TO exp 2 BY exp 3 DO stmts is equivalent to the following steps: (i) evaluate expressions exp 1 , exp 2 , and exp 3 in that order to values v 1 , v 2 , v 3 (which must be integers); (ii) if the value of id is less than or equal to v 2 , execute stmts; otherwise terminate the loop. (iii) set id := id + v 3 and repeat step (ii). If the BY clause is omitted, v 3 is taken to be 1.

ID is an ordinary integer variable; it must be declared in the scope containing the FOR statement, and it can be inspected or set above, within, or below the loop body. If an EXIT statement is executed within the body of the loop (but not within the body of any nested WHILE, LOOP or FOR statement), the loop is prematurely terminated, and control passes to the statement following the FOR.

11.9 Exit

statement -> EXIT ’;’

Executing EXIT causes control to pass immediately to the next statement following the nearest enclosing WHILE, LOOP or FOR statement. If there is no such enclosing statement, the EXIT is illegal.

11.10 Return

statement -> RETURN [ expression ] ’;’

Executing RETURN terminates execution of the current procedure and returns control to the calling context. There can be multiple RETURNs within one procedure body, and there is an implicit RETURN at the bottom of every proce- dure. A RETURN from a function procedure must specify a return value expression of the return type; a RETURN from a proper procedure must not. The main program body must not include a RETURN.