




































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of the pyth programming language, focusing on string literals, control statements, and function declarations. It covers various types of string literals, their meanings, and the distinction between statements and expressions. The document also explains simple and compound control statements, such as if statements and for loops, as well as function declarations and their types.
Typology: Study notes
1 / 44
This page cannot be seen from the preview
Don't miss anything!





































Department of Electrical Engineering and Computer Sciences Computer Science Division
CS 164 P. N. Hilfinger Spring 2008
The Pyth Programming Language (v 3.2)
Our project this year is a compiler for a dialect of the popular language Python. Python itself is usually categorized as a scripting language, meaning either that it is intended to imple- ment extensions to your computer’s command language, or that it is intended to implement “glue” programs that accomplish most of their work by invoking other self-contained pro- grams. You will often hear Python described as an “interpreted language,” in contrast to a “compiled language.” To the limited extent this statement is meaningful, it is false. First, as you’ll see in this course, the adjectives “interpreted” and “compiled” do not properly modify programming languages, but rather implementations of programming languages. Any language that can be implemented by an interpreter can be compiled, and vice-versa. There are C compilers, C interpreters, Lisp compilers, Lisp interpreters, Java compilers, and Java interpreters. There indeed are Python interpreters, but this semester, we will implement a compiler for the Pyth dialect of Python. Although this document is self-contained, I think that any well-trained professional pro- grammer would do well to be familiar with Python itself, a useful language whose design is refreshingly clean—in marked contrast to other scripting languages, especially Perl. A reference document is available on-line (there is a link from the class homepage) or in book form (David M. Beazley, Python Essential Reference, Third Edition, New Riders Publishing, 2006, ISBN: 0672328623).
Typically, we factor the description a programming language’s syntax into two parts: a lexical description, which defines a set of possible tokens or atomic symbols formed from the text of a program (called its source code or source), and a grammar that describes the possible properly formed constructs (or sentences), that may be created from sequences of such tokens. It is not essential to make this separation, but the practice has proven useful over the years, and there are tools and techniques for converting each of these descriptions into translation programs.
The source code of a Pyth program consists of the tokens described below possibly inter- spersed with whitespace (blanks, tabs, formfeeds), blank lines, and comments. Notionally, the conversion of source into tokens proceeds from the beginning to the end of the source in sequence, at each point forming the longest token or separator allowed by the rules. For example, the empty program
is interpreted as a blank line consisting of a single comment and the end of line, even though ‘print’ and ‘ 42 ’ can otherwise be interpreted as valid tokens. Likewise,
if42 = 31
is treated as three tokens ‘if42’, ‘=’, and ‘ 31 ’ even though ‘if’, ‘ 42 ’, ‘ 3 ’ and ‘ 1 ’ could all be valid tokens. This interpretation is sometimes called the maximal munch rule. An implication of this is that whitespace between two tokens is necessary only when the two tokens could be interpreted differently if concatenated. A comment consists of the hash character (‘#’, sometimes called an octothorpe) followed by all characters up to, but not including the end of the line. A blank line consists of any amount of whitespace^1 , possibly followed by a comment, and the end of the line. The end of a line is itself a token (or part of one); that is, in contrast to most programming languages, it is not generally treated as whitespace, but as a terminating character, like the semicolon in C or Java. There are three exceptions:
For example, the two statements
print 12, 13 print 12, 13
are equivalent, as are
a = 3 * (b + c) - x[ i ] a = 3 * (b
An identifier is a token that consists of letters, digits, and underscores (_), and does not start with a digit. Identifiers are mostly used as names of things. However, certain of them have special roles in the syntax, and may not be used as names. These reserved words are as follows:
and elif global or assert* else if pass break except* import print class exec* in raise* continue finally* is return def for lambda* try* del* from* not while
Words marked with an asterisk are not used in Pyth, but are reserved because of their use in Python.
A literal is any construct that stands for a value determined entirely by the text of that literal. For example, the numeral 42 stands for an integral value that can be determined from its two digits in the usual way.
1.4.1 Integer literals
Pyth’s integer literals are essentially the same as those of Java:
1.4.2 Floating-point literals
A floating-point literal is a decimal fraction or a numeral in modified scientific notation. In ordinary mathematical or engineering contexts, such literals would denote rational numbers; in Pyth, they represent rational approximations to these numbers. They are approximate, in general, because computer floating-point arithmetic generally uses binary fractions internally, and not all base-10 fractions can be exactly represented with finite base-2 fractions. For example, 0.1 in base 2 is the repeating numeral 0.00011001100110011... , and if floating-point numbers are limited to 52 bits of significance (a common number), then what you write as 0.1 is actually
which is close enough for most purposes. Floating-point literals in Pyth come in any of the following forms (the following all denote the same number):
1.23e2 # Means 1. 23 × 102 1.23E 0.123e .123e 1230.0e- 1230e-
That is, either a sequence of digits containing one decimal point, or a sequence of digits containing at most one decimal point followed by an ‘e’ or ‘E’, an optional sign, and an integer numeral, which is always treated as decimal.
1.4.3 String literals
Strings in Pyth are sequences of bytes. Their literals are written as ASCII text surrounded by any of several different kinds of quotation marks:
Literal Meaning "A ’string’ in double quotes" A ’string’ in double quotes ’A "string" in single quotes’ A "string" in single quotes "A "string" in double quotes" A "string" in double quotes ’A \’string\’ in single quotes’ A ’string’ in single quotes ’’’A ’string’ in triple single quotes’’’ A ’string’ in triple single quotes """A "string" in triple double quotes""" A "string" in triple double quotes "A string\nthat contains a new line" A string that contains a new line ’’’A multi-line A multi-line string in triple single quotes’’’ string in triple single quotes """A multi-line A multi-line string in triple double quotes""" string in triple double quotes
is a kind of statement that controls when and whether other statements are executed. A declaration is a kind of statement that defines new names. A statement, per se, consists either of a statement list, a single compound statement, a single import statement (2.5), or a type declaration (4.2), A statement list is a sequence of one or more simple statements separated by semicolons, optionally followed by a semicolon, and then terminated by a newline. The simple statements are pass (§2.1.1), print statements (§2.1.3), expression statements (§2.1.2), assignments (§2.1.4 and §2.1.5), break and continue statements (§2.3.2), and return statements (§2.4). A program in Pyth consists of a sequence of zero or more statements.
2.1.1 Pass
A pass statement does nothing. This probably does not sound useful, but one does occasion- ally need to write, for example, a statement that doesn’t need to do anything in places where Pyth requires there to be a statement:
def close (): pass
or you just might think that it looks better to be explicit about cases in which nothing needs to happen:
if x < 0: pass elif x < 10: y = f (x) elif ...
2.1.2 Expression statements
Any non-empty expression list (see §5.4) may be used as a statement (as may an empty pair of parentheses, although as a statement, it is entirely useless). When used as a statement, its value is ignored, and the expressions are evaluated for their side-effects alone. So typically, you’ll see function calls like this:
clear(theBoard)
2.1.3 Print
A print statement is a convenient way to output text. The simple form:
print expression 1 , ...expressionk
prints external representations of its arguments, separated by spaces, on the standard output, followed by an end of line. If k = 0, therefore, as in:
the program simply starts a new line. The external representations used are those produced by the str method, which is defined on all types (§4.1). Following the last expression with a comma:
print expression 1 , ...expressionk,
(k ≥ 1) does the same thing, but suppresses the end of line. For all but the first output on a line, furthermore, both forms of print first output a space. As a result,
print 1, print 2, print 3
prints
1 2 3
just like
print 1, 2, 3
You can direct output to a file other than the standard output using the >> operator:
f = open ("results.txt", "r") # f points to a "file object" print >> f, "The answer is", x
Again, in the absence of arguments, this just ends the current line in f, and with a trailing comma, it suppresses the end-of-line. The standard output is represented by a file called sys.stdout, so that the following are equivalent:
print 1, 2, 3 print >>sys.stdout, 1, 2, 3
The file sys.stderr represents the standard error output (conventionally used for error mes- sages from the program).
2.1.4 Assignment
In its simplest form, an assignment looks like this:
variable = expression
The value of the expression is stored in the variable. The right side of an assignment may be a list of expressions separated by commas (and possibly followed by one), which is just an abbreviation for a tuple (see §5.4):
x = 1, 2, 3 # short for x = (1, 2, 3) y = 1, # short for y = (1,)
assigns the integer 1 to x, while
x = (1,)
assigns the one-element sequence containing the single element 1 to x. (In general, one may always have a trailing comma after a list of things forming some kind of sequence display, but it only makes a difference in the one-element case.) The left-side sequences may be nested, as in
(x, (y, z[3])) = q
which requires that q contain two elements, the second of which is a sequence containing two items.
2.1.5 Augmented assignments
As in Java, the augmented assignment operators +=, -=, etc., have the effect of applying an operator to a variable’s value and then assigning the result back into the same variable. For example,
x += f(x)
is equivalent to
x = x + f(x)
Unlike Java, however, Pyth restricts the use of the augmented assignments to simple variables. We don’t allow
A[0] += 1
We also do not allow
x += y = 17
That is, the right-hand side of an augmented assignment must be a simple expression, not an assignment statement. Because of these restrictions, we can translate all augmented assign- ments into equivalent simple assignments. Thus
x += f(x)
becomes
x = x + f(x)
Compound statements by definition contain smaller constituent statements. Syntactically, each of these constituents is a statement, but one often needs the effect of a sequence of statements, rather than just one. Typical programming languages provide some kind grouping (or block ) construct for this purpose. In Pyth, this construct is called a suite, which comes in two flavors. The first form of suite is the statement list. For example:
if h > 0: dx = x/h; dy = y/h else: print "singularity"
The second form consists of an end-of-line followed by an INDENT (see §1.2), one or more statements, each terminated by an end-of-line, and a matching DEDENT. For example:
if h > 0: dx = x/h dy = y/h else: print "singularity"
In a statement like this, with two suites, there is no need to use the same form for both; for example:
if h > 0: dx = x/h dy = y/h else: print "singularity"
2.3.1 If statements
This statement executes exactly one of several possible statements depending on the values of some condition tests. The general form is
if expression 1 : suite 1 elif expression 2 : suite 2 ... elif expressionk : suitek else: suitek+
where k ≥ 1 (so that there need not be any elif clauses) and the else clause is optional. A missing else clause is equivalent to
else: pass
Each expressioni is evaluated in order. The suite corresponding to the first expression that yields a true value (see §5.13) is executed, after which the whole if statement is finished. If no expression yields a true value, the else clause is executed.
skips the print and assignment statements if A[i] is not 0. The break and continue statements are also valid inside for loops, described next.
2.3.3 For loops
The for loop is simply a shorthand for a particular kind of while loop^5. You write
for variable in expression list: suite 1 else: suite 2
This is functionally equivalent to the following:
LST = (expression list) I = 0 while I < len (LST): variable = getindex (LST, I) I += 1 suite 1 # (but indented appropriately) else: suite 2
where LST and I are replaced by some new variables that are not used anywhere else in the program. As usual, the else clause may be omitted, in which case suite 2 defaults to pass. For example:
A = [ 1, 2, 3 ] # An array for i in A: print i
The getindex method is defined for built-in sequence types and programmers can define this method for new classes as well (just as Java programmers can define classes that implement java.util.Iterable and work with Java’s for loop). Pyth has a built-in type xrange that allows you to write loops over numbers:
for i in xrange (0,3): print i,
prints ‘0 1 2’. In effect, the value of xrange(L, U ) is the sequence of integers, j, such that L ≤ j < U.
The statement
return optional expression list
must be a part of a function body. The expression list defaults to None if not specified. It is evaluated and this value is returned as the value of the current call to the innermost enclosing function. (^5) Pyth’s for statement differs significantly from Python’s, which makes use of generators and exceptions.
The statement
import identifier
causes the contents of a file called identifier .py to be substituted for the statement, if that file has not previously been imported. The statement has no effect if identifier.py has already been imported. It may only occur at the outer level of the program and not within a suite, including those of class declarations or function declarations. The compiler searches for this file in the directory containing the program’s source, plus additional directories in a standard list. At the end of a indentifier.py, all open INDENT brackets are implicitly closed with DEDENTS, as if the file were followed by an unindented pass statement.
In the study of programming languages, a declaration is something that introduces a meaning or binding for an identifier. See §6 for information about declaring classes. This section discusses the other kinds of declaration.
The scope of a declaration is the segment of program text or execution in which that decla- ration’s binding is the one that defines the identifier. Pyth uses static scoping, which means that the scope of a declaration is fixed, and does not depend on what statements in the pro- gram get executed. A declarative region is a section of text that confines the scope of the declarations within it. For example, in the Java declaration
class A { // 0 private int x; // 1 void f (int x) { // 2 if (x > 0) { // 3 int y = ...; // 4 f (y); // 5 } else { // 6 String y = ...; // 7 g (y); // 8 } } // 9 void g () { ... } // 10 } // 11
there are declarative regions for the body of A (lines 1–10), the body of f (lines 2–8), the block that forms the ‘then’ part of the if statement (lines 4–5), and the block that forms the
assignment to it does not create a new variable), and SIZE is “local” to the program as a whole:
def f (y): y = g(y) x = y+ print y SIZE = 13
The declaration, in other words, is implicit. The scope of a local declaration of x includes the entire body of the declarative region containing it. Thus, when a variable is not assigned to in a particular declarative region, its definition is “inherited” (although not in the object- oriented sense) from the enclosing context. For example,
def f (): x = 12 y = 3 def g (): y = 7 print x, y, g() print x, y f()
prints 12 7 12 3. Sometimes, however, you mean for an assignment within a function body to refer to an outer variable. The global declaration allows you to do so, to a limited extent:
x = 12 def tryToSetIt (y): x = y def reallySetIt (y): global x x = y tryToSetIt (42) print x # Prints 12 reallySetIt (42) print x # Prints 42
The identifiers (there may be a comma-separated list) in a global declaration must be defined at the outer level of the program (i.e., outside any def’s or class declarations). The scope of the global declaration is the entire function that contains it, not including any nested function definitions. A global declaration at the outer level is rather useless, but it does still require that its variables be defined somewhere at the outer level. It follows that a function can only set its own local variables or those at the outer (global) level. If it is nested inside another function, it cannot set that outer function’s variables. The
reason is simple: either the variable it assigns to is declared global, in which case it refers to an outer-level variable, or else is declared local by the assignment itself. The scope of a local-variable declaration is the entire declarative region in which it is assigned to at least once. Hence, it is possible to reference the variable before its assignment. Prior to assignment, its value is None^6.
The declaration
def name = expression
is a constant declaration that defines name to denote the value of the expression. The expres- sions in constant declarations are evaluated in the usual execution order, as if each declaration were an ordinary statement. The scope of the name, however, will typically start before that, in accordance with the usual scoping rules. Prior to executing the declaration, the value of name is None. One cannot assign to anything defined with def.
Functions may be declared with function declarations, which are specialized def statements, as in
def f (x): return x+
The identifiers listed in parentheses are the formal parameters of the function being defined. Their scope is the body of the function. Functions declared this way immediately inside a class body are instance methods, which means that they get called in a special way (see 5.10). Preceding the declaration with the keyword class, as in
class def g (): return 42
defines a class method (or static method), which is basically just an ordinary function. (See §6 for the significance of class def, which is used inside classes only.) It is also possible to define variables or ordinary constants whose values are functions, to pass functions as parameters, return them as values or store them in other data structures. For example:
def f (x,y): ... def synonymForF = f functionVar = f typedFunctionVar : (All, All) -> All typedFunctionVar = f listOfFunctions = [synonymForF] (^6) This treatment is different from Python’s mostly to avoid introducing still another kind of value—the “undefined value.”
Table 1: Predefined types in Pyth
Name Description
Any The supertype of all types. Every value is an Any. Void The type of the null object, None. Object A supertype of all user-defined classes.
Int Signed, 32-bit integers in the range − 231 to 2^31 − 1. Float Double-precision floating-point numbers. Bool Logical values (True or False).
String Character strings. Tuple Immutable tuples (e.g., (1, 2, 3)). Xrange Results of the xrange function (ranges of Ints). List Mutable (modifiable) sequences (e.g., [1,2,3]). Dict Mapping (e.g., {1: ’A’, 2: ’B’}).
File External files.
(T 1 ,... , Tk)->T 0 The type of all functions that take k ≥ 0 arguments of types T 1 ,... , Tk and returns a value of type T 0. Notationally, ->T 0 defaults to ->Void if omitted.
Again, this is because, as in Java, the variable T contains a reference to these tuple objects. The second assignment to T does not change the first tuple’s state; it merely changes which tuple T is pointing to. By their nature, immutable objects have an interesting property: one can pretend that their variables do not contain references to objects, and instead contain the objects themselves. After all, as the examples above show, after T is assigned to R, there is nothing you can do T that changes the contents of the tuple pointed to by R, just as in ordinary Java, once you assign one integer variable to another, any operations on the first variable have no effect on the second. In typical Pyth implementations, we take advantage of this fact, so that Int- and Float-valued variables, for example, do not contain pointers, and doing arithmetic does not require allocating new object storage.
The special value None belongs to all types. Its type is Void, the subtype of all types. In effect, this type inherits all instance methods, including ones in new classes you define, provides implementations for a few, and “implements” all the rest to cause errors. All values belong to type Any (it is a supertype of all types), but one cannot create a value whose dynamic type is Any. The methods defined for Any include all the predefined methods, but with certain exceptions, shown in Table 2, the default implementations of all these methods is to cause an error^8.
A type declaration has the form
name : type
It applies to the declaration of name (variable, constant, function) that is immediately within the current declarative region. The given type becomes the static type of name wherever the declaration applies. If name is a variable or field, only values that have a subtype of the given type may be assigned to name. The types of the formal parameters of a function may be declared outside the function like this:
augment : (Int, Int) -> Int def augment (a, b): ...
which causes a and b to have static type Int. In the absence of an explicit type declaration, functions declared with function declarations (see §3.4):
(^8) This is in marked contrast to Java, where, for example, the predefined String methods are defined only on Strings, and will give compile-time errors when attempted on things whose static type is Object. As a result of Pyth’s rule, many errors that would be caught at compile-time in Java are only caught during execution in Pyth. The reason for this is to increase the resemblance between Pyth and its parent language Python, which is entirely dynamically typed. New method names defined by the user in Pyth, by contrast, are not defined in class Any, and so (unlike Python, but like Java), you need type declarations to inform the compiler that these new method exist in a particular value.