Anatomy of Programming Languages, Exams of Programming Languages

This document is a series of notes about programming languages, ... Languages at the University of Texas at Austin, who helped out while I ...

Typology: Exams

2022/2023

Uploaded on 05/11/2023

shachi_984a
shachi_984a 🇺🇸

4.6

(15)

222 documents

1 / 128

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Anatomy of Programming Languages
William R. Cook
January 20, 2013
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Anatomy of Programming Languages and more Exams Programming Languages in PDF only on Docsity!

Anatomy of Programming Languages

William R. Cook

January 20, 2013

Chapter 1

Preliminaries

1.1 Preface

1.1.1 What?

This document is a series of notes about programming languages, originally written for students of the undergraduate programming languages course at UT.

1.1.2 Why?

I’m writing these notes because I want to teach the theory of programming languages with a practical focus, but I don’t want to use Scheme (or ML) as the host language. Thus many excellent books do not fit my needs, including Pro- gramming Languages: Application and Interpretation , Essentials of Programming Languages or Concepts in Programming Languages.

This book uses Haskell, a pure functional language. Phil Wadler gives some good reasons why to prefer Haskell over Scheme in his review of Structure and Interpretation of Computer Programs. I agree with most but not all of his points. For example, I do not care much for the fact that Haskell is lazy. Only small portions of this book rely upon this feature. I believe Haskell is particularly well suited to writing interpreters. But one must be careful to read Haskell code as one would read poetry, not the way one would read a romance novel. Ponder each line and extract its deep meaning. Don’t skim unless you are pretty sure what you are doing.

The title of this book is derived from one of my favorite books, The Anatomy of Lisp.

obstacle rather than an enabler. The normal reaction in such situations is to work around the problem and move on.

The study of language, including the study of programming languages, requires a different focus. We must examine the language itself, as an artifact. What are its rules? What is the vocabulary? How do different parts of the language work together to convey meaning? A user of a language has an implicit understanding of answers to these questions. But to really study language we must create an explicit description of the answers to these questions.

The concepts of structure and meaning have technical names.

syntax The structure of a language is called its syntax. semantics The rules that define the meaning of a language are called semantics.

Syntax is a particular way to structure information, while semantics can be viewed as a mapping from syntax to its meaning, or interpretation. The meaning of a program is usually some form of behavior, because programs do things. Fortunately, as programmers we are adept at describing the structure of infor- mation, and at creating mappings between different kinds of information and behaviors. This is what data structures and functions/procedures are for.

Thus the primary technique in these notes is to use programming to study programming languages. In other words, we will write programs to represent and manipulate programs. One general term for this activity is metaprogramming.

metaprogram A metaprogram is any program whose input or output is a program.

Familiar examples of metaprograms include compilers, interpreters, virtual machines. In this course we will read, write and discuss many metaprograms.

1.3 Introduction to Haskell Programming

The goal of this tutorial is to get the students familiar with Haskell Programming. Students are encouraged to bring their own laptops to go through the installation process of Haskell and corresponding editors, especially if they haven’t tried to install Haskell before or if they had problems with the installation. In any case the lab machines will have Haskell installed and students can also use these machines for the tutorial.

1.3.1 Installing Haskell and related tools

If you have your laptop and have not installed Haskell yet, you can try to install it now. The Haskell platform is the easiest way to install Haskell in Windows or Mac OS.

In Ubuntu Linux you can use: sudo aptget install haskellplatform

1.3.2 Installing Emacs

We recommend using emacs as the editor for Haskell, since it is quite simple to use and it has a nice Haskell mode. In Ubuntu you can do the following to install emacs and the corresponding Haskell mode: sudo aptget install emacs sudo aptget install haskellmode

In Mac OS you can try to use Aquamacs. Look here for a version of emacs for Windows.

However students are welcome to use whatever editor they prefer. If students are more comfortable using Vim, for example, they are welcome to. The choice of the editor is not important.

1.3.3 Basic steps in Haskell

In this tutorial we are going to implement our first Haskell code. To begin with, Haskell has normal data as in other programming languages. When writing Haskell code, lines that begin Prelude> are input to the Haskell interpreter, ghci , and the next line is the output.

Prelude > 3 + 8 ∗ 8 67 Prelude > TrueFalse False Prelude > "this is a " ++ "test" "this is a test"

As illustrated above, Haskell has standard functions for manipulating numbers, booleans, and strings. Haskell also supports tuples and lists, as illustrated below: Prelude > (3 ∗ 8 , "test" ++ "1" , ¬ True ) (24 , "test1" , False ) Prelude > () () Prelude > [1 , 1 + 1 , 1 + 1 + 1] [1 , 2 , 3] Prelude > 1 : [2 , 3] [1 , 2 , 3] Prelude > 1 : 2 : 3 : [ ] [1 , 2 , 3]

nested _ if2 x = if (( if x < 0 thenx else x ) 6 10) then x else error "Only numbers between [-10, 10] allowed"

Once you have thought about it you can try these definitions on your Haskell file and see if they are accepted or not. Question 2: Can you have nested if statements in Java, C or C++? For example, would this be valid in Java?

int m(int x) { if (( if (x < 0) -x else x) > 10) return x; else return 0; }

1.3.3.2 Data Types

Data types in Haskell are defined by variants and components. In other words, a data type has a set of variants each with their own constructor, or tag, and each variant has a set of components or fields. For example, here is a data type for simple geometry: data Geometry = Point Int Int -- x and y | Circle Int Int Int -- x, y, and radius | Rectangle Int Int Int Int -- top, left, right, bottom

A data type definition always begins with data and is followed by the name of the data type, in this case Geometry. There then follows a list of variants with unique tag names, in this case Point , Circle , and Rectangle , which are separated by a vertical bar |. Following each constructor tag is a list of data types specifying the types of components of that variant. A Point has two components, both integers. A Circle has three components, and a rectangle has 4 integer components. One issue with this notation is that it is not clear what the components mean. The meaning of each component is specified in a comment.

The tags are called constructors because they are defined to construct values of the data type. For example, here are three expressions that construct three geometric objects: Point 3 10 Circle 10 10 10 Rectangle 0 0 100 10

Data types can also be recursive, allowing the definition of complex data types.

data Geometry = Point Float Float -- x and y | Circle Float Float Float -- x, y, and radius | Rectangle Float Float Float Float -- top, left, right, bottom | Composite [ Geometry ] -- list of geometry objects

Here is a composite geometric value: Composite [ Point 3 10 , Circle 10 10 10 , Rectangle 0 0 100 10]

Two special cases of data types are enumerations , which only have variants and no components, and structures which only have a single variant, with multiple components. An example enumeration is the data type of days of the week: data Days = Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday

In this case the tags are constants. One well known enumeration is the data type Boolean : data Boolean = True | False

An example of a structure is a person data type:

data Person = Person String Int Int -- name, age, shoe size

Note that the data type and the constructor tag are the same. This is common for structures, and doesn’t cause any confusion because data types and constructor functions are always distinguished syntactically in Haskell. Here is an example person: Person "William" 42 10

A Haskell program almost always includes more than one data type definition.

1.3.3.3 Parametric Polymorphism and Type-Inference

We have seen that Haskell supports definitions with a type-signature or without. When a definition does not have a signature, Haskell infers one and it is still able to check whether some type-errors exist or not. For example, for the definition newline s = s ++ "\n"

Haskell is able to infer the type: StringString. In Haskell strings are represented as lists of characters, whose type is written [ Char ]. The operator ++ is a built-in function in Haskell that allows concatenat- ing two lists. However for certain definitions it appears as if there is not enough information to infer a type. For example, consider the definition: identity x = x

More Pattern Matching: Pattern matching can be used with different types. For example, here are two definitions with pattern matching on tuples and integers: first :: ( a, b ) → a first ( x, y ) = x

isZero :: IntBool isZero 0 = True isZero n = False

1.3.3.5 Pattern Matching Data Types

Functions are defined over data types by pattern matching. For example, to compute the area of a geometric figure, one would define: area :: GeometryFloat area ( Point x y ) = 0 area ( Circle x y r ) = pir ↑ 2 area ( Rectangle t l r b ) = ( bt ) ∗ ( rl ) area ( Composite cs ) = sum [ area c | ccs ]

1.3.3.6 Recursion

In functional languages mutable state is generally avoided and in the case of Haskell (which is purely functional) it is actually forbidden. So how can we write many of the programs we are used to? In particular how can we write programs that in a language like C would normally be written with some mutable state and some type of loop? For example:

int sum_array(int a[], int num_elements) { int i, sum = 0; for (i = 0; i < num_elements; i++) { sum = sum + a[i]; } return sum; }

The answer is to use recursive functions. For example here is how to write a function that sums a list of integers: sumList :: [ Int ] → Int sumList [ ] = 0 sumList ( x : xs ) = x + sumList xs

Question 5: The factorial function can be defined as follows:

n! = 1 if n = 0 n * (n-1)! if n > 0

Translate this definition into Haskell using recursion and pattern matching.

Question 6: The Fibonacci sequence is: 0 , 1 , 1 , 2 , 3 , 5 , 8 , 13 , 21 , 34 , 55 , 89 , 144 , ...

write a function:

fib :: IntInt

that given a number returns the corresponding number in the sequence. (If you don’t know Fibonacci numbers you may enjoy finding the recurrence pattern; alternatively you can look it up in WikiPedia). Question 7: Write a function: mapList :: ( ab ) → [ a ] → [ b ]

that applies the function of type ab to every element of a list. For example: Prelude > mapList absolute [4 , − 5 , 9 , −7] [4 , 5 , 9 , 7]

Question 7: Write a function that given a list of characters returns a list with the corresponding ASCII number of the character. Note that in Haskell, the function ord : ord :: CharInt

gives you the ASCII number of a character. To use it add the following just after the module declaration: import Data.Char

to import the character handling library. Question 8: Write a function filterList that given a predicate and a list returns another list with only the elements that satisfy the predicate. filterList :: ( aBool ) → [ a ] → [ a ]

For example, the following filters all the even numbers in a list (even is a built-in Haskell function): Prelude > filterList even [1 , 2 , 3 , 4 , 5] [2 , 4]

Chapter 2

Expressions, Syntax, and

Evaluation

This chapter introduces three fundamental concepts in programming languages: expressions , syntax and evaluation. These concepts are illustrated by a simple language of arithmetic expressions.

expression An expression is a combination of variables, values and operations over these values. For example, the arithmetic expression 2 + 3 uses two numeric values 2 and 3 and an operation + that operates on numeric values. syntax The syntax of an expression prescribes how the various components of the expressions can be combined. In general it is not the case that the components of expressions can be combined arbitrarily: they must obey certain rules. For example 2 3 or ++ are not valid arithmetic expressions. evaluation Each expression has a meaning (or value), which is defined by the evaluation of that expression. Evaluation is a process where expressions composed of various components get simplified until eventually we get a value. For example evaluating 2 + 3 results in 5.

2.1 Simple Language of Arithmetic

Lets have a closer look at the language of arithmetic, which is familiar to every grade-school child.

Figure 2.1: Graphical illustration of abstract structure

These are examples of arithmetic expressions. The rules for understanding such expressions are surprisingly complex. For example, in the third expression the first and third minus signs (−) mean subtraction, while the second and fourth mean that the following number is negative. The last two examples mean the same thing, because of the rule that multiplication must be performed before addition. The third expression is potentially confusing, even given knowledge of the rules for operations. It means (3 − (−2)) − (−7) not 3 − ((−2) − (−7)) because subtraction operations are performed left to right. Part of the problem here is that there is a big difference between our conceptual view of what is going on in arithmetic and our conventions for expressing arithmetic expressions in written form. In other words, there isn’t any confusion about what negative numbers are or what subtraction or exponentiation do, but there is room for confusion about how to write them down.

The conceptual structure of a given expression can be defined much more clearly using pictures. For example, the following pictures make a clear description of the underlying arithmetic operations specified in the expressions given above:

These pictures are similar to sentence diagramming that is taught in grade school to explain the structure of English.

The last picture represents the last two expressions in the previous example. This is because the pictures do not need parentheses, since the grouping structure is explicit.

2.2 Syntax

Syntax comes in two forms: abstract and concrete.

abstract syntax The conceptual structure (illustrated by the pictures) is called the abstract syntax of the language. concrete syntax The particular details and rules for writing expressions as strings of characters is called the concrete syntax.

2.2.2 Concrete Syntax and Grammars

The concrete syntax of a language describes how the abstract con- cepts in the language are represented as text. For example, lets con- sider how to convert the string “3 + 81 ∗ 2 ” into the abstract syntax Add ( Number 3) ( Multiply ( Number 81 ) ( Number 2)). The first step is to break a text up into tokens.

2.2.2.1 Tokens

Tokens are the basic units of a language. In English, for example, words are tokens. But English also uses many symbol tokens, including “.”, “!”, “?”, “(” and “)”. In the example “3 + 81 ∗ 2 ” the tokens are 3 , “ + ”, 81 , “ ∗ ”, and 2. It is also important to classify tokens by their kind. The tokens 3 , 81 and 2 are sequences of digits. The tokens “ + ” and “ ∗ ” are symbol tokens.

token A token is the basic syntactic unit of a language. Tokens can be individual characters, or groups of characters. Token are often classified into kinds, for example integers , strings , identifiers. identifier An identifier is a string of characters that represents a name. Identi- fiers usually begin with a alphabetic character, then continue with one or more numeric digits or special symbols. Special symbols that may be used include underscore "_" and “$”, but others may be included.

Tokens are typically as simple as possible, and they must be recognizable without considering any context. This means that the integer “ − 23 ” might not be a good token, because it contains the symbol “ − ”, which is also used in other contexts. More complex languages may have other kinds of tokens (other common kinds of token are keyword and identifier tokens, which are discussed later in the book). Token kinds are similar to the kinds of words in English, where some words are verbs and other words are nouns.

The following data structure is useful for representing basic tokens.

data Token = Digits Int | Symbol String

A Token is either an integer token or a symbol token with a string. For example, the tokens from the string “3 + 81 ∗ 2 ” are: Digits 3 Symbol "+" Digits 81 Symbol "*" Digits 2

The Lexer.hs file contains the code for a simple lexer that creates tokens in this form. It defines a function lexer that transforms a string (i.e. a list of characters) into a list of tokens. The lexer function takes as input a list of symbols and a list of keywords.

2.2.2.2 Grammars

Grammars are familiar from studying natural languages, but they are especialy important when studying computer languages.

grammar A grammar is a set of rules that specify how tokens can be placed together to form valid expressions of a language.

To create a grammar, it is essential to identify and name the different parts of the language.

syntactic category The parts of a language are called syntactic categories. For example, in English there are many different parts, including verb , noun , gerund , prepositional phrase , declarative sentence , etc. In software lan- gauges, example syntactic categories include expressions , terms, functions , types , or classes*.

It is certainly possible to be a fluent English speaker without any explicit awareness of the rules of English or the names of the syntactic categories. How many people can identify a gerund? But understanding syntactic categories is useful for studying a language. Creating a complete syntax of English is quite difficult, and irrelevant to the purpose of this book. But defining a grammar for a (very) small fragment of English is useful to illustrate how grammars work. Here is a simple grammar: Sentence : Noun Verb | Sentence PrepositionalPhase PrepositionalPhase : Preposition Noun Noun : ’Dick’ | ’Jane’ | ’Spot’ Verb : ’runs’ | ’talks’ Preposition : ’to’ | ’with’

The names Sentence , PrepositionalPhase , Noun , Verb , and Preposition are the syntactic categories of this grammar. Each line of the grammar is a rule that specifies a syntactic category, followed by a colon (:) and then sequence of alternative forms for that syntactic category. The words in quotes, including Dick , Jane , and Runs are the tokens of the language.

Here is a translation of the grammar into English:

  • a sentence is either:

Exp : digits { Number $ 1} | ’-’ digits { Number (− $ 2)} | Exp ’+’ Exp { Add $ 1 $ 3} | Exp ’-’ Exp { Subtract $ 1 $ 3} | Exp ’*’ Exp { Multiply $ 1 $ 3} | Exp ’/’ Exp { Divide $ 1 $ 3} | ’(’ Exp ’)’ {$2}

This grammar is similar to the one given above for English, but each rule includes an action enclosed in curly braces { ... }. The action says what should happen when that rule is recognized. In this case, the action is some Haskell code with calls to constructors to create the abstract syntax that corresponds to the concrete syntax of the rule. The special syntax $ n in an action means that the value of the nth item in the grammar rule should be used in the action. For example, in the last rule the $ 2 refers to the second item in the parenthesis rule, which is Exp.

Written out explicitly, this grammar means:

  • An expression Exp is either - a digit token ∗ which creates a Number with the integer value of the digits - a minus sign followed by a digits token ∗ which creates a Number with the negative of the integer value of the digits - an expression followed by a + followed by an expression ∗ which creates an Add node containing the value of the expressions - an expression followed by a − followed by an expression ∗ which creates a Subtract node containing the value of the expres- sions - an expression followed by a ∗ followed by an expression ∗ which creates a Multiply node containing the value of the expres- sions - an expression followed by a / followed by an expression ∗ which creates a Divide node containing the value of the expres- sions - a open parenthesis ‘(’ followed by an expression followed by a close parenthesis ‘)’ ∗ which returns the expression an throws away the parentheses

Given this lengthy and verbose explanation, I hope you can see the value of using a more concise notation! Just like other kinds of software, there are many design decisions that must be made in creating a grammar. Some grammars work better than others, depending on the situation.

2.2.2.4 Ambiguity, Precedence and Associativity

One problem with the straightforward grammar is allows for ambiguity.

ambiguity A sentence is ambiguous if there is more than one way that it can be derived by a grammar.

For example, the expression 1 − 2 − 3 is ambiguous because it can be parsed in two ways to create two different abstract syntax trees [TODO: define “parse”]: Subtract ( Number 1) ( Subtract ( Number 2) ( Number 3)) Subtract ( Subtract ( Number 1) ( Number 2)) ( Number 3)

TODO: show the parse trees? define “parse tree”

The same abstract syntax can be generated by parsing 1 − (2 − 3) and (1 − 2) − 3. We know from our training that the second one is the “correct” version, because subtraction operations are performed left to right. The technical term for this is that subtraction is left associative. (note that this use of the associative is not the same as the mathematical concept of associativity.) But the grammar as it’s written doesn’t contain any information associativity, so it is ambiguous.

Similarly, the expression 1 − 2 ∗ 3 can be parsed in two ways: Subtract ( Number 1) ( Multiply ( Number 2) ( Number 2)) Multiply ( Subtract ( Number 1) ( Number 2)) ( Number 2)

The same abstract syntax can be generated by parsing 1 − and (1 − 2)} 3. Again we know that the first version is the correct one, because multiplication should be performed before subtraction. Technically, we say that multiplication has higher precedence than subtraction.

precedence Precedence is an order on grammar rules that defines which rule should apply first in cases of ambiguity. Precedence rules are applied before associativity rules. associativity Associativity specifies whether binary operators are grouped from the left or the right in order to resolve ambiguity.

The grammar can be adjusted to express the precedence and associativity of the operators. Here is an example: Exp : Term {$1} Term : Term ’+’ Factor { Add $ 1 $ 3} | Term ’-’ Factor { Subtract $ 1 $ 3} | Factor {$1} Factor : Factor ’*’ Primary { Multiply $ 1 $ 3} | Factor ’/’ Primary { Divide $ 1 $ 3}