Programming Language Semantics: A Rewriting Approach, Study notes of Programming Languages

Basic concepts of computability theory and introduces notation for them. It explains Turing machines, which are abstract computational models used to capture the notion of a computing system. The Church-Turing thesis postulates that any computing system can be simulated by a Turing machine. The document also discusses the halting problem and the Post correspondence problem. It is a useful resource for students studying computer science and related fields.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

tanvir
tanvir 🇺🇸

5

(4)

224 documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Programming Language Semantics
A Rewriting Approach
Grigore Ros
,u
University of Illinois at Urbana-Champaign
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download Programming Language Semantics: A Rewriting Approach and more Study notes Programming Languages in PDF only on Docsity!

Programming Language Semantics

A Rewriting Approach

Grigore Ros, u

University of Illinois at Urbana-Champaign

2.2 Basic Computability Elements

In this section we recall very basic concepts of computability theory needed for other results in the book, and introduce our notation for these. This section is by no means intended to serve as a substitute for much thorough presentations found in dedicated computability textbooks (some mentioned in Section 2.2.4).

2.2.1 Turing Machines

Turing machines are abstract computational models used to formally capture the informal notion of a computing system. The Church-Turing thesis postulates that any computing system, any algorithm, or any program in any programming language running on any computer, can be equivalently simulated by a Turing machine. Having a formal definition of computability allows us to rigorously investigate and understand what can and what cannot be done using computing devices, regardless of what languages are used to program them. Intuitively, by a computing device we understand a piece of machinery that carries out tasks by successively applying sequences of instructions and using, in principle, unlimited memory; such sequences of instructions are today called programs, or procedures, or algorithms. Turing machines are used for their theoretical value; they are not meant to be physically built. A Turing machine is a finite state device with infinite memory. The memory is very primitively organized, as one or more infinite tapes of cells that are sequentially accessible through heads that can move to the left or to the right cell only. Each cell can hold a bounded piece of data, typically a Boolean, or bit, value. The tape is also used as the input/output of the machine. The computational steps carried out by a Turing machine are also very primitive: in a state, depending on the value in the current cell, a Turing machine can only rewrite the current cell on the tape and/or move the head to the left or to the right. Therefore, a Turing machine does not have the direct capability to perform random memory access, but it can be shown that it can simulate it. There are many equivalent definitions of Turing machines in the literature. We prefer one with a tape that is infinite at both ends and describe it next (interestingly, an almost identical machine was proposed by Emil Post independently from Alan Turing also in 1936; see Section 2.2.4). Consider a mechanical device which has associated with it a tape of infinite length in both directions, partitioned in spaces of equal size, called cells, which are able to hold either a 0 or an 1 and are rewritable. The device examines exactly one cell at any time, and can, potentially nondeterministically, perform any of the following four operations (or commands):

  1. Write a 1 in the current cell;
  2. Write a 0 in the current cell;
  3. Shift one cell to the right;
  4. Shift one cell to the left.

The device performs one operation per unit time, called a step. We next give a formal definition.

Definition 7. A (deterministic) Turing machine M is a 6-tuple (Q, B, qs, qh, C, M), where:

  • Q is a finite set of internal states;
  • qs ∈ Q is the starting state of M;
  • qh ∈ Q is the halting state of M;
  • B is the set of symbols of M; we assume without loss of generality that B = { 0 , 1 };
  • C = B ∪ {→, ←} is the set of commands of M;
  • M : (Q − {qh}) × B → Q × C is a total function, the transition function of M.

We assume that the tape contains only 0’s before the machine starts performing.

States and transition function (graphical representation to the right):

Q = {qs, qh, q 1 , q 2 } M(qs, 0) = (q 1 , →) M(qs, 1) = anything M(q 1 , 0) = (q 2 , 1) M(q 1 , 1) = (q 1 , →) M(q 2 , 0) = (qh, 0) M(q 2 , 1) = (q 2 , ←)

/ / qs 0 /→^ // q 1 0 /^1 //

1 /→

q 2 0 /^0 //

1 /←

qh //

Sample computation:

(qs, 0 ω 01110 ω) →M (q 1 , 0 ω 1110 ω) →M (q 1 , 0 ω 1110 ω) →M (q 1 , 0 ω 1110 ω) →M (q 1 , 0 ω 11100 ω) →M (q 2 , 0 ω 11110 ω) →M (q 2 , 0 ω 11110 ω) →M (q 2 , 0 ω 11110 ω) →M (q 2 , 0 ω 11110 ω) →M (q 2 , 0 ω 011110 ω) →M (qh, 0 ω 011110 ω)

Figure 2.1: Turing machine M computing the successor function, and sample computation

to functions that take any natural numbers as input and produce any natural numbers as output. For example, Figure 2.1 shows a Turing machine computing the successor function. Cells containing 0 can then be used as number separators, when more natural numbers are needed. For example, a Turing machine computing a binary operation on natural numbers would run on configurations (qs, 0 ω 01 m+^101 n+^10 ω) and would halt on configurations (qh, 0 ω 01 k+^10 ω), where m, n, k are natural numbers. One can similarly have tape encodings of rational numbers; for example, one can encode the number m/n as m followed by n with two 0 cells in-between (and keep the one-0-cell-convention for argument separation). Real numbers are not obviously representable, though. A Turing machine is said to compute a real number r iff it can finitely approximate r (for example using a rational number) with any desired precision; one way to formalize this is as follows: Turing machine Mr computes r iff when run on input natural number p, it halts with result rational number m/n such that |r − m/n| < 1 / 10 p. If a real number can be computed by a Turing machine then it is called Turing computable. Many real numbers, e.g., π, e,

2 , etc., are Turing computable.

2.2.2 Universal Machines, the Halting Problem, and Decision Problems

Since Turing machines have finite descriptions, they can be encoded themselves as natural numbers. Therefore, we can refer to “the kth^ Turing machine”, where k is a natural number, the same way we can refer to the ith^ input to a Turing machine. A universal Turing machine is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of the machine to be simulated as well as the input thereof from its own tape. There are various constructions of universal Turing machines in the literature, which we do not repeat here. We only notice that we can construct such a universal machine U which terminates precisely on all inputs of the form 1k 01 i^ where Turing machine k terminates on input i. This immediately implies that the language { 1 k 01 i^ | Turing machine k terminates on input i} is recursively enumerable. However, the undecidability of the famous halting problem (does a given Turing machine terminates on a given input?) implies that this language is not recursive; more specifically, it is not co-recursively enumerable. Since the elements of many mathematical domains can be encoded as words in B?, the terminology in Definition 8 is also used for decision problems over such domains. For example, the decision problem of

whether a graph given as input has a cycle or not is recursive; in other words, the set of cyclic graphs (under some appropriate encoding in B?) is recursive/decidable. Since there is a bijective correspondence between elements in B?^ and natural numbers, and also between tuples of natural numbers and natural numbers, decision problems are often regarded as one- or multi-argument relations or predicates over natural numbers. For example, a subset R ⊆ B?^ can be regarded as a predicate, say of three natural number arguments, where R(i, j, k) for i, j, k ∈ Nat indicates that the encoding of (i, j, k) belongs to R. While the haling problem is typically an excellent vehicle to formally state and prove that certain problems are undecidable, so they cannot be solved by computers no matter how powerful they are or what programming languages are used, its reflective nature makes the halting problem sometimes hard to use in practice. The Post correspondence problem (PCP) is an another canonical undecidable problem, which is sometimes easier to use to show that other problems are undecidable. The PCP can be stated as follows: given a set of (domino-style) tiles each containing a top and a bottom string of 0/1 bits, is it possible to find a sequence of possibly repeating such tiles so that the concatenated top strings equal the concatenated bottom strings? For example, if the given tiles are the following:

then the answer to the PCP problem is positive, because the sequence of tiles 3 2 3 1 yields the same concatenated strings at the top and at the bottom.

2.2.3 The Arithmetic Hierarchy

The arithmetical hierarchy defines classes of problems of increasing difficulty, called degrees, as follows:

Σ^00 = Π^00 = {R | R recursive}

Σ^0 n+ 1 = {P | ∃Q ∈ Π^0 n, ∀i(P(i) ↔ ∃ jQ(i, j))}

Π^0 n+ 1 = {P | ∃Q ∈ Σ^0 n, ∀i(P(i) ↔ ∀ jQ(i, j))}

For example, the Σ^01 degree consists of the predicates P over natural numbers for which there is some recursive predicate R such that for any i ∈ Nat, P(i) holds iff R(i, j) holds for some j ∈ Nat. It can be shown that Σ^01 contains precisely the recursively enumerable predicates. Similarly, Π^01 consists of the predicates P for which there is some recursive predicate R such that for any i ∈ Nat, P(i) holds iff R(i, j) holds for all j ∈ Nat, which is precisely the set of co-recursively enumerable predicates. An important degree is also Π^02 , which consists of the predicates P over natural numbers for which there is some recursive predicate R such that for any i ∈ Nat, P(i) holds iff for any j ∈ Nat there is some k ∈ Nat such that R(i, j, k) holds. A prototypical Π^02 problem is the following: giving a Turing machine M, does it terminate on all inputs? The complexity of the problem stays in the fact that there are infinitely many (but enumerable) inputs, so one can never be “done” with testing them; moreover, even for a given input, one does not know when to stop running the machine and reject the input. However, if one is given for each input accepted by the machine a run of the machine, then one can simply check the run and declare the input indeed accepted. Therefore, M terminates on all inputs iff for any input there exists some accepting run of M, which makes it a Π^02 problem because checking whether a given run on a given Turing machine with a given input accepts the input is decidable. Moreover, it can be shown that if we pick M to be the universal Turing machine U discussed above, then the following problem, which we refer to as Totality from here on, is in fact Π^02 -complete:

sorts: Cell, Tape, Configuration operations: 0 , 1 : → Cell zeros : → Tape : : Cell × Tape → Tape q : Tape × Tape → Configuration — one such operation for each q ∈ Q generic equation: zeros = 0 : zeros specific equations: q(L, b :R) = q′(L, b′:R) — one equation for each q, q′^ ∈ Q, b, b′^ ∈ Cell with M(q, b) = (q′, b′) q(L, b :R) = q′(b :L, R) — one equation for each q, q′^ ∈ Q, b ∈ Cell with M(q, b) = (q′, →) q(B :L, b :R) = q′(L, B : b :R) — one equation for each q, q′^ ∈ Q, b ∈ Cell with M(q, b) = (q′, ←)

Figure 2.2: Lazy equational logic representation Elazy M of Turing machine M

2.4.3 Computation as Equational Deduction

Here we discuss simple equational logic encodings of Turing machines (see Section 2.2.1 for general Turing machine notions). The idea is to associate an equational theory to any Turing machine, so that an input is accepted by the Turing machine if and only if an equation corresponding to that input can be proved from the equational theory of the Turing machine, using conventional equational deduction. Moreover, as seen in Section 2.5.3, the resulting equational theories can be executed as rewrite theories by rewrite engines, thus yielding actual Turing machine interpreters. We present two encodings, both based on intuitions from lazy data-structures, specifically stream data-structures. The first is simpler but requires lazy rewriting support from rewrite engines in order to be executed, while the second can be executed by any rewrite engines.

Lazy Equational Representation

Our first representation of Turing machines in equational logic is based on the idea that the infinite tape can be finitely represented by means of self-expanding stream data-structures. In spite of being infinite sequences of cells, like the Turing machine tapes, many interesting streams can be finitely specified using equations. For example, the stream of zeros, zeros = 0 : 0 : 0 : · · · , can be defined as zeros = 0 : zeros. Since at any given moment the portions of a Turing machine tape to the left and to the right of the head have a suffix consisting of an infinite sequence of 0 cells, it is natural to represent them as streams of the form b 1 :b 2 : · · · bn:zeros. When the head is on cell bn and the command is to move the head to the right, the self-expanding equational definition of zeros can produce one more 0, so that the head can move onto it. To expand zeros on a by-need basis and thus to avoid undesired non-termination due to the uncontrolled application of the self-expanding equation of zeros, this approach requires an equational/rewrite engine with support for lazy evaluation/rewriting in order to be executed. Figure 2.2 shows how a Turing machine M = (Q, B, qs, qh, C, M) can be associated a computationally equivalent equational logic theory Elazy M. Except for the self-expanding equation of the zeros stream and our

stream representation of the two-infinite-end tape, the equations of Elazy M are identical to the transition relation on Turing machine configurations discussed right after Definition 7. The self-expanding equation of zeros

guarantees that enough 0’s can be provided when the head reaches one or the other end of the sequence of cells visited so far. The result below shows that Elazy M is proof-theoretically equivalent to M:

Theorem 7. The following are equivalent:

(1) The Turing machine M terminates on input b 1 b 2... bn;

(2) Elazy M |= qs(zeros, b 1 : b 2 : · · · : bn:zeros) = qh(l, r) for some terms l, r of sort Tape.

Proof.... 

The equations in Elazy M can be applied in any direction, so an equational proof of Elazy M |= qs(zeros, b 1 : b 2 : · · · : bn : zeros) = qh(l, r) needs not necessarily correspond step-for-step to the computation of M on input b 1 b 2... bn. We will see in Section 2.5.3 that by orienting the specific equations in Figure 2.2 into rewrite rules, we will obtain a rewrite logic theory which will faithfully capture, step-for-step, the computational granularity of M. Note that in Figure 2.2 we preferred to define a configuration construct q : Tape × Tape → Configuration for each q ∈ Q. A natural alternative could have been to define an additional sort State for the Turing machine states, a constant q : → State for each q ∈ Q, and one generic configuration construct ( , ) : State × Tape × Tape → Configuration, as we do in the subsequent representation of Turing machines as rewriting logic theories (see Figure 2.3). The reason for which we did not do that here is twofold: first, in functional languages like Haskell it is very natural to associate a function to each such configuration construct q : Tape × Tape → Configuration, while it would take some additional effort to implement the second approach; second, the approach in this section is more compact than the one below.

Unrestricted Equational Representation

The equational representation of Turing machines above is almost as simple as it can be and, additionally, can be easily executed on programming languages or rewrite engines with support for lazy evaluation/rewriting, such as Haskell or Maude (see, e.g., Section 2.5.6). However, the fact that it requires lazy evaluation/rewriting and that the equivalence classes of configurations have infinitely many terms, its use is limited to systems that support strategies. Here we show that a simple idea can turn the representation in the previous section into an elementary one which can be executed on any equational/rewrite engines: replace the self-expanding and non-terminating (when regarded as a rewrite rule) equation “zeros = 0 :zeros” with configuration equations of the form “q(zeros, R) = q(0:zeros, R)” and “q(L, zeros) = q(L, 0 :zeros)”; these equations achieve the same role of expanding zeros by need, but avoid non-termination when applied as rewrite rules. Figure 2.3 shows our unrestricted representation of Turing machines as equational logic theories. There are some minor differences between the representation in Figure 2.3 and the one in Figure 2.2. For example, note that in order to add the two equations above for the expanding of zeros in a generic manner for any state, we separate the states from the configuration construct. In other words, instead of having an operation q : Tape × Tape → Configuration for each q ∈ Q like in Figure 2.2, we now have one additional sort State, a generic configuration construct ( , ) : State × Tape × Tape → Configuration, and a constant q : → State for each q ∈ Q. This change still allows us to write configurations as terms q(l, r), so we do not need to change the equations corresponding to the Turing machine transitions. With this modification in the signature, we can now remove the troubling equation zeros = 0 :zeros from the representation in Figure 2.2 and replace it with the two safe equations in Figure 2.3. Let EM be the equational logic theory in Figure 2.3.

Theorem 8. The following are equivalent:

sort: Stream operations: : : Int × Stream → Stream head : Stream → Int tail : Stream → Stream zeros : → Stream zip : Stream × Stream → Stream add : Stream → Stream fibonacci : → Stream equations: head(X : S ) = X tail(X : S ) = S zeros = 0 : zeros zip(X : S 1 , S 2 ) = X : zip(S 2 , S 1 ) add(X 1 : X 2 : S ) = (X 1 +Int X 2 ) : add(S ) fibonacci = 0 : 1 : add(zip(fibonacci, tail(fibonacci)))

Figure 2.8: Streams of integers defined as an algebraic datatype. The variables S , S 1 , S 2 have sort Stream and the variables X, X 1 , X 2 have sort Int.

Streams

Figure 2.8 shows an example of a data-structure whose elements are infinite sequences, called streams, together with several particular streams and operations on them. Here we prefer to be more specific than in the previous examples and work with streams of integers. We assume the integers and operations on them already defined; specifically, we assume Int to be their sort and operations on them indexed with Int to distinguish them from other homonymous operations, e.g., +Int, etc. The operation : adds a given integer to the beginning of a given stream, and the dual operations head and tail extract the head (integer) and the tail (stream) from a stream. The stream zeros contains only 0 elements. The stream operation zip merges two streams by interleaving their elements, and add generates a new stream by adding any two consecutive elements of a given stream. The stream fibonacci consists of the Fibonacci sequence (see Exercise 25). It is interesting to note that the equational specification of streams in Figure 2.8 is one where its initial algebra semantics is likely not the model that we want. Indeed, the initial algebra here would consists of infinite classes of finite terms, where any two terms in any class are provably equal, for example {zeros, 0 : zeros, 0 : 0 : zeros,... }. While this is a valid and interesting model of streams, it is likely not what one has in mind when one thinks of streams as infinite sequences. Nevertheless, the intended stream model is among the models/algebras of this equational specification, so any equational deduction or reduction that we perform, with or without strategies, is sound (see Exercise 26).

2.4.7 Notes

Equational encodings of general computation into equational deduction are well-known; for example, [ 7 , 1 ] show such encodings, where the resulting equational specifications, if regarded as term rewrite systems (TRSs), are confluent and terminate whenever the original computation terminates. Our goal in this section is to discuss equational encodings of (Turing machine) computation. These encodings will be used later in the paper to show the Π^02 -hardness of the equational satisfaction problem in the initial algebra. While we could have used existing encodings of Turing machines as TRSs, however, we found them more complex and intricate for our purpose in this paper than needed. Consequently (and also for the sake of self-containment), we recall the more recent (simple) encoding and corresponding proofs from [ 65 ]. Since the subsequent encoding is general purpose rather than specific to our Π^02 -hardness result, the content of this section may have a more pedagogical than technical nature. For example, the references to TRSs are technically only needed to prove the equational encoding correct, so they could have been removed from the main text and added only in the proofs, but we find them pedagogically interesting and potentially useful for other purposes. The equational encodings that follow can be faithfully used as TRS Turing-complete computational engines, because each rewrite step corresponds to precisely one computation step in the Turing machine; in other words, there are no artificial rewrite steps.

2.4.8 Exercises

Exercise 24. Eliminate the two equations in Figure 2.3 as discussed right after Theorem 8, and prove a result similar to Theorem 8 for the new representation.

Exercise 25. Show that the fibonacci stream defined in Figure 2.8 indeed defines the sequence of Fibonacci numbers. This exercise has two parts: first formally state what to prove, and second prove it.

Exercise 26. Consider the equational specification of streams in Figure 2.8. Define the intended model/algebra of streams over integer numbers with constant streams and functions on streams corresponding to the various operations in this specification. Then show that this model indeed satisfies all the equations in Figure 2.8. Describe also its default initial model and compare it with the intended model. Are they isomorphic?

sorts: Cell, Tape, State, Configuration operations: 0 , 1 : → Cell zeros : → Tape : : Cell × Tape → Tape ( , ) : State × Tape × Tape → Configuration q : → State — one such constant for each q ∈ Q equations: S (zeros, R) = S (0:zeros, R) S (L, zeros) = S (L, 0:zeros) rules: q(L, b :R) → q′(L, b′:R) — one rule for each q, q′^ ∈ Q, b, b′^ ∈ Cell with (q′, b′) ∈ M(q, b) q(L, b :R) → q′(b :L, R) — one rule for each q, q′^ ∈ Q, b ∈ Cell with (q′, →) ∈ M(q, b) q(B :L, b :R) → q′(L, B : b :R) — one rule for each q, q′^ ∈ Q, b ∈ Cell with (q′, ←) ∈ M(q, b)

Figure 2.11: Unrestricted rewriting logic representation RM of Turing machine M

for any finite sequences of bits u, v, u′, v′^ ∈ { 0 , 1 }∗, any bits b, b′^ ∈ { 0 , 1 }, and any states q, q′^ ∈ Q , where if u = b 1 b 2... bn− 1 bn, then ←−u = bn : bn− 1 : · · · : b 2 : b 1 : zeros and →−u = b 1 : b 2 : · · · : bn− 1 : bn : zeros. Finally, the following are equivalent:

(1) The Turing machine M terminates on input b 1 b 2... bn;

(2) Rlazy M |= qs(zeros, b 1 : b 2 : · · · : bn:zeros) → qh(l, r) for some terms l, r of sort Tape ; note though that Rlazy M does not terminate on term qs(zeros, b 1 : b 2 : · · · : bn:zeros) as an unrestricted rewrite system, since the equation zeros = 0 : zeros (regarded as a rewrite rule) can apply forever, thus yield- ing infinite equational classes of configurations with no canonical forms, but Rlazy M terminates on qs(zeros, b 1 : b 2 : · · · : bn:zeros) if the stream construct operation : : Cell × Tape → Tape has a lazy rewriting strategy on its second argument;

Proof.... 

Therefore, unlike the equational logic theory Elazy M in Theorem 7, the rewrite logic theory Rlazy M faithfully captures, step-for-step, the computational granularity of M. Recall that equational deduction does not count as computational, or rewrite steps in rewriting logic, which allows to apply the self-expanding equation of zeros silently in the background. Since there are no artificial rewrite steps, we can conclude that RM actually is precisely M and not an encoding of it. Theorem 10 thus showed not only that rewriting logic is Turing complete, but also that it faithfully captures the computational granularity of the represented Turing machines.

Unrestricted Rewrite Logic Representations

Figure 2.11 shows our unrestricted representation of Turing machines as rewriting logic theories, following the same idea as the equational representation in Section 2.4.3 (Figure 2.3). Let RM be the rewriting logic theory in Figure 2.11. Then the following result holds:

Theorem 11. The rewriting logic theory RM is confluent. Moreover, the Turing machine M and the rewrite theory RM are step-for-step equivalent, that is,

(q, 0 ωubv 0 ω) →M (q′, 0 ωu′b′v′ 0 ω) if and only if RM |= q(←−u , b : →−v ) →^1 q′(

u′, b′^ :

v′^ )

for any finite sequences of bits u, v, u′, v′^ ∈ { 0 , 1 }∗, any bits b, b′^ ∈ { 0 , 1 }, and any states q, q′^ ∈ Q , where if u = b 1 b 2... bn− 1 bn, then ←−u = bn : bn− 1 : · · · : b 2 : b 1 : zeros and →−u = b 1 : b 2 : · · · : bn− 1 : bn : zeros. Finally, the following are equivalent:

(1) The Turing machine M terminates on input b 1 b 2... bn;

(2) RM terminates on term qs(zeros, b 1 : b 2 : · · · : bn:zeros) as an unrestricted rewrite system and RM |= qs(zeros, b 1 : b 2 : · · · : bn:zeros) → qh(l, r) for some terms l, r of sort Tape;

Proof.... 

Like for the lazy representation of Turing machines in rewriting logic discussed above, the rewrite theory RM is the Turing machine M, in that there is a step-for-step equivalence between computational steps in M and rewrite steps in RM. Recall, again, that equations do not count as rewrite steps, their role being to structurally rearrange the term so that rewrite rules can apply; indeed, that is precisely the intended role of the two equations in Figure 2.11 (they reveal new blank cells on the tape whenever needed). Similarly to the equational case in Section 2.4.3, the two generic equations can be completely eliminated. However, this time we have to add more Turing-machine-specific rules instead. For example, if (q′, ←) ∈ M(q, b) then, in addition to the last rule in Figure 2.11, we also include the rule:

q(zeros, b :R) → q′(zeros, 0 : b :R)

This way, one can expand zeros and apply the transition in one rewrite step, instead of one equational step and one rewrite step. Doing that systematically for all the transitions allows us to eliminate the need for equations entirely; the price to pay is, of course, that the number of rules increases.

mod is

endm

where can be any identifier. The of a module can include importation of other modules, sort and operation declarations, and a set of sentences. The sorts together with the operations form the signature of that module, and can be thought of as the interface of that module to other modules. To lay the ground for introducing more Maude features, let us define Peano-style natural numbers with addition and multiplication. We define the addition first, in one separate module:

mod PEANO-NAT is sort Nat. op zero : -> Nat. op succ : Nat -> Nat. op plus : Nat Nat -> Nat. vars N M : Nat. eq plus(zero, M) = M. eq plus(succ(N), M) = succ(plus(N, M)). endm Declarations and sentences are always terminated by periods, which should have white spaces before and after. Forgetting a terminal period or a white space before the period are two of the most common errors that Maude beginners make. The signature of PEANO-NAT consists of one sort, Nat, and three operations, namely zero, succ, and plus. Sorts are declared with the keywords sort or sorts, and operations with op or ops. The three operations have zero, one and two arguments, respectively, whose sorts are listed between the symbols : and ->. Operations of zero arguments are also called constants, those of one argument are called unary and those of two binary. The result sort appears right after the symbol ->. We use ops when two or more operations of same arguments are declared together, to save space, and then we use white spaces to separate them:

ops plus mult : Nat Nat -> Nat.

There are few special characters in Maude, and users are allowed to define almost any token or combination of tokens as operation names. If you use op in the above instead of ops, for example, then only one operation, called “plus mult”, is declared. The two equations in PEANO-NAT are properties, or constraints, that terms built with these operations must satisfy. Another way to look at equations is through the lenses of possible implementations of the specifications they define; in our case, any correct implementation of Peano natural numbers should satisfy the two equations. Equations are quantified universally with the variables they contain, and can be applied from left-to-right or from right-to-left in reasoning, which means that equational proofs may require exponential search, thus making them theoretically intractable. Maude provides limited support for equational reasoning.

reduce: Rewriting with Equations

When executing specifications, Maude regards all equations as rewrite rules, which means that they are applied only from left to right. Moreover, they are applied iteratively for as long as their left-hand-side terms match any subterm of the term to reduce. This way, any well-formed term can either be derived infinitely often, or be reduced to a normal form, which cannot be reduced anymore by applying equations as rewriting rules. Maude’s command to reduce a term to its normal form using equations as rewrite rules is reduce, or simply red. Reduction will be made in the last defined module, which is PEANO-NAT in our case:

Maude> reduce plus(plus(succ(zero),succ(succ(zero))), succ(succ(succ(zero)))). rewrites: 6 in 0ms cpu (0ms real) (˜ rewrites/second) result Nat: succ(succ(succ(succ(succ(succ(zero))))))

Make sure commands are terminated with a period. Maude implements state of the art term rewriting algorithms, based on advanced indexing and pattern matching techniques. This way millions of rewrites per second can be performed, making Maude usable as a programming language in terms of performance. Sometimes the results of reductions are repetitive and may be too large to read. To ameliorate this problem, Maude provides an operator attribute called iter, which allows to input and print repetitive terms more compactly. For example, if we replace the declaration of operation succ with

op succ : Nat -> Nat [iter].

then Maude uses, e.g., succˆ3(zero) as a shorthand for succ(succ(succ(zero))). For example,

Maude> reduce plus(plus(succ(zero),succˆ2(zero)), succˆ3(zero)). result Nat: succˆ6(zero)

Importation

Modules can be imported in several different ways. The difference between importation modes is subtle and semantical rather than operational, and it is not relevant in this book. Therefore, we only use the most general of them, including. For example, the following module extends PEANO-NAT with multiplication:

mod PEANO-NAT* is including PEANO-NAT. op mult : Nat Nat -> Nat. vars M N : Nat. eq mult(zero, M) = zero. eq mult(succ(N), M) = plus(mult(N, M), M). endm

It is safe to think of including as “copy and paste” the contents of the imported module into the importing module, with one exception: variable declarations are not imported, so they need to be redeclared. We can now “execute programs” using features in both modules: red mult(plus(succ(zero),succ(succ(zero))), succ(succ(succ(zero)))).

The following is Maude’s output:

rewrites: 18 in 0ms cpu (0ms real) (˜ rewrites/second) result Nat: succˆ9(zero)

Even though this language is very simple and its syntax is ugly, it nevertheless shows a formal and executable definition of a language using equational logic and rewriting. Other languages or formal analyzers discussed in this book will be defined in a relatively similar manner, though, as expected, they will be more involved.

The Mixfix Notation and Parsing

The plus and mult operations defined above are meant to be written using the prefix notation in terms. Maude also supports the mixfix notation for operations (see Section 2.1.3), by allowing the user to write underscores in operation names as placeholders for their corresponding arguments.

Associativity, Commutativity and Identity Attributes

Some of the binary operations used in this book will be associative (A), commutative (C) or have an identity (I), or combinations of these. E.g., + is associative, commutative and has 0 as identity. All these can be added as attributes to operations when declared:

op + : Int Int -> Int [assoc comm id: 0 prec 33]. op * : Int Int -> Int [assoc comm id: 1 prec 31].

Note that each of the A, C, and I attributes are logically equivalent to appropriate equations, such as

eq A + (B + C) = (A + B) + C. eq A + B = B + A. ---> attention: rewriting does not terminate! eq A + 0 = A.

When applied as rewrite rules, each of the three equations above have limitations. The associativity equation forces all the parentheses to be grouped to the left, which may prevent some other rules from applying. The commutativity equation may lead to non-termination when applied as a rewrite rule. The identity equation would only be able to simplify expressions, but not to add a 0 to an expression, which may be useful in some situations (we will see such situations shortly, in the context of lists). Maude’s builtin support for ACI attributes addresses all the problems above. Additionally, the assoc attribute of a mixfix operation is also taken into account by Maude’s parser, which hereby eliminates the need for some useless parentheses:

Maude> parse X + Y + Z. Nat: X + Y + Z

An immediate consequence of the builtin support for the comm attribute, which allows rewriting with commutative operations to terminate, is that normal forms will be reported now modulo commutativity:

Maude> red X + Y + X. rewrites: 0 in 0ms cpu (0ms real) (˜ rewrites/second) result Nat: X + X + Y

As seen, Maude picked to display some equivalent (modulo AC) of the original term (extracted from how the current implementation of Maude stores this term internally). There were 0 rewrites applied in the reduction above, because the internal rearrangements of terms according to the ACI attribute annotations do not count as rule applications.

Matching Modulo Associativity, Commutativity, and Identity

Here we discuss Maude’s support for ACI matching, which is arguably one of the most distinguished and complex Maude features, and nevertheless the reason and the most important use of the ACI attributes. We discuss ACI matching by means of a series of examples, starting with lists, which occur in many programming languages. The following module defines lists of integers with a membership operation in , based on AI (associative and identity) matching:

mod INT-LIST is including INT. sort IntList. subsort Int < IntList. op nil : -> IntList. op __ : IntList IntList -> IntList [assoc id: nil]. op in : Int IntList -> Bool. var I : Int. vars L L’ : IntList. eq I in L I L’ = true. eq I in L = false [owise]. endm

We start by including the builtin INT module, which declares a sort Int and provides arbitrary large integers as constants of sort Int, together with the usual operations on these. The builtin module BOOL, which similarly declares a sort Bool and common Boolean operations on it, is automatically included in all modules, so it needs not be included explicitly. To see a an existing module, builtin or not, use the command

Maude> show module .

For example, “show module INT .” will display the INT module. In the INT-LIST module above, note the subsort declaration “Int < IntList”, which says that integers are also lists of integers. This, together with the constant nil and the concatenation operation , can generate any finite list of integers:

Maude> parse 1 2 3 4 5. IntList: 1 2 3 4 5 Maude> red 1 nil 2 nil 3 nil 4 nil 5 6 7 nil. rewrites: 0 in 0ms cpu (0ms real) (˜ rewrites/second) result IntList: 1 2 3 4 5 6 7

Note how the reduce command above eliminated all the unnecessary nil constants from the list, in zero rewrite steps, for the same reason as above: the internal rearrangements according to the ACI attributes do not count as rewrite steps. The two equations defining the membership operation make use of AI matching. The first equation says that if we can match the integer I anywhere inside the list, then we are done. Since the list constructor was declared associative and with identity nil, Maude is mathematically allowed to bind the variables L and L’ of sort IntList to any lists of integers, including the empty one. Maude indeed does this through its efficient AI matching algorithm. Equations with attribute owise are applied only when other equations fail to apply. Therefore, we defined the semantics of the membership operation only by means of AI matching, without having to implement any explicit traversal of the list. Here are some examples testing the semantics above:

Maude> red 3 in 2 3 4. result Bool: true Maude> red 3 in 3 4 5. result Bool: true Maude> red 3 in 1 2 4. result Bool: false

To define sets of integers (see, e.g., Exercise 30), besides likely renaming the sort IntList into IntSet, we would also need to declare the concatenation operation commutative; moreover, thanks to Maude’s commutative matching, we can also replace the first equation by “eq I in I L = true .” We next discuss a Maude definition of (partial finite-domain) maps (see Section 2.4.6 and Figure 2. for the mathematical definition). We assume that the Source and Target sorts are defined in separate modules SOURCE and TARGET, respectively; one may need to change these in concrete applications. The associativity, commutativity and identity equations in Figure 2.7 are replaced by corresponding Maude operational attributes. Note that the second equation defining the update operation takes advantage of Maude’s owise attribute (explained above), so it departs from the more mathematical definition in Figure 2.7:

mod MAP is including SOURCE + TARGET. sort Map. op |-> : Source Target -> Map [prec 0]. op empty : -> Map. op , : Map Map -> Map [assoc comm id: empty]. op () : Map Source -> Target [prec 0]. --- lookup op [/_] : Map Target Source -> Map [prec 0]. --- update