The Computability Concepts - Cryptography | MATH 0209A, Papers of Cryptography and System Security

Material Type: Paper; Class: CRYPTOGRAPHY; Subject: Mathematics; University: University of California - Los Angeles; Term: Unknown 1989;

Typology: Papers

Pre 2010

Uploaded on 09/17/2009

koofers-user-z6d
koofers-user-z6d 🇺🇸

9 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 1
The Computability Concept
1 The informal concept
Decidable sets
Computability theory, also known as recursion theory, is the area of mathematics
dealing with the concept of an effective procedure—a procedure that can be
carried out by following specific rules. For example, we might ask whether there
is some effective procedure—some algorithm—that, given a sentence about the
positive integers, will decide whether that sentence is true or false. In other
words, is the set of true sentences about the positive integers decidable? (We
will see later that the answer is negative.) Or for a simpler example, the set of
prime numbers is certainly a decidable set. That is, there are quite mechanical
procedures, which are taught in the schools, for deciding of any given positive
integer whether or not it is a prime number. (For a very large number, the
procedure taught in the schools might take a long time.) If we want, we can
write a computer program to execute the procedure. Simpler still, the set of even
positive integers is decidable. We can write a computer program that, given a
positive integer, will very quickly decide whether or not it is even. Our goal
is to study what decision problems can be solved (in principle) by a computer
program, and what decision problems (if any) cannot.
More generally, consider a set Sof natural numbers. (The natural numbers
are 0,1,2, . . .. In particular, 0 is natural.) We say that Sis a decidable set if there
exists an effective procedure that, given any natural number, will eventually end
by supplying us with the answer: “Yes” if the given number is a member of S
and “No” if it is not a member of S.
(Initially, we are going to examine computability in the context of the nat-
ural numbers. Later, we will see that computability concepts can be readily
transferred to the context of strings of letters from a finite alphabet. In that
context, we can consider a set Sof strings, such as the set of equations, like
x(y+z) = xy +xz, that hold in the algebra of real numbers. But to start with,
we will consider sets of natural numbers.)
And by an effective procedure here is meant a procedure for which we can
give exact instructions—a program—for carrying out the procedure. Following
these instructions should not demand brilliant insights on the part of the agent
(human or machine) following them. It must be possible, at least in principle,
to make the instructions so explicit that they can be executed by a diligent clerk
(who is very good at following directions but is not too clever) or even a machine
(which does not think at all). That is, it must be possible for our instructions to
be mechanically implemented. (One might imagine a mathematician so brilliant
that he or she can look at any sentence of arithmetic and say whether it is true
or false. But you cannot ask the clerk to do this. And there is no computer
101
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download The Computability Concepts - Cryptography | MATH 0209A and more Papers Cryptography and System Security in PDF only on Docsity!

Chapter 1

The Computability Concept

1 The informal concept

Decidable sets

Computability theory, also known as recursion theory, is the area of mathematics dealing with the concept of an effective procedure—a procedure that can be carried out by following specific rules. For example, we might ask whether there is some effective procedure—some algorithm—that, given a sentence about the positive integers, will decide whether that sentence is true or false. In other words, is the set of true sentences about the positive integers decidable? (We will see later that the answer is negative.) Or for a simpler example, the set of prime numbers is certainly a decidable set. That is, there are quite mechanical procedures, which are taught in the schools, for deciding of any given positive integer whether or not it is a prime number. (For a very large number, the procedure taught in the schools might take a long time.) If we want, we can write a computer program to execute the procedure. Simpler still, the set of even positive integers is decidable. We can write a computer program that, given a positive integer, will very quickly decide whether or not it is even. Our goal is to study what decision problems can be solved (in principle) by a computer program, and what decision problems (if any) cannot. More generally, consider a set S of natural numbers. (The natural numbers are 0, 1 , 2 ,.. .. In particular, 0 is natural.) We say that S is a decidable set if there exists an effective procedure that, given any natural number, will eventually end by supplying us with the answer: “Yes” if the given number is a member of S and “No” if it is not a member of S. (Initially, we are going to examine computability in the context of the nat- ural numbers. Later, we will see that computability concepts can be readily transferred to the context of strings of letters from a finite alphabet. In that context, we can consider a set S of strings, such as the set of equations, like x(y + z) = xy + xz, that hold in the algebra of real numbers. But to start with, we will consider sets of natural numbers.) And by an effective procedure here is meant a procedure for which we can give exact instructions—a program—for carrying out the procedure. Following these instructions should not demand brilliant insights on the part of the agent (human or machine) following them. It must be possible, at least in principle, to make the instructions so explicit that they can be executed by a diligent clerk (who is very good at following directions but is not too clever) or even a machine (which does not think at all). That is, it must be possible for our instructions to be mechanically implemented. (One might imagine a mathematician so brilliant that he or she can look at any sentence of arithmetic and say whether it is true or false. But you cannot ask the clerk to do this. And there is no computer

program to do this. It is not merely that we have not succeeded in writing such a program. We can actually prove that such a program cannot possibly exist!) Although these instructions must of course be finite in length, we impose no upper bound on their possible length. We do not rule out the possibility that the instructions might even be absurdly long. (If the number of lines in the instructions exceeds the number of electrons in the universe, we merely shrug and say, “That’s a pretty long program.”) We insist only that the instructions— the program—be finitely long, so that we can communicate them to the person or machine doing the calculations. (There is no way to give someone all of an infinite object.) Similarly, in order to obtain the most comprehensive concepts, we impose no bounds on the time that the procedure might consume before it supplies us with the answer. Nor do we impose a bound on the amount of storage space (scratch paper) that the procedure might need to use. (The procedure might, for example, need to utilize very large numbers requiring a substantial amount of space simply to write down.) We merely insist that the procedure give us the answer eventually, in some finite length of time. What is definitely ruled out is doing infinitely many steps and then giving the answer. In Chapter 7, we will consider more restrictive concepts, where the amount of time is limited in some way, so as to exclude the possibility of ridiculously long execution times. But initially we want to avoid such restrictions, to obtain the limiting case where practical limitations on execution time or memory space are removed. It is well known that in the real world, the speed and capability of computers has been steadily growing. We want to ignore actual speed and actual capability, and instead to ask what the purely theoretical limits are. The foregoing description of effective procedures is admittedly vague and imprecise. In the following section, we will look at how this vague description can be made precise—how the concept can be made into a mathematical concept. Nonetheless, the informal idea of what can be done by effective procedure, that is, what is calculable, can be very useful. Rigor and precision can wait until the next chapter. First we need a sense of where we are going. For example, any finite set of natural numbers must be decidable. The program for the decision procedure can simply include a list of all the numbers in the set. Then given a number, the program can check it against the list. Thus the concept of decidability is interesting only for infinite sets. Our description of effective procedures, vague as it is, already shows how limiting the concept of decidability is. One can, for example, utilize the concepts of countable and uncountable sets (see the appendix for a summary of these concepts). It is not hard to see that there are only countably many possible instructions of finite length that one can write out (using a standard keyboard, say). But there are uncountably many sets of natural numbers (by Cantor’s diagonal argument). It follows that almost all sets, in a sense, are undecidable. The fact that not every set is decidable is relevant to theoretical computer science. The fact that there is a limit to what can carried out by effective procedures means there is a limit to what can—even in principle—be done by computer programs. And this raises the questions: What can be done? What cannot?

For a k-place partial function f , we say that f is an effectively calculable partial function if there exists an effective procedure with the following property:

  • Given a k-tuple ~x in the domain of f , the procedure eventually halts and returns the correct value for f (~x).
  • Given a k-tuple ~x not in the domain of f , the procedure does not halt and return a value.

(There is one issue here: How can a number be given? To communicate a number x to the procedure, we send it the numeral for x. Numerals are bits of language, which can be communicated. Numbers are not. Communication requires language. Nonetheless, we will continues to speak of being “given num- bers m and n” and so forth. But at a few points, we will need to be more accurate and to take account of the fact that what the procedure is given are numerals. There was a time in the 1960’s when, as part of the “new math,” schoolteachers were encouraged to distinguish carefully between numbers and numerals. This was a good idea that turned out not to work.) For example, the partial function for subtraction

f (m, n) =

m − n if m ≥ n ↑ otherwise

is effectively calculable, and procedures for calculating it, using base-10 numer- als, are taught in the elementary schools. The empty function is effectively calculable. The effective procedure for it, given a k-tuple, does not need to do anything in particular. But it must never halt and return a value. The concept of decidability can then be described in terms of functions: For a subset S of Nk, we can say that S is decidable iff its characteristic function

CS (~x) =

Yes if ~x ∈ S No if ~x /∈ S

(which is always total) is effectively calculable. Here “Yes” and “No” are some fixed members of N, such as 1 and 0. (That word “iff” in the preceding paragraph means “if and only if.” This is a bit of mathematical jargon that has proved to be so useful that it has become a standard part of mathspeak.) Here if k = 1 then S is a set of numbers. If k = 2 then we have the concept of a decidable binary relation on numbers, and so forth. Take for example the divisibility relation, that is, the set of pairs 〈m, n〉 such that m divides n evenly. (For definiteness, assume that 0 divides only itself.) The divisibility relation is decidable, because given m and n, we can carry out the division algorithm we all learned in the fourth grade, and see whether the remainder is 0 or not.

Example: Any total constant function on N is effectively computable. Sup- pose, for example, f (x) = 36 for all x in N. There is an obvious procedure for

calculating f ; it ignores its input and writes “36” as the output. This may seem a triviality, but compare it with the next example. Example: Define the function F as follows.

F (x) =

1 if Goldbach’s conjecture is true 0 if Goldbach’s conjecture is false.

Goldbach’s conjecture states that every even integer greater than 2 is the sum of two primes; for example 22 = 5 + 17. This conjecture is still an open problem in mathematics. Is this function F effectively computable? (Choose your answer before reading the next paragraph.) Observe that F is a total constant function. (Classical logic enters here: Either there is an even number that serves as a counterexample or there isn’t.) So as noted in the preceding example, F is effectively computable. What, then, is a procedure for computing F? I don’t know, but I can give you two procedures, and be confident that one of them computes F. The point of this example is that effective computability is a property of the function itself, not a property of some linguistic description we might give the function. (One says that the effective computability property is extensional.) There are many English phrases that would serve to define F. For a function to be effectively computable, there must exist (in the mathematical sense) an effective procedure for computing it. That is not the same as saying that you hold such procedure in your hand. If, in the year 2083, some creature in the universe proves (or refutes) Goldbach’s conjecture, that does not mean that F will suddenly change from non-computable to computable. It was computable all along. There will be, however, situations later in which we will want more than the mere existence of an effective procedure P ; we will want some way of actually finding P , given some suitable clues. That is for later.

It is very natural to extend these concepts to the the situation where we have half of decidability: Say that S is semi-decidable if its “semi-characteristic function”

cS (~x) =

Yes if ~x ∈ S ↑ if ~x /∈ S

is an effectively calculable partial function. Thus a set S of numbers is semi- decidable if there is an effective procedure for recognizing members of S. We can think of S as the set that the procedure accepts. And the effective procedure, while it may not be a decision procedure, is at least an acceptance procedure. Any decidable set is also semi-decidable. If we have an effective procedure that calculates the characteristic function CS , then we can convert it to an ef- fective procedure that calculates the semi-characteristic function cS. We simply replace each “output No” command by some endless loop. Or more informally, we simply unscrew the No bulb. What about the converse? Are there semi-decidable sets that are not decid- able? We will see that there are indeed. The trouble with the semi-characteristic

was in exactly the opposite direction! The following digression expands on this point.)

Digression. The concept of a general-purpose, stored-program computer is now very common, but the concept developed slowly over a period of time. The ENIAC machine, the most important computer of the 1940’s, was programmed by setting switches and inserting cables into plugboards! This is a far cry from treating a program like data. It was von Neumann who in a 1945 technical sup- port laid out the crucial ideas for a general-purpose stored-program computer, that is, for a universal computer. Turing’s 1936 paper on what are now called Turing machines, had proved the existence of a “universal Turing machine” to compute the Φ function described below. When Turing went to Princeton in 1936–37, von Neumann was there and must have been aware of his work. Ap- parently von Neumann’s thinking in 1945 was influenced by Turing’s work of nearly a decade earlier.

Suppose we adopt a fixed method of encoding any set of instructions by a single natural number. (First we convert the instructions to a string of 0’s and 1’s—one always does this with computer programs—and then we regard that string as naming a natural number under a suitable base-2 notation.) Then the “universal function”

Φ(w, x) = the result of applying the instructions coded by w to the input x

is an effectively calculable partial function (where it is understood that Φ(w, x) is undefined whenever applying the instructions coded by w to the input x fails to halt and return an output). Here are the instructions for Φ: “Given w and x, decode w to see what it says to do with x, and then do it.” Of course, the function Φ is not total. For one thing, when we try to decode w, we might get complete nonsense, so that the instruction “then do it” leads nowhere. And even if decoding w yields explicit and comprehensible instructions, applying those instructions to a particular x might never yield an output. The two-place partial function Φ is “universal” in the sense that any one- place effectively calculable partial function f is given by the equation

f (x) = Φ(e, x) for all x

where e codes the instructions for f. It will be helpful to introduce a special notation here: Let [[e]] be the one-place partial function defined by the equation

[e] = Φ(e, x).

That is, [[e]] is the partial function whose instructions are coded by e, with the understanding that, because some values of e might not code anything sensible, the function [[e]] might be the empty function. In any case, [[e]] is the partial function we get from Φ when we hold its first variable fixed at e. Thus

[[0]], [[1]], [[2]],...

is a complete list of all the 1-place effectively calculable partial functions. The values of [[e]] are given by the (e + 1)st row in the following table:

[[0]] Φ(0, 0) Φ(0, 1) Φ(0, 2) Φ(0, 3) · · · [[1]] Φ(1, 0) Φ(1, 1) Φ(1, 2) Φ(1, 3) · · · [[2]] Φ(2, 0) Φ(2, 1) Φ(2, 2) Φ(2, 3) · · · [[3]] Φ(3, 0) Φ(3, 1) Φ(3, 2) Φ(3, 3) · · · · · · · · · · · · · · · · · ·

Using the universal partial function Φ, we can construct an undecidable binary relation, the halting relation H:

〈w, x〉 ∈ H ⇐⇒ Φ(w, x) ↓ ⇐⇒ applying the instructions coded by w to input x halts

On the positive side, H is semi-decidable. To calculate the semi-characteristic function cH (w, x), given w and x, we first calculate Φ(w, x). If and when this halts and returns a value, we give output “Yes” and stop. On the negative side, H is not decidable. To see this, first consider the following partial function:

f (x) =

Yes if Φ(x, x) ↑ ↑ if Φ(x, x) ↓

(Notice that we are employing the classical diagonal construction. Looking at the earlier table of the values of Φ arranged in a two-dimensional array, one sees that f has been made by going along the diagonal of that table, taking the entry Φ(x, x) found there, and making sure that f (x) differs from it.) There are two things to be said about f. First, f cannot possibly be effec- tively calculable. Consider any set of instructions that might compute f. Those instructions have some code number k and hence compute the partial function [[k]]. Could that be the same as f? No, f and [[k]] differ at the input k. That is, f has been constructed in such a way that f (k) differs from [k]; they differ because one is defined and the other is not. So these instructions cannot cor- rectly compute f ; they produce the wrong result at the input k. And because k was arbitrary, we are forced to conclude that no set of instructions can correctly compute f. (This is our first example of a partial function that is not effectively calculable. There are a great many more, as will be seen.) Secondly, we can argue that if we had a decision procedure for H, then we could calculate f. To compute f (x), we first use that decision procedure for H to decide if (x, x) ∈ H or not. If not, then f (x) = Yes. But if (x, x) ∈ H, then the procedure for finding f (x) should throw itself into an infinite loop, because f (x) is undefined. Putting these two observations about f together, we conclude that there can be no decision procedure for H. The fact that H is undecidable is usually expressed by saying that “the halting problem is unsolvable”; i.e., we cannot effectively determine, given w and x, whether applying the instructions coded by w to the input x will eventually terminate or will go on forever:

The set K is an example of a semi-decidable set that is not decidable. Its complement K is not semi-decidable; we have seen that its semi-characteristic function f is not effectively calculable. The connection between effectively calculable partial functions and semi- decidable sets can be further described as follows:

Theorem: (i) A relation is semi-decidable if and only if it is the domain of some effectively calculable partial function. (ii) A partial function f is an effectively calculable partial function if and only if its graph G (i.e., the set of pairs 〈~x, y〉 such that f (~x) = y) is a semi- decidable relation.

For statement (i), one direction is true by definition: Any relation is the domain of its semi-characteristic function, and for a semi-decidable relation, that function is a effectively calculable partial function. Conversely, for a effectively calculable partial function f we have the natural semi-decision procedure for its domain: Given ~x, we try to compute f (~x). If and when we succeed in finding f (~x), we ignore the value and simply say Yes and halt. To prove (ii) in one direction, suppose that f is an effectively calculable partial function. Here is a semi-decision procedure for its graph G: Given 〈~x, y〉, we proceed to compute f (~x). If and when we obtain the result, we check to see whether it is y or not. If the result is indeed y, then we say Yes and halt. Of course this procedure fails to give an answer if f (~x) ↑, which is exactly as it should be, because 〈~x, y〉 is not in the graph. To prove the other direction of (ii), suppose that we have a semi-decision procedure for the graph G. We seek to compute, given ~x, the value f (~x), if this is defined. Our plan is to check 〈~x, 0 〉, 〈~x, 1 〉,... , for membership in G. But to budget our time sensibly, we use a procedure called “dovetailing.” Here is what we do: (1) Spend one minute testing whether 〈~x, 0 〉 ∈ G. (2) Spend two minutes testing whether 〈~x, 0 〉 ∈ G and two minutes testing whether 〈~x, 1 〉 ∈ G. (3) Similarly, spend three minutes on each of 〈~x, 0 〉, 〈~x, 1 〉, 〈~x, 2 〉. And so forth. If and when we discover that, in fact, 〈~x, k〉 ∈ G, then we return the value k and halt. Observe that whenever f (~x) ↓, then sooner or later the foregoing procedure will correctly determine f (~x) and halt. Of course, if f (~x) ↑, then the procedure runs forever. a

Church’s thesis

While the concept of effective calculability has here been described in somewhat vague terms, the following section will describe a precise (mathematical) concept of a “computable partial function.” In fact, it will describe several equivalent ways of formulating the concept in precise terms. And it will be argued that

the mathematical concept of a computable partial function is the correct for- malization of the informal concept of an effectively calculable partial function. This claim is known as Church’s thesis or the Church–Turing thesis. Church’s thesis, which relates an informal idea to a formal idea, is not itself a mathematical statement, capable of being given a proof. But one can look for evidence for or against Church’s thesis; it all turns out to be evidence in favor. One piece of evidence is the absence of counterexamples. That is, any func- tion examined thus far that mathematicians have felt was effectively calculable, has been found to be computable. Stronger evidence stems from the various attempts that different people made independently, trying to formalize the idea of effective calculability. Alonzo Church used λ-calculus; Alan Turing used an idealized computing agent (later called a Turing machine); Emil Post developed a similar approach. Remarkably, all these attempts turned out to be equivalent, in that they all defined exactly the same class of functions, namely the computable partial functions! The study of effective calculability originated in the 1930’s with work in mathematical logic. As noted previously, the subject is related to the concept of an acceptable proof. More recently, the study of effective calculability has formed an essential part of theoretical computer science. A prudent computer scientist would surely want to know that, apart from the difficulties the real world presents, there is a purely theoretical limit to calculability.

Exercises

  1. Assume that S is a set of natural numbers containing all but finitely many natural numbers. (That is, S is a cofinite subset of N.) Explain why S must be decidable.
  2. Assume that A and B are decidable sets of natural numbers. Explain why their intersection A ∩ B is also decidable. (Describe an effective procedure for determining whether or not a given number is in A ∩ B.)
  3. Assume that A and B are decidable sets of natural numbers. Explain why their union A ∪ B is also decidable.
  4. Assume that A and B are semi-decidable sets of natural numbers. Explain why their intersection A ∩ B is also semi-decidable.
  5. Assume that A and B are semi-decidable sets of natural numbers. Explain why their union A ∪ B is also semi-decidable.
  6. (a) Assume that R is a decidable binary relation on the natural numbers. That is, it is a decidable 2-ary relation. Explain why its domain, {x | 〈x, y〉 ∈ R for some y} is a semi-decidable set. (b) Now suppose that instead of assuming that R is decidable, we assume only that it is semi-decidable. Is it still true that its domain must be semi- decidable?
  7. (a) Assume that f is a one-place total calculable function. Explain why its graph is a decidable binary relation.
  1. Formalizations—an overview

In the preceding section, the concept of effective calculability was described only very informally. Now we want to make those ideas precise (i.e., make them part of mathematics). In fact, several approaches to doing this will be described: idealized computing devices, generative definitions (i.e., the least class containing certain initial functions and closed under certain constructions), programming languages, and definability in formal languages. It is a significant fact that these very different approaches all yield exactly equivalent concepts. This section gives a general overview of a number of different (but equivalent) ways of formalizing the concept of effective calculability. Later chapters will develop a few of these ways in full detail. Digression: The 1967 book by Rogers cited in the References demonstrates that the subject of computability can be developed without adopting any of these formalizations. And that book was preceded by a 1956 mimeographed preliminary version, which is where I first saw this subject. A few treasured copies of the mimeographed edition still exist.

Turing machines

In early 1935, Alan Turing was a 22-year-old graduate student at King’s College in Cambridge. Under the guidance of Max Newman, he was working on the problem of formalizing the concept of effective calculability. In 1936, he learned of the work of Alonzo Church, at Princeton. Church had also been working on this problem, and in his 1936 paper An unsolvable problem of elementary number theory he presented a definite conclusion: that the class of effectively calculable functions should be identified with the class of functions definable in the lambda calculus, a formal language for specifying the construction of functions. Church moreover showed that exactly the same class of functions could be characterized in terms of formal derivability from equations. Turing then promptly completed writing his paper, in which he presented a very different approach to characterizing the effectively calculable functions, but one that—as he proved—yielded once again the same class of functions as Church had proposed. With Newman’s encouragement, Turing went to Prince- ton for two years, where he wrote a Ph.D. dissertation under Alonzo Church. Turing’s paper remains a very readable introduction to his ideas. How might a diligent clerk carry out a calculation, following instructions? He (or she) might organize the work in a notebook. At any given moment his attention is focused on a particular page. Following his instructions, he might alter that page, and then he might turn to another page. And the notebook is large enough (or the supply of fresh paper is ample enough) that he never comes to the last page. The alphabet of symbols available to the clerk must be finite; if there were infinitely many symbols, then there would be two that were arbitrarily similar and so might be confused. We can then without loss of generality regard what can be written on one page of notebook as a single symbol. And we can envision the notebook pages as being placed side by side, forming a paper tape, consist-

ing of squares, each square being either blank or printed with a symbol. (For uniformity, we can think of a blank square as containing the “blank” symbol B.) At each stage of his work, the clerk—or the mechanical machine—can alter the square under examination, can turn attention to the next square or the previous one, and can look to the instructions to see what part of them to follow next. Turing described the latter part as a “change of state of mind.” Turing wrote, “We may now construct a machine to do the work.” Such a machine is of course now called a Turing machine, a phrase first used by Church in his review of Turing’s paper in The Journal of Symbolic Logic. The machine has a potentially infinite tape, marked into squares. Initially the given input numeral or word in written on the tape, but it is otherwise blank. The machine is capable of being in any one of finitely many “states” (the phrase “of mind” being inappropriate for a machine). At each step of calculation, depending on its state at the time, the machine can change the symbol in the square under examination at that time, and can turn its attention to the square to the left or to the right, and can then change its state to another state. (The tape stretches endlessly in both directions.) The program for this Turing machine can be given by a table. Where the possible states of the machine are q 1 ,... , qr , each line of the table is a quintu- ple 〈qi, Sj , Sk, D, qm〉 which is to be interpreted as directing that whenever the machine is in state qi and the square under examination contains the symbol Sj , then that symbol should be altered to Sk and the machine should shift its attention to the square on the left (if D = L) or on the right (if D = R), and should change its state to qm. Possibly Sj is the “blank” symbol B, meaning the square under examination is blank; possibly Sk is B, meaning that whatever is in the square is to be erased. For the program to be unambiguous, it should have no two different quintuples with the same first two components. (By re- laxing this requirement regarding absence of ambiguity, we obtain the concept of a non-deterministic Turing machine, which will be useful later, in the dis- cussion of feasible computability.) One of the states, say q 1 , is designated as the initial state—the state in which the machine begins its calculation. If we start the machine running in this state, and examining the first square of its input, it might (or might not), after some number of steps, reach a state and a symbol for which its table lacks a quintuple having that state and symbol for its first two components. At that point the machine halts, and we can look at the tape (starting with the square then under examination) to see what the output numeral or word is. Now suppose that Σ is a finite alphabet (the blank B does not count as a member of Σ). Let Σ∗^ be the set of all words over this alphabet (that is, Σ∗^ is the set of all strings, including the empty string, consisting of members of Σ). Suppose that f is a k-place partial function from Σ∗^ into Σ∗. We will say that f is Turing computable if there exists a Turing machine M that, when started in its initial state scanning the first symbol of a k-tuple w~ of words (written on the tape, with a blank square between words, and with the rest of the tape blank), behaves as follows:

computability is the correct formalization of the informal concept of effective calculability. Certainly the definition reflects the ideas of following predeter- mined instructions, without limitation of the amount of time that might be required. (The name “Church–Turing thesis” obscures the fact that Church and Turing followed very different paths in reaching equivalent conclusions.) Church’s thesis has by now achieved universal acceptance. Kurt G¨odel, writing in 1964 about the concept of a “formal system” in logic, involving the idea that the set of correct deductions must be a decidable set, said that “due to A. M. Turing’s work, a precise and unquestionably adequate definition of the general concept of formal system can now be given.” And others agree. The robustness of the concept of Turing computability is evidenced by the fact that it is insensitive to certain modifications to the definition of a Turing machine. For example, we can impose limitations on the size of the alphabet, or we can insist that the machine never move to the left of its initial starting point. None of this will affect that class of Turing computable partial functions. Turing developed these ideas prior to the introduction of modern digital com- puters. After World War II, Turing played an active rˆole in the development of early computers, and in the emerging field of artificial intelligence. (During the war, he worked on deciphering the German battlefield code Enigma, militar- ily important work which remained classified until after Turing’s death.) One can speculate as to whether Turing might have formulated his ideas somewhat differently, if his work had come after the introduction of digital computers.

Digression: There is an interesting example here, that goes by the name^1 of “the busy beaver problem.” Suppose we want a Turing machine, starting on a blank tape, to write as many 1’s as it can, and then stop. With a limited number of states, how many 1’s can we get? To make matters more precise, take Turing machines with the alphabet { 1 } (so the only symbols are B and 1). We will allow such machines to have n states, plus a halting state (that can occur as the last member of a quintuple, but not as the first member). For each n, there are only finitely many essentially different such Turing machines. Some of them, started on a blank tape, might not halt. For example the 1-state machine

〈q 1 , B, 1 , R, q 1 〉

keeps writing forever without halting. But among those that do halt, we seek the ones that write a lot of 1’s. Define σ(n) to be the largest number of 1’s that can be written by an n- state Turing machine as described above before it halts. For example, σ(1) = 1, because the 1-state machine

〈q 1 , B, 1 , R, qH〉

(the halting state qH doesn’t count) writes one 1, and none of the other 1-state machines do any better. (There are not so very many 1-state machines, and

(^1) This name has given translators much difficulty.

one can examine all of them in a reasonable length of time). Let’s agree that σ(0) = 0. Then σ is a total function. It is also nondecreasing, since having an extra state to work with is never a handicap. Despite the fact that σ(n) is merely the largest member of a certain finite set, there is no algorithm that lets us, in general, evaluate it.

Example: Here is a two-state candidate:

〈q 1 , B, 1 , R, q 2 〉 〈q 1 , 1 , 1 , L, q 2 〉 〈q 2 , B, 1 , L, q 1 〉 〈q 2 , 1 , 1 , R, qH〉

Started on a blank tape, this machine write four consecutive 1’s, and then halts (after six steps), scanning the third 1. You are invited to verify this by running the machine. We conclude the σ(2) ≥ 4.

Rado’s Theorem (1962): The function σ is not Turing computable. More- over, for any Turing computable total function f , we have f (x) < σ(x) for all sufficiently large x. That is, σ eventually dominates any Turing computable total function.

Proof outline: Assume we are given some Turing computable total f. We must show that σ eventually dominates it. Define (for reasons that may initially appear mysterious) the function g:

g(x) = max(f (2x), f (2x + 1)) + 1

Then g is total and one can show that it is Turing computable. So there is some Turing machine M with, say, k states that computes it, using the alphabet { 1 } and base-1 notation. For each x, let Nx be the (x + k)-state Turing machine that first writes x 1’s on the tape, and then imitates M. (The x states let us write x 1’s on the tape in a straightforward way, and then there are the k states in M.) Then Nx, when started on a blank tape, writes g(x) 1’s on the tape and halts. So g(x) ≤ σ(x + k), by the definition of σ. Thus we have

f (2x), f (2x + 1) < g(x) ≤ σ(x + k)

and if x ≥ k then σ(x + k) ≤ σ(2x) ≤ σ(2x + 1).

Putting these two lines together, we see that f < σ from 2k on. a

So σ grows faster—eventually—than any Turing computable total function. How fast does it grow? Among the smaller numbers, σ(2) = 4. (The preceding example shows that σ(2) ≥ 4. The other inequality is not entirely trivial, because there are thousands of 2-state machines.) It has also been shown that σ(3) = 6 and σ(4) = 13. From here on, only lower bounds are known. In

A k-place function h is said to be obtained by composition from the n-place function f and the k-place functions g 1 ,... , gn if the equation

h(~x) = f (g 1 (~x),... , gn(~x))

holds for all ~x. In the case of partial functions, it is to be understood here that h(~x) is undefined unless g 1 (~x),... , gn(~x) are all defined and 〈g 1 (~x),... , gn(~x)〉 belongs to the domain of f. A (k + 1)-place function h is said to be obtained by primitive recursion from the k-place function f and the (k + 2)-place function g (where k > 0) if the pair of equations h(~x, 0) = f (~x) h(~x, y + 1) = g(h(~x, y), ~x, y)

holds for all ~x and y. Again, in the case of partial functions, it is to be understood that h(~x, y + 1) is undefined unless h(~x, y) is defined and 〈h(~x, y), ~x, y〉 is in the domain of g. Observe that in this situation, knowing the two functions f and g completely determines the function h. More formally, if h 1 and h 2 are both obtained by primitive recursion from f and g, then for each ~x we can show by induction on y that h 1 (~x, y) = h 2 (~x, y). For the k = 0 case, the one-place function h is obtained by primitive recursion from the two-place function g by using the number m if the pair of equations

h(0) = m h(y + 1) = g(h(y), y)

holds for all y. Postponing the matter of search, we define a function to be primitive recur- sive if it can be built up from zero, successor, and projection functions by use of composition and primitive recursion. In other words, the class of primitive recursive functions is the smallest class that includes our initial functions and is closed under composition and primitive recursion. (Here saying that a class C is “closed” under composition and primitive recursion means that whenever a function f is obtained by composition from functions in C or is obtained by primitive recursion from functions in C, then f itself also belongs to C.) Clearly all the primitive recursive functions are total. This is because the initial functions are all total, the composition of total functions is total, and a function obtained by primitive recursion from total functions will be total. We say that a k-ary relation R on N is primitive recursive if its characteristic function is primitive recursive. One can then show that a great many of the common functions on N are primitive recursive: addition, multiplication,... , the function whose value at m is the (m + 1)st prime,.... Chapter 2 will carry out the project of showing that many functions are primitive recursive. On the one hand, it seems clear that every primitive recursive function should be regarded as being effectively calculable. (The initial functions are pretty

easy. Composition presents no big hurdles. Whenever h is obtained by primitive recursion from effectively calculable f and g, then we see how we could effectively find h(~x, 99), by first finding h(~x, 0) and then working our way up.) On the other hand, the class of primitive recursive functions cannot possibly comprehend all total calculable functions, because we can “diagonalize out” of the class. That is, by suitably indexing the “family tree” of the primitive recursive functions, we can make a list f 0 , f 1 , f 2 ,... of all the one-place primitive recursive functions. Then consider the diagonal function d(x) = fx(x) + 1. Then d cannot be primitive recursive; it differs from each fx at x. Nonetheless, if we made our list very tidily, the function d will be effectively calculable. The conclusion is the class of primitive recursive functions is an extensive but proper subset of the total calculable functions. Next, we say that a k-place function h is obtained from the k + 1-place function g by search and we write

h(~x) = μ y[g(~x, y) = 0]

if for each ~x, the value h(~x) either is the number y such that g(~x, y) = 0 and g(~x, s) is defined and is non-zero for every s < y, if such a number t exists, or else is undefined, if no such number t exists. The idea behind this “μ-operator” is the idea of searching for the least number y that is the solution to an equation, by testing successively y = 0, 1 ,.... We obtain the general recursive functions by adding search to our closure methods. That is, a partial function is general recursive if it can be built up from the initial zero, successor, and projection functions, by use of composition, primitive recursion, and search (i.e, the μ-operator). The class of general recursive partial functions on N is (as Turing proved) exactly the same as the class of Turing computable partial functions. This is a rather striking result, in light of the very different ways in which the two definitions were formulated. Turing machines would seem, at first glance, to have little to do with primitive recursion and search. And yet we get exactly the same partial functions from the two approaches. And Church’s thesis therefore has the equivalent formulation that the concept of a general recursive function is the correct formalization of the informal concept of effective calculability. What if we try to “diagonalize out” of the class of general recursive functions, as we did for the primitive recursive functions? As will be argued later, we can again make a tidy list ϕ 0 , ϕ 1 , ϕ 2 ,... of all the one-place general recursive partial functions. And we can define the diagonal function d(x) = ϕx(x) + 1. But in this equation, d(x) is undefined unless ϕx(x) is defined. The diagonal function d is indeed among the general recursive partial functions, and hence is ϕk for some k, but d(k) must be undefined. No contradiction results. The class of primitive recursive functions was defined by by G¨odel, in his 1931 paper on the incompleteness theorems in logic. Of course, the idea of defining functions on N by recursion is much older, and reflects the idea that the natural numbers are built up from the number 0 by repeated application of the successor function. (Dedekind wrote about this topic.) The theory of