







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Cryptography and Machine Learning. Ronald L. Rivest*. Laboratory for Computer Science. Massachusetts Institute of Technology. Cambridge, MA 02139. Abstract.
Typology: Slides
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Abstract
This paper gives a survey of the relationship between the fields of cryptography and machine learning, with an emphasis on how each field has contributed ideas and techniques to the other. Some suggested directions for future cross-fertilization are also proposed.
The field of computer science blossomed in the 1940's and 50's, following some theoretical developments of the 1930's. From the beginning, both cryptography and machine learning were intimately associated with this new technology. Cryptography played a major role in the course of World War II, and some of the first working computers were dedicated to cryptanalytic tasks. And the possibility that computers could "learn" to perform tasks, such as playing checkers, that are challenging to humans was actively explored in the 50% by Turing [46], Samuel [39], and others. In this note we examine the relationship between the fields of cryptography and machine learning, emphasizing the cross-fertilization of ideas, both realized and potential. The reader unfamiliar with either of these fields may wish to consult some of the ex- cellent surveys and texts available for background reading. In the area of cryptography, there is the classic historical study of Kahn [20], the survey papers of Diffie and Heltman [11] and Rivest [37], and Simmons [44], as well as the texts by Brassard [8], Denning [10], and Davies and Price [9], among others. The CRYPTO and EUROCP~YPT con- ference proceedings (published by Springer) are also extremely valuable sources. In the area of machine learning, there are standard collections of papers [29, 30, 23] for "AI" style machine learning, the seminal paper of Valiant [47] for the "computational learning theory" approach, the COLT conference proceedings (published by Morgan Kaufmann) for additional material of a theoretical nature, and the NIPS conference proceedings (also
*Supported by NSF grant CCR-8914428,ARO grant N00014-89-J-1988,and the Siemens Corporation. email address: rivest~theory, its. mit. edu
published by Morgan Kaufmann) for many interesting papers. The ACM STOCand the IEEE FOCS conference proceedings also contain many key theoretical papers from both areas. The Ph.D. thesis of Kearns [21] is one of the first major works to explore the rela- tionship between cryptography and machine learning, and is also an excellent introduction to many of the key concepts and results.
2 Initial Comparison
Machine learning and cryptanalysis can be viewed as %ister fields," since they share many of the same notions and concerns. In a typical cryptanaiytic situation, the eryptanalyst wishes to "break" some cryptosystem. Typically this means he wishes to find the secret key used by the users of the cryptosystem, where the general system is already known. The decryption function thus comes from a known family of such functions (indexed by the key), and the goal of the cryptanalyst is to exactly identify which such function is being used. He may typically have available a large quantity of matching ciphertext and plaintext to use in his analysis. This problem can also be described as the problem of "learning an unknown function" (that is, the decryption function) from examples of its input/output behavior and prior knowledge about the class of possible functions. Valiant [47] notes that good cryptography can therefore provide examples of classes of functions that are hard to learn. Specifically, he references the work of Goldreich, Gold- wasser, and Micaii [14], who demonstrate (under the assumption that one-way functions exist) how to construct a family of "pseudo-random" functions Fk : {0,1} k --* {0, 1} k for each k > 0 such that (i) each function f~ E Fk is described by a k-bit index i, (ii) there is a polynomial-time algorithm that, on input i and x, computes f~(z) (so that each function in Fk is computable by a polynomial-size boolean circuit), and (iii) no proba- bilistic polynomial-time algorithm can distinguish functions drawn at random from Fk from functions drawn at random from the set of all functions from {0,1} k to {0, 1} k, even if the algorithm can dynamically ask for and receive polynomiaily many evaluations of the unknown function at arguments of its choice. (It is interesting to note that Section 4 of Goldreich et al. [14] makes an explicit analogy with the problem of "learning physics" from experiments, and notes that their results imply that some such learning problems can be very hard.) We now turn to a brief comparison of terminology and concepts, drawing some natural correspondences, some of which have already been illustrated in above example. Secret K e y s and Target F u n c t i o n s The notion of "secret key" in cryptography corresponds to the notion of "target func- tion" in machine learning theory, and more generally the notion of "key space" in cryp- tography corresponds to the notion of the "class of possible target functions." For crypto- graphic (encryption) purposes, these functions must also be efficiently invertible, while no such requirement is assumed in a typical machine learning context. There is another as- pect of this correspondence in which the fields differ: while in cryptography it is common to assume that the size of the unknown key is known to the cryptanalyst (this usually falls under the general assumption that "the general system is known"), there is much interesting research in machine learning theory that assumes that the complexity (size) of
E x a c t versus A p p r o x i m a t e I n f e r e n c e In the practical cryptographic domain, an attacker typically aims for a "total break," in which he determines the unknown secret key. That is, he exactly identifies the unknown cryptographic function. Approximate identification of the unknown function is typically not a goal, because the set of possible cryptographic functions used normally does not admit good approximations. On the other hand, the theoretical development o! cryptog- raphy has focussed on definitions of security that exclude even approximate inference by the cryptanalyst. (See, for example, Goldwasser and Micali's definitions in their paper on probabilistic encryption [15].) Such theoretical definitions and corresponding results are thus applicable to derive results on the difficulty of (even approximately) learning, as we shall see. The machine learning literature deals with both exact inference and approximate in- ference. Because exact inference is often too difficult to perform efficiently, much of the more recent research in this area deals with approximate inference. (See, for example, the key paper on learnability and the Vapnik-Chervonenkis dimension by Blumer et al. [7].) Approximate learning is normally the goal when the input data consists of randomly chosen examples. On the other hand, when the learner may actively query or experiment with the unknown target function, exact identification is normally expected. C o m p u t a t i o n a l C o m p l e x i t y The computational complexity (sometimes called "work factor" in the cryptographic literature) of a cryptanalytic or learning task is of major interest in both fields. In cryptography, the major goal is to "prove" security under the broadest possible definition of security, while making the weakest possible complexity-theoretic assumptions. Assuming the existence of one-way functions has been a common such weakest possible assumption. Given such an assumption, in the typical paradigm it is shown that there is no polynomial-time algorithm that can "break" the security of the proposed system. (Proving, say, exponential-time lower bounds could presumably be done, at the expense of making stronger initial assumptions about the difficulty of inverting a one-way function.) In machine learning, polynomial-time learning algorithms are the goal, and there exist many clever and efficient learning algorithms for specific problems. Sometimes, as we shall see, polynomial-time algorithms can be proved not to exist, under suitable cryptographic assumptions. Sometimes , as noted above, a learning algorithm does not know in advance the size of the unknown target hypothesis, and to be fair, we allow it to run in time polynomial in this size as well. Often the critical problem to be solved is that of finding a hypothesis (from the known class of possibile hypotheses) that is consistent with the given set of examples; this is often true even if the learning algorithm is trying merely to approximate the unknown target function. For both cryptanalysis and machine learning, there has been some interest in minimiz- ing space complexity as well as time complexity. In the cryptanalytic domain, for example, Hellman [18] and Schroeppel and Shamir [42] have investigated space/time trade-offs for breaking certain cryptosystems. In the machine learning literature, Schapire has shown the surprising result [41, Theorem 6.1] that if there exists an efficient learning algorithm for a class of functions, then there is a learning algorithm whose space complexity grows only logarithmically in the size of the data sample needed (as e, the approximation pa- rameter, goes to 0).
Unicity D i s t a n c e and S a m p l e C o m p l e x i t y In his classic paper on cryptography [43], Shannon defines the "unicity distance" of a
of bits needed to describe a key, on the average), and where D is redundancy of the language (about 2.3 bits/letter in English). The unicity distance measures the amount of ciphertext that must be intercepted in order to make the solution unique; once that amount of ciphertext has been intercepted, one expects there to be a unique key that will decipher the ciphertext into a~cceptable English. The unicity distance is an "information- theoretic" measure of the amount of data that a cryptanalyst needs to succeed in exactly identifying the unknown secret key. Similar information-theoretic notions play a role in machine learning theory, although there are differences arising from the fact that in the standard PAC-learning model there may be infinitely many possible target hypotheses, but on the other hand only an ap- proximately correct answer is required. The Vapnik-Chervonenkis dimension [7] is a key concept in coping with this issue. O t h e r differences
more carefully in the learning scenario than the cryptanalytic scenario, probably because a little noise in the cryptanalytic situation can render analysis (and legitimate decryption) effectively hopeless. However, there are cryptographic systems [48~ 28, 45] that can make effective use of noise to improvesecurity, and other (analog) schemes [49, 50] that attempt to work well in spite of possible noise. Often the inference problems studied in machine learning theory are somewhat more general than those that occur naturally in cryptography. For example, work has been done on target concepts that "drift" over time [19, 24]; such variability is rare in cryptog- raphy (users may change their secret keys from time to time, but this is dramatic change, not gradual drift). In another direction, some work [38] has been done on learning "con- cept hierarchies"; such a framework is rare in cryptography (although when breaking a substitution cipher one may first learn what the vowels are, and then learn the individual substitutions for each vowel).
3 Cryptography's impact on Learning Theory
As noted earlier, Valiant [47] argued that the work of Goldreich, Goldwasser, and Micali [14] on random functions implies that even approximately learning the class of functions representable by polynomial-slze boolean circuits is infeasible, assuming that one-way functions exist, even if the learner is allowed to query the unknown function. So re- searchers in machine learning have focussed on the question of identifying which simpler classes of functions are learnable (approximately, from random examples, or exactly, with queries). For example, a major open question in the field is whether the class of boolean functions representable as boolean formulas in disjunctive normal form (DNF) is efficiently learnable from random examples. The primary impact of cryptography on machine learning theory is a natural (but negative) one: showing that certain learning problems are computationally intractable. Of
1/2 the classification of new examples, then the learning algorithm could be used to "break" one of the cryptographic problems assumed to be hard. The results of Kearns and Valiant were also based on the work of Pitt and Warmuth [34], who develop the notion of a "prediction-preserving reducibility." The definition implies that if class A is so reducible to class B, then if class B is efficiently predictable, then so is class A. Using this notion of prediction-preserving reducibility, they show a number of classes of functions to be "prediction-complete" for various complexity classes. In particular, the problem of prediction the class of alternating DFAs is shown to be prediction-complete for P, and ordinary DFAs are as hard to predict as any function computable in log-space, and boolean formula are prediction-complete for N O 1. These results, and the notion of prediction-preserving reducibility, were central to the work of Kearns and Valiant. The previous results assumed a learning scenario in which the learner was working from random examples of the input/output behavior of the target function. One can ask if cryptographic techniques can be employed to prove that certain classes of functions are unlearnable even if the learner may make use of queries. Angluin and Kharitonov [5] have done so, showing that (modulo the usual cryptographic assumptions regarding RSA, quadratic residues, or factoring Blum integers), that there is no polynomial-time
functions:
These results are based on the public-key cryptosystem of Naor and Yung [31], which is provably secure against a chosen-ciphertext attack. (Basically, the queries asked by the learner get translated into chosen-ciphertext requests against the Naor-Yung scheme.)
4 Learning Theory's impact on Cryptography
Since most of the negative results in learning theory already depend on cryptographic assumptions, there has been no impact of negative results on learning theory on the development of cryptographic schemes. Perhaps some of the results and concepts and Pitt and Warmuth [34] could be applied in this direction, but this has not been done.
On the other hand, the positive results in learning theory are normally independent of cryptographic assumptions, and could in principle be applied to the cryptanalysis of relatively simple cryptosystems. Much of this discussion will be speculative in nature, since there is little in the literature exploring these possibilities. We sketch some possi- ble approaches, but leave their closer examination and validation (either theoretical or empirical) as open problems.
Encryption Decryption
Figure h In cipher-feedback mode, each plaintext message bit m is encrypted by exclusive- oring it with the result of applying the function f to the last n bits of ciphertext, where n is the size of the shift register. The ciphertext bit c is transmitted over the channel; the corresponding decryption process is illustrated on the right.
Perhaps the most straightforward application of learning results would be for the cryptanalysis of nonlinear feedback shift-registers operating in cipher-feedback mode. See Figure 1. The feedback function f is known only to the sender and the receiver; it
If the cryptanalyst has a collection of matching plaintext/ciphertext bits, then he has a number of corresponding input/output pairs for the unknown function f. A learning algorithm that can infer f from such a collection of data could then be used as a crypt- analytic tool. Moreover, a chosen-ciphertext attack gives the cryptanalyst the ability to query the unknown function f at arbitrary points, so a learning algorithm that can infer f using queries could be an effective cryptanalytic tool. We note that a definition of learnability that permits "approximate" learning is a good fit for this problem: if the cryptanalyst can learn an approximation to f that agrees with f 99% of the time, then he will be able to decrypt 99% of the plaintext. Suppose first that we consider a known plaintext attack. Good cryptographic design principles require that f be about as likely to output 0 as it is to output 1. This would typically imply that the shift register contents can be reasonably viewed as a randomly
that it is this assumptiorl that makes are remarks here speculative in nature; detailed analysis or experimentation is required to verify that this assumption is indeed reasonable in each proposed direction.) There are a number of learning-theory results that assume that examples are drawn from {0,1}" according to the uniform distribution. While this assumption seems rather unrealistic and restrictive in most learning applications, it is a perfect match for such a cryptographic scenario. What cryptographic lessons can be
References