



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
assignment 2 of csce machine learning
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




When it comes to what can be calculated, there is right now lots of computation theories. However, at the time of 1984, there was no relevant theory of what computers could learn. (Actually we’re now still arguing about learning theories today). That's why Valiant's paper A Theory of the Learnable stood out. Just keep thinking about this interesting question -- "what is learnable by a computer, and what is not"? “The problem is to discover good models that are interesting to study for their own sake and that promise to be relevant both to explaining human experience and to building devices that can learn. The models should also shed light on the limits of what can be learned, just as computability does on what can be computed.” Let’s imagine that deep learning, neural networks (and so on) do NOT exist now. A deeper level of understanding comes to our mind then, regardless of which particular learning algorithm we choose, we see something “fundamental” that can help to explain the probability and limitation of learnability. Therefore, we might need an abstract definition of learning, so Valiant presented this: “… we shall say that a program for performing a task has been acquired by learning if it has been acquired by any means other than explicit programming.”
As for this paper though, Valiant limited the problem range to learning to recognize whether or not a given set of data is exemplar of a particular concept – for example, is this animal (Fig.1) a duck or not? That is, problem of binary classifiers. There are two parts of our learning method: “choosing an appropriate information gathering mechanism, the learning protocol”, and “exploring the class of concepts that can be learned using it in a reasonable (polynomial) number of steps”. Of course the learner has access to a supply of data. Based on the underlying distribution of examples in the problem domain, every time invocation (of the EXAMPLES routine) produces a positive example at random. We can make our own data and give it to the ORACLE routine, then we know if the data exemplify the concept positively. In fact, this means that we human beings can play the role of ORACLE! Under the conditions above, Valiant managed to show that “it is possible to design learning machines that have all three of the following properties”:
Learnability has a precise definition now in this paper. “A class X of programs is learnable with respect to a given learning protocol if and only if there exists an algorithm A (the deduction procedure) invoking the protocol with the following properties:”
What classes of tasks can be considered learnable in polynomial time, if given the learning protocol above? The answer is at least 3 classes of tasks – k-CNF expressions, Monotone DNF expressions and μ- expressions, explanations of which are shown below. Conjunctive normal-form expressions: “A k-CNF expression is a CNF expression where each clause is the sum of at most k literals.” Disjunctive normal-form expressions: “A monotone DNF expression is one in which no variable is negated in it.” μ-expressions: “A μ-expression is an expression in which each variable appears at most once. We can assume monotone μ-expressions (no negatives) since we can always relabel negated variables with new names that denote their negation.” The paper shows results for CNF expressions, DNF expressions, and arbitrary expressions in which each variable will just occur once. Assume h is any real number greater than one, and S is any positive integer, we have the probabilistic analysis depending on the lemma for the function L(h, S).
“Let L(h, S) be the smallest integer such that in L(h, S) independent Bernoulli trials each with probability at least h-^1 of success, the probability of having fewer than S successes is less than h-^1 … The following simple upper bound holds for the whole range of values of S and h and shows that L(h, S) is essentially linear both in h and in S.” L(h, S) ≤ 2h(S + ln h)
Why does this paper matter? It provides a general framework to answer the question – what is learnable within algorithmic complexity; it comes up with the idea of Probably Approximately Learnable problems, which means problems learnable in polynomial time (with acceptable correctness); it manages to prove that at least 3 classes of programs are PAL. According to Valiant himself, “the main contribution of this paper is that it shows that it is possible to design learning machines that have all three … properties”. Actually, PAC learning “applies in a natural way to only a rather narrow class of learning problems”: learning kinds from labelled data, under condition that there is exact definition. Thus, other forms of learning can be omitted or ignored if we regard PAC learning as a universal framework for learning.