Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Learning and Generalization in Machine Learning, Summaries of Data Warehousing

This chapter discusses the core question of what learning really is and how it relates to generalization in machine learning. It uses examples of two individuals with different learning disorders to explain the importance of finding the right balance between over-generalization and over-fitting. The chapter also explains how machine learning is not just about remembering and regurgitating observed information, but rather about transferring properties from observed data onto new, unobserved data. It concludes by discussing the fundamental tradeoff between model complexity and data transmission in machine learning.

Typology: Summaries

2020/2021

Available from 01/19/2022

SanketSalvi
SanketSalvi 🇮🇳

3 documents

1 / 3

Toggle sidebar

Related documents


Partial preview of the text

Download Learning and Generalization in Machine Learning and more Summaries Data Warehousing in PDF only on Docsity! Learning This chapter is without question the most important one of the book. It concerns the core, almost philosophical question of what learning really is (and what it is not). If you want to remember one thing from this book you will find it here in this chapter. Ok, let’s start with an example. Alice has a rather strange ailment. She is not able to recognize objects by their visual appearance. At her home she is doing just fine: her mother explained Alice for every object in her house what is is and how you use it. When she is home, she recognizes these objects (if they have not been moved too much), but when she enters a new environment she is lost. For example, if she enters a new meeting room she needs a long time to infer what the chairs and the table are in the room. She has been diagnosed with a severe case of ”overfitting”. What is the matter with Alice? Nothing is wrong with her memory because she remembers the objects once she has seem them. In fact, she has a fantastic memory. She remembers every detail of the objects she has seen. And every time she sees a new objects she reasons that the object in front of her is surely not a chair because it doesn’t have all the features she has seen in earlier chairs. The problem is that Alice cannot generalize the information she has observed from one instance of a visual object category to other, yet unobserved members of the same category. The fact that Alice’s disease is so rare is understandable there must have been a strong selection pressure against this disease. Imagine our ancestors walking through the savanna one million years ago. A lion appears on the scene. Ancestral Alice has seen lions before, but not this particular one and it does not induce a fear response. Of course, she has no time to infer the possibility that this animal may be dangerous logically. Alice’s contemporaries noticed that the animal was yellow-brown, had manes etc. and immediately un- 11 12 CHAPTER 3. LEARNING derstood that this was a lion. They understood that all lions have these particular characteristics in common, but may differ in some other ones (like the presence of a scar someplace). Bob has another disease which is called over-generalization. Once he has seen an object he believes almost everything is some, perhaps twisted instance of the same object class (In fact, I seem to suffer from this so now and then when I think all of machine learning can be explained by this one new exciting principle). If ancestral Bob walks the savanna and he has just encountered an instance of a lion and fled into a tree with his buddies, the next time he sees a squirrel he believes it is a small instance of a dangerous lion and flees into the trees again. Over-generalization seems to be rather common among small children. One of the main conclusions from this discussion is that we should neither over-generalize nor over-fit. We need to be on the edge of being just right. But just right about what? It doesn’t seem there is one correct God-given definition of the category chairs. We seem to all agree, but one can surely find examples that would be difficult to classify. When do we generalize exactly right? The magic word is PREDICTION. From an evolutionary standpoint, all we have to do is make correct predictions about aspects of life that help us survive. Nobody really cares about the definition of lion, but we do care about the our responses to the various animals (run away for lion, chase for deer). And there are a lot of things that can be predicted in the world. This food kills me but that food is good for me. Drumming my fists on my hairy chest in front of a female generates opportunities for sex, sticking my hand into that yellow-orange flickering“flame” hurts my hand and so on. The world is wonderfully predictable and we are very good at predicting it. So why do we care about object categories in the first place? Well, apparently they help us organize the world and make accurate predictions. The category lions is an abstraction and abstractions help us to generalize. In a certain sense, learning is all about finding useful abstractions or concepts that describe the world. Take the concept “fluid”, it describes all watery substances and summarizes some of their physical properties. Ot he concept of “weight”: an abstraction that describes a certain property of objects. Here is one very important corollary for you: “machine learning is not in the business of remembering and regurgitating observed information, it is in the business of transferring (generalizing) properties from observed data onto new, yet unobserved data”. This is the mantra of machine learning that you should repeat to yourself every night before you go to bed (at least until the final exam). The information we receive from the world has two components to it: there 13 is the part of the information which does not carry over to the future, the unpredictable information. We call this “noise”. And then there is the information that is predictable, the learnable part of the information stream. The task of any learning algorithm is to separate the predictable part from the unpredictable part. Now imagine Bob wants to send an image to Alice. He has to pay 1 dollar cent for every bit that he sends. If the image were completely white it would be really stupid of Bob to send themessage: pixel 1: white, pixel 2: white, pixel 3: white,..... He could just have send the message all pixels are white!. The blank image is completely predictable but carries very little information. Now imagine a image that consist of white noise (your television screen if the cable is not connected). To send the exact image Bob will have to send pixel 1: white, pixel 2: black, pixel 3: black,.... Bob can not do better because there is no predictable information in that image, i.e. there is no structure to be modeled. You can imagine playing a game and revealing one pixel at a time to someone and pay him 1$ for every next pixel he predicts correctly. For the white image you can do perfect, for the noisy picture you would be random guessing. Real pictures are in between: some pixels are very hard to predict, while others are easier. To compress the image, Bob can extract rules such as: always predict the same color as the majority of the pixels next to you, except when there is an edge. These rules constitute the model for the regularities of the image. Instead of sending the entire image pixel by pixel, Bob will now first send his rules and ask Alice to apply the rules. Every time the rule fails Bob also send a correction: pixel 103: white, pixel 245: black. A few rules and two corrections is obviously cheaper than 256 pixel values and no rules. There is one fundamental tradeoff hidden in this game. Since Bob is sending only a single image it does not pay to send an incredibly complicated model that would require more bits to explain than simply sending all pixel values. If he would be sending 1 billion images it would pay off to first send the complicated model because he would be saving a fraction of all bits for every image. On the other hand, if Bob wants to send 2 pixels, there really is no need in sending a model whatsoever. Therefore: the size of Bob’s model depends on the amount of data he wants to transmit. Ironically, the boundary between what is model and what is noise depends on how much data we are dealing with! If we use a