Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Project; Professor: Brooks; Class: Artificial Intelligence Prog; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;
Typology: Study Guides, Projects, Research
1 / 6
Department of Computer Science — University of San Francisco – p. 1/
??
21-2: Introduction^ •^ We’ve talked about learning previously in the context of specificalgorithms.^ •^ Purpose: discuss learning more generally.^ •^ Give a flavor of other approaches to learning^ •^ Talk more carefully about how to evaluate the performance of alearning algorithm.^ ◦^ This will come in handy for project 3.
Department of Computer Science — University of San Francisco – p. 2/
??
21-3: Defining Learning^ •^ So far, we’ve defined a learning agent as one that can improveits performance over time.^ •^ We’ve seen two learning algorithms:^ ◦^ Decision tree^ ◦^ Bayesian Learning^ •^ Let’s define the problem a bit more precisely.
Department of Computer Science — University of San Fra
21-4: Defining Learning^ •^ A program is said to learn from experiences E with respect to aset of tasks T and a performance measure P if its performanceon T, as measured by P, improves with experience E.^ •^ This means that, for a well-formulated learning problem, weneed:^ ◦^ A set of tasks the agent must perform^ ◦^ A way to measure its performance^ ◦^ A way to quantify the experience the agent receives
Department of Computer Science — University of San Francisco – p. 4/
??
21-5: Examples^ •^ Speech recognition^ ◦^ Task: successfully recognize spoken words^ ◦^ Performance measure: fraction of words correctlyrecognized^ ◦^ Experience: A database of labeled, spoken words^ •^ Learning to drive a car^ ◦^ Task: Drive on a public road using vision sensors^ ◦^ Performance: average distance driven without error^ ◦^ Experience: sequence of images and reactions from ahuman driver.^ •^ Learning to play backgammon^ ◦^ Task: play backgammon^ ◦^ Performance measure: number of games won againsthumans of the appropriate caliber.^ ◦^ Experience: Playing games against itself.
Department of Computer Science — University of San Francisco – p. 5/
??
21-6: Discussion^ •^ Notice that not all performance measures are the same.^ ◦^ In some cases, we want to minimize all errors. In othercases, some sorts of errors can be more easily toleratedthan others.^ •^ Also, not all experience is the same.^ ◦^ Are examples labeled?^ ◦^ Does a learning agent immediately receive a reward afterselecting an action?^ ◦^ How is experiental data represented? Symbolic?Continuous?^ •^ Also: What is the final product?^ ◦^ Do we simply need an agent that performs correctly?^ ◦^ Or is it important that we understand
why^ the agent performs correctly?
Department of Computer Science — University of San Fra
21-7: Other types of learning^ •^ In this class, we’ll focus on inductive supervised learning^ ◦^ Well-understood, mature, many applications.^ •^ There are other types of learning^ ◦^ Deductive learning^ ◦^ Unsupervised learning^ ◦^ Reinforcement learning
Department of Computer Science — University of San Francisco – p. 7/
??
21-8: Deductive Learning^ •^ Recall that induction develops a general hypothesis fromspecific data.^ •^ Deductive learning develops rules about specific situations fromgeneral principles.^ ◦^ “Knowledge-based” learning might be a better name - someinduction may take place.^ •^ For example, a deductive learning agent might cache thesolution to a previous search problem so that it doesn’t need tore-solve the problem.^ •^ It might even try to generalize some of the specifics of thesolution to apply to other instances.^ ◦^ Soar uses this style of learning.^ ◦^ Case-based reasoning is another example of this style oflearning.
Department of Computer Science — University of San Francisco – p. 8/
??
21-9: Unsupervised Learning^ •^ In^ unsupervised learning
, there is no teacher who has presented the learner with labeled examples. • Instead, all the learner has is data. • Problem: find a hypotheis (or pattern) that explains the data.
Department of Computer Science — University of San Fra
21-10: Clustering^ •^ One example of unsupervised learning is
clustering
-^ Given a collection of data, group the data into
k^ clusters, such that similar items are in the same cluster. • Challenge: don’t know the class definitions in advance.
Department of Computer Science — University of San Francisco – p. 10/
??
21-11: Agglomerative Clustering of Text^ •^ One place where this is often applied is in documentprocessing.^ •^ Given a collection of documents, organize them into clustersbased on topic.^ •^ No preset list of potential categories, or labeled documents.^ •^ Algorithm:^ •^ D^ =^ {d, d, ..., d}^12 n^ •^ While^ |D|^ > k^ :^ ◦^ Find the documents
dand^ dthat are closest according soi^ j^ some similarity measure. ◦ Remove them from D′ (^) ◦ Construct a new dthat is the “union” of^ dandi^
dand add itj^ to^ D • Result: a set of categories emerges from a collection ofdocuments.^
Department of Computer Science — University of San Francisco – p. 11/
??
21-12: Reinforcement Learning^ •^ In some cases, an agent must learn through interaction with theenvironment.^ •^ Agent selects and executes an action and receives a reward asa result.^ •^ Learning problem: What is the best action to take in a givenstate?^ •^ Issue - since learning is integrated with execution, we can’t justexplore every possibility.^ •^ Approach (in a nutshell) - try different actions to see how theydo.^ •^ The more confidence we have in our estimate of action values,the more likely we are to take the best-looking action.
Department of Computer Science — University of San Fran
21-13: Q-learning^ •^ We want to learn a^ policy^ ◦^ This is a function that maps states to actions.^ •^ What we get from the environment are state: reward pairs.^ •^ We’ll use this to learn a
Q(s, a)^ function that estimates thereward for taking action a^ in state^ s.
-^ This is a form of^ model-free
learning ◦^ We do no reasoning about how the world works - we justmap states to rewards. ◦^ This means we can apply the same algorithm to a widevariety of environments.
Department of Computer Science — University of San Francisco – p. 13/
??
21-14: Q-learning^ •^ We keep a table that maps state-action pairs to Q values.^ •^ Every time we are in state
s^ and take an action^ a, we receive a reward^ r, and winds up in state
′ s.
-^ We update our table as follows:^ ◦^ Q(s, a)+ =^ α(r^ +^ γmax
′′′ Q(s, a)^ −^ Q(s, a))a
-^ In other words, we add in the reward for taking an action in thisstate, plus acting optimally from that point. •^ α^ is the learning rate.
Department of Computer Science — University of San Francisco – p. 14/
??
21-15: Q-learning^ •^ Q-learning has a distinct difference from other learningalgorithms we’ve seen:^ •^ The agent can select actions and observe the rewards they get.^ •^ This is called^ active learning^ •^ Issue: the agent would also like to maximize performance^ ◦^ This means trying the action that currently looks best^ ◦^ But if the agent never tries “bad-looking” actions, it can’trecover from mistakes.^ •^ Intuition: Early on,^ Q
is not very accurate, so we’ll trynon-optimal actions. Later on, as^ Q^ becomes better, we’ll selectoptimal actions.^ Department of Computer Science — University of San Fran
21-16: Boltzmann exploration^ •^ One way to do this is using Boltzmann exploration.^ •^ We take an action with probability:Q(s,a)k^ •^ P^ (a|s) =^ PQ(s,a)j^ k^ j^ •^ Where^ k^ is a temperature parameter.^ •^ This is the same formula we used in simulated annealing.^ •^ We’ll return to Q-learning after we discuss MDPs^ ◦^ They’re closely related.
Department of Computer Science — University of San Francisco – p. 16/
??
21-17: Return to supervised learning^ •^ Let’s step back and think about supervised learning.^ •^ Given some labeled data, find a hypothesis that best explainsthis data.^ •^ This can be done using symbolic or numeric data.
Department of Computer Science — University of San Francisco – p. 17/
??
21-18: A symbolic example^ •^ Consider (once again) our playTennis example.^ •^ Suppose we have the following experience:^ ◦^ Sunny, overcast, high, weak : yes^ ◦^ Sunny, overcast, low, strong: yes^ ◦^ Rainy, overcast, normal, weak: no^ •^ We need to select a hypothesis that explains all of theseexamples^ ◦^ H1: sunny : yes^ ◦^ H2: Sunny and Overcast: yes^ ◦^ H3:^ ¬^ Rainy, overcast, normal, weak : yes^ •^ Which do we pick?
Department of Computer Science — University of San Fran
21-19: Representing a hypothesis^ •^ Before we can answer, we need to decide how our hypothesiswill be represented.^ ◦^ All possible prob. logic expressions?^ ◦^ Only conjunctions?^ ◦^ Negation?^ •^ Simpler hypotheses can be learned more quickly^ •^ May not fit data as well.
Department of Computer Science — University of San Francisco – p. 19/
??
21-20: Find-S^ •^ Suppose we agree that hypotheses consist of a single attributevalue or “don’t care” for each attribute.^ ◦^ Sunny, sunny and overcast are possible^ ◦^ sunny or rainy is not.^ •^ This is called a^ representational bias
-^ Stronger representational biases let us learn more quickly. •^ Find the most specific hypothesis that explains our data.
Department of Computer Science — University of San Francisco – p. 20/
??
21-21: Hypothesis spaces^ •^ We can arrange all potential hypotheses fromspecific-to-general in a lattice.^ •^ Our learning problem is now to search this space of hypothesesto find the best hypothesis that is consistent with out data.^ •^ the way in which those hypotheses are considered is called the^ learning bias^ •^ Every algorithm has a representational bias and a learning bias.^ ◦^ Understanding them can help you know how your learningalgorithm will generalize.
Department of Computer Science — University of San Fran
21-22: A numeric example^ •^ Suppose we have the following data points:^ ◦^ (160, 126), (180,103), (200,82), (220,75), (240,82),(260,40), (280,20)^ •^ We would like to use this data to construct a function thatallowed us to predict an
f^ (x)^ for other^ x.
-^ There are infinitely many functions that fit this data - how do wechoose one? •^ Representational bias: Restrict ourselves to straight lines •^ Inductive bias: choose the line that minimizes sum of squarederrors. •^ This is linear regression - most statistics packages cancompute it.
Department of Computer Science — University of San Francisco – p. 22/
??
21-23: Nonlinear Regression^ •^ Linear regression is nice because it’s easy to compute.^ •^ Problem: many functions we might want to approximate are notlinear.^ •^ Nonlinear regression
is a much more complicated problem ◦ How do we choose a representational bias? Polynomial?Trigonometric? • Neural networks are actually nonlinear function approximators. ◦ We’ll return to them the last week of class and see how theyautomatically induce a nonlinear function.^ Department of Computer Science — University of San Francisco – p. 23/
??
21-24: Approximation vs. Classification^ •^ Regression is an example of
function approximation. ◦^ Find a function that approximates given data and performswell on unseen data. • A particular kind of function to approximate is a
classification function^ ◦^ Maps from inputs into one or more classes.^ ◦^ Task: find a hypothesis that best splits the data into classes. • This is the task that decision trees and Bayesian learners solve.
Department of Computer Science — University of San Fran
21-25: Measuring Performance^ •^ How do we evaluate the performance of a classifying learningalgorithm?^ •^ Two traditional measures are precision and accuracy.^ •^ Precision is the fraction of examples classified as belonging toclass^ x^ that are really of that class.^ ◦^ How well does our hypothesis avoid
false positives?
-^ Recall (or accuracy) is the fraction of true members of class
x that are actually captured by our hypothesis.^ ◦^ How well does our hypothesis capture
false negatives? Department of Computer Science — University of San Francisco – p. 25/
??
21-26: Precision vs recall^ •^ Often, there is a tradeoff of precision vs recall.^ ◦^ In our playTennis example, what if we say we always playtennis?^ ◦^ this will have a high accuracy, but a low precision.^ ◦^ What if we say we’ll never play tennis?^ ◦^ High precision, low accuracy.^ •^ Try to make a compromise that best suits your application.^ •^ What is a case where a false positive would be worse than afalse negative?^ •^ What is a case where a false negative would be better than afalse positive?
Department of Computer Science — University of San Francisco – p. 26/
??
21-27: Evaluating a supervised learning algorithm^ •^ Typically, in evaluating the performance of a learning algorithm,we’ll be interested in the following sorts of questions:^ ◦^ Does performance improve as the number of trainingexamples increases?^ ◦^ How do precision and recall trade off as the number oftraining examples changes?^ ◦^ How does performance change as the problem getseasier/harder?^ •^ So what does ’performance’ mean?
Department of Computer Science — University of San Fran
21-28: Evaluating a supervised learning algorithm^ •^ Recall that supervised algorithms start with a set of labeleddata.^ •^ Divide this data into two subsets:^ ◦^ Training set: used to train the classifier.^ ◦^ Test set: used to evaluate the classifier’s performance.^ ◦^ These sets are disjoint.^ •^ Procedure:^ ◦^ Train the algorithm with the classifier.^ ◦^ Run each element of the test set through the classifier.Count the number of incorrectly classified examples.^ •^ If the classification is binary, you can also measureprecision and recall.
Department of Computer Science — University of San Francisco – p. 28/
??
21-29: Evaluating a supervised learning algorithm^ •^ How do we know we got a representative training and test set?^ •^ Try it multiple times.^ •^ N-fold cross-validation:^ ◦^ Do this N times:^ •^ Select 1/N documents at random as the test set.^ •^ Remainder is the training set.^ •^ Test as usual.^ •^ Average results.
Department of Computer Science — University of San Francisco – p. 29/
??
21-30: Ensemble learning^ •^ Often, classifiers reach a point where improved performance onthe training set leads to reduced performance on the test set.^ ◦^ This is called^ overfitting^ •^ Representational bias can also lead to upper limits inperformance.^ •^ One way to deal with this is through
ensemble learning. ◦^ Intuition: Independently train several classifiers on the samedata (different training subsets) and let them vote. ◦^ This is basically what the Bayes optimal classifier does.
Department of Computer Science — University of San Fran
21-31: Boosting^ •^ Boosting is a widely-used method for ensemble learning.^ •^ Pick your favorite classifier.^ •^ Idea:^ ◦^ For i = 1 to M :^ •^ Train the ith classifier on the training set.^ •^ For each misclassified example, increase its “weight”^ •^ for each correctly classified example, decrease its“weight”.^ •^ To classify :^ ◦^ Present each test example to each classifier.^ ◦^ Each classifier gets a vote, weighted by its precision.^ •^ Very straightforward - can produce substantial performanceimprovement.^ ◦^ Combining stupid classifiers can be more effective thanbuilding one smart classifier.
Department of Computer Science — University of San Francisco – p. 31/
??
21-32: Summary^ •^ Learning is the process of improving performance on a set oftasks through experience.^ ◦^ This can take many different forms.^ •^ Supervised learning is a (particularly interesting) subset oflearning.^ •^ In evaluating learning, we will be interested in precision, recall,and performance as the training set size changes.^ •^ We can also combine poor-performing classifiers to get betterresults.
Department of Computer Science — University of San Francisco – p. 32/
??