Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture 25: Statistical Learning - MLPs, Bayesian Learning, and SVMs, Study notes of Computer Science

A set of lecture notes from a university course on statistical learning, specifically covering topics on multi-layer perceptrons, bayesian learning, and support vector machines. The notes include explanations of concepts, formulas, and examples. Students are encouraged to read chapter 20 of their textbook on artificial neurons and gradient descent for perceptron learning, as well as learn about naive bayes and the restaurant problem.

Typology: Study notes

Pre 2010

Uploaded on 03/13/2009

koofers-user-7z9
koofers-user-7z9 🇺🇸

10 documents

1 / 30

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture 25: Statistical Learning - MLPs, Bayesian Learning, and SVMs and more Study notes Computer Science in PDF only on Docsity! Statistical Learning Introduction to Artificial Intelligence CS440/ECE448 Lecture 25 1-unit projects for grad students: Get in touch with me ! New homework out tonight Last lecture • Multi-layer perceptrons • Backpropogation This lecture • Bayesian learning • MAP learning • ML learning • Support vector machines Reading • Chapter 20 Multi-layer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a; We Hidden units q; Wi; Input units ay Expressiveness of multi-layer perceptrons All continuous functions w/ 2 layers, all functions w/ 3 layers Jay (Xy. X5) 1 0.8 0.6 0.4 0.2 Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Backpropagation Output layer: same as for single-layer perceptron, where Hidden layer: back-propagate the error from the output layer: Update rule for weights in hidden layer: (Most neuroscientists deny that back-propagation occurs in the brain) Full Bayesian Learning View learning as Bayesian updating of a probability distribution over the is the hypothesis variable, values , prior th observation d; gives the outcome of random variable training data Bayesian learning: Example cdt. P( d | hi ) = Πj P(dj | hi ) Prior distribution P(H): <0.1,0.2,0.4,0.2,0.1> P( lime | h3 )=0.5 Assumption: Data are iid (independently and identically distributed) Posterior Probability of Hypotheses P(hi|d) = α P(d|hi)P(hi ) = α P(hi) Πj P(dj | hi ) P(next candy is lime | d) = © © o 0 © N oO oD o wo o ms Prediction Probability oO 2 4 6 8 Number of samples in d 10 MAP Approximation Summing over the hypothesis space is often intractable (e.g., 18,446,744,073, 709,551,616 Boolean functions of 6 attributes) Maximum a posteriori (MAP) learning: choose maximizing l.e., maximize or Log terms can be viewed as (negative of) bits to encode data given hypothesis + bits to encode hypothesis This is the basic idea of minimum description length (MDL) learning For deterministic hypotheses, is 1 if consistent, 0 otherwise = MAP = simplest consistent hypothesis (cf. science) Posterior Probability of Hypotheses Assumes P(X | d) ≈ P(X | hMAP) Multiple Parameters Red /green wrapper depends probabilistically on flavor: Likelihood for, e.g., cherry candy in green wrapper: N candies, r,. red-wrapped cherry candies, etc.: P(P=cherry)) QO PUl=red |F) Multiple Parameters ctd Derivatives of L contain only the relevant parameter: With complete data, parameters can be learned separately Naives Bayes Model C X1 Xn… Variables: • one class C • n attributes Xi Assume the Xi’s are conditionally independent given C. Parameters: • θ = P(C=true) • θi1 = P (Xi=true|C=true) • θi2 = P (Xi=true|C=false) Learning: independent ML estimation of the parameters Classification: P(C|x1, … , xn) = α P(C) Πi P(xi|C) Linearly Separable Classes support hyperplanes a, A A a A A A A A A A A we A > support vectors A A A A A A Support Vector Machines What is the maximum-margin separating plane? (Boser, Guyon & Vapnik, 1992; Vapnik 1995) w.x+b = 0 w.x+b = -1 w.x+b = 1 positive examplesnegative examples Support vector machines ctd • Examples are of the form ( xi , yi ), where yi = ¨ 1. • They all verify yi ( w.xi + b ) ≥ 1. • The distance between the separating plane is 2 / | w |. • Thus finding the maximum margin plane amounts to – Minimizing: ½ | w |^2 – subject to: yi ( w.xi + b ) ≥ 1 for i = 1, … ,n. • A quadratic programming problem!