Download Machine Learning: Classification Models & Rosenblatt's Perceptron Algorithm - Prof. Gregor and more Study notes Computer Science in PDF only on Docsity! 1 Greg Grudic Machine Learning 1 Introduction to Classification Greg Grudic Greg Grudic Machine Learning 2 Today’s Lecture Goals • Introduction to classification • Generative Models – Fisher (Linear Discriminative Analysis) – Gaussian Mixture Models • Discriminative Models – Rosenblatt’s Preceptron Learning Algorithm • Nonlinear Extensions Greg Grudic Machine Learning 3 Last Week: Learning Regression Models • Collect Training data • Build Model: stock value = F(feature space) • Make a prediction Feature (input) Space Stock Value * * ** ** ** * ** * * *** * ** ** ** * ** * * * * ** ** ** * *** * Greg Grudic Machine Learning 4 This Class: Learning Classification Models • Collect Training data • Build Model: happy = F(feature space) • Make a prediction High Dimensional Feature (input) Space 2 Greg Grudic Machine Learning 5 Binary Classification • A binary classifier is a mapping from a set of d inputs to a single output which can take on one of TWO values • In the most general setting • Specifying the output classes as -1 and +1 is arbitrary! – Often done as a mathematical convenience { } inputs: output: 1, 1 d y ∈ ∈ − + x Greg Grudic Machine Learning 6 A Binary Classifier Classification Modelx { }ˆ 1, 1y∈ − + Given learning data: ( ) ( )1 1, ,..., ,N Ny yx x A model is constructed: ( )M x Greg Grudic Machine Learning 7 The Learning Data • Learning algorithms don’t care where the data comes from! • Here is a toy example from robotics… – Inputs from two sonar sensors: – Classification output: • Robot in Greg’s office: y = +1 • Robot NOT in Greg’s office: y = -1 1 2 sensor 1: sensor 2: x x ∈ ∈ Greg Grudic Machine Learning 8 Classification Learning Data… … Example 4 Example 3 Example 2 Example 1 ……… -10.760370.018504 10.432910.8913 -10.42350.23114 10.582790.95013 1x 2x y 5 Greg Grudic Machine Learning 17 Rosenblatt’s Minimization Function • This is classic Machine Learning! • First define a cost function in model parameter space • Then find an algorithm that modifies such that this cost function is minimized • One such algorithm is Gradient Descent ( )0 1 0 1 ˆ ˆ ˆ ˆ ˆ, ,..., d d i k ik i M k D y xβ β β β β ∈ = =− + ∑ ∑ ( )0 1ˆ ˆ ˆ, ,..., dβ β β Greg Grudic Machine Learning 18 Gradient Descent -1 0 1 2 -2 -1 0 1 2 3 0 5 10 15 20 25 w0 w1 E [w ] Greg Grudic Machine Learning 19 The Gradient Descent Algorithm ( )0 1ˆ ˆ ˆ, ,...,ˆ ˆ ˆ d i i i D β β β β β ρ β ∂ ← − ∂ Where the learning rate is defined by: 0ρ> Greg Grudic Machine Learning 20 The Gradient Descent Algorithm for the Perceptron 0 0 11 1 ˆ ˆ ˆ ˆ ˆ ˆ i i i i idd d y y x y x β β β β ρ β β ← − ( )0 1 0 ˆ ˆ ˆ, ,..., ˆ d i i M D y β β β β ∈ ∂ =− ∂ ∑ ( )0 1ˆ ˆ ˆ, ,..., , 1,...,ˆ d i ij i Mj D y x j d β β β β ∈ ∂ =− = ∂ ∑ 6 Greg Grudic Machine Learning 21 The Good Theoretical Properties of the Perceptron Algorithm • If a solution exists the algorithm will always converge in a finite number of steps! • Question: Does a solution always exist? Greg Grudic Machine Learning 22 Linearly Separable Data • Which of these datasets are separable by a linear boundary? + a) b) + + - - - + + - - - Greg Grudic Machine Learning 23 Linearly Separable Data • Which of these datasets are separable by a linear boundary? + a) b) + + - - - + + - - - NotLinearly Separable! Greg Grudic Machine Learning 24 Bad Theoretical Properties of the Perceptron Algorithm • If the data is not linearly separable, algorithm cycles forever! – Cannot converge! – This property stopped research in this area between 1968 and 1984… • Perceptrons, Minsky and Pappert, 1969 • There are infinitely many solutions • When data is linearly separable, the number of steps to converge can be very large (depends on size of gap between classes) 7 Greg Grudic Machine Learning 25 What about Nonlinear Data? • Data that is not linearly separable is called nonlinear data • Nonlinear data can often be mapped into a nonlinear space where it is linearly separable Greg Grudic Machine Learning 26 Nonlinear Models • The Linear Model: • The Nonlinear (basis function) Model: • Examples of Nonlinear Basis Functions: 0 1 ˆ ˆˆ ( ) sgn d i i i y M xβ β = = = + ∑x ( )0 1 ˆ ˆˆ ( ) sgn k i i i y M β β φ = = = + ∑x x ( ) ( ) ( ) ( ) ( )2 21 1 2 2 3 1 2 4 55sinx x x x xφ φ φ φ= = = =x x x x Greg Grudic Machine Learning 27 Linear Separating Hyper-Planes In Nonlinear Basis Function Space 1φ 2φ 0 1 0 k i i i β β φ = + ≤∑ 0 1 0 k i i i β β φ = + >∑ 0 1 0 k i i i β β φ = + =∑ 1y=− 1y=+ Greg Grudic Machine Learning 28 An Example -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 x1 x 2 : y=+1 : y=-1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 φ1 = x1 2 φ 2 = x 22 : y=+1 : y=-1 Φ