Download Introduction to Machines Learning – Supervised Learning Introduction | CSI 5V93 and more Assignments Software Engineering in PDF only on Docsity! Lecture 4: Supervised learning introduction CSI 5v93: Introduction to machine learning Baylor University Computer Science Department Dr. Greg Hamerly http://cs.baylor.edu/˜hamerly/ CSI 5v93: Introduction to machine learning, Lecture 4 – p. 1/14 Announcements • Homework 1 due today CSI 5v93: Introduction to machine learning, Lecture 4 – p. 2/14 Questions? CSI 5v93: Introduction to machine learning, Lecture 4 – p. 3/14 Chapter 2: Overview of supervised learning • 2.1 – Introduction • 2.2 – Variable types and terminology • 2.3 – Two simple approaches to prediction: Least squares and nearest neighbors • 2.4 – Statistical decision theory • 2.5 – Local methods in high dimensions • 2.6 – Statistical models, supervised learning, and function approximation • 2.7 – Structured regression models • 2.8 – Classes of restricted estimators • 2.9 – Model selection and the bias-variance tradeoff CSI 5v93: Introduction to machine learning, Lecture 4 – p. 4/14 Structured regression models (2.7) Consider the RSS criterion for an arbitrary function f : RSS(f) = n ∑ i=1 (yi − f(xi)) 2 If we have no restrictions on f , then minimizing this function leads to infinitely many functions. There are an infinite number of functions that pass through the training points {xi, yi} (or through the average yi values for the same xi). In order to obtain useful solutions, we must restrict f . CSI 5v93: Introduction to machine learning, Lecture 4 – p. 9/14 Constraints Constraining a function can be described using complexity restrictions. These complexity restrictions can be thought of as “regular behavior in a neighborhood” (e.g. constant, linear, or low-order polynomial). The strength of the constraint and the size of the neighborhood are positively correlated. CSI 5v93: Introduction to machine learning, Lecture 4 – p. 10/14 Classes of restricted estimators (2.8) There are several methods of modelling constraints in regression-type problems, which we touch on briefly: Roughness penalty (aka Bayesian method, aka regularization) PRSS(f ; λ) = RSS(f) + λJ(f) Here we must pick λ and J , and they correspond to Bayesian priors on the variability of f . For λ = 0, no penalty is imposed. For λ = ∞, only linear models are allowed. CSI 5v93: Introduction to machine learning, Lecture 4 – p. 11/14 Kernel methods and local regression A kernel is a function that explicitly defines the neighborhood around a point x0: Kλ(x0, x) Examples: Gaussian kernel, constant-width kernel (on board) The kernel’s primary parameter is the kernel width, which defines the size of the neighborhood. A kernel + a local model allow flexible models where the penalty is the kernel width. CSI 5v93: Introduction to machine learning, Lecture 4 – p. 12/14 ZIP code demonstration with k-nearest neighbors CSI 5v93: Introduction to machine learning, Lecture 4 – p. 13/14 2-minute journal Please write a response to the following on a piece of paper and hand it in immediately. Please make it anonymous (no names). Write about: • major points you learned today • areas not understood or requiring clarification CSI 5v93: Introduction to machine learning, Lecture 4 – p. 14/14