Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Midterm Exam Review for Introduction to Machine Learning Course, Study notes of Machine Learning

A review for the midterm exam of the Introduction to Machine Learning course offered by the Machine Learning Department at Carnegie Mellon University. The review includes logistics, format of questions, and advice on how to prepare for the exam. Sample questions are also provided to test the student's knowledge on probability and K-NN. useful for students who are preparing for the midterm exam and need study notes or summaries.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

ekaram
ekaram 🇺🇸

4.7

(29)

264 documents

Partial preview of the text

Download Midterm Exam Review for Introduction to Machine Learning Course and more Study notes Machine Learning in PDF only on Docsity! Midterm  Exam  Review 1 10-­‐601  Introduction  to  Machine  Learning Matt  Gormley Lecture  14 March  6,  2017 Machine  Learning  Department School  of  Computer  Science Carnegie  Mellon  University Reminders • Midterm Exam (Evening Exam) – Tue,  Mar.  07  at  7:00pm  – 9:30pm – See Piazza  for details about location 2 Midterm  Exam • Logistics – Evening  Exam Tue,  Mar.  07  at  7:00pm  – 9:30pm – 8-­‐9  Sections – Format  of  questions: • Multiple  choice • True  /  False  (with  justification) • Derivations • Short  answers • Interpreting  figures – No  electronic  devices – You  are  allowed  to  bring one  8½  x  11  sheet  of  notes   (front  and  back) 5 Midterm  Exam • How  to  Prepare – Attend  the  midterm  review  session:   Thu,  March  2  at  6:30pm  (PH  100) – Attend  the  midterm  review  lecture Mon,  March  6  (in-­‐class) – Review  prior  year’s  exam  and  solutions (we’ll  post  them) – Review  this  year’s  homework  problems 6 Midterm  Exam • Advice  (for  during  the  exam) – Solve  the  easy  problems  first   (e.g.  multiple  choice  before  derivations) • if  a  problem  seems  extremely  complicated  you’re  likely   missing  something – Don’t  leave  any  answer  blank! – If  you  make  an  assumption,  write  it  down – If  you  look  at  a  question  and  don’t  know  the   answer: • we  probably  haven’t  told  you  the  answer • but  we’ve  told  you  enough  to  work  it  out • imagine  arguing  for  some  answer  and  see  if  you  like  it 7 Sample  Questions 10 10-601: Machine Learning Page 4 of 16 2/29/2016 1.3 MAP vs MLE Answer each question with T or F and provide a one sentence explanation of your answer: (a) [2 pts.] T or F: In the limit, as n (the number of samples) increases, the MAP and MLE estimates become the same. (b) [2 pts.] T or F: Naive Bayes can only be used with MAP estimates, and not MLE estimates. 1.4 Probability Assume we have a sample space ⌦. Answer each question with T or F. No justification is required. (a) [1 pts.] T or F: If events A, B, and C are disjoint then they are independent. (b) [1 pts.] T or F: P (A|B) / P (A)P (B|A) P (A|B) . (The sign ‘/’ means ‘is proportional to’) (c) [1 pts.] T or F: P (A [ B)  P (A). (d) [1 pts.] T or F: P (A \ B) P (A). 10-601: achine Learning Page 4 of 16 2/29/2016 1.3 vs LE Answer each question with T or F and provide a one sentence explanation of your answer: (a) [2 pts.] T or F: In the limit, as n (the number of samples) increases, the AP and LE estimates become the same. (b) [2 pts.] T or F: Naive Bayes can only be used with AP estimates, and not LE estimates. 1.4 robability Assume we have a sample space ⌦. Answer each question with T or F. No justification is required. (a) [1 pts.] T or F: If events A, B, and C are disjoint then they are independent. (b) [1 pts.] T or F: P (A|B) / P (A)P (B|A) P (A|B) . (The sign ‘/’ means ‘is proportional to’) (c) [1 pts.] T or F: P (A [ B)  P (A). (d) [1 pts.] T or F: P (A \ B) P (A). 10-601: Machine Learning Page 4 of 16 2/29/2016 1.3 MAP vs MLE Answer each question with T or F and provide a one sentence explanation of your answer: (a) [2 pts.] T or F: In the limit, as n (the number of samples) increases, the MAP and MLE estimates become the same. (b) [2 pts.] T or F: Naive Bayes can only be used with MAP estimates, and not MLE estimates. 1.4 Probability Assume we have a sample space ⌦. Answer each question with T or F. No justification is required. (a) [1 pts.] T or F: If events A, B, and C are disjoint then they are independent. (b) [1 pts.] T or F: P (A|B) / P (A)P (B|A) P (A|B) . (The sign ‘/’ means ‘is proportional to’) (c) [1 pts.] or : ( [ )  ( ). (d) [1 pts.] T or F: P (A \ B) P (A). Sample  Questions 11 10-701 Machine Learning Midterm Exam - Page 8 of 17 11/02/2016 Now we will apply K-Nearest Neighbors using Euclidean distance to a binary classifi- cation task. We assign the class of the test point to be the class of the majority of the k nearest neighbors. A point can be its own neighbor. Figure 5 3. [2 pts] What value of k minimizes leave-one-out cross-validation error for the dataset shown in Figure 5? What is the resulting error? 4. [2 pts] Sketch the 1-nearest neighbor boundary over Figure 5. 5. [2 pts] What value of k minimizes the training set error for the dataset shown in Figure 5? What is the resulting training error? 10-701 Machine Learning Midterm Exam - Page 7 of 17 11/02/2016 4 K-NN [12 pts] In this problem, you will be tested on your knowledge of K-Nearest Neighbors (K-NN), where k indicates the number of nearest neighbors. 1. [3 pts] For K-NN in general, are there any cons of using very large k values? Select one. Briefly justify your answer. (a) Yes (b) No 2. [3 pts] For K-NN in general, are there any cons of using very small k values? Select one. Briefly justify your answer. (a) Yes (b) No Sample  Questions 12 10-601: Machine Learning Page 3 of 16 2/29/2016 1.2 Maximum Likelihood Estimation (MLE) Assume we have a random sample that is Bernoulli distributed X1, . . . , Xn ⇠ Bernoulli(✓). We are going to derive the MLE for ✓. Recall that a Bernoulli random variable X takes values in {0, 1} and has probability mass function given by P (X; ✓) = ✓X(1 ✓)1X . (a) [2 pts.] Derive the likelihood, L(✓;X1, . . . , Xn). (b) [2 pts.] Derive the following formula for the log likelihood: `(✓;X1, . . . , Xn) = nX i=1 Xi ! log(✓) + n nX i=1 Xi ! log(1 ✓). (c) Extra Credit: [2 pts.] Derive the following formula for the MLE: ✓̂ = 1 n ( Pn i=1 Xi). 10-601: Machine Learning Page 3 of 16 2/29/2016 1.2 Maximum Likelihood Estimation (MLE) Assume we have a random sample that is Bernoulli distributed X1, . . . , Xn ⇠ Bernoulli(✓). We are going to derive the MLE for ✓. Recall that a Bernoulli random variable X takes values in {0, 1} and has probability mass function given by P (X; ✓) = ✓X(1 ✓)1X . (a) [2 pts.] Derive the likelihood, L(✓;X1, . . . , Xn). (b) [2 pts.] Derive the following formula for the log likelihood: `(✓;X1, . . . , Xn) = nX i=1 Xi ! log(✓) + n nX i=1 Xi ! log(1 ✓). (c) Extra Credit: [2 pts.] Derive the following formula for the MLE: ✓̂ = 1 n ( Pn i=1 Xi). 10-601: Machine Learning Page 3 of 16 2/29/2016 1.2 Maximum Likelihood Estimation (MLE) Assume we have a random sample that is Bernoulli distributed X1, . . . , Xn ⇠ Bernoulli(✓). We are going to derive the MLE for ✓. Recall that a Bernoulli random variable X takes values in {0, 1} and has probability mass function given by P (X; ✓) = ✓X(1 ✓)1X . (a) [2 pts.] Derive the likelihood, L(✓;X1, . . . , Xn). (b) [2 pts.] Derive the following formula for the log likelihood: `(✓;X1, . . . , Xn) = nX i=1 Xi ! log(✓) + n nX i=1 Xi ! log(1 ✓). (c) Extra Credit: [2 pts.] Derive the following formula for the MLE: ✓̂ = 1 n ( Pn i=1 Xi). Sample  Questions 15 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the original one) in Fig. 2 corresponds to the regression line for the new data set. Write your answers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of th alt red data sets Snew plott d in Fig. 3, indicate hich regression line (relative to the original one) in Fig. 2 correspo s to th regression l n for the new data se . Write your n wers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noi e is Gaussian, maximizing t e lik lihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalen to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the origin l one) in Fig. 2 corresponds to the regr s on l ne for the new data set. W ite your answers in he t ble below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 8 of 16 2/29/2016 (a) Adding one outlier to the original data set. (b) Adding two outliers to the original data set. (c) Adding three outliers to the original data set. Two on one side and one on the other side. (d) Duplicating the original data set. (e) Duplicating the original data set and adding four points that lie on the trajectory of the original regression line. Figure 3: New data set Snew. Dataset Sample  Questions 16 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the original one) in Fig. 2 corresponds to the regression line for the new data set. Write your answers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of th alt red data sets Snew plott d in Fig. 3, indicate hich regression line (relative to the original one) in Fig. 2 correspo s to th regression l n for the new data se . Write your n wers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noi e is Gaussian, maximizing t e lik lihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalen to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the origin l one) in Fig. 2 corresponds to the regr s on l ne for the new data set. W ite your answers in he t ble below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 8 of 16 2/29/2016 (a) Adding one outlier to the original data set. (b) Adding two outliers to the original data set. (c) Adding three outliers to the original data set. Two on one side and one on the other side. (d) Duplicating the original data set. (e) Duplicating the original data set and adding four points that lie on the trajectory of the original regression line. Figure 3: New data set Snew. Dataset Sample  Questions 17 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the original one) in Fig. 2 corresponds to the regression line for the new data set. Write your answers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 Linear regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noise is Gaussian, maximizing the likelihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalent to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of th alt red data sets Snew plott d in Fig. 3, indicate hich regression line (relative to the original one) in Fig. 2 correspo s to th regression l n for the new data se . Write your n wers in the table below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 7 of 16 2/29/2016 3 Linear and Logistic Regression [20 pts. + 2 Extra Credit] 3.1 regression Given that we have an input x and we want to estimate an output y, in linear regression we assume the relationship between them is of the form y = wx+ b+ ✏, where w and b are real-valued parameters we estimate and ✏ represents the noise in the data. When the noi e is Gaussian, maximizing t e lik lihood of a dataset S = {(x1, y1), . . . , (xn, yn)} to estimate the parameters w and b is equivalen to minimizing the squared error: argmin w nX i=1 (yi (wxi + b))2. Consider the dataset S plotted in Fig. 1 along with its associated regression line. For each of the altered data sets Snew plotted in Fig. 3, indicate which regression line (relative to the origin l one) in Fig. 2 corresponds to the regr s on l ne for the new data set. W ite your answers in he t ble below. Dataset (a) (b) (c) (d) (e) Regression line Figure 1: An observed data set and its associated regression line. Figure 2: New regression lines for altered data sets Snew. 10-601: Machine Learning Page 8 of 16 2/29/2016 (a) Adding one outlier to the original data set. (b) Adding two outliers to the original data set. (c) Adding three outliers to the original data set. Two on one side and one on the other side. (d) Duplicating the original data set. (e) Duplicating the original data set and adding four points that lie on the trajectory of the original regression line. Figure 3: New data set Snew. Dataset Samples  Questions 20 10-601B: MACHINE LEARNING Page 5 of ?? 10/10/2016 2 To err is machine-like [20 pts] 2.1 Train and test errors In this problem, we will see how you can debug a classifier by looking at its train and test errors. Consider a classifier trained till convergence on some training data Dtrain, and tested on a separate test set Dtest. You look at the test error, and find that it is very high. You then compute the training error and find that it is close to 0. 1. [4 pts] Which of the following is expected to help? Select all that apply. (a) Increase the training data size. (b) Decrease the training data size. (c) Increase model complexity (For example, if your classifier is an SVM, use a more complex kernel. Or if it is a decision tree, increase the depth). (d) Decrease model complexity. (e) Train on a combination of Dtrain and Dtest and test on Dtest (f) Conclude that Machine Learning does not work. 2. [5 pts] Explain your choices. 3. [2 pts] What is this scenario called? 4. [1 pts] Say you plot the train and test errors as a function of the model complexity. Which of the following two plots is your plot expected to look like? , . tr i , t t. , . . . . . . , , . , . . tr i t t t t . . . . . . Samples  Questions 21 10-601B: MACHINE LEARNING Page 5 of ?? 10/10/2016 2 To err is machine-like [20 pts] 2.1 Train and test errors In this problem, we will see how you can debug a classifier by looking at its train and test errors. Consider a classifier trained till convergence on some training data Dtrain, and tested on a separate test set Dtest. You look at the test error, and find that it is very high. You then compute the training error and find that it is close to 0. 1. [4 pts] Which of the following is expected to help? Select all that apply. (a) Increase the training data size. (b) Decrease the training data size. (c) Increase model complexity (For example, if your classifier is an SVM, use a more complex kernel. Or if it is a decision tree, increase the depth). (d) Decrease model complexity. (e) Train on a combination of Dtrain and Dtest and test on Dtest (f) Conclude that Machine Learning does not work. 2. [5 pts] Explain your choices. 3. [2 pts] What is this scenario called? 4. [1 pts] Say you plot the train and test errors as a function of the model complexity. Which of the following two plots is your plot expected to look like? 10-601B: MACHINE LEARNING Page 5 of ?? 10/10/2016 2 To err is machine-like [20 pts] 2.1 Train and test errors In this problem, we will see how you can debug a classifier by looking at its train and test errors. Consider a classifier trained till convergence on some training data Dtrain, and tested on a separate test set Dtest. You look at the test error, and find that it is very high. You then compute the training error and find that it is close to 0. 1. [4 pts] Which of the following is expected to help? Select all that apply. (a) Increase the training data size. (b) Decrease the training data size. (c) Increase model complexity (For example, if your classifier is an SVM, use a more complex kernel. Or if it is a decision tree, increase the depth). (d) Decrease model complexity. (e) Train on a combination of Dtrain and Dtest and test on Dtest (f) Conclude that Machine Learning does not work. 2. [5 pts] Explain your choices. 3. [2 pts] What is this scenario called? 4. [1 pts] Say you plot the train and test errors as a function of the model complexity. Which of the following two plots is your plot expected to look like?10-601B: MACHINE LEARNING Page 6 of ?? 10/10/2016 (a) (b) 2.2 True and sample errors Consider a classification problem with distribution D and target function c⇤ : Rd 7! ±1. For any sample S drawn from D, answer whether the following statements are true or false, along with a brief explanation. 1. [4 pts] For a given hypothesis space H , it is possible to define a sufficient size of S such that the true error is bounded by the sample error by a margin ✏, for all hypotheses h 2 H with a given probability. 2. [4 pts] The true error of any hypothesis h is an upper bound on its training error on the sample S. Sample  Questions 24 10-601: Machine Learning Page 10 of 16 2/29/2016 4 SVM, Perceptron and Kernels [20 pts. + 4 Extra Credit] 4.1 True or False Answer each of the following questions with T or F and provide a one line justification. (a) [2 pts.] Consider two datasets D(1) and D(2) where D(1) = {(x(1) 1 , y (1) 1 ), ..., (x(1) n , y (1) n )} and D(2) = {(x(2) 1 , y (2) 1 ), ..., (x(2) m , y (2) m )} such that x(1) i 2 Rd1 , x(2) i 2 Rd2 . Suppose d1 > d2 and n > m. Then the maximum number of mistakes a perceptron algorithm will make is higher on dataset D(1) than on dataset D(2). (b) [2 pts.] Suppose (x) is an arbitrary feature mapping from input x 2 X to (x) 2 RN and let K(x, z) = (x) · (z). Then K(x, z) will always be a valid kernel function. (c) [2 pts.] Given the same training data, in which the points are linearly separable, the margin of the decision boundary produced by SVM will always be greater than or equal to the margin of the decision boundary produced by Perceptron. 4.2 Multiple Choice (a) [3 pt.] If the data is linearly separable, SVM minimizes kwk2 subject to the constraints 8i, yiw · xi 1. In the linearly separable case, which of the following may happen to the decision boundary if one of the training samples is removed? Circle all that apply. • Shifts toward the point removed • Shifts away from the point removed • Does not change (b) [3 pt.] Recall that when the data are not linearly separable, SVM minimizes kwk2 + C P i ⇠i subject to the constraint that 8i, yiw · xi 1 ⇠i and ⇠i 0. Which of the following may happen to the size of the margin if the tradeo↵ parameter C is increased? Circle all that apply. • Increases • Decreases • Remains the same Sample  Questions 28 10-601B: MACHINE LEARNING Page 8 of ?? 10/10/2016 3. [Extra Credit: 3 pts.] One formulation of soft-margin SVM optimization problem is: min w 1 2 kwk2 2 + C NX i=1 ⇠ i s.t. y i (w >x i ) 1 ⇠ i 8i = 1, ..., N ⇠ i 0 8i = 1, ..., N C 0 where (x i , y i ) are training samples and w defines a linear decision boundary. Derive a formula for ⇠ i when the objective function achieves its minimum (No steps neces- sary). Note it is a function of y i w >x i . Sketch a plot of ⇠ i with y i w >x i on the x-axis and value of ⇠ i on the y-axis. What is the name of this function? Figure 2: Plot here 10-601B: MACHINE LEARNING Page 8 of ?? 10/10/2016 3. [Extra Credit: 3 pts.] One formulation of soft-margin SVM optimization problem is: min w 1 2 kwk2 2 + C NX i=1 ⇠ i s.t. y i (w >x i ) 1 ⇠ i 8i = 1, ..., N ⇠ i 0 8i = 1, ..., N C 0 where (x i , y i ) are training samples and w defines a linear decision boundary. Derive a formula for ⇠ i when the objective function achieves its mini um (No steps neces- sary). Note it is a function of y i w >x i . Sketch a plot of ⇠ i with y i w >x i on the x-axis and value of ⇠ i on the y-axis. What is the name of this function? Figure 2: Plot here CLASSIFICATION  AND   REGRESSION The  Big  Picture 30 Classification  and  Regression:   The  Big  Picture Whiteboard – Decision  Rules  /  Models  (probabilistic   generative,  probabilistic  discriminative,   perceptron,  SVM,  regression) – Objective  Functions  (likelihood,  conditional   likelihood,  hinge  loss,  mean  squared  error) – Regularization (L1,  L2,  priors  for  MAP) – Update  Rules  (SGD,  perceptron) – Nonlinear  Features  (preprocessing,  kernel  trick) 31