Gaussian Discriminant Analysis vs. Logistic Regression: Comparing Classifiers | Study notes Machine Learning

MachineLearning-Lecture05

Instructor (Andrew Ng):Okay, good morning. Just one quick announcement and

reminder, the project guidelines handout was posted on the course website last week. So

if you haven’t yet downloaded it and looked at it, please do so. It just contains the

guidelines for the project proposal and the project milestone, and the final project

presentation.

So what I want to do today is talk about a different type of learning algorithm, and, in

particular, start to talk about generative learning algorithms and the specific algorithm

called Gaussian Discriminant Analysis. Take a slight digression, talk about Gaussians,

and I’ll briefly discuss generative versus discriminative learning algorithms, and then

hopefully wrap up today’s lecture with a discussion of Naive Bayes and the Laplace

Smoothing.

So just to motivate our discussion on generative learning algorithms, right, so by way of

contrast, the source of classification algorithms we’ve been talking about I think of

algorithms that do this. So you’re given a training set, and if you run an algorithm right,

we just see progression on those training sets.

The way I think of logistic regression is that it’s trying to find – look at the date and is

trying to find a straight line to divide the crosses and O’s, right? So it’s, sort of, trying to

find a straight line. Let me – just make the days a bit noisier. Trying to find a straight line

that separates out the positive and the negative classes as well as pass the law, right?

And, in fact, it shows it on the laptop. Maybe just use the screens or the small monitors

for this. In fact, you can see there’s the data set with logistic regression, and so I’ve

initialized the parameters randomly, and so logistic regression is, kind of, the outputting –

it’s the, kind of, hypothesis that iteration zero is that straight line shown in the bottom

right.

And so after one iteration and creating descent, the straight line changes a bit. After two

iterations, three, four, until logistic regression converges and has found the straight line

that, more or less, separates the positive and negative class, okay? So you can think of

this as logistic regression, sort of, searching for a line that separates the positive and the

negative classes.

What I want to do today is talk about an algorithm that does something slightly different,

and to motivate us, let’s use our old example of trying to classify the team malignant

cancer and benign cancer, right? So a patient comes in and they have a cancer, you want

to know if it’s a malignant or a harmful cancer, or if it’s a benign, meaning a harmless

cancer.

So rather than trying to find the straight line to separate the two classes, here’s something

else we could do. We can go from our training set and look at all the cases of malignant

cancers, go through, you know, look for our training set for all the positive examples of

Gaussian Discriminant Analysis vs. Logistic Regression: Comparing Classifiers, Study notes of Machine Learning