Download Understanding Intelligence: Linear Models & Artificial Neural Networks and more Slides Advanced Algorithms in PDF only on Docsity!
- We have been discussing some simple ideas fromstatistical learning theory. PR NPTEL course – p.1/
- We have been discussing some simple ideas fromstatistical learning theory. - The risk minimization framework that we discussedgives us a better perspective on understanding theunifying theme in different learning algorithms. PR NPTEL course – p.2/
- We have been discussing some simple ideas fromstatistical learning theory. - The risk minimization framework that we discussedgives us a better perspective on understanding theunifying theme in different learning algorithms. - We will now go back to studying pattern classificationalgorithms. - We will first briefly review algorithms for learninglinear classifiers and then start looking at methods tolearn nonlinear classifiers. PR NPTEL course – p.4/
Linear Models
- In the two class case, the linear classifier is given by h
X
sign
W
T
X
w 0
PR NPTEL course – p.5/
- We discussed many algorithms for learning
W
. PR NPTEL course – p.7/
- We discussed many algorithms for learning
W
.
- The Perceptron algorithm is a simple error-correctingmethod that is guarenteed to find a separatinghyperplane if one exists. PR NPTEL course – p.8/
- We discussed many algorithms for learning
W
.
- The Perceptron algorithm is a simple error-correctingmethod that is guarenteed to find a separatinghyperplane if one exists. - The perceptron convergence theorem shows thatgiven any training set of linearly separable patterns,the algorithm will find a separating hyperplane. - Our discussion on statistical learning theory gives usan idea of how many iid examples we should have before we can be confident that the hyperplane thatseparates the examples will also do well on test data. PR NPTEL course – p.10/
- We have also seen the least-squares method wherewe find
W
to minimize
J
W
(^1) n
i
W
T
X
i
y i
2 where, for simplicity of notation, we have assumedaugumented feature vectors. PR NPTEL course – p.11/
- We have seen how to obtain the least-squaressolution:
W
∗
A
T
A
− 1
A
T
Y
where rows of matrix
A
are feature vectors and components of
Y
are y i . PR NPTEL course – p.13/
- We have seen how to obtain the least-squaressolution:
W
∗
A
T
A
− 1
A
T
Y
where rows of matrix
A
are feature vectors and components of
Y
are y i .
- The least-squares method can also be used to learnlinear regression models. PR NPTEL course – p.14/
- We have seen that we can also minimize the empiricalrisk
J
W
using gradient descent. PR NPTEL course – p.16/
- We have seen that we can also minimize the empiricalrisk
J
W
using gradient descent.
- We can also run this gradient descent in anincremental fashion by considering one example at atime. PR NPTEL course – p.17/
- We have also seen that we can use the least squaresidea to learn a model g
W
T
X
by redefining
J
as J
W
(^1) n
i
g
W
T
X
i
y i
2 PR NPTEL course – p.19/
- We have also seen that we can use the least squaresidea to learn a model g
W
T
X
by redefining
J
as J
W
(^1) n
i
g
W
T
X
i
y i
2
- An important example is the logistic regression wherewe take g as the sigmoid function. PR NPTEL course – p.20/