





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Maximum likelihood estimation (mle) and its applications to logistic regression and support vector machines. Mle is a method to estimate parameters given data, assuming the data is identically independently distributed. The likelihood function, mle estimator, and optimization techniques for logistic regression and support vector machines. It also compares these methods and discusses their differences.
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Goal : estimate the parameters given data Assuming the data is i.i.d (identically independently distributed) For example, given the results of n coin tosses, we like to estimate the probability of head p. Likelihood function:
MLE estimator:
= =
= = =
n i
i i n i
L PD Pxi^ yi Px y 1 1
(θ) log ( |θ) log ( , |θ) log ( , | θ)
θ argmax ( θ ) θ MLE =^ L
P ( x )= θ x ( 1 − θ)^1 −^ x
( 0 11 ... )
d
d d error
error y y
e
y
i N
d
, y , i N
i
i i
·
i
i i
i
−
w w
x
For to do
Repeatuntilconvergence
Letw (0,0,0,...,0)
Given:trainingexamples x
w x )
y ˆ^ i ← sign ( w · x i )
Note: y takes 0/1 here, not 1/-
Logistic Regression Vs. Perceptron
algorithms
boundary – how so?
Naïve Bayes, P(y|x) will take the same functional form of Logistic Regression
Bottom line: if the naïve assumption holds, NB would be a good choice; otherwise, logistic regression works better
Intuition of Margin
+ + +
+
+
+ +
+
− −
−
− − −
−
−
− −
−
A
+
B
C
Given a training set, we would like to make all of our predictions correct and confident! This leads to the concept of margin.
w · x + b = 0
Functional Margin
w · x + b = 0
X^1?
1
+ + +
+
+
+ +
+
− −
−
− − −
−
−
− −
−
A
+
B
C
w
i y^ i ( w^^ ⋅ x i + b )
( ) 1
i N
=L
Points closest to the boundary are called Support vectors – only these points really matters, other examples are ignorable