Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Mining: Homework 3 - Linear Separability, Logistic Regression, Perceptron, and SVM, Assignments of Health sciences

The University of Texas at Austin Health sciences

The third homework assignment for the data mining course (cs 395t) taught by inderjit dhillon during spring 2008. The assignment covers topics such as linear separability, logistic regression, perceptron algorithm, and support vector machines (svm). Students are required to prove theorems, analyze error functions, and derive algorithms for solving svm problems.

Typology: Assignments

Pre 2010

Uploaded on 08/27/2009

koofers-user-vub 🇺🇸

7 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

CS 395T Data Mining: A Mathematical Perspective Spring 2008

Homework 3

Lecturer: Inderjit Dhillon Date Due: April 7, 2008

Keywords: Classification, Perceptron, Support Vector Machines

1. Given two sets of data points X={x1, x2, . . . , xn}and Y={y1, y2, . . . , yn}, prove that the two sets (Xand

Y) are linearly separable if and only if their convex hulls do not intersect.

2. Given training instances (xn, yn) with yn∈ {0,1}, consider the following error function for logistic regression:

E(w) = −

N

X

n=1

(ynlog zn+ (1 −yn) log(1 −zn)),

where zn=σ(wTxn), wspecifies a hyperplane, and σis the logistic sigmoid function defined by

σ(a) = 1

1 + exp(−a).

Prove that the error function E(w) is a convex function and provide a condition on the input data so that

E(w) has a unique minimum.

3. In this exercise, we will prove correctness and convergence of the Perceptron algorithm for linearly separable

data.

Let wtrepresent the hyperplane at step tand (xt, yt) represent an input instance with yt∈ {1,−1}. Note

that the input data is padded with one, i.e. xt=xt

1. Recall the update:

wt+1 =wt+ytxt,if yt(wT

txt)<0 ,i.e., a mistake.

Assume that all the input data points have bounded Euclidean norm, i.e., kxtk ≤ Rand are linearly separable

with finite margin γ, i.e., there exists a hyperplane specified by w∗such that:

yt(w∗Txt)≥γ, ∀t.

(a) Prove that the following holds after tupdates: w∗Twt≥tγ.

(b) Prove that: kwtk2

2≤tR2.

(c) Using parts (a) and (b), prove that the Perceptron algorithm converges to a separating hyperplane after

at most R2kw∗k2

2

γ2steps.

4. In this exercise, we will derive an algorithm for solving the SVM problem. Recall the dual formulation for

the linearly-separable SVM:

max

αW(α),where W(α) = PN

i=1 αi−1

2PN

i=1 PN

j=1 yiyjαiαjKij

subject to

N

X

i=1

yiαi= 0,(1)

αi≥0, i = 1, ..., N . (2)

1

Discover Assignments of Health sciences The University of Texas at Austin

Partial preview of the text

Download Data Mining: Homework 3 - Linear Separability, Logistic Regression, Perceptron, and SVM and more Assignments Health sciences in PDF only on Docsity!

CS 395T Data Mining: A Mathematical Perspective Spring 2008

Homework 3

Lecturer: Inderjit Dhillon Date Due: April 7, 2008 Keywords: Classification, Perceptron, Support Vector Machines

Given two sets of data points X = {x 1 , x 2 ,... , xn} and Y = {y 1 , y 2 ,... , yn}, prove that the two sets (X and Y ) are linearly separable if and only if their convex hulls do not intersect.
Given training instances (xn, yn) with yn ∈ { 0 , 1 }, consider the following error function for logistic regression:

E(w) = −

∑^ N

n=

(yn log zn + (1 − yn) log(1 − zn)),

where zn = σ(wT^ xn), w specifies a hyperplane, and σ is the logistic sigmoid function defined by

σ(a) =

1 + exp(−a)

Prove that the error function E(w) is a convex function and provide a condition on the input data so that E(w) has a unique minimum.

In this exercise, we will prove correctness and convergence of the Perceptron algorithm for linearly separable data. Let wt represent the hyperplane at step t and (xt, yt) represent an input instance with yt ∈ { 1 , − 1 }. Note

that the input data is padded with one, i.e. xt =

[

xt 1

]

. Recall the update:

wt+1 = wt + ytxt, if yt(wTt xt) < 0 ,i.e., a mistake.

Assume that all the input data points have bounded Euclidean norm, i.e., ‖xt‖ ≤ R and are linearly separable with finite margin γ, i.e., there exists a hyperplane specified by w∗^ such that:

yt(w∗T^ xt) ≥ γ, ∀ t.

(a) Prove that the following holds after t updates: w∗T^ wt ≥ tγ. (b) Prove that: ‖wt‖^22 ≤ tR^2. (c) Using parts (a) and (b), prove that the Perceptron algorithm converges to a separating hyperplane after at most R

(^2) ‖w∗‖ (^22) γ^2 steps.

In this exercise, we will derive an algorithm for solving the SVM problem. Recall the dual formulation for the linearly-separable SVM:

max α W (α), where W (α) =

∑N

i=1 αi^ −^

1 2

∑N

i=

∑N

j=1 yiyj^ αiαj^ Kij

subject to

∑^ N

i=

yiαi = 0, (1)

αi ≥ 0 , i = 1, ..., N. (2)

2 CS 395T: Data Mining: A Mathematical Perspective

In the above problem, Kij could be xTi xj or Kij = κ(xi, xj ) = h(xi)T^ h(xj ). Note that the matrix K is positive semi-definite. The dual variables α 1 , ..., αN are said to be feasible if (1) and (2) are satisfied. We will consider the following strategy for optimizing this problem: at each iteration, we start with a feasible α and then update exactly 2 α’s at a time. The update must maintain feasibility. Assume without loss of generality that the variables to be updated are α 1 and α 2. In the following, you will derive an update to α 1 and α 2 that maximizes the dual problem given above when only α 1 and α 2 are allowed to change. a) α 1 and α 2 are to be updated to ¯α 1 and ¯α 2. Using the constraints on α from the dual problem, show that if y 1 = y 2 , then ¯α 2 ≤ α 1 + α 2 , and if y 1 6 = y 2 , then ¯α 2 ≥ α 2 − α 1. b) Given that y 1 α 1 + y 2 α 2 = constant = y 1 α¯ 1 + y 2 α¯ 2 , express this equivalently as α 1 + sα 2 = γ, where s = y 1 y 2. Furthermore, let

vi =

∑^ N

j=

yj αj Kij , i = 1, 2.

Write the dual objective as a function of α 1 and α 2 (fixing the other α variables as constants), then use the equation α 1 + sα 2 = γ to express the dual as a function of only α 2 , yielding

W (α 2 ) = γ − sα 2 + α 2 −

K 11 (γ − sα 2 )^2 −

K 22 α^22 −sK 12 (γ − sα 2 )α 2 − y 1 (γ − sα 2 )v 1 − y 2 α 2 v 2 + constant

c) Differentiate W (α 2 ) with respect to α 2 to calculate the maximizing ¯α 2. Let d 12 = K 11 − 2 K 12 + K 22 for notational convenience. Justify why this solution is a maximum (not a minimum). d) Let Ei = f (xi) − yi = (

∑N

j=1 αj^ yj^ Kij^ +^ w^0 )^ −^ yi, i.e., the difference between the predicted value and the true class label. Simplify your result in part c) to obtain the following:

α¯ 2 = α 2 +

y 2 (E 1 − E 2 ) d 12

and then, using part a), obtain the final solution for ¯α 2 as:

α¯ 2 :=

max(0, min(¯α 2 , α 1 + α 2 )) if y 1 = y 2 , max(¯α 2 , α 2 − α 1 , 0) if y 1 6 = y 2.

Furthermore, show that ¯α 1 = α 1 +y 1 y 2 (α 2 − α¯ 2 ). This update results in a non-decreasing dual, and repeating over pairs of α eventually leads to global convergence of the SVM problem.

Data Mining: Homework 3 - Linear Separability, Logistic Regression, Perceptron, and SVM, Assignments of Health sciences

Related documents

Partial preview of the text

Download Data Mining: Homework 3 - Linear Separability, Logistic Regression, Perceptron, and SVM and more Assignments Health sciences in PDF only on Docsity!

CS 395T Data Mining: A Mathematical Perspective Spring 2008

Homework 3

∑^ N

[

]

∑N

∑N

∑N

∑^ N

∑^ N

∑N