CS 680 Homework 1: Ridge Regression and Kernels - Prof. Asa Ben-Hur, Assignments of Computer Science

The instructions for homework 1 of a computer science course, cs 680. The homework covers topics related to ridge regression and kernels, including the optimization algorithm for ridge regression, kernel functions for discrete data, and the properties of kernels. Students are required to write the ridge regression algorithm in primal and dual formulations, prove that certain functions are kernels, and investigate whether other functions are valid kernels. The homework also includes questions on positivity and vanishing diagonals of kernels, and the relationship between kernel and classifier parameters.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-9zb
koofers-user-9zb 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 680
Homework 1 (due September 18th)
1. Ridge regression [10 pts].
The ridge regression optimization can be solved by gradient descent. Write this algo-
rithm in the primal and dual formulations.
2. A kernel for discrete data [10 pts].
Let be a set, and let A, B be subsets of Ω. Show that the following functions are
kernels:
K1(A, B) = |AB|,
and
K2(A, B) = 2|AB|,
where |·|is the cardinality of a set.
3. Is this a kernel? [35 pts]
Let K1and K2be kernels over X × X. Let f:X 7→ R, let abe a positive number,
let φbe a function φ:X 7→ Rm, and let K3be a kernel over Rm×Rm. For each of
the functions Kdefined below, state whether it is a kernel. If you think it is a kernel,
prove it by explicitly showing the feature map for which Kis a dot product; otherwise
prove it is not a kernel.
(a) K(x,x0) = K1(x,x0) + K2(x,x0)
(b) K(x,x0) = K1(x,x0)K2(x,x0)
(c) K(x,x0) = aK1(x,x0)
(d) K(x,x0) = aK1(x,x0)
(e) K(x,x0) = K1(x,x0)K2(x,x0)
(f) K(x,x0) = K3(φ(x), φ(x0))
(g) K(x,x0) = f(x)f(x0)
(h) K(x,x0) = K1(x,x0)
K1(x,x)K1(x0,x0)
4. Positivity on the diagonal [5 pts]. Prove that every kernel satisfies K(x,x)
0x X.
5. Kernels with a vanishing diagonal [5 pts]. Prove that a kernel satisfying K(x,x) =
0x X is identically zero. You can use the Cauchy-Schwarz inequality which states
that for all x,x0in some vector space X
|hx,x0i| ||x|| · ||x0||,
with equality occurring if and only if xand x0are co-linear (i.e. there exists λsuch
that x0=λx).
pf2

Partial preview of the text

Download CS 680 Homework 1: Ridge Regression and Kernels - Prof. Asa Ben-Hur and more Assignments Computer Science in PDF only on Docsity!

CS 680

Homework 1 (due September 18th)

  1. Ridge regression [10 pts]. The ridge regression optimization can be solved by gradient descent. Write this algo- rithm in the primal and dual formulations.
  2. A kernel for discrete data [10 pts]. Let Ω be a set, and let A, B be subsets of Ω. Show that the following functions are kernels: K 1 (A, B) = |A ∩ B| , and K 2 (A, B) = 2|A∩B|^ , where | · | is the cardinality of a set.
  3. Is this a kernel? [35 pts] Let K 1 and K 2 be kernels over X × X. Let f : X 7 → R, let a be a positive number, let φ be a function φ : X 7 → Rm, and let K 3 be a kernel over Rm^ × Rm. For each of the functions K defined below, state whether it is a kernel. If you think it is a kernel, prove it by explicitly showing the feature map for which K is a dot product; otherwise prove it is not a kernel.

(a) K(x, x′) = K 1 (x, x′) + K 2 (x, x′) (b) K(x, x′) = K 1 (x, x′) − K 2 (x, x′) (c) K(x, x′) = aK 1 (x, x′) (d) K(x, x′) = −aK 1 (x, x′) (e) K(x, x′) = K 1 (x, x′)K 2 (x, x′) (f) K(x, x′) = K 3 (φ(x), φ(x′)) (g) K(x, x′) = f (x)f (x′) (h) K(x, x′) = K^1 (x,x

′) √ K 1 (x,x)K 1 (x′,x′)

  1. Positivity on the diagonal [5 pts]. Prove that every kernel satisfies K(x, x) ≥ 0 ∀x ∈ X.
  2. Kernels with a vanishing diagonal [5 pts]. Prove that a kernel satisfying K(x, x) = 0 ∀x ∈ X is identically zero. You can use the Cauchy-Schwarz inequality which states that for all x, x′^ in some vector space X

|〈x, x′〉| ≤ ||x|| · ||x′||,

with equality occurring if and only if x and x′^ are co-linear (i.e. there exists λ such that x′^ = λx).

Homework 1

  1. Positive definiteness does not imply positivity [5 pts]. Give an example of a kernel which is positive definite, but not positive, i.e. it does not satisfy K(x, x′) ≥ 0 ∀x ∈ X.
  2. Classifier performance as a function of kernel and classifier parameters [ pts]. Download the dataset provided on the homework 1 section of the homework page of the course website. In this assignment we will explore the dependence of classifier accuracy on the kernel, kernel parameters, and classifier parameters, using the ridge regression classifier implemented in PyML. You can instantiate a ridge regression classifier as:

from PyML import classifiers rr = classifiers.RidgeRegression(ridge = someValue)

Accuracy can be assessed using cross-validation:

results = rr.cv(data)

where data is a dataset object (see the tutorial on how to read data into PyML). By default a dataset is instantiated with a linear kernel attached to it. To use a different kernel you need to attach a new kernel to the dataset:

from PyML import ker data.attachKernel(ker.Gaussian(gamma = 2.0))

or

from PyML import ker data.attachKernel(ker.Polynomial(degree = 3))

In this question we will consider both the Gaussian and polynomial kernels:

Kgaus = exp(−γ||x − x′||^2 )

Kpoly = (1 + 〈x, x′〉)p Plot the accuracy of the classifier, measured using the success rate and the area under the ROC curve as a function of both the ridge parameter of the classifier, and the free parameter of the kernel function. Show a couple of representative cross sections of this plot for a given value of the ridge parameter, and for a given value of the kernel parameter. Comment on the results.