Introduction to Machine Learning - Homework 2 Practice | CS 591 | Assignments Programming Languages

CS591: Intro to Machine Learning, F’03

Homework 2

Due: Sep 30, 2003

In this assignment you will examine the problem of polynomial regression, that is, of fitting a

polynomial curve to a set of continuous-valued measurements.

Let X={x1,...,xN}be a set of 1-d inputs (i.e., xi∈



) and Y={y1, . . . , yN}be the

corresponding output measurements (also in



). Our job is to find a function f:



→



that

approximates the relationship between Xand Yas well as possible without overfitting. For this

assignment, we’ll choose a polynomial of order kas our function. That is,

fw(x) = w0+w1x+w2x2+···+wkxk

i=0

wixi(1)

where wis the vector of the polynomial coefficients.

For “as well as possible”, we’ll use a squared-error loss function:

L(X,Y) = 1

i=1

(fw(xi)−yi)2(2)

1. Using the vector Pi= [1 xix2

i. . . xk

i]T(for “powers of x”), write Equation 2 in linear

algebraic (matrix) notation. Expand the square on the right hand side of the equation and

comment on the shape (scalar, vector, matrix, and dimensions of each) of each term in the

resulting equation.

2. We wish to find the wthat minimizes (2). Using your result from the previous question,

minimize Lwith respect to wby differentiating, setting equal to zero, and solving. (Hint:

it is simplest if you continue to maintain this in linear algebraic notation and perform the

minimization that way. You may leave your result expressed in linear algebra, including

operations like inverse or trace.) Note that although Pis nonlinear in x,Litself is linear in

w– the terms of Pare simply constants with respect to this minimization.

With the theory out of the way now, we can look at some actual code and data (well, synthetic

data anyway). The following questions can be done in the programming language of your choice,

though I recommend a language such as Matlab or Mathematica that supports linear algebra with

primitive operations. You should turn in a copy of your code with your homework. All plots should

be legible and well formatted; all axes should be labeled and each plot should have some title,

caption, or legend describing its content. Be sure to distinguish discrete points from continuous

functions.

3. Generate a synthetic data set (X,Y)where X={x1,...,x10}are 10 points uniformly

spaced between 0 and 2π(inclusive) and yi= sin(xi) + N(0,0.2). (I.e., yis the concept

“sin(x)”, plus some small Gaussian noise.) Plot Yvs. Xand the curve of the “true” concept.

Introduction to Machine Learning - Homework 2 Practice | CS 591, Assignments of Programming Languages

Related documents

Partial preview of the text

Download Introduction to Machine Learning - Homework 2 Practice | CS 591 and more Assignments Programming Languages in PDF only on Docsity!

Homework 2

Due: Sep 30, 2003

L(X, Y) =

N

∑^ N

N

∑^ N