
CS591: Intro to Machine Learning, F’03
1
Homework 2
Due: Sep 30, 2003
In this assignment you will examine the problem of polynomial regression, that is, of fitting a
polynomial curve to a set of continuous-valued measurements.
Let X={x1,...,xN}be a set of 1-d inputs (i.e., xi∈
) and Y={y1, . . . , yN}be the
corresponding output measurements (also in
). Our job is to find a function f:
→
that
approximates the relationship between Xand Yas well as possible without overfitting. For this
assignment, we’ll choose a polynomial of order kas our function. That is,
fw(x) = w0+w1x+w2x2+···+wkxk
=
k
X
i=0
wixi(1)
where wis the vector of the polynomial coefficients.
For “as well as possible”, we’ll use a squared-error loss function:
L(X,Y) = 1
N
N
X
i=1
(fw(xi)−yi)2(2)
1. Using the vector Pi= [1 xix2
i. . . xk
i]T(for “powers of x”), write Equation 2 in linear
algebraic (matrix) notation. Expand the square on the right hand side of the equation and
comment on the shape (scalar, vector, matrix, and dimensions of each) of each term in the
resulting equation.
2. We wish to find the wthat minimizes (2). Using your result from the previous question,
minimize Lwith respect to wby differentiating, setting equal to zero, and solving. (Hint:
it is simplest if you continue to maintain this in linear algebraic notation and perform the
minimization that way. You may leave your result expressed in linear algebra, including
operations like inverse or trace.) Note that although Pis nonlinear in x,Litself is linear in
w– the terms of Pare simply constants with respect to this minimization.
With the theory out of the way now, we can look at some actual code and data (well, synthetic
data anyway). The following questions can be done in the programming language of your choice,
though I recommend a language such as Matlab or Mathematica that supports linear algebra with
primitive operations. You should turn in a copy of your code with your homework. All plots should
be legible and well formatted; all axes should be labeled and each plot should have some title,
caption, or legend describing its content. Be sure to distinguish discrete points from continuous
functions.
3. Generate a synthetic data set (X,Y)where X={x1,...,x10}are 10 points uniformly
spaced between 0 and 2π(inclusive) and yi= sin(xi) + N(0,0.2). (I.e., yis the concept
“sin(x)”, plus some small Gaussian noise.) Plot Yvs. Xand the curve of the “true” concept.