Introduction to Machine Learning - Homework 2 Practice | CS 591, Assignments of Programming Languages

Material Type: Assignment; Class: ST: Prog Analy &Mechanization; Subject: Computer Science; University: University of New Mexico; Term: Fall 2003;

Typology: Assignments

Pre 2010

Uploaded on 07/23/2009

koofers-user-8ga-1
koofers-user-8ga-1 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS591: Intro to Machine Learning, F’03
1
Homework 2
Due: Sep 30, 2003
In this assignment you will examine the problem of polynomial regression, that is, of fitting a
polynomial curve to a set of continuous-valued measurements.
Let X={x1,...,xN}be a set of 1-d inputs (i.e., xi
) and Y={y1, . . . , yN}be the
corresponding output measurements (also in
). Our job is to find a function f:
that
approximates the relationship between Xand Yas well as possible without overfitting. For this
assignment, we’ll choose a polynomial of order kas our function. That is,
fw(x) = w0+w1x+w2x2+···+wkxk
=
k
X
i=0
wixi(1)
where wis the vector of the polynomial coefficients.
For “as well as possible”, we’ll use a squared-error loss function:
L(X,Y) = 1
N
N
X
i=1
(fw(xi)yi)2(2)
1. Using the vector Pi= [1 xix2
i. . . xk
i]T(for “powers of x”), write Equation 2 in linear
algebraic (matrix) notation. Expand the square on the right hand side of the equation and
comment on the shape (scalar, vector, matrix, and dimensions of each) of each term in the
resulting equation.
2. We wish to find the wthat minimizes (2). Using your result from the previous question,
minimize Lwith respect to wby differentiating, setting equal to zero, and solving. (Hint:
it is simplest if you continue to maintain this in linear algebraic notation and perform the
minimization that way. You may leave your result expressed in linear algebra, including
operations like inverse or trace.) Note that although Pis nonlinear in x,Litself is linear in
w the terms of Pare simply constants with respect to this minimization.
With the theory out of the way now, we can look at some actual code and data (well, synthetic
data anyway). The following questions can be done in the programming language of your choice,
though I recommend a language such as Matlab or Mathematica that supports linear algebra with
primitive operations. You should turn in a copy of your code with your homework. All plots should
be legible and well formatted; all axes should be labeled and each plot should have some title,
caption, or legend describing its content. Be sure to distinguish discrete points from continuous
functions.
3. Generate a synthetic data set (X,Y)where X={x1,...,x10}are 10 points uniformly
spaced between 0 and 2π(inclusive) and yi= sin(xi) + N(0,0.2). (I.e., yis the concept
sin(x)”, plus some small Gaussian noise.) Plot Yvs. Xand the curve of the “true” concept.
pf2

Partial preview of the text

Download Introduction to Machine Learning - Homework 2 Practice | CS 591 and more Assignments Programming Languages in PDF only on Docsity!

CS591: Intro to Machine Learning, F’03 1

Homework 2

Due: Sep 30, 2003

In this assignment you will examine the problem of polynomial regression, that is, of fitting a polynomial curve to a set of continuous-valued measurements. Let X = {x 1 ,... , xN } be a set of 1-d inputs (i.e., xi ∈

) and Y = {y 1 ,... , yN } be the corresponding output measurements (also in

). Our job is to find a function f :

that approximates the relationship between X and Y as well as possible without overfitting. For this assignment, we’ll choose a polynomial of order k as our function. That is,

fw(x) = w 0 + w 1 x + w 2 x^2 + · · · + wkxk

=

∑^ k

i=

wixi^ (1)

where w is the vector of the polynomial coefficients. For “as well as possible”, we’ll use a squared-error loss function:

L(X, Y) =

N

∑^ N

i=

(fw(xi) − yi)^2 (2)

  1. Using the vector Pi = [1 xi x^2 i... xki ]T^ (for “powers of x”), write Equation 2 in linear algebraic (matrix) notation. Expand the square on the right hand side of the equation and comment on the shape (scalar, vector, matrix, and dimensions of each) of each term in the resulting equation.
  2. We wish to find the w that minimizes (2). Using your result from the previous question, minimize L with respect to w by differentiating, setting equal to zero, and solving. (Hint: it is simplest if you continue to maintain this in linear algebraic notation and perform the minimization that way. You may leave your result expressed in linear algebra, including operations like inverse or trace.) Note that although P is nonlinear in x, L itself is linear in w – the terms of P are simply constants with respect to this minimization.

With the theory out of the way now, we can look at some actual code and data (well, synthetic data anyway). The following questions can be done in the programming language of your choice, though I recommend a language such as Matlab or Mathematica that supports linear algebra with primitive operations. You should turn in a copy of your code with your homework. All plots should be legible and well formatted; all axes should be labeled and each plot should have some title, caption, or legend describing its content. Be sure to distinguish discrete points from continuous functions.

  1. Generate a synthetic data set (X, Y) where X = {x 1 ,... , x 10 } are 10 points uniformly spaced between 0 and 2 π (inclusive) and yi = sin(xi) + N(0, 0 .2). (I.e., y is the concept “sin(x)”, plus some small Gaussian noise.) Plot Y vs. X and the curve of the “true” concept.

CS591: Intro to Machine Learning, F’03 2

  1. Write a program that implements the least-squares solution of w and use it to find the best- fit order-k polynomial for k ∈ { 0 ,... , 9 }. For each value of k, plot X, Y, the true concept curve, and the curve generated by your w polynomial. Also, print the values of w and the L 2 norm of w, ‖w‖ = (wTw)^12. Finally, plot ‖w‖ vs. k. What do you observe about the quality of the fitted curve, the order of the polynomial (k) and the norm of w?

Let us now examine the generalization error of the functions that you locate in this way. Gen- eralization error is the average error incurred on data that you have not seen before. In a theoretical sense, it is the integral of L() between the true curve and the functional fit that you generated to it (i.e., between the two curves you plotted in Question 4). In practice, however, you never have direct access to the true curve; all you can get is a sample of the curve, as you did in Question 3.

  1. Generate 20 new synthetic data sets, each of N = 100 points according to the recipe given in Question 3. For each data set, compute the loss between the data labels and the functional approximation using the coefficients that you generated in Question 4. I.e., do not generate new w’s from the new data sets you generate in this question; use only the w’s that you generated in Question 4 (the training data) to estimate the outputs the new data sets that you generate in this question (the test data). Plot the mean loss against k. On the same plot, display the loss of the training data — i.e., the loss of the function estimator on the data used to set its weights. Comment on the relative shapes of these curves and the relation to the norm of w, as you examined in Question 4.

Let us modify the loss function (2) with a regularization term that penalizes extreme values of ‖w‖:

Lr (X, Y) =

N

∑^ N

i=

(fw(xi) − yi)^2 +

λ 2

‖w‖^2 (3)

where λ is a parameter that controls the relative strengths of the error and regularization terms.

  1. Solve for the w that minimizes the regularized loss of Eq. 3.
  2. Repeat questions 4 and 5, this time holding k constant at 9 and varying

λ ∈ { 0 , 10 −^50 , 10 −^40 , 10 −^30 , 10 −^20 , 10 −^10 , 10 −^1 , 1 }

. What do you observe about the effects of λ on the learned model, ‖w‖, and the generaliza- tion error rate? Comment on the effects of the regularization term. Can you suggest another reasonable form of regularization for this system?