Classes Part 1-Artificial Intelligence-Quiz, Exercises of Artificial Intelligence

Madam Amrita Ahuja took this quiz in class of Artificial Intelligence at Central University of Jammu and Kashmir. This quiz involves: Learning, Hypothesis, Classes, Real­Valued, Inputs, Separators, Trees, Linear, Kernel, Gaussian, Svm

Typology: Exercises

2011/2012

Uploaded on 07/31/2012

shaina_44kin
shaina_44kin 🇮🇳

3.9

(9)

64 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A
FE
D
CB
5 Learning hypothesis classes (16 points)
Consider a classification problem with two real-valued inputs. For each of the following
algorithms, specify all of the separators below that it could have generated and explain why.
If it could not have generated any of the separators, explain why not.
1. 1-nearest neighbor
2. decision trees on real-valued inputs
13
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download Classes Part 1-Artificial Intelligence-Quiz and more Exercises Artificial Intelligence in PDF only on Docsity!

A

D E F

B C

5 Learning hypothesis classes (16 points)

Consider a classification problem with two realvalued inputs. For each of the following algorithms, specify all of the separators below that it could have generated and explain why. If it could not have generated any of the separators, explain why not.

  1. 1nearest neighbor
  2. decision trees on realvalued inputs
  1. standard perceptron algorithm
  2. SVM with linear kernel
  3. SVM with Gaussian kernel (σ = 0.25)
  4. SVM with Gaussian kernel (σ = 1)
  5. neural network with no hidden units and one sigmoidal output unit, run until conver gence of training error
  6. neural network with 4 hidden units and one sigmoidal output unit, run until conver gence of training error

7 SVMs (12 points)

Assume that we are using an SVM with a polynomial kernel of degree 2. You are given the following support vectors:

x 1 x 2 y 1 2 + 1 2 1

The α values for each of these support vectors are equal to 0.05.

  1. What is the value of b? Explain your approach to getting the answer.
  2. What value does this SVM compute for the input point (1, 3)

8 Neural networks (18 points)

A physician wants to use a neural network to predict whether patients have a disease, based on the results of a battery of tests. He has assigned a cost of c 01 to false positives (generating an output of 1 when it ought to have been 0), and a cost of c 10 to generating an output of 0 when it ought to have been 1. The cost of a correct answer is 0. The neural network is just a single sigmoid unit, which computes the following function:

g(¯x ) = s(w¯ ·x¯)

with s(z) being the usual sigmoid function.

  1. Give an error function for the whole training set, E(w ¯) that implements this error metric, for example, for a training set of 20 cases, if the network predicts 1 for 5 cases that should have been 0, predicts 0 for 3 cases that should have been 1 and predicts another 12 correctly, the value of the error function should be: 5c 01 + 3c 10.
  2. Would this be an appropriate error criterion to use for a neural network? Why or why not?

4 Machine Learning — Continuous Features (20 points)

In all the parts of this problem we will be dealing with onedimensional data, that is, a set of points (xi) with only one feature (called simply x). The points are in two classes given by the value of yi^. We will show you the points on the x axis, labeled by their class values; we also give you a table of values.

4.1 Nearest Neighbors

i xi^ yi 1 1 0 2 2 1 3 3 1 4 4 0 5 6 1 6 7 1 7 10 0 8 11 1

  1. In the figure below, draw the output of a 1NearestNeighbor classifier over the range indicated in the figure.
  1. In the figure below, draw the output of a 5NearestNeighbor classifier over the range indicated in the figure.

4.3 Neural Nets

Assume that each of the units of a neural net uses one of the the following output functions of the total activation (instead of the usual sigmoid s(z))

  • Linear: This just outputs the total activation:

l(z) = z

  • NonLinear: This looks like a linearized form of the usual sigmoid funtion:

f (z) = 0 if z < − 1 f (z) = 1 if z > 1 f (z) = 0 .5(z + 1) otherwise

Consider the following output from a neural net made up of units of the types described above.

  1. Can this output be produced using only linear units? Explain.
  2. Construct the simplest neural net out of these two type of units that would have the output shown above. When possible, use weights that have magnitude of 1. Label each unit as either Linear or NonLinear.

� �

5 Machine Learning (20 points)

Grady Ent decides to train a single sigmoid unit using the following error function:

1 E(w) = (y(xi^ , w) − y i∗)^2 +

β wj^2 (^2) i 2 j

where y(xi^ , w) = s(x^ i w) with s(z) = 1+

1 · e−z being our usual sigmoid function.

  1. Write an expression for ∂E ∂wj.^ Your^ answer^ should^ not^ involve^ derivatives.
  2. What update should be made to weight wj given a single training example < x, y∗^ >. Your answer should not involve derivatives.

  1. Here are two graphs of the output of the sigmoid unit as a function of a single feature x. The unit has a weight for x and an offset. The two graphs are made using different values of the magnitude of the weight vector (�w�^2 = (^) j wj^2 ).

Which of the graphs is produced by the larger �w�^2? Explain.

  1. Why might penalizing large �w�^2 , as we could do above by choosing a positive β, be desirable?
  2. How might Grady select a good value for β for a particular classification problem?
  1. The score is the percentage correct of the tree, computed on the training set, minus a constant C times the number of nodes in the tree. C is chosen in advance by running this algorithm (grow a large tree then prune in order to maximize percent correct minus C times number of nodes) for many different values of C, and choosing the value of C that minimizes trainingset error.
  2. The score is the percentage correct of the tree, computed on the training set, minus a constant C times the number of nodes in the tree. C is chosen in advance by running crossvalidation trials of this algorithm (grow a large tree then prune in order to maximize percent correct minus C times number of nodes) for many different values of C, and choosing the value of C that minimizes crossvalidation error.

Problem 4: Learning (25 points)

Part A: (5 Points)

Since the cost of using a nearest neighbor classifier grows with the size of the training set, sometimes one tries to eliminate redundant points from the training set. These are points whose removal does not affect the behavior of the classifier for any possible new point.

  1. In the figure below, sketch the decision boundary for a 1-nearest-neighbor rule and circle the redundant points.
  1. What is the general condition(s) required for a point to be declared redundant for a 1- nearest-neighor rule? Assume we have only two classes (+, -). Restating the definition of redundant ("removing it does not change anything") is not an acceptable answer. Hint
  • think about the neighborhood of redundant points.

Part C: (10 Points)

X Y

In this network, all the units are sigmoid except unit 5 which is linear (its output is simply the weighted sum of its inputs). All the bias weights are zero. The dashed connections have weights of -1, all the other connections (solid lines) have weights of 1.

  1. Given X=0 and Y=0, what are the output values of each of the units? Unit 1 = Unit 2 = Unit 3 = Unit 4 = Unit 5 =
  2. What are the δ values for each unit (as computed by backpropagation defined for squared error) assume that the desired output for the network is 4. Unit 1 = Unit 2 = Unit 3 = Unit 4 = Unit 5 =
  3. What would be the new value of the weight connecting units 2 and 3 assuming that the learning rate for backpropagation is set to 1?

Part D: (10 Points)

  1. Consider the simple one-dimensional classification problem shown below. Imagine attacking this problem with an SVM using a radial-basis function kernel. Assume that we want the classifier to return a positive output for the + points and a negative output for the – points.

Draw a plausible classifier output curve for a trained SVM, indicating the classifier output for every feature value in the range shown. Do this twice, once assuming that the standard deviation (σ) is very small relative to the distance between adjacent training points and again assuming that the standard deviation (σ) is about double the distance between adjacent training points.

Small standard deviation (σ):

SVM output

Feature value

Large standard deviation (σ):

SVM output

Feature value