

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Problem set 4 for cs 4803b/8803b: pattern recognition course. Students are required to compare two non-parametric classification techniques - parzen window and k-nearest neighbor - using the same dataset. They will estimate the free parameters using leave-one-out validation and plot the evidence curves to determine the optimal values. After selecting the best h for parzen window and k for k-nearest neighbor, students will design minimum error rate classifiers and evaluate their performance on the test data.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Problem Set 4
For the first two problems of this problem set you will compare two different non-parametric classification techniques in the same data set. The training and testing data can be obtained from the class Web page (files PS4 1train.mat and PS4 1test.mat) These are two classes with equal prior probabilities and scalar-valued features. Each classification technique has a free parameter which you will estimate via the technique of leave-one-out validation (to be explained).
(a) Break the training data into two parts A and B , where B contains only a single sample. This is what makes the method “leave-one-out” validation. (b) Compute the Parzen window density estimate p (^) n (x) from A only. (c) Compute log (pn (B )), the log-probability of the sample you held out. (d) Iterate back to part (a) and break the training data up in a different way. Do this for every possible choice of B. The average of all of the log-probabilities you get in part (c) is defined to be v (h).
For each class, plot the evidence curve. Pick three values of h from different parts of the curve and plot the estimated density using each. What would your intuition say is the best h? How well does the evidence curve match your intuition? Now design a minimum error rate classifier using the h that maximizes v (h) for each class. (Remember, minimum error rate would mean posterior is greatest. With equal priors, simply choose class with greater likelihood - right?) What is its performance on the test data?
(a) Break the training data into two parts A and B , where B contains only a single sample. (b) Determine if B is classified correctly by its k -nearest neighbors from A.
(c) Iterate back to part (a) and break the training data up in a different way. Do this for every possible choice of B. The number of correct classifications in part (b) is defined to be v (k ).
Plot the evidence curve for odd values of k from 1 to 19. Pick three values of k from different parts of the curve and plot the resulting discriminant “function” over the scalar feature space (this function should be 1 when choosing class 1 and 1 otherwise). What would your intuition say is the best k? Design a minimum error rate classifier using the k that maximizes v (k ). What is its performance on the test data?
(a) Following the method given in the handout from Duda & Hart, use the sample data to construct SW and SB. To make a projection matrix W , you' ll need to find generalized eigenvectors and eigenvalues. Use Matlab's function [aa,bb,q,z,v] = qz(a,b). Some hints about qz are in order: the i th^ eigenvector appears as column i in v; and the corresponding eigenvalue is the ratio aa(i; i)=bb(i; i). Due to tiny numerical errors, Matlab may return some complex numbers. If imag(z)real(z), just use the real part. Now that you have a projection matrix W , project the 30D data down to a c 1 = 2 D space, and make a scatter plot of that 2D data, with a different symbol for each class. (b) Assume the 2D data is Gaussian and use it to compute individual covariance matrices K (^) i for each class. Now use them to make a quadratic classifier for the classes; we want to see the algebraic form of each classifier with numerical values for K (^) i and mi. Show how well the classifier handles the input data sets. How many samples does it misclassify? (c) Finally, assume all covariances are the same K = K (^) i and make a linear classifier (boy yhis goes way back in the class). Again, show the algebraic form and how well it works. How much worse is the linear classifier?
Scenario 1 A cattle rancher wants to have a non-invasive system of recognizing cows by the patterns on their backs. He has a few hundred cows, and they will walk under an archway where a photo is taken of their back from a camera mounted above. You are given only