Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CMPS 242 Homework: Decision Trees, Perceptron Algorithm and Logistic Regression, Assignments of Computer Science

The second homework assignment for cmps 242 course, focusing on decision trees, perceptron algorithm, and logistic regression. The assignment includes constructing a small boolean dataset for decision trees, implementing the perceptron algorithm, and understanding the equivalence between logistic regression and log-odds. Recommended exercises are also provided.

Typology: Assignments

Pre 2010

Uploaded on 08/19/2009

koofers-user-kn7
koofers-user-kn7 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download CMPS 242 Homework: Decision Trees, Perceptron Algorithm and Logistic Regression and more Assignments Computer Science in PDF only on Docsity! CMPS 242 Second Homework, Winter 2009 3 Problems, due start of class Tuesday Feb. 3 This homework in to be done in groups of 2-3. Each group members should completely understand the group’s solutions and must acknowledge all sources of inspiration, techniques, and/or helpful ideas (web, people, books, etc.) other than the instructor and class text. 1. Construct a small data set with boolean features where the greedy (univariate) decision tree procedure using the impurity heuristic (Equation 9.8 in Alpaydin) fails to find a smallest decision tree. Show the optimal tree as well as the tree resulting from the greedy top-down construction procedure. (Hint: my solution has three features, and the optimal tree uses two of them. You may repeat examples in the training set) 2. Perceptron algorithm. Implement the Perceptron algorithm presented in class in 2 dimensions and perform the following experiments where concept C is defined by C(x) = +1 if x1 + x2 > 0 and C(x) = −1 otherwise. Experiment 1: Generate a 10 example training set by picking points uniformly at random from the unit circle and generating labels (y-values) according to C. Calculate the gap of the best homogeneous separating line (this gap is the distance between the separating line and the closest example, and the best separating line is not likely to be C’s decision boundary). Run the Perceptron algorithm and note how many ”mistakes” it makes before finding a consistent hypothesis, and how many iterations through the data are required before it finds a consistent hypotheses. Perform experiment 1 ten times and sort them by the gap. Do you see a relationship between the gaps and the number of iterations of number of mistakes made? Experiment 2: Generate a 100 example training set by picking points uniformly at random from the unit circle, with noisy labels. For each example x = (x1, x2) in the training set, generate a random number r in [0, 1]. If x1 + x2 + 2r − 1 > 0 then set the label of x to 1. If x1 + x2 + 2r − 1 ≤ 0 then set the label of x to −1. Also generate two 100 example test sets, one the same way and one without noise. This generates a noisy version of C where the noise tends to be concentrated around the decision boundary. Run the following version of the perceptron algorithm for 500 iterations where each iteration uses a random point from the training set (rather than cycling through the training set) and save the weight vector wi after each iteration i. Compare how well the following prediction rules perform on the test sets.