



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Class: MACHINE LEARNING AND DATA MINING; Subject: Computer Science; University: Oregon State University; Term: Unknown 1989;
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




In part I, you will use WEKA to analyze the two artificial data sets we generated and one real data set. You will apply the learning algorithms we learned to each data set and compare their performance.
br data files: br-test.arff br test data file br-train.arff br training data file
hw2-1 data files hw2-1-10.arff 10 training examples hw2-1-20.arff 20 training examples hw2-1-50.arff 50 training examples hw2-1-100.arff 100 training examples hw2-1-200.arff 200 training examples hw2-1-test.arff test data file
hw2-2 data files hw2-2-25.arff 25 training examples hw2-2-50.arff 50 training examples hw2-2-100.arff 100 training examples hw2-2-200.arff 200 training examples hw2-2-600.arff 600 training examples hw2-2-test.arff test data file
In case you are curious, here is how we generated the two synthetic data sets. The data set hw2-1 is generated from two Gaussian distributions. One is centered as (1,0) and the other at (0,1). Both have the same co-variance matrix: [ 2 0 ] [ 0 1 ]
hw2-2 is generated as follows. The x coordinate is generated from an exponential distribution with parameter 1.0. The y coordinate is generated from a uniform random distribution in the interval [0,1]. The class is assigned as follows. If (x > 0.5), the example belongs to the positive class, otherwise to the negative class. However, the class label is flipped with probability 0.1 (so-called "10% label noise").
br is a hand written letter data set that contains letter b and r. Each example is described by 16 attributes corresponding to 16 pixels of a 4 by 4 image.
You will run the learning algorithms on each training data file and evaluate the results on the corresponding test data files.
N Method1 Method2 Method
hw2-1: 10 xxx yyy zzz 20 xxx yyy zzz 50 xxx yyy zzz 100 xxx yyy zzz 200 xxx yyy zzz
hw2-2: 25 xxx yyy zzz 50 xxx yyy zzz 100 xxx yyy zzz 200 xxx yyy zzz 600 xxx yyy zzz
br: 614 xxx yyy zzz
Where xxx, yyy, zzz give the error rates of each method on the test data. (Use “Supplied test set” for “Test Option” in the classify tab)
x1 <= 1.0: positive (75.0/17.0) x1 > 1. | x2 <= 5.0: negative (42.0/12.0) | x2 > 5.0: positive (33.0/10.0)
The first line indicates a split on feature x1 with threshold 1.0. The first branch leads to a leaf labeled "positive". The numbers in parentheses indicate that this
Once we have chosen an algorithm, it will be listed next to the “choose” button with its default parameter choices. To change these choices, click on it, you will be given an interface to modify parameters. Click the “More” button to get more information about the parameters. After setting parameters, click ok. Now we are ready to run the algorithm. Make sure you have the right test option and then click on the "Start" button, and the Classifier Output window will show the output from the classifier. This output consists of several sections:
Probability
1. (10pts) We have two identical bags. Bag A contains 4 red marbles and 6 black marbles and bag B contains 5 red marbles and 5 black marbles. Now we random chose a bag and drew a marble from the chosen bag and it turns out to be black. What is the probability that the chosen bag is bag A?
Decision tree
The task is to build a decision tree for classifying Y. (a) Compute the information gain of attributes X, V and W respectively. (b) Use information gain for selecting test and produce the full decision tree generated by the top-down greedy algorithm described in class. (Stopping criterion: stop if all the instances belong to the same class.) (c) Considering the following two strategies for avoid over-fitting. i. The first strategy stops growing the tree when the information gain of the best test is less than a given threshold ε. ii. The second strategy grows the full tree first and then prunes the tree bottom- up: start from the lowest level of the tree and prune a sub-tree if the information gain of the test is less than a given threshold ε. (Note that you should stop checking level t if none of sub-trees at level t+1 satisfies the pruning criterion. Let ε be 0.001 for both cases, write down the resulting tree for each strategy and compare their training errors. (d) Discuss the advantages and disadvantages of each of these two strategies.