Assignment I Questions - Machine Learning and Data Mining | CS 434, Assignments of Computer Science

Material Type: Assignment; Class: MACHINE LEARNING AND DATA MINING; Subject: Computer Science; University: Oregon State University; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 08/31/2009

koofers-user-0ro
koofers-user-0ro 🇺🇸

9 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Cs434 Assignment 1
Due: Monday Oct 13th in class
Part I. (14pts)
Through out the course, Weka will be a very useful tool for you to explore different
machine learning and data mining algorithms. The purpose of this part of the assignment
is to familiarize you with this software package. For this assignment, please download
and install the Weka software package (version 3.4) from
http://www.cs.waikato.ac.nz/ml/weka/
If you don’t have access to a computer for this purpose, please inform the instructor and
special arrangement can be made to accommodate your need.
Basic information on this software package can be found in the following tutorial:
http://easynews.dl.sourceforge.net/sourceforge/weka/ExplorerGuide-3.4.pdf
Note that there are many documents available on the Weka webpage introducing different
aspects of Weka. For example, another useful thing to read is the following document,
which describes the input format for Weka, i.e., the “arff” (attribute relation file format)
format.
http://www.cs.waikato.ac.nz/~ml/weka/arff.html
Please use Weka to explore the “iris” data set that comes with the software. To open this
data set, choose “explorer” from the Weka GUI chooser, which opens a panel with
several tabs. Select the “preprocess” tab and click “Open file”, then click on the “data”
folder, choose the “iris.arff” file.
With the help of the Weka software, answer the following questions:
1. How many classes there are in this data set? (2pts)
2. How many (non-class) attributes there are? (2pts)
3. What are the mean and standard deviation for each attribute? (2pts)
4. If you were to choose only one attribute to build your classifier, which attribute
should you choose? (4pts)
5. Which pair of attributes provides the best discrimination among classes? (4pts)
(Suggestion: use the visualization tool to look for good separations among classes.)
Note that questions 4 and 5 are subjective; please provide your reasons for the answers.
Reasons could be describe in words and/or shown through figures.
pf3

Partial preview of the text

Download Assignment I Questions - Machine Learning and Data Mining | CS 434 and more Assignments Computer Science in PDF only on Docsity!

Cs434 Assignment 1 Due: Monday Oct 13th^ in class

Part I. (14pts) Through out the course, Weka will be a very useful tool for you to explore different machine learning and data mining algorithms. The purpose of this part of the assignment is to familiarize you with this software package. For this assignment, please download and install the Weka software package (version 3.4) from http://www.cs.waikato.ac.nz/ml/weka/

If you don’t have access to a computer for this purpose, please inform the instructor and special arrangement can be made to accommodate your need.

Basic information on this software package can be found in the following tutorial: http://easynews.dl.sourceforge.net/sourceforge/weka/ExplorerGuide-3.4.pdf

Note that there are many documents available on the Weka webpage introducing different aspects of Weka. For example, another useful thing to read is the following document, which describes the input format for Weka, i.e., the “arff” (attribute relation file format) format. http://www.cs.waikato.ac.nz/~ml/weka/arff.html

Please use Weka to explore the “iris” data set that comes with the software. To open this data set, choose “explorer” from the Weka GUI chooser, which opens a panel with several tabs. Select the “preprocess” tab and click “Open file”, then click on the “data” folder, choose the “iris.arff” file.

With the help of the Weka software, answer the following questions:

  1. How many classes there are in this data set? (2pts)
  2. How many (non-class) attributes there are? (2pts)
  3. What are the mean and standard deviation for each attribute? (2pts)
  4. If you were to choose only one attribute to build your classifier, which attribute should you choose? (4pts)
  5. Which pair of attributes provides the best discrimination among classes? (4pts) (Suggestion: use the visualization tool to look for good separations among classes.)

Note that questions 4 and 5 are subjective; please provide your reasons for the answers. Reasons could be describe in words and/or shown through figures.

Part II

  1. Below is a set of 2-d data points, with black dots representing positive class and red dots representing negative class. The blue line segments show the Voronoi diagram of these points. (14pts) a. What is the training error of 1-nearest neighbor? (2pts) b. What is the training error of 3-nearest neighbor? (4pts) c. Please mark out the 1-nearest neighbor decision boundary for this data set, which should be a subset of the blue line segments. (4pts) d. Now consider 3-nearest neighbor, true or false : the decision boundary of 3-NN is also formed by a subset of these blue line segments, but a different subset from the answer of (a). Explain your answer. (4 pts)