Design Assignment-Artificial Intelligence-Project Report, Exercises of Artificial Intelligence

This is project report. Project was related to Artificial Intelligence course. It was supervised by Madam Amrita Ahuja at Central University of Jammu and Kashmir. Its main points are: Data, Sets, Experiments, Array, Algorithm, Write­Up, Graphs, Tables, Grading, Classifier, Organization

Typology: Exercises

2011/2012

Uploaded on 07/31/2012

shaina_44kin
shaina_44kin 🇮🇳

3.9

(9)

64 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
6.034 Design Assignment 2
April 5, 2005
Weka Script Due: Friday April 8, in recitation
Paper Due: Wednesday April 13, in class
Oral reports: Friday April 15, by appointment
The goal of this assignment is for you to gain some practice in the application of machine learning
algorithms to real data. We give you two data sets and a framework that will allow you to experiment with
different learning algorithms on that data.
Data Sets
We ask you to build classifiers for two data sets:
1. credit-g-500.arff: This is a two-class data set related to credit rating in Germany. Information on
the data set can be found at the top of the file.
2. digits-2-4-5-9.arff: This is a collection of 14x14 binary images of hand-written digits (2,4,5,9); see
figure. There are 250 samples of each digit. Each image is converted into a feature vector by listing
the content of the array in row-ma jor order.
2 Experiments
We would like you to find an effective learning algorithm for each of these data sets. In order to do so, you
should think about the general strengths and weaknesses of the different learning algorithms, as well as to
experiment with them on the data.
Using the algorithm you choose, generate a classifier, and a prediction of how well it will perform on new
data. We will run that classifier on some additional data and compare its performance to your predicted
performance.
3 Write-up
In your write-up, describe process by which you developed the classifiers you did. You should answer the
following questions, in detail, including supporting data in graphs or tables.
What algorithms are generally expected to be appropriate for these data sets?
1
docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Design Assignment-Artificial Intelligence-Project Report and more Exercises Artificial Intelligence in PDF only on Docsity!

6.034 Design Assignment 2

April 5, 2005

Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment is for you to gain some practice in the application of machine learning algorithms to real data. We give you two data sets and a framework that will allow you to experiment with different learning algorithms on that data.

Data Sets

We ask you to build classifiers for two data sets:

  1. creditg500.arff: This is a twoclass data set related to credit rating in Germany. Information on the data set can be found at the top of the file.
  2. digits2459.arff : This is a collection of 14x14 binary images of handwritten digits (2,4,5,9); see figure. There are 250 samples of each digit. Each image is converted into a feature vector by listing the content of the array in rowma jor order.

2 Experiments

We would like you to find an effective learning algorithm for each of these data sets. In order to do so, you should think about the general strengths and weaknesses of the different learning algorithms, as well as to experiment with them on the data. Using the algorithm you choose, generate a classifier, and a prediction of how well it will perform on new data. We will run that classifier on some additional data and compare its performance to your predicted performance.

3 Writeup

In your writeup, describe process by which you developed the classifiers you did. You should answer the following questions, in detail, including supporting data in graphs or tables.

  • What algorithms are generally expected to be appropriate for these data sets?
  • How did you choose among the different algorithms? Report your chosen algorithm, as well as at least three others that you tried.
  • How did you choose parameter settings for each algorithm? Report the parameters that gave you the best results.
  • How did you come up with a prediction for how well the classifiers you delivered would perform on previously unseen data? Report your prediction.
  • Compare the best performance you got on each data set with the performance if you had picked the class (a) at random (unbiased coin flip) or (b) by always predicting the most prevalent class in the training data.
  • What classifier would you use in the credit data if it were twice as expensive to say that a person with bad credit was going to have good credit, as to say that a person with good credit would have bad credit?
  • What two attributes seem to be most relevant in each data set? Or is it the case that they’re all just about equally significant? Explain how you determined this, and why you think you obtained the answer you did.
  • In the multipleclass digits problem, which two digits are most frequently confused by your classifier. Does that make sense to you?

4 Grading

There will be a late penalty of 20% per day assessed, with no credit given for assignments turned in after the oral report. Grading will be broken down as follows:

30: Good plan for choosing and validating algorithm, parameters, and classifiers 15: How effective are the classifiers on new data 10: How good is the supplied performance prediction 5: Completing the Weka script given at the end of this handout 20: Clarity and organization of written report 20: Clarity and understanding in oral report

5 Software

We ask you to use the Weka environment for machine learning. You can download the software from:

http://www.cs.waikato.ac.nz/~ml/weka/

The software is written in Java and should run under Windows, Linux and Mac. A word of warning: Weka will often run out of memory and need to be restarted, so save results as you go. Within this system, you can find the major algorithms that we’ve studied:

  • K Nearest Neighbor (called IBk in Weka)
  • Decision Trees (called J48 in Weka)
  • Naive Bayes (called Naive Bayes in Weka)
  • SVM (called SMO in Weka)

Click Classify Tab Click Choose in Classifier Pane, under Trees, pick J48 (which is Decision Tree) Click the Percentage Split button under Test Options (this holds out 1/3 of the data for validation; you could instead do crossvalidation) Click Start (always make sure it says Class right above the Start button) Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "J48 C ..."). In the dialog window, change the value of MinNumObj to 1 Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under Bayes, pick NaiveBayes Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under functions, pick SMO (which is an SVM) Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "SMO C ..."). In the dialog window, change the value of gamma to 0.1 and useRBF to True Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under lazy, pick IBk (which is K Nearest Neighbor) Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "IBk K ..."). In the dialog window, change the value of KNN to 3 Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window

The procedure above is for a binary (twoclass) classification problem. Most of the classifiers in Weka have been extended to handle multiple classes, including Naive Bayes. The SMO (SVM) algorithm is inherently for two classes, if you use it on a data set with more than two classes; Weka will build classifiers for each pair of classes (1against1) by default. You can have control of how to build a multiclass classifier from a binary classifier by using the Weka “Meta” classifier MultiClassClassifier (see below).

Choose Preprocess Tab Open File Click Classify Tab Click Choose in the Classifier Pane, under meta, pick MultiClassClassifier Click on the Classifier pane (where it says "MultiClassClassifier M ..."). In the dialog window, the method pulldown allows you to choose 1againstall or 1against1 (and some other options). Pick 1againstall. In the same dialog window, click Choose in the Classifier entry Under functions, pick SMO Click OK Click on the Classifier pane (where it says "MultiClassClassifier M ..."). In the dialog window, click the Classifier pane (where it says "SMO...") Change the value of C to 10.

Click OK, Click OK Click Start (make sure it says Class right above the Start button) Report the line "Correctly Classified Instances" in the Classifier Output window

Note that you can Right click on each of the entries in Results list and choose to save the output to a file. To select a subset of the features (attributes) for the vehicle.arff dataset:

Choose Select attributes Tab Leave default choices for Attribute Evaluator and Search Method Click Use full training set button Click Start Report the line Selected attributes Click the Preprocess Tab On the Attributes, click on the attributes that were NOT selected Make sure that you DON’T click on the Class attribute. Click on the Remove button Repeat the MultiClassClassifier operation we described above.

Here’s example of comparing multiple algorithms on a dataset:

Go to Weka GUI Chooser window Click Experimenter Click Setup Tab at the top of the new window Click New Pick a name for an ARFF File under Results Destination Make sure Experiment Type is Crossvalidation and Number of Folds is 10 Click Add new under Datasets Click Add new under Algorithms Click Choose, under bayes, pick NaiveBayes, click OK Click Choose, under trees, pick J48, click OK Click Choose, under lazy, pick IBk, set KNN to 3, click OK Click Choose, under functions, pick SMO, set C to 10.0, gamma to 0.1 and useRBF to True, click OK Click the Run Tab at the top of the window Click Start (wait till it says Finished) Click Analyse Tab Click Experiment button Click Perform test button Click Save output Report the performance of the methods

Visualization

Weka has tools for helping you understand the classifiers you’re learning. If you rightclick on a classifier result, you get a menu of options. Visualize tree: (available only for classifiers that build trees) will show the tree in a new window. If you resize the window, and then rightclick (or optionclick) in the window, you can resize the tree to fit in the window. Visualize classifier errors: correctly classified instances are represented by crosses, errors by squares. You can pick which attributes to use on the X and Y axes. Visualize margin curve: The margin is the difference between the probability predicted for the actual class and the highest probability predicted for the other classes. (So, for a single class, if it is predicted to