



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This is project report. Project was related to Artificial Intelligence course. It was supervised by Madam Amrita Ahuja at Central University of Jammu and Kashmir. Its main points are: Data, Sets, Experiments, Array, Algorithm, WriteUp, Graphs, Tables, Grading, Classifier, Organization
Typology: Exercises
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Weka Script Due: Friday April 8, in recitation Paper Due: Wednesday April 13, in class Oral reports: Friday April 15, by appointment The goal of this assignment is for you to gain some practice in the application of machine learning algorithms to real data. We give you two data sets and a framework that will allow you to experiment with different learning algorithms on that data.
We ask you to build classifiers for two data sets:
We would like you to find an effective learning algorithm for each of these data sets. In order to do so, you should think about the general strengths and weaknesses of the different learning algorithms, as well as to experiment with them on the data. Using the algorithm you choose, generate a classifier, and a prediction of how well it will perform on new data. We will run that classifier on some additional data and compare its performance to your predicted performance.
In your writeup, describe process by which you developed the classifiers you did. You should answer the following questions, in detail, including supporting data in graphs or tables.
There will be a late penalty of 20% per day assessed, with no credit given for assignments turned in after the oral report. Grading will be broken down as follows:
30: Good plan for choosing and validating algorithm, parameters, and classifiers 15: How effective are the classifiers on new data 10: How good is the supplied performance prediction 5: Completing the Weka script given at the end of this handout 20: Clarity and organization of written report 20: Clarity and understanding in oral report
We ask you to use the Weka environment for machine learning. You can download the software from:
http://www.cs.waikato.ac.nz/~ml/weka/
The software is written in Java and should run under Windows, Linux and Mac. A word of warning: Weka will often run out of memory and need to be restarted, so save results as you go. Within this system, you can find the major algorithms that we’ve studied:
Click Classify Tab Click Choose in Classifier Pane, under Trees, pick J48 (which is Decision Tree) Click the Percentage Split button under Test Options (this holds out 1/3 of the data for validation; you could instead do crossvalidation) Click Start (always make sure it says Class right above the Start button) Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "J48 C ..."). In the dialog window, change the value of MinNumObj to 1 Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under Bayes, pick NaiveBayes Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under functions, pick SMO (which is an SVM) Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "SMO C ..."). In the dialog window, change the value of gamma to 0.1 and useRBF to True Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click Choose in the Classifier Pane, under lazy, pick IBk (which is K Nearest Neighbor) Click Start Report the line "Correctly Classified Instances" in the Classifier Output window Click on the Classifier pane (where it says "IBk K ..."). In the dialog window, change the value of KNN to 3 Click OK Click Start Report the line "Correctly Classified Instances" in the Classifier Output window
The procedure above is for a binary (twoclass) classification problem. Most of the classifiers in Weka have been extended to handle multiple classes, including Naive Bayes. The SMO (SVM) algorithm is inherently for two classes, if you use it on a data set with more than two classes; Weka will build classifiers for each pair of classes (1against1) by default. You can have control of how to build a multiclass classifier from a binary classifier by using the Weka “Meta” classifier MultiClassClassifier (see below).
Choose Preprocess Tab Open File
Click OK, Click OK Click Start (make sure it says Class right above the Start button) Report the line "Correctly Classified Instances" in the Classifier Output window
Note that you can Right click on each of the entries in Results list and choose to save the output to a file. To select a subset of the features (attributes) for the vehicle.arff dataset:
Choose Select attributes Tab Leave default choices for Attribute Evaluator and Search Method Click Use full training set button Click Start Report the line Selected attributes Click the Preprocess Tab On the Attributes, click on the attributes that were NOT selected Make sure that you DON’T click on the Class attribute. Click on the Remove button Repeat the MultiClassClassifier operation we described above.
Here’s example of comparing multiple algorithms on a dataset:
Go to Weka GUI Chooser window Click Experimenter Click Setup Tab at the top of the new window Click New Pick a name for an ARFF File under Results Destination Make sure Experiment Type is Crossvalidation and Number of Folds is 10 Click Add new under Datasets
Weka has tools for helping you understand the classifiers you’re learning. If you rightclick on a classifier result, you get a menu of options. Visualize tree: (available only for classifiers that build trees) will show the tree in a new window. If you resize the window, and then rightclick (or optionclick) in the window, you can resize the tree to fit in the window. Visualize classifier errors: correctly classified instances are represented by crosses, errors by squares. You can pick which attributes to use on the X and Y axes. Visualize margin curve: The margin is the difference between the probability predicted for the actual class and the highest probability predicted for the other classes. (So, for a single class, if it is predicted to