






















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth guide on weka, a workbench for data analysis and predictive modeling. It covers the features, benefits, and usage of weka, including its portability, comprehensive collection of data preprocessing and modeling techniques, and ease of use. The document also includes exercises on creating datasets, pre-processing techniques, association rule mining, decision tree learning, and naive bayes classification. It is particularly useful for information technology students studying data mining and machine learning.
Typology: Lab Reports
1 / 94
This page cannot be seen from the preview
Don't miss anything!























































































Recognized under 2(f) and 12 (B) of UGC ACT 1956 Affiliated to JNTUH,Hyderabad, Approved by AICTE - Accredited by NBA & NAAC – ‘A’ Grade - ISO 9001: Certified) Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad – 500100, Telangana State, India
Vision
skills and attitude to adapt to the global needs of the Information technology sector, through academic and research excellence. Mission
the teaching learning pedagogy by using innovative techniques.
motivation towards possession of effective academic skills and relevant research experience.
the betterment of the society.
PROGRAM SPECIFIC OUTCOMES (PSOs) After the completion of the course, B. Tech Information Technology, the graduates will have the following Program Specific Outcomes:
PROGRAM OUTCOMES (POs) Engineering Graduates will be able to:
COURSE NAME: DATA WAREHOUSING AND DATA MINING LAB COURSE CODE: R18A COURSE OBJECTIVES:
Cloudera Teradata Oracle TabLeau OPEN SOURCE DATA MINING TOOLS WEKA Orange KNIME R-Programming
Information Technology Page 1 Experiment 1: Installation of WEKA Tool Aim: A. Investigation the Application interfaces of the Weka tool. Introduction: Introduction Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions. The original non-Java version of Weka was a Tcl/Tk front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus data preprocessing utilities in C, and Make file-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include: ▪ Free availability under the GNU General Public License. ▪ Portability, since it is fully implemented in the Java programming language and thus runs on almost any modern computing platform ▪ A comprehensive collection of data preprocessing and modeling techniques ▪ Ease of use due to its graphical user interfaces Description: Open the program. Once the program has been loaded on the user‟s machine it is opened by navigating to the programs start option and that will depend on the user‟s operating system. Figure 1.1 is an example of the initial opening screen on a computer. There are four options available on this initial screen:
Information Technology Page 2 Fig: 1.1 Weka GUI
1. Explorer - the graphical interface used to conduct experimentation on raw data After clicking the Explorer button the weka explorer interface appears. Fig: 1.2 Pre-processor
Information Technology Page 4 Inside the weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application. Open File - allows for the user to select files residing on the local machine or recorded medium Open URL - provides a mechanism to locate a file or data source from a different location specified by the user Open Database - allows the user to retrieve files or data from a database source provided by user 2. Classify- used to test and train different learning schemes on the preprocessed data file under experimentation Fig: 1.3 choosing Zero set from classify Again there are several options to be selected inside of the classify tab. Test option gives the user the choice of using four different test mode scenarios on the data set.
Information Technology Page 5
4. Association- used to apply different rules to the data file that identify association within the data. The associate tab opens a window to select the options for associations within the dataset.
Information Technology Page 7
3. Knowledge Flow - basically the same functionality as Explorer with drag and drop functionality. The advantage of this option is that it supports incremental learning from previous results 4. Simple CLI - provides users without a graphic interface option the ability to execute commands from a terminal window. b. Explore the default datasets in weka tool. Click the “ Open file… ” button to open a data set and double click on the “ data ” directory. Weka provides a number of small common machine learning datasets that you can use to practiceon. Select the “ iris.arff ” file to load the Iris dataset. Fig: 1.7 Different Data Sets in weka References: [1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and techniques. 2nd edition Morgan Kaufmann, San Francisco. [2] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA. [3] CVS–http://weka.sourceforge.net/wiki/index.php/CVS [4] Weka Doc–http://weka.sourceforge.net/wekadoc/ Exercise: 1. Normalize the data using min-max normalization
Information Technology Page 8 Record Notes
Information Technology Page 11 Experiment 2.Creating new ARFF file Aim: Creating a new ARFF file An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of instances sharing a set of attributes. ARFF files were developed by the Machine Learning Project at the Department of Computer Science of The University of Waikato for use with the Weka machine learning software in WEKA, each data entry is an instance of the java class weka.core. Instance, and each instance consists of a For loading datasets in WEKA, WEKA can load ARFF files. Attribute Relation File Format has two sections: