Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Classification vs. Prediction: Applications - Prof. Abdelghani Bellaachia, Study notes of Computer Science

George Washington University (GW)Computer Science

Prof. Abdelghani Bellaachia

An overview of classification and prediction, two fundamental concepts in machine learning and data mining. It covers the definitions, differences, typical applications, supervised vs. Unsupervised learning, and related issues. Topics include document categorization, credit approval, medical diagnosis, treatment effectiveness analysis, and data preparation. The document also discusses various algorithms such as decision trees, neural networks, and attribute selection measures.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-y6g 🇺🇸

10 documents

1 / 36

This page cannot be seen from the preview

Don't miss anything!

A. Bellaachia Page: 1

Classification and Prediction

1. Objectives.................................................................................2

2. Classification vs. Prediction.....................................................3

2.1. Definitions........................................................................3

2.2. Supervised vs. Unsupervised Learning............................3

2.3. Classification and Prediction Related Issues ...................4

3. Common Test Corpora.............................................................5

4. Classification............................................................................6

5. Decision Tree Induction .........................................................11

5.1. Decision Tree Induction Algorithm...............................13

5.2. Other Attribute Selection Measures...............................18

5.3. Extracting Classification Rules from Trees:..................19

5.4. Avoid Overfitting in Classification................................19

5.5. Classification in Large Databases..................................20

6. Bayesian Classification ..........................................................21

6.1. Basics..............................................................................22

6.2. Naïve Bayesian Classifier ..............................................24

7. Bayesian Belief Networks......................................................27

7.1. Definition........................................................................27

8. Neural Networks: Classification by Backpropagation...........30

8.1. Neural network Issues ....................................................31

8.2. Backpropagation Algorithm...........................................32

9. Prediction................................................................................35

9.1. Regress Analysis and Log-Linear Models in Prediction

35

10. Classification Accuracy: Estimating Error Rates ..............36

Discover Study notes of Computer Science George Washington University (GW)

Partial preview of the text

Download Understanding Classification vs. Prediction: Applications - Prof. Abdelghani Bellaachia and more Study notes Computer Science in PDF only on Docsity!

Classification and Prediction

9.1. Regress Analysis and Log-Linear Models in Prediction

1. Objectives .................................................................................
1. Classification vs. Prediction.....................................................
- 2.1. Definitions ........................................................................
- 2.2. Supervised vs. Unsupervised Learning............................
- 2.3. Classification and Prediction Related Issues ...................
1. Common Test Corpora .............................................................
1. Classification ............................................................................
1. Decision Tree Induction .........................................................
- 5.1. Decision Tree Induction Algorithm ...............................
- 5.2. Other Attribute Selection Measures...............................
- 5.3. Extracting Classification Rules from Trees: ..................
- 5.4. Avoid Overfitting in Classification................................
- 5.5. Classification in Large Databases ..................................
1. Bayesian Classification ..........................................................
- 6.1. Basics..............................................................................
- 6.2. Naïve Bayesian Classifier ..............................................
1. Bayesian Belief Networks......................................................
- 7.1. Definition........................................................................
1. Neural Networks: Classification by Backpropagation...........
- 8.1. Neural network Issues ....................................................
- 8.2. Backpropagation Algorithm...........................................
1. Prediction................................................................................
1. Classification Accuracy: Estimating Error Rates ..............

1. Objectives

Techniques to classify datasets and provide categorical labels, e.g., sports, technology, kid, etc.
Example; {credit history, salary}-> credit approval ( Yes/No)
Models to predict certain future behaviors, e.g., who is going to buy PDAs?

The class labels of training data is unknown

Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data

2.3. Classification and Prediction Related Issues

(^) Data Preparation Data cleaning o Preprocess data in order to reduce noise and handle missing values Relevance analysis ( feature selection ) o Remove the irrelevant or redundant attributes Data transformation o Generalize and/or normalize data
Performance Analysis Predictive accuracy: o Ability to classify new or previously unseen data. Speed and scalability o Time to construct the model o Time to use the model Robustness o Model makes correct predictions: Handling noise and missing values Scalability o Efficiency in disk-resident databases Interpretability: o Understanding and insight provided by the model Goodness of rules

o Decision tree size o Compactness of classification rules

3. Common Test Corpora

Reuters - Collection of newswire stories from 1987 to 1991, labeled with categories.
TREC-AP newswire stories from 1988 to 1990, labeled with categories.
OHSUMED Medline articles from 1987 to 1991, MeSH categories assigned.
UseNet newsgroups.
WebKB - Web pages gathered from university CS departments.

o Distance based o Partitioning based

(^) Classification as a two-step process: o (^) Model construction : Build a model for pre-determined classes. o Model usage : Classify unknown data samples o If the accuracy is acceptable, use the model to classify data objects whose class labels are not known

Partitioning Based^ Distance Based

Model construction : describing a set of predetermined classes o Each data sample is assumed to belong to a predefined class, as determined by the class label attribute o Use a training dataset for model construction. o The model is represented as classification rules, decision trees, or mathematical formula

Training Data

NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 yes Bill Professor 2 yes Jim Associate Prof 7 yes Dave Assistant Prof 6 no Anne Associate Prof 3 no

Classification Algorithms

IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’

Classifier (Model)

Neural Networks (NNet) - Learn non-linear mapping from input data samples to categories.
Support Vector Machines (SVMs).

5. Decision Tree Induction

Example: Training Dataset [Quinlan’s ID3]

age income student credit_rating buys_computer

<=30 high no fair no <=30 high no excellent no 31…40 high no fair yes

40 medium no fair yes

40 low yes fair yes 40 low yes excellent no

31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes

40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes 40 medium no excellent no

5.1. Decision Tree Induction Algorithm

Basic algorithm (a greedy algorithm) o Tree is constructed in a top-down recursive divide-and- conquer manner o At start, all the training examples are at the root o Attributes are categorical (if continuous-valued, they are discretized in advance) o Samples are partitioned recursively based on selected attributes o (^) Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) o Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf There are no samples left
Attribute Selection Measure: Information gain

o Select the attribute with the highest information gain o S contains si tuples of class Ci for i = {1, …, m} o Entropy of the set of tuples: It measures how informative is a node.

=

m

i

i i m (^) s

s s

s I s s s 1

( 1 , 2 ,..., ) log 2

o Entropy after choosing attribute A with values {a1,a2,…,av}

=

v

j

j mj

j mjI s s s

s s E A 1

1

( )^1 ... ( ,..., )

o Information gained by branching on attribute A

Gain ( A )= I ( s 1 , s 2 ,..., sm )− E ( A )

Information Gain Computation

o Class P: buys_computer = “yes” p: number of samples

o Class N: buys_computer = “no” n: number of samples

o The expected information:

I(p, n) = I(9, 5) =0.

o Compute the entropy for age :

age pi ni I(pi, ni) <=30 2 3 0. 30…40 4 0 0

40 3 2 0.

Recursively apply the same process to each subset.

income student credit_rating class

high no fair no high no excellent no low yes fair yes medium no fair no medium yes excellent yes

income student credit_rating class medium no fair yes low yes fair yes low yes excellent no medium yes fair yes medium no excellent no

income student credit_rating class high no fair yes low yes excellent yes medium no excellent yes high yes fair yes

Age?

All yes. It is a leaf.

ID3 Algorithm:

5.3. Extracting Classification Rules from Trees:

Represent the knowledge in the form of IF-THEN rules
One rule is created for each path from the root to a leaf
Each attribute-value pair along a path forms a conjunction
The leaf node holds the class prediction
Rules are easier for humans to understand
Example o IF age = “<=30” AND student = “ no ” THEN buys_computer = “ no ” o IF age = “<=30” AND student = “ yes ” THEN buys_computer = “ yes ” o IF age = “31…40” THEN buys_computer = “ yes ” o IF age = “>40” AND credit_rating = “ excellent ” THEN buys_computer = “ yes ” o IF age = “<=30” AND credit_rating = “ fair ” THEN buys_computer = “ no ”

5.4. Avoid Overfitting in Classification

Overfitting: An induced tree may overfit the training data o Too many branches, some may reflect anomalies due to noise or outliers o Poor accuracy for unseen samples
Two approaches to avoid overfitting o Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold o Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees

Use a set of data different from the training data to decide which is the “best pruned tree”

5.5. Classification in Large Databases

Classification—a classical problem extensively studied by statisticians and machine learning researchers
Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed
(^) Why decision tree induction in data mining? o Relatively faster learning speed (than other classification methods) o Convertible to simple and easy to understand classification rules o Can use SQL queries for accessing databases o Comparable classification accuracy with other methods

Understanding Classification vs. Prediction: Applications - Prof. Abdelghani Bellaachia, Study notes of Computer Science

Related documents

Partial preview of the text

Download Understanding Classification vs. Prediction: Applications - Prof. Abdelghani Bellaachia and more Study notes Computer Science in PDF only on Docsity!

Classification and Prediction

1. Objectives

3. Common Test Corpora

5. Decision Tree Induction

( )^1 ... ( ,..., )