Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Administrative Machine Learning - Project | CSCI 567, Study Guides, Projects, Research of Computer Science

University of Southern California (USC)Computer Science

Prof. Fei Sha

Material Type: Project; Professor: Sha; Class: Machine Learning; Subject: Computer Science; University: University of Southern California; Term: Fall 2008;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 02/24/2010

koofers-user-5o1 🇺🇸

10 documents

1 / 42

This page cannot be seen from the preview

Don't miss anything!

Fall 2008 Evaluation - Dr. Sofus A. Macskassy1

Machine Learning (CS 567)

Fall 2008

Time: T-Th 5:00pm - 6:20pm

Location: GFS 118

Instructor: Sofus A. Macskassy ([email protected])

Office: SAL 216

Office hours: by appointment

Teaching assistant: Cheol Han ([email protected])

Office: SAL 229

Office hours: M 2-3pm, W 11-12

Class web page:

http://www-scf.usc.edu/~csci567/index.html

Discover Study Guides, Projects, Research of Computer Science University of Southern California (USC)

Partial preview of the text

Download Administrative Machine Learning - Project | CSCI 567 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Machine Learning (CS 567)

Fall 2008

Time: T-Th 5:00pm - 6:20pm Location: GFS 118

Instructor : Sofus A. Macskassy ([email protected]) Office: SAL 216 Office hours: by appointment

Teaching assistant : Cheol Han ([email protected]) Office: SAL 229 Office hours: M 2-3pm, W 11-

Class web page: http://www-scf.usc.edu/~csci567/index.html

Administrative – 599 Spring 2009

I teach a 599 Seminar next semester (Spring 2009)
- Style is seminar: weekly readings, class discussion
Title: Advanced Topics in Machine Learning: Statistical Relational Learning - This is not the same as what Prof. Fei Sha is teaching, which is a different advanced topics in ML seminar
Focus of the course is on relational learning
- Standard ML considers instances to be independent
- What if they are not? Such as in relational databases, social networks, other graphical data such as the web or hypertext?
- Topics include issues such as collective inference, relational inference, search space, bias, graphical models and more.
Preliminary syllabus is posted on the csci567 page

Evaluation of Classifiers

ROC Curves
Reject Curves
Precision-Recall Curves
Statistical Tests
- Estimating the error rate of a classifier
- Comparing two classifiers
- Estimating the error rate of a learning algorithm
- Comparing two algorithms

Cost-Sensitive Learning

In most applications, false positive and false

negative errors are not equally important. We therefore want to adjust the tradeoff between them. Many learning algorithms provide a way to do this:

probabilistic classifiers: combine cost matrix with decision theory to make classification decisions
discriminant functions: adjust the threshold for classifying into the positive class
ensembles: adjust the number of votes required to classify as positive

Directly Visualizing the Tradeoff

We can plot the false positives versus false negatives directly.
If R ¢ L(0,1) = L(1,0) (i.e., a FP is R times more expensive than a FN), then total errors = L(0,1)FN() + L(1,0)FP() / FN() + RFP(); FN() = – RFP()
The best operating point will be tangent to a line with a slope of – R

If R=1, we should set the threshold to

If R=10, the threshold should be 29

Receiver Operating Characteristic (ROC) Curve

It is traditional to plot this same information in a normalized form with True Positive Rate plotted against the False Positive Rate.

The optimal operating point is tangent to a line with a slope of R

R=2 R=

SVM: Asymmetric Margins

Minimize ||w||^2 + C i i

Subject to w ¢ x i + i ¸ R (positive examples)

w ¢ x i + i ¸ 1 (negative examples)

ROC: Sub-Optimal Models

Sub-optimal region

ROC: Sub-Optimal Models ROC Convex Hull

Original ROC Curve

Maximizing AUC

At learning time, we may not know the cost ratio R. In such cases, we can maximize the Area Under the ROC Curve (AUC)
Efficient computation of AUC
- Assume h( x ) returns a real quantity (larger values => class 1)
- Sort x i according to h( x i). Number the sorted points from 1 to N such that r(i) = the rank of data point x i
- AUC = P(s(x=1)>s(x=0))
  - probability that a randomly chosen example from class 1 ranks above a randomly chosen example from class 0
- This is also known as the Wilcoxon-Mann-Whitney statistic

Optimizing AUC

A hot topic in machine learning right now is

developing algorithms for optimizing AUC

RankBoost: A modification of AdaBoost.

The main idea is to define a “ranking loss”

function and then penalize a training

example x by the number of examples of

the other class that are misranked (relative

to x )

Rejection Curves

In most learning algorithms, we can specify

a threshold for making a rejection decision

Probabilistic classifiers: adjust cost of rejecting versus cost of FP and FN
Decision-boundary method: if a test point x is within  of the decision boundary, then reject - Equivalent to requiring that the “activation” of the best class is larger than the second-best class by at least 

Precision versus Recall

Information Retrieval:
- y = 1: document is relevant to query
- y = 0: document is irrelevant to query
- K: number of documents retrieved
Precision:
- fraction of the K retrieved documents (ŷ=1) that are actually relevant (y=1)
- TP / (TP + FP)
Recall:
- fraction of all relevant documents that are retrieved
- TP / (TP + FN) = true positive rate

Precision Recall Graph

Plot recall on horizontal axis; precision on vertical axis; and vary the threshold for making positive predictions (or vary K)

Administrative Machine Learning - Project | CSCI 567, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text

Download Administrative Machine Learning - Project | CSCI 567 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Machine Learning (CS 567)

Administrative – 599 Spring 2009

Evaluation of Classifiers

Cost-Sensitive Learning

Directly Visualizing the Tradeoff

SVM: Asymmetric Margins

Maximizing AUC

Optimizing AUC

developing algorithms for optimizing AUC

The main idea is to define a “ranking loss”

function and then penalize a training

example x by the number of examples of

the other class that are misranked (relative

to x )

Rejection Curves

a threshold for making a rejection decision

Precision versus Recall

Precision Recall Graph