Administrative Machine Learning - Project | CSCI 567, Study Guides, Projects, Research of Computer Science

Material Type: Project; Professor: Sha; Class: Machine Learning; Subject: Computer Science; University: University of Southern California; Term: Fall 2008;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 02/24/2010

koofers-user-5o1
koofers-user-5o1 🇺🇸

10 documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Fall 2008 Evaluation - Dr. Sofus A. Macskassy1
Machine Learning (CS 567)
Fall 2008
Time: T-Th 5:00pm - 6:20pm
Location: GFS 118
Instructor: Sofus A. Macskassy ([email protected])
Office: SAL 216
Office hours: by appointment
Teaching assistant: Cheol Han ([email protected])
Office: SAL 229
Office hours: M 2-3pm, W 11-12
Class web page:
http://www-scf.usc.edu/~csci567/index.html
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download Administrative Machine Learning - Project | CSCI 567 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Machine Learning (CS 567)

Fall 2008

Time: T-Th 5:00pm - 6:20pm Location: GFS 118

Instructor : Sofus A. Macskassy ([email protected]) Office: SAL 216 Office hours: by appointment

Teaching assistant : Cheol Han ([email protected]) Office: SAL 229 Office hours: M 2-3pm, W 11-

Class web page: http://www-scf.usc.edu/~csci567/index.html

Administrative – 599 Spring 2009

  • I teach a 599 Seminar next semester (Spring 2009)
    • Style is seminar: weekly readings, class discussion
  • Title: Advanced Topics in Machine Learning: Statistical Relational Learning - This is not the same as what Prof. Fei Sha is teaching, which is a different advanced topics in ML seminar
  • Focus of the course is on relational learning
    • Standard ML considers instances to be independent
    • What if they are not? Such as in relational databases, social networks, other graphical data such as the web or hypertext?
    • Topics include issues such as collective inference, relational inference, search space, bias, graphical models and more.
  • Preliminary syllabus is posted on the csci567 page

Evaluation of Classifiers

  • ROC Curves
  • Reject Curves
  • Precision-Recall Curves
  • Statistical Tests
    • Estimating the error rate of a classifier
    • Comparing two classifiers
    • Estimating the error rate of a learning algorithm
    • Comparing two algorithms

Cost-Sensitive Learning

  • In most applications, false positive and false

negative errors are not equally important. We therefore want to adjust the tradeoff between them. Many learning algorithms provide a way to do this:

  • probabilistic classifiers: combine cost matrix with decision theory to make classification decisions
  • discriminant functions: adjust the threshold for classifying into the positive class
  • ensembles: adjust the number of votes required to classify as positive

Directly Visualizing the Tradeoff

  • We can plot the false positives versus false negatives directly.
  • If R ¢ L(0,1) = L(1,0) (i.e., a FP is R times more expensive than a FN), then total errors = L(0,1)FN() + L(1,0)FP() / FN() + RFP(); FN() = – RFP()
  • The best operating point will be tangent to a line with a slope of – R

If R=1, we should set the threshold to

If R=10, the threshold should be 29

Receiver Operating Characteristic (ROC) Curve

  • It is traditional to plot this same information in a normalized form with True Positive Rate plotted against the False Positive Rate.

The optimal operating point is tangent to a line with a slope of R

R=2 R=

SVM: Asymmetric Margins

Minimize ||w||^2 + C i i

Subject to w ¢ x i + i ¸ R (positive examples)

  • w ¢ x i + i ¸ 1 (negative examples)

ROC: Sub-Optimal Models

Sub-optimal region

ROC: Sub-Optimal Models ROC Convex Hull

Original ROC Curve

Maximizing AUC

  • At learning time, we may not know the cost ratio R. In such cases, we can maximize the Area Under the ROC Curve (AUC)
  • Efficient computation of AUC
    • Assume h( x ) returns a real quantity (larger values => class 1)
    • Sort x i according to h( x i). Number the sorted points from 1 to N such that r(i) = the rank of data point x i
    • AUC = P(s(x=1)>s(x=0))
      • probability that a randomly chosen example from class 1 ranks above a randomly chosen example from class 0
    • This is also known as the Wilcoxon-Mann-Whitney statistic

Optimizing AUC

  • A hot topic in machine learning right now is

developing algorithms for optimizing AUC

  • RankBoost: A modification of AdaBoost.

The main idea is to define a “ranking loss”

function and then penalize a training

example x by the number of examples of

the other class that are misranked (relative

to x )

Rejection Curves

  • In most learning algorithms, we can specify

a threshold for making a rejection decision

  • Probabilistic classifiers: adjust cost of rejecting versus cost of FP and FN
  • Decision-boundary method: if a test point x is within  of the decision boundary, then reject - Equivalent to requiring that the “activation” of the best class is larger than the second-best class by at least 

Precision versus Recall

  • Information Retrieval:
    • y = 1: document is relevant to query
    • y = 0: document is irrelevant to query
    • K: number of documents retrieved
  • Precision:
    • fraction of the K retrieved documents (ŷ=1) that are actually relevant (y=1)
    • TP / (TP + FP)
  • Recall:
    • fraction of all relevant documents that are retrieved
    • TP / (TP + FN) = true positive rate

Precision Recall Graph

  • Plot recall on horizontal axis; precision on vertical axis; and vary the threshold for making positive predictions (or vary K)