Download Administrative Machine Learning - Project | CSCI 567 and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!
Machine Learning (CS 567)
Fall 2008
Time: T-Th 5:00pm - 6:20pm Location: GFS 118
Instructor : Sofus A. Macskassy ([email protected]) Office: SAL 216 Office hours: by appointment
Teaching assistant : Cheol Han ([email protected]) Office: SAL 229 Office hours: M 2-3pm, W 11-
Class web page: http://www-scf.usc.edu/~csci567/index.html
Administrative – 599 Spring 2009
- I teach a 599 Seminar next semester (Spring 2009)
- Style is seminar: weekly readings, class discussion
- Title: Advanced Topics in Machine Learning: Statistical Relational Learning - This is not the same as what Prof. Fei Sha is teaching, which is a different advanced topics in ML seminar
- Focus of the course is on relational learning
- Standard ML considers instances to be independent
- What if they are not? Such as in relational databases, social networks, other graphical data such as the web or hypertext?
- Topics include issues such as collective inference, relational inference, search space, bias, graphical models and more.
- Preliminary syllabus is posted on the csci567 page
Evaluation of Classifiers
- ROC Curves
- Reject Curves
- Precision-Recall Curves
- Statistical Tests
- Estimating the error rate of a classifier
- Comparing two classifiers
- Estimating the error rate of a learning algorithm
- Comparing two algorithms
Cost-Sensitive Learning
- In most applications, false positive and false
negative errors are not equally important. We therefore want to adjust the tradeoff between them. Many learning algorithms provide a way to do this:
- probabilistic classifiers: combine cost matrix with decision theory to make classification decisions
- discriminant functions: adjust the threshold for classifying into the positive class
- ensembles: adjust the number of votes required to classify as positive
Directly Visualizing the Tradeoff
- We can plot the false positives versus false negatives directly.
- If R ¢ L(0,1) = L(1,0) (i.e., a FP is R times more expensive than a FN), then total errors = L(0,1)FN() + L(1,0)FP() / FN() + RFP(); FN() = – RFP()
- The best operating point will be tangent to a line with a slope of – R
If R=1, we should set the threshold to
If R=10, the threshold should be 29
Receiver Operating Characteristic (ROC) Curve
- It is traditional to plot this same information in a normalized form with True Positive Rate plotted against the False Positive Rate.
The optimal operating point is tangent to a line with a slope of R
R=2 R=
SVM: Asymmetric Margins
Minimize ||w||^2 + C i i
Subject to w ¢ x i + i ¸ R (positive examples)
- w ¢ x i + i ¸ 1 (negative examples)
ROC: Sub-Optimal Models
Sub-optimal region
ROC: Sub-Optimal Models ROC Convex Hull
Original ROC Curve
Maximizing AUC
- At learning time, we may not know the cost ratio R. In such cases, we can maximize the Area Under the ROC Curve (AUC)
- Efficient computation of AUC
- Assume h( x ) returns a real quantity (larger values => class 1)
- Sort x i according to h( x i). Number the sorted points from 1 to N such that r(i) = the rank of data point x i
- AUC = P(s(x=1)>s(x=0))
- probability that a randomly chosen example from class 1 ranks above a randomly chosen example from class 0
- This is also known as the Wilcoxon-Mann-Whitney statistic
Optimizing AUC
- A hot topic in machine learning right now is
developing algorithms for optimizing AUC
- RankBoost: A modification of AdaBoost.
The main idea is to define a “ranking loss”
function and then penalize a training
example x by the number of examples of
the other class that are misranked (relative
to x )
Rejection Curves
- In most learning algorithms, we can specify
a threshold for making a rejection decision
- Probabilistic classifiers: adjust cost of rejecting versus cost of FP and FN
- Decision-boundary method: if a test point x is within of the decision boundary, then reject - Equivalent to requiring that the “activation” of the best class is larger than the second-best class by at least
Precision versus Recall
- Information Retrieval:
- y = 1: document is relevant to query
- y = 0: document is irrelevant to query
- K: number of documents retrieved
- Precision:
- fraction of the K retrieved documents (ŷ=1) that are actually relevant (y=1)
- TP / (TP + FP)
- Recall:
- fraction of all relevant documents that are retrieved
- TP / (TP + FN) = true positive rate
Precision Recall Graph
- Plot recall on horizontal axis; precision on vertical axis; and vary the threshold for making positive predictions (or vary K)