















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Advanced Special Topics in Bioengineering; Subject: Bioengineering; University: University of Illinois - Chicago; Term: Spring 2003;
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Prof. Yang Dai, BioE 594 Computational Functional Genomics Classification tasks for microarray analysis^
Generate gene expression profiles that can ^
discriminate between different
known
cell types or conditions,
e.g. between tumor and normal tissue, ^
identify different and previously
unknown
cell types or
conditions, e.g. new subclasses of an existing class of tumors.
Assign an unknown cDNA sequence to one of a set of known
gene classes.
Partition a set of genes into new (
unknown
) functional
classes on the basis of their expression patterns across anumber of samples.
Prof. Yang Dai, BioE 594 Computational Functional Genomics
Each tuple/sample is assumed to belong to a predefined class ^
The model is represented as classification rules, decision trees,or mathematical formulae
Estimate accuracy of the model ^
The known label of test example is compared with theclassified result from the model ^
Accuracy rate is the % of test cases that are correctlyclassified by the model
^
If the accuracy is acceptable, use the model to classify datatuples whose class labels are not known.
Prof. Yang Dai, BioE 594 Computational Functional Genomics Evaluating Classification Methods ^
Predictive accuracy ^
Speed and scalability^
time to construct the model time to use the model ^
Robustness: handling noise and missing values ^
Scalability: efficiency in disk-resident databases ^
Interpretability:^
understandable and insight provided by the model ^
Compactness of the model: size of the tree, or the number of rules.
Prof. Yang Dai, BioE 594 Computational Functional Genomics
Prof. Yang Dai, BioE 594 Computational Functional Genomics
8
Performance assessment^
Error rate on the learning set. ^
Can be severely biased
Points in the learning set
are divided into two sets,
and 1
classifier is built using
and error rate is computed for 1
2
Must ensure that
and 1
are from the same population 2
2/3 and 1/3 repeated random sampling. This procedurereduces effective sample size.
Prof. Yang Dai, BioE 594 Computational Functional Genomics
The use of cross-validation (or any other process) is intendedto provide accurate estimates of the classification error rate.In addition this is also a procedure for model selection, i.e., to
determine parameters involved in the model.
^
These estimates relate only to experiment that was (cross-)validated. ^
There is a common practice in this area of doing featureselection using all of the data and them using cross-validation only on the model building and classificationportion.
Prof. Yang Dai, BioE 594 Computational Functional Genomics
Prof. Yang Dai, BioE 594 Computational Functional Genomics
False Positive rate
True Positive rate
Cut-offmeasure
Specificity
Sensitivity
Cut-offmeasure
Table 2
Table 3
Prof. Yang Dai, BioE 594 Computational Functional Genomics
Random performance
Good separation betweenclasses, convex curve
Reasonable separation,
mostly convex
Fairly poor separation,
mostly convex
Poor separation, large and small concavities
Prof. Yang Dai, BioE 594 Computational Functional Genomics
For each class, calculate the center of mass of the representativepoints in the class. ^
Calculate the distance between the position of the sample to beclassified and each of the centers of mass of the classes using anappropriate distance measure ^
Assign sample to the class whose center of mass is nearest to it
Prof. Yang Dai, BioE 594 Computational Functional Genomics
17
c^ |
d), i.e., given a data tuple
d, what is the probability that
d^
is
of class
c. (
is the class attribute).
Let
through
be the attributes with discrete values. They arek
used to predict a discrete class
Given an example with observed attribute values
a
through
ak
The prediction is the class c such thatP(
c|A
a
=k ak
) is maximal.
Prof. Yang Dai, BioE 594 Computational Functional Genomics
Prof. Yang Dai, BioE 594 Computational Functional Genomics An example - training dataset
age
income
student
credit_rating
buys_computer
<=
high
no
fair
no
<=
high
no
excellent
no
30…
high
no
fair
yes
medium
no
fair
yes
low
yes
fair
yes
low
yes
excellent
no
31…
low
yes
excellent
yes
<=
medium
no
fair
no
<=
low
yes
fair
yes
medium
yes
fair
yes
<=
medium
yes
excellent
yes
31…
medium
no
excellent
yes
31…
high
yes
fair
yes
medium
no
excellent
no
Class:C1:buys_computer=‘yes’C2:buys_computer=‘no’ Data sampleX =(age<=30,Income=medium,Student=yesCredit_rating=Fair)