
Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Class: Special Topics: Advanced Database Systems; Subject: Elect Engr & Computer Science; University: University of Kansas; Term: Fall 2005;
Typology: Assignments
1 / 1
This page cannot be seen from the preview
Don't miss anything!

Classification of Leukemia with gene expression profiles. The training set (in the file golub-data- train.txt) and the test set (in the file golub-data-independent.txt) are available at http://people.eecs.ku.edu/~yazhang/course/f05/Homework.html.
Background on the dataset: The training set contains gene expression profiles for 38 bone marrow samples from acute leukemia patients, with each profile consisting of about 7000 gene expression levels. The training sample are labeled as either ALL (acute lymphoid leukemia) or AML (acute myeloid leukemia), two clinically distinct types of leukemia. The ALL type samples can further be divided into T-lineage ALL and B-lineage ALL. Finally there is a test ("independent") set of 50 additional samples also consisting of AML, T-lineage ALL, and B- lineage ALL leukemia types. Your task is to distinguish between ALL and AML. (Reference: "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring", Golub et al ., Science , 286, 1999.)
Actual \ Predicted Negative Positive Negative A B
Positive C D
where A, B, C, D are the number of test examples falling into each category, and calculate the following simple statistics: