Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Microarray Analysis Exercise: Classification using Support Vector Machines, Assignments of Biology

University of Illinois - Chicago Biology

An exercise on using support vector machines (svm) for classification in microarray analysis. Students will use data from golub et al. (1999) and preprocess it using r and bioconductor packages. They will then apply svm to the preprocessed data and evaluate the error rates for both the training and testing sets. The exercise aims to help students learn how to apply svm to microarray data analysis.

Typology: Assignments

Pre 2010

Uploaded on 07/23/2009

koofers-user-y84 🇺🇸

4.5

(2)

10 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Exercise : Microarray Analysis: Classification

using Support Vector Machines

April 14, 2006

Due Date: April 27, 2006

Objective

•Learn to apply SVM on microarray data analysis.

1 Pre-lab

In this exercise we explore the use of support vector machines (svm) for classi-

fication in microarray analysis. We will use the data set presented in Golub et

al. (1999) and available in an online repository from the authors ( http://www-

genome.wi.mit.edu/mpr/data set ALL AML.html) and were included in a R data

package, golubEsets in Bioconductor. The expression data in this dataset are from

the study of gene expression in two types of acute leukemias: acute lymphoblasic

leukemia (ALL) and acute myeloid leukemia (AML). Gene expression levels were

measure using Affymetrix high-density oligonucleotide arrays (HU6800 chip) con-

taining probes for 6,817 human genes and ESTs. The chip actually contains 7,129

different probe sets; some of these map to the sample genes and others are there

for quality control purposes. The data comprise of 38 samples of ALL (38 B-cell

ALL and 9 T-cell ALL) and 25 samples of AML. These samples are further divided

into a training set (golubTrain) with 38 observations and a test set (golubTest) of

34 observations. The svm solver comes from the package e1071(you can download

it from http://cran.r-project.org/src/contrib/Descriptions/e1071.html).

2 Data pre-processing

First you will need to load the following R and Bioconductor packages

1

Discover Assignments of Biology University of Illinois - Chicago

Partial preview of the text

Download Microarray Analysis Exercise: Classification using Support Vector Machines and more Assignments Biology in PDF only on Docsity!

Exercise : Microarray Analysis: Classification

using Support Vector Machines

April 14, 2006

Due Date: April 27, 2006

Objective

Learn to apply SVM on microarray data analysis.

1 Pre-lab

In this exercise we explore the use of support vector machines (svm) for classi- fication in microarray analysis. We will use the data set presented in Golub et al. (1999) and available in an online repository from the authors ( http://www- genome.wi.mit.edu/mpr/data set ALL AML.html) and were included in a R data package, golubEsets in Bioconductor. The expression data in this dataset are from the study of gene expression in two types of acute leukemias: acute lymphoblasic leukemia (ALL) and acute myeloid leukemia (AML). Gene expression levels were measure using Affymetrix high-density oligonucleotide arrays (HU6800 chip) con- taining probes for 6,817 human genes and ESTs. The chip actually contains 7, different probe sets; some of these map to the sample genes and others are there for quality control purposes. The data comprise of 38 samples of ALL (38 B-cell ALL and 9 T-cell ALL) and 25 samples of AML. These samples are further divided into a training set (golubTrain) with 38 observations and a test set (golubTest) of 34 observations. The svm solver comes from the package e1071(you can download it from http://cran.r-project.org/src/contrib/Descriptions/e1071.html).

2 Data pre-processing

First you will need to load the following R and Bioconductor packages

library(golubEsets) library(e1071) library(Biobase) library(genefilter)

Then we obtain the required expression data in the form of exprSets (golubTrain and golubTest) by using the data function.

data(golubTrain) data(golubTest)

Apply the preliminary gene filter procedure on golubTrain as we did before.

X <- exprs(golubTrain) X[X<100] <- 100 X[X>16000] <- 16000 mmfilt <- function(r=5, d=500, na.rm=TRUE) { function(x) { minval <- min(x, na.rm=na.rm) maxval <- max(x,na.rm=na.rm) (maxval/minval > r) && (maxval-minval > d) } } mmfun <- mmfilt() ffun <- filterfun(mmfun) sub <- genefilter(X, ffun) X <- X[sub,] X <- log10(X) golubTrainSub<-golubTrain[sub,] golubTrainSub@exprs <- X Y <- golubTrainSub$ALL.AML Y <- paste(golubTrain$ALL.AML,golubTrain$T.B.cell) Y <- sub(" NA","",Y)

This is a non-specific filter. The genes were selected according to their variability not with respect to their ability to classify any particular set of samples.

In order to make the test set comparable we must select the same set of genes and apply the same transformations to that data set.

Xt <- exprs(golubTest)

Question 4: As a second exercise you could reverse the rolls of the two data sets, the test set could be treated as the training data set and the training data set could be treated as the test data set. What is the error rate for training set? What is the average error rate for 10-fold cross-validation? What is the error rate for testing set?

For more details about svm in R, please refer to package e1071 manual at http://cran.r- project.org/doc/packages/e1071.pdf

4 What do you need to submit?

The complete source code in R (do include some comments).
The answers of the questions in this exercise. You need some R commands that did not appear in the text to answer the questions. Get help from reference manual or CRAN network.

5 Acknowledgement

This exercise is adapted from the lab material in A short course on Computational and Statistical Aspects of Microarray Analysis, A. Antoniadis and R. Gentleman, May 2003, Milan

Microarray Analysis Exercise: Classification using Support Vector Machines, Assignments of Biology

Related documents

Partial preview of the text

Download Microarray Analysis Exercise: Classification using Support Vector Machines and more Assignments Biology in PDF only on Docsity!

Exercise : Microarray Analysis: Classification

using Support Vector Machines

April 14, 2006

1 Pre-lab

2 Data pre-processing