Microarray Data Analysis Exercise: Clustering, Functional Enrichment, and Motif Finding | Assignments Computer Science

Homework 3

Due: Thursday, Dec 6, 8:30pm

Turn in everything electronically. However, please do not send me too many extra stuff

that I did not ask for. Read carefully below for what I am expecting.

Purpose

The purpose of this exercise is for you to get familar with microarray data and some available tools for

analyzing them.

Data and tools needed

(1) Download gene expression data cellcycle.txt from the course website (http://www.cs.utsa.edu/

~jruan/teaching/cs5263_fall_2007/hw3_files/cellcycle.txt).

The original data were obtained by two different research groups using microarray to study gene

expression at different stages of cell cycles in the Saccharomyces cerevisiae (the more common name:

baker’s yeast). (If you have forgotten what is a cell cycle, just keep reading. You don’t need to know

it to finish the homework). One study was conducted using Affymetrix microarray by Cho et. al.

(Molecular Cell, July 1998), and included 17 time points covering approximately two cell cycles. The

other study was conducted by Spellman et. al. (Molecular Biology of the Cell, December 1998) using

cDNA microarray, and consisted of three subsets of experiments, each with 10-20 time points covering

about two cell cycles. The combined data were downloaded from http://genome-www.stanford.edu/

cellcycle/.

I have processed the data with the following procedure:

i. Removed genes with more than ten missing values.

ii. Missing values for the remaining genes were replaced with uniformly distributed random numbers

between -1 and 1.

iii. Genes were ranked according to the standard deviation of their expression vectors. The top 3000

genes with the highest variability were selected. The rest were discarded.

The processed data are stored as a 3000 x 73 matrix, where each row corresponds to a gene, and each

column corresponds to an experiment (a time point). Gene names and experiment names are shown

as row and column headers, respectively. Data are provided as log ratios between gene expression

levels measured from yeast cells collected at a particular time point (if you want to know, the cells

were “synchronized” so that they were in the same “cell-cycle phases”.) and those measured from a

mixture of yeast cells collected at different time points (i.e., the sample consisted of cells from different

cell-cycle phases).

(2) Download the MeV 4.0 software for micorarray gene expression clustering from http://www.tm4.org/

mev.html (or some other clustering tools if you like).

(3) FuncAssociate web interface for gene ontology analysis (http://llama.med.harvard.edu/cgi/func/

funcassociate). You can use other similar tools if you prefer.

(4) AlignACE or MEME for motif finding. Both tools have standalone versions and web interfaces. Alig-

nACE is available at http://atlas.med.harvard.edu/. MEME can be accessed and downloaded at

http://meme.sdsc.edu/meme/intro.html. I believe both tools have some limitations on input sizes

if you use their web interfaces. So if the number of sequences you provide exceeds the limit, you may

have to download and install a standalone version on your local computer.

Microarray Data Analysis Exercise: Clustering, Functional Enrichment, and Motif Finding, Assignments of Computer Science

Related documents

Partial preview of the text

Download Microarray Data Analysis Exercise: Clustering, Functional Enrichment, and Motif Finding and more Assignments Computer Science in PDF only on Docsity!

Homework 3

Due: Thursday, Dec 6, 8:30pm

Purpose

Data and tools needed

Problem 1: Clustering of microarray data (15 points)

Motif 1 Description

Motif 1 in BLOCKS format

19 ( 161) AATACAATCAGCTGC 1

7 ( 23) ACAACACTCAGAGTC 1

4 ( 16) CAAACACAAAACGGT 1

Bonus (5 points)