Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Gene Expression Analysis and Statistics: A Case Study in Bioinformatics - Prof. Nicholas S, Lab Reports of Computer Science

Utah State University (USU)Computer Science

Prof. Nicholas S. Flann

A case study on gene expression analysis and statistics using bioinformatics. It covers the background of gene expression analysis using microarray technology, preprocessing of data, differential expression testing, and multiple comparison adjustment. The document also includes visualization and annotation tests. The case study uses data from acute lymphoblastic leukemia (all) with 12,625 genes and 128 samples, and focuses on the main ideas of the analysis.

Typology: Lab Reports

Pre 2010

Uploaded on 07/30/2009

koofers-user-ypi 🇺🇸

10 documents

1 / 45

This page cannot be seen from the preview

Don't miss anything!

Gene Expression Analysis

and Statistics: A Case Study

Bioinformatics: Problems and Solutions

Dr. John R. Stevens

Utah State University

CS 5890/6890

9 July 2007

Discover Lab Reports of Computer Science Utah State University (USU)

Partial preview of the text

Download Gene Expression Analysis and Statistics: A Case Study in Bioinformatics - Prof. Nicholas S and more Lab Reports Computer Science in PDF only on Docsity!

Gene Expression Analysisand Statistics: A Case Study

Bioinformatics: Problems and SolutionsDr. John R. StevensUtah State UniversityCS 5890/68909 July 2007

Outline

Background – ALL data

Differential Expression

Multiple Comparisons

Report File

Visual Summaries

Annotation Test

Reference: Bioinformatics and Computational BiologySolutions Using R and Bioconductor (2005), edited byGentleman et al.

Here, focus on the main ideas

Microarray Technology: General Idea

Use mRNA transcript

abundance level as a measure

of the level of “expression” for the correspondinggene

Represent thousands of genes on an array

each spot on array corresponds to a specific gene(a set of probes on the array represent a gene)

“apply” a prepared sample to chip (array)

mRNA in sample hybridizes to array

fluoresces

look at spot intensities

intensity assumed proportional to mRNA transcriptabundance level

Background: ALL data

cute L

ymphoblastic L

eukemia (Ritz lab)

12,625 genes

128 samples (arrays)

phenotypic data on all 128 patients, including:

95 B-cell cancer patients

33 T-cell cancer patients

Reference: Chiaretti et al., Blood (2004)103(7)

Array images and “quality”

(code chunk 1)

residual images from 2 of the 128 arrays

Boxplots and “Preprocessing”

(code chunk 2)

boxplots of log2intensities from 20 ofthe 128 arrays

Common preprocessing methods– each has strengths and weaknesses

MAS5

Affymetrix’s algorithm

dChip

Li & Wong’s model (probe-level model)

vsn

“Variance Stabilization”

RMA

fits probe-level model to all arrays(assume probes have same “effect” on each array)

GCRMA

RMA-based, with additional adjustment for position of G & Cnucleotides in probe sequence

(more proposed all the time – active area of research)

Each point represents asingle gene’sexpression level on onearray afterpreprocessing all thearrays togetherNotice similarities anddifferences

(code chunk 3)

Simple / Naïve test of DE

“Observe” gene expression levels under twoconditions (after preprocessing)

Calculate: average log fold change

Make a cut-off: R

treatment" "

of j

replicate

in k

gene

level

expr.

log

ijk Y

gene

for

change

fold

log

ave.

treatment

in k

gene

for

expr.

log

ave.

−

= =

⋅

k k i

LFC

significan "

is k

Gene

What’s wrong with the naïve test?

Summary interpretation okay:

LFC > 0 for “up-regulated” genesLFC < 0 for “down-regulated” genes

Ignores variability of estimate

cannot really test for “significance”

what if larger LFC have large variability?

then not necessarily significant

How to use this to “test” for DE?

What is being tested?

Null: No change for gene k

Under null, t

k

~ t dist. with n

k

d.f.

(expected distribution under repeat sampling)

But what is needed to do this?

“Large” sample size

Estimate

k

= “pop. SD” for gene k

“parametric” assumption

(code chunk 3.5)

What if we don’t have a large enoughsample size? (note: probably don’t)

Two main problems:

Estimate

k

(especially for small sample size)

Appropriate sampling distribution of test stat.

Basic solutions:

To estimate

k

: Pool information across genes

For comparison:

use parametric assumption on “improved” test stat.

use non-parametric methods

Assumptions in linear model (Smyth)

[

]

w w

w k

k k

k k w k k w k w k

t

V

t

p

n

d

V

N

w

V

Var

Then

d.f.

resid.

covariate

For

and

estimates

Obtain

k not necessary here

This is a traditional statisticallinear regression model.

Hierarchical model to borrow informationacross genes (Smyth): eBayes

[

]

w w

w k

d s d E d s

s d

statistic

moderated" "

the

Then

ˆ |

mean

posterior

Consider t

methods)

Bayes

empirical

using

data

from

estimated

and

(

distributi

prior

Assume

(^20) 0

(^20)

(^20) 0

represents added information from usingall genes

(using all of the genes)

Gene Expression Analysis and Statistics: A Case Study in Bioinformatics - Prof. Nicholas S, Lab Reports of Computer Science

Related documents

Partial preview of the text

Download Gene Expression Analysis and Statistics: A Case Study in Bioinformatics - Prof. Nicholas S and more Lab Reports Computer Science in PDF only on Docsity!

Gene Expression Analysisand Statistics: A Case Study

Bioinformatics: Problems and SolutionsDr. John R. StevensUtah State UniversityCS 5890/68909 July 2007

Affymetrix’s algorithm

Li & Wong’s model (probe-level model)

“Variance Stabilization”

fits probe-level model to all arrays(assume probes have same “effect” on each array)

RMA-based, with additional adjustment for position of G & Cnucleotides in probe sequence

log

k

k

k

k

k

Assumptions in linear model (Smyth)

[

]

t

V

t

p

n

d

d

V

N

w

V

Var

Then

d.f.

resid.

covariate

For

and

and

estimates

Obtain