Making and Using DNA Microarrays: A Comprehensive Guide - Prof. Sandra L. Rodriguez-Zas, Study notes of Data Analysis & Statistical Methods

An in-depth exploration of dna microarrays, their creation through robotic spotting and in-situ synthesis, and their application in gene expression analysis. Topics include microarray technologies, making microarrays, labeling methods, and image acquisition. This guide is essential for students in the field of molecular biology and genetics.

Typology: Study notes

2010/2011

Uploaded on 10/11/2011

zee-heart
zee-heart 🇺🇸

1 document

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ANSC 545 - 1 - Created by Sandra Rodriguez-Zas©
ANALYSIS OF GENE EXPRESSION PROFILES
CHAPTER ONE Microarrays: Making Them and Using Them
DNA microarray:
set of DNA reagents on a solid surface
solid surface (e.g. microscope slide) with single-strand nucleotide chain
external cDNA sample is hybridized to DNA
detect abundance of labeled nucleic acids in a sample
measures the expression of many thousands of genes simultaneously
Reporters = probe = element = DNA on array
Hybridization extract = target = sample DNA
Slide = array = microarray
Two-dye (spotted arrays) and one-dye arrays (in situ synthesized chips)
(Hamadeh and Afshari, 2000) (http://jcsmr.anu.edu.au)
Array assesses abundance of thousands of sequence transcripts in sample
Gene expression array experiments involve large and complex data sets
Global analysis of gene expression levels:
Allows identification of differentially expressed genes
Can suggest gene function and identify networks
Help detect genes associated with states (e.g. disease)
Can be used as diagnostic tool
Can suggest targets for new drugs or bioengineering
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Making and Using DNA Microarrays: A Comprehensive Guide - Prof. Sandra L. Rodriguez-Zas and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

ANALYSIS OF GENE EXPRESSION PROFILES

CHAPTER ONE Microarrays: Making Them and Using Them

DNA microarray:

set of DNA reagents on a solid surface

solid surface (e.g. microscope slide) with single-strand nucleotide chain

external cDNA sample is hybridized to DNA

detect abundance of labeled nucleic acids in a sample

measures the expression of many thousands of genes simultaneously

Reporters = probe = element = DNA on array

Hybridization extract = target = sample DNA

Slide = array = microarray

Two-dye (spotted arrays) and one-dye arrays (in situ synthesized chips)

(Hamadeh and Afshari, 2000) (http://jcsmr.anu.edu.au)

Array assesses abundance of thousands of sequence transcripts in sample

Gene expression array experiments involve large and complex data sets

Global analysis of gene expression levels:

Allows identification of differentially expressed genes

Can suggest gene function and identify networks

Help detect genes associated with states (e.g. disease)

Can be used as diagnostic tool

Can suggest targets for new drugs or bioengineering

SECTION 1.2 MAKING MICROARRAYS

Two main technologies: robotic spotting and in-situ synthesis

Robotic spotting

Spotting robot

Pins are held in a cassette held on a robot arm

Arm moves between microtiter well plates and arrays to deposit probes

Each pin spots a different grid on the array

Steps of spotted array synthesis

Spotted Microarrays

Process:

1. Make DNA probes (parallel PCR or presynthesized oligonucleotides)

2. Robotic spotting of DNA onto glass surface

3. Post-spotting processing

array surface is fixed (minimizes target DNA attaching to the glass)

Attachment of sequences to slide:

covalent

non-covalent

Covalent:

aliphatic amine group is added to 5’ end of probe

probes are attached to glass from 5’ end

SECTION 1.3 USING MICROARRAYS

Steps:

1. Sample preparation and labeling

2. Hybridization

3. Washing

4. Image acquisition

Sample preparation and labeling

Variable process of extraction of RNA from the sample

Labeling:

biotin: oligonucleotide Affymetrix

fluorescent dyes:

Cy3 (excited by laser at approximate wavelength 532 nm, fluoresces green)

Cy5 (excited by laser at approximate wavelength 635 nm, fluoresces red)

Spotted array experiments:

two samples are competitively hybridized to the arrays

(each sample received a different dye)

simultaneous measurement of both samples

Most common labeling methods:

Direct incorporation (labeling):

mRNA is primed with a poly-T primer

reverse transcription starts from the polyadenylation signal at the 3' UTR

dCTP (or dUTP) with fluorescent Cy dye is added

transcripts are typically a few hundred bases complementary to 3’ end

Cy5 is incorporated less well by reverse transcriptase than Cy

Indirect labeling

use amino-allyl-modified dC (smaller than the Cy-modified dCTP)

reverse transcription is more efficient

following reverse transcription:

cDNA is reacted with active ester of the dye

dye becomes attached to the modified dCs in the cDNA

advantage:

each target has the same "foreign" base incorporated at the same rate

Cy5 is incorporated as well as Cy3 by reverse transcriptase

transcripts are typically a few hundred bases complementary to 3’ end

Hybridization:

base pairing heteroduplexes between probe and target

influenced by temperature, humidity, concentrations (salt, formamide)

volume of target solution

operator

robotic or manual

After hybridization, washing of slides:

remove excess hybridization solution

(only labeled target has specifically bound to the features on the array)

reducing cross-hybridization

(only DNA complementary to each of the features will remain bound)

KEY POINTS SUMMARY

Microarray technologies:

• Spotted cDNAs

• Spotted oligonucleotides

• Light-directed in-situ synthesized arrays

• Inkjet in-situ synthesized arrays

Steps in using a microarray:

Target preparation

Hybridization

Washing

Image acquisition using a scanner

CHAPTER TWO Sequence Databases for Microarrays

Bioinformatics databases can be used to:

1. Annotate or find information about the microarray probes.

Identify genes represented by probes that may be differentially expressed

in the experiment.

2. Design probes to include in a microarray.

Obtain information about sequences and splice variants.

1. Primary databases are main source of general sequence information:

GenBank, EMBL, DNA Data Bank

2. Secondary and genomic databases are useful for microarray design

a. UniGene

Cluster of multiple ESTs and mRNA representing a unique gene

Splice variants are not identified and are clustered together

Multiple species, no consensus sequence of cluster

b. TIGR Gene Indices

Multiple species (more than UniGene), consensus sequence of cluster

Splice variants are identified

c. Reference Sequence (RefSeq)

Less species and sequences than Unigene and TIGR

Well-curated clusters and splice variants are identified

DESCRIPTION OF 2-DYE MICROARRAY EXPERIMENT

EXAMPLE USED ALONG THE COURSE

During the course we will study in class and as part of the homework

assignments, the expression of 100 genes in three neural tissues in mice

Tissues (samples):

Cerebral Cortex

Mid Brain

Spinal Cord

Experimental design:

reference standard sample

each 2-dye microarray includes one reference sample

there are 2 microarrays per tissue, with dye-swap design, so each tissue

received the Cy3 dye in one microarray and the Cy5 in the other microarray

and the reference received the alternative dye in each microarray:

For each tissue:

Microarray I: Cy3=tissue, Cy5=reference

Microarray II: Cy5=tissue, Cy3=reference

Each gene is double spotted in the microarray

Statistical software used along the course: SAS

SAS tutorials:

http://www.umanitoba.ca/centres/mchp/teaching/sasmanual/index.html

http://www.itc.virginia.edu/research/sas/training/v8/

http://www.ats.ucla.edu/stat/sas/

SAS Manual: http://support.sas.com/documentation/onlinedoc/sas9doc.html

TWO-DYE SYSTEM DATA FILE FORMAT

There is one file per microarray

The structure of a typical data file (for example .gpr file), after the images

are scanned and the fluorescence intensities are translated into numbers,

consists of:

rows= microarray probe, and

columns= information on the microarray probe (i.e. location, intensity

value)

The number of columns varies with the program settings. The most critical

columns are:

Block number

Column number (within block)

Row number (within block)

Name of microarray probe or element

Identification of microarray probe or element

Location of spot on the horizontal axis of the array

Location of spot on the vertical axis of the array

Diameter of the spot

Median foreground intensity of Cy5 pixels or median of F

Mean foreground intensity of Cy5 (red) pixels

Standard deviation of foreground intensity of Cy5 pixels

Median background intensity of Cy5 pixels or median B

Mean background intensity of Cy5 pixels

Standard deviation of background intensity of Cy5 pixels

[Repeat intensity measurements for Cy3 or 532 or green]

Ratio of Medians

Ratio of Means

Median of Ratios

Mean of Ratios Foreground Background

Flags (negative values denote unsuitable spots, 0 denotes OK spot)

DESCRIPTION OF 2-DYE MICROARRAY DATA SET TO BE

ANALYZED INCLUDING ALL 6 MICROARRAYS

All 6 files were appended vertically one after the other.

All files have the same column order and are comma-delimited

Prior to appending three new columns were created in each file to denote

the tissue and dye assignment of each file

The new columns allow the identification of arrays and are denoted in italics

Creation of additional columns can be done in Excel

Appending files vertically can be done with Excel, Word, text editor, Unix

command

Id tissue cy3 cy5 block column row Flags cy5_fore cy5_back cy3_fore cy3_back

1 Cer Cer Ref 3 2 3 0 500 40 1000 50

1 Cer Ref Cer 3 2 3 0 1460 180 460 80

1 Cer Cer Ref 3 3 3 0 1030 55 1490 45

1 Spi Spi Ref 3 2 3 -100 25 20 30 21

100 Mid Ref Mid 32 24 20 0 20000 80 1000 60

id = Unique identifier for sequence

tissue = Name of the tissue

cy3 = Name assigned to the cy3 labeled sample

cy5 = Name assigned to the cy5 labeled sample

block = Printing block of the array

column = Column within the array block

row = Row within the array block

Flags = Flags provided by software

cy5_fore = foreground of cy5 labeled sample

cy5_back = background of cy5 labeled sample

cy3_fore = foreground of cy3 labeled sample

cy3_back = background of cy3 labeled sample

SAS code to read the data file including the 6 microarrays

(available in the “SAS code” folder in the course website)

Data (available in the “Demo data” folder in the course website)

options ls=110 ; * set page output dimensions; data all;

  • Allocate sufficient size for variables*; informat id $20. tissue cy3 cy5 $14.;
  • Identify file, you must change this: A) To the name of the file assigned to you B) Under Windows, you need the full pathname
  • lrecl=2000 tells SAS to expect lines up to 2000 characters in length
  • dlm=',' tells SAS that a comma separates (delimits) columns *; infile 'C:\Documents and Settings\Rodrgzzs\My Documents\ArrayClass\ARRAY_DATA_1' lrecl=2000 dlm=',';
  • Tell SAS the names of the columns
  • $ denotes that the preceding column has characters in it
  • Variable Names:
  • id = Unique identifier for sequence
  • tissue = Name of the tissue
  • cy3 = Name assigned to the cy3 labeled sample
  • cy5 = Name assigned to the cy5 labeled sample
  • block = Printing block of the array
  • column = Column within the array block
  • row = Row within the array block
  • Flags = Flags provided by software
  • cy5_fore = foreground of cy5 labeled sample
  • cy5_back = background of cy5 labeled sample
  • cy3_fore = foreground of cy3 labeled sample
  • cy3_back = background of cy3 labeled sample *; input id $ tissue $ cy3 $ cy5 $ block column row Flags cy5_fore cy5_back cy3_fore cy3_back; run ;
  • Check data entry and obtain descriptive statistics; proc means data=all; * provides default statistics; run ; *proc means data=all n nmiss mean stddev stderr min max p1 p10 p25 p5 p50 p p90 p95 p99 cv; * specify statistics; *run; proc freq data=all; tables tissue cy3 cy5 flags; run ;
  • You can sort the data by group and check the data by group ; *proc sort data=all; *by flags; *run; *proc means data=all; *var cy3_fore cy3_back; *by flags; run; proc freq data=all; tables flagscy3 flagscy5 flagstissue; *run;

Use PROC MEANS to obtain means, standard deviations, number of

observations, minimum and maximum values of the quantitative

variables

Simplified example of SAS procedure to obtain descriptive statistics (e.g.

averages, minimum, etc.) for the data “all”

proc means data=all;

run;

Use PROC FREQ to count the relative and absolute frequency of qualitative

and quantitative variable values (e.g. tissue, block)

Simplified example of SAS procedure to obtain frequencies

proc freq data=all;

tables tissue cy3 cy5 flags;

run;

To get descriptive statistics or frequencies by group level in SAS, the data

must be sorted by group and the procedure must include a “by” statement

Use PROC SORT (followed by the “by” statement) to sort data sets by the

levels of the variable(s) of interest (e.g. tissue).

For example,

proc sort data=all;

by tissue;

run;

proc means data=all;

var cy3_fore;

by tissue;

run;

Partial SAS output

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum

block 1200 17.0800000 9.1609790 1.0000000 32. column 1200 12.8900000 7.2898687 1.0000000 24. row 1200 4.3000000 3.3074334 1.0000000 20. Flags 1200 -2.0000000 9.8020440 -50.0000000 0 cy5_fore 1200 1933.46 3631.92 107.0000000 37140. cy5_back 1200 183.2875000 69.5292752 95.0000000 434. cy3_fore 1200 2079.78 3791.55 137.0000000 49034. cy3_back 1200 212.3533333 82.4677064 79.0000000 604.


The FREQ Procedure

Cumulative Cumulative tissue Frequency Percent Frequency Percent


CerebralCortex 400 33.33 400 33. MidBrain 400 33.33 800 66. SpinalCord 400 33.33 1200 100.

Cumulative Cumulative cy3 Frequency Percent Frequency Percent


CerebralCortex 200 16.67 200 16. MidBrain 200 16.67 400 33. Reference 600 50.00 1000 83. SpinalCord 200 16.67 1200 100.

Cumulative Cumulative cy5 Frequency Percent Frequency Percent


CerebralCortex 200 16.67 200 16. MidBrain 200 16.67 400 33. Reference 600 50.00 1000 83. SpinalCord 200 16.67 1200 100.

Cumulative Cumulative Flags Frequency Percent Frequency Percent


-50 48 4.00 48 4. 0 1152 96.00 1200 100.