Multiple Sequence Alignment - Lecture Slides | BIT 150, Study notes of Bioinformatics

Material Type: Notes; Class: Applied Bioinformatics; Subject: Biotechnology; University: University of California - Davis; Term: Unknown 2006;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-yca-1
koofers-user-yca-1 🇺🇸

4.3

(3)

10 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Multiple Sequence Alignment (MSA)
1. Uses of MSA
2. Technical difficulties
1. Select sequences
2. Select objective function
3. Optimize the objective function
1. Exact algorithms
2. Progressive algorithms
3. Iterative algorithms
4. Consistency-based algorithms
3. Tools to view alignments
1. MEGA
2. BOXSHADE & Seq. LOGOS
Chapter 12 &
Notredame 2002.pdf
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Multiple Sequence Alignment - Lecture Slides | BIT 150 and more Study notes Bioinformatics in PDF only on Docsity!

Multiple Sequence Alignment (MSA)

1. Uses of MSA2. Technical difficulties

1. Select sequences2. Select objective function3. Optimize the objective function

1. Exact algorithms2. Progressive algorithms3. Iterative algorithms4. Consistency-based algorithms

3. Tools to view alignments

1. MEGA2. BOXSHADE & Seq. LOGOS

Chapter 12 &Notredame 2002.pdf

USES of Multiple Sequence Alignment

Fig. from Boris Steipe U. ofToronto

If the MSA is incorrect, all the

above inferences will be

incorrect!

Sequencerelationships

MSA can be used to: •^ Infer function •^ Predict secondary structure •^ Phylogenetic reconstruction •^ Identify residues important

for function

-^ Sensitive database searching

algorithms

MSA of

ZCCT

genes

I HvCO4I OsCI HvCO

I OsF I HvCO3I OsBI AtCO I AtCOL

I OsHd

I HvCO I HvCO I OsG

II AtCOL6II OsJ

IV HvCO9IV OsH

IV OsIZCCT2 HaZCCT2 Hb

ZCCT2 Tm ZCCT2 TdZCCT1 TmG

ZCCT1 TmDV92ZCCT1 Td

III AtCOL9III OsN

83

82 91

99

60 66 73

94

56 53

99

I^ HvCO I^ OsC^ IV^ OsI

III^

Os N

I^ HvCO7^ I

OsF I^ HvCO3 I OsB^ I^ AtCO I^ AtCOL

I^ OsHD

I^ HvCO I^ HvCO I^ OsG

II^ AtCOL

6 II^ OsJ

IV^ HvCO9 IV^ OsH

III^

AtCOL

9

HvZCCT-Ha

HvZCCT-Hb

TmZCCT TdZCCT2TmZCCT1-G

TmZCCT1-DV92TdZCCT

Photoperiodresponse

Cluster analysis of the CCT domains

Exon 1

Exon 2

CCT

ZF

Yan et al. 2004. Science 303:

+^

+^

++ +

+

+^

+^

+ +

+^

+^

+^

+

+

IV II III

CCT domains

EMSco-

ZCCT I

Multiple Sequence Alignment (MSA)

1. Uses of MSA2. Technical difficulties

1. Select sequences2. Select objective function3. Optimize the objective function

1. Exact algorithms2. Progressive algorithms3. Iterative algorithms4. Consistency-based algorithms

3. Tools to view alignments

1. MEGA2. BOXSHADE & Seq. LOGOS

Multiple Sequence Alignment (MSA)

1. Uses of MSA2. Technical difficulties

1. Select sequences2. Select objective function3. Optimize the objective function

1. Exact algorithms2. Progressive algorithms3. Iterative algorithms4. Consistency-based algorithms

3. Tools to view alignments

1. MEGA2. BOXSHADE & Seq. LOGOS

Fig. from Boris Steipe Univ. of Toronto

MSA. Technical difficulties

We need an Objective Function (OF) to measure which alignment is better

Seq.

AT-AATG

Induced

Seq.

alignment

Seq.

CTGAG-G

Seq.

AT-AATG

Distance

scheme:

Seq.

ATGAA-G

Seq.

ATGAA-G

#^

mismatches

(including

-)

Simplest OF: Sum-of-pairs (SP)

After

the best MSA is obtained, non-alignable sequences and spaces facingspaces are removed and a score is calculated for the

induced MAS

using

any chosen scoring scheme (distance or similarity).

3

Seq.

AT-AATG

Seq.

CTGAG-G

Seq.

ATGAA-G

Sum-of-pairs distance = 4 + 3 + 2 = 9

Sum-of-pairs score

:^ The SP of a MSA is the sum of the scores of all the

scores of the

induced

pairwise global alignments

Weighted Sum-of-pairs score:

each score can be multiplied by a

weight to reflect evolutionary distances.

New Objective Functions

Multiple MSA:

Depending on the Mutation Data matrix selected and on the

selected gap penalties (opening and extension) very different MSA will beobtained.

Which one is the correct one?

3

New generation of Objective functions

:^ less sensitive to gap penalty

estimations thanks to the incorporation of local information•

Segment-to-segment

comparisons

of the sequences (instead of character-

to-character) without gap penalties is the strategy used by

DiAlign

. This

approach is efficient for

local

similarities, (genomic DNA, many protein families)

http://bibiserv.techfak.uni-bielefeld.de/dialign/

.

-^

Consistency objective function

: (e.g.

T-Coffee

DiAlign

). The optimal

MSA is defined as the one that agrees the most with all the optimal pair-wisealignments. “Given a set of independent observations the most consistent areoften closer to “the truth”.

Seq.

AT-AATG

Seq.

ATAATG

Clustal

Seq.

CTGAG-G

Distance

scheme

Seq.

CTGAGG

Gap

open=

Seq.

ATGAA-G

#^

mismathes

(including

-)

Seq.

ATGAAG

Gap

ext.=

From Boris Steipe Univ. of Toronto

MSA. Technical difficulties

MSA: Exact algorithm^ MSA program^ •

Multidimensional dynamic programming

-^

Optimizes sum-of-pairs

-^

More accurate than progressive methods

-^

BUT… Time proportional to L

n

-^

Practical to ~10 seq. of L<200-bp

DCA
( D

ivide &

C

onquer

A

lgorithm)

•^

Sits on top of MSA program

-^

Produces simultaneous MSA

-^

Cuts seq. in subsets, that are fed into MSA

-^

Practical to ~20-30 seq. of L<200-bp

-^

Easy WEB submission

http://bibiserv.techfak.uni-bielefeld.de/dca/

OMA
( O

ptimal

M

ultiple

A

ligment)

•^

Iterative implementation of DCA

-^

Speeds up DCA

-^

Decreases memory requirements

http://bibiserv.techfak.uni-bielefeld.de/oma/

Progressive algorithms

(ClustalW

, MultAlign, AMPS)

Example of Progressive algorithm •^

Calculate distances/similarities between sequences

-^

Construct a tree

-^

Add sequentially, following tree

CLUSTALW • Non-iterative & deterministic.• OF: weighted sum-of-pairs.• Affine gap penalties.• Automatic substitution matrixchoice.• Most popular.• Performs well in dense treeswithout obvious outliers(needs stepping stones).•^ Can use SwissProtsecondary structureinformation for gap penaltyestimation.

Problems with progressive algorithms

GARFIELDTHEFASTCAT---GARFIELDTHELASTFATCAT

GARFIELD

THE

LAST

FAST

CAT

GARFIELD

THE

FAST

CAT

GARFIELD

THE

VERY

FAST

CAT

THE

FAT

CAT

GARFIELDTHEVERYFASTCATGARFIELDTHEFASTCAT----GARFIELDTHELASTFAT-CAT

DCA--------THEFA-----TCATGARFIELDTHEVERYFASTCATGARFIELDTHEFAS----TCATGARFIELDTHELASTFA-TCAT--------THEFAT-----CATGARFIELDTHEVERYFASTCATGARFIELDTHEFASTCAT----GARFIELDTHELASTFAT-CAT

alignment

ClustalW

Blosum

Gap

11-

Cheaper to open distal gapthan to align C and F

1. Bad decisions taken in the

initial alignments will persistthroughout the process andcannot be corrected

2. For numerous sequences may

take long time to calculate tree.MUSCLE is a good rapidalternative

Iterative algorithmsRecurrent modifications of suboptimal solutions SAGA •

Uses a ‘Genetic Algorithm’

-^

Can use different objective functions (e.g. Coffee)

-^

Mutations randomly insertion or shift gaps

-^

Sequences can recombine

-^

Sequences evolve, higher OF scores survive

Gibbs sampler

( Local

MSA)
•^

Finds un-gapped motifs

-^

Segments are removed or added to increasea P value

-^

http://bayesweb.wadsworth.org/gibbs/gibbs.html

GA

s and

HMM

s have been rather

disappointing in

ab initio

alignments.

Better: Pre-compute MSA with otherprogram and then use this ones foroptimization

Evolution of a seq. alignment by recombination

Compatible ends

Hammer

( also SAM)

•^

Build Hidden Markov Models based on seq.

-^

Align sequences to HMM

-^

http://hmmer.janelia.org/

Multiple Sequence Alignment (MSA)

1. Uses of MSA2. Technical difficulties

1. Select sequences2. Select objective function3. Optimize the objective function

1. Exact algorithms2. Progressive algorithms3. Iterative algorithms 4. Consistency-based algorithms

3. Tools to view alignments

1. MEGA2. BOXSHADE & Seq. LOGOS