Multiple Alignment Motif Finding - Lecture Notes | CMSC 423, Study notes of Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-rt4
koofers-user-rt4 🇺🇸

10 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC423: Bioinformatic Algorithms,
Databases and Tools
Lecture 13
multiple alignment
motif finding
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Multiple Alignment Motif Finding - Lecture Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity!

CMSC423: Bioinformatic Algorithms,

Databases and Tools

Lecture 13

multiple alignment

motif finding

Recap

• Multiple alignment is expensive – O(nk) for k

sequences of length n (use same DP as for pairwise

but on a k-dimensional matrix)

• Approximation algorithm (star alignment) can find a

solution in O(n

2

k

2

) which is at most twice worse than

the best alignment

Iterative alignment revisited

  • Pick a sequence (e.g. SC) as a starting point
  • Align S1 to it & build consensus for the alignment
  • Take S2 and align it to the consensus (instead of SC)
  • repeat...
  • Problem: consensus (or any single sequence) ignores the other

sequences being aligned.

  • Solution: keep track of % of each amino-acid aligned in each

column

  • score of alignment to profile – combination of scores to each

AA.

Profile alignment

  • Solution: keep track of % of each amino-acid aligned in each

column

  • score of alignment to profile – combination of scores to each

AA.

  • Score(prof1, prof2) = weighted average of all pairs of amino-

acids

S1 YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVAHLDDLPGAL S2 YFPHF-DLS-----HG-AQVKG—GKKVA-----DALTNAVAHVDDMPNAL S3 FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS S4 LFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL

50% S 25% N 100% F 25% -

75% A 25% Q

Iterative alignment

• Take sequences si in order:

  • align s1 with sc - results in gaps being inserted in both sequences
  • align s2 with sc - if gaps must be inserted – insert in previously aligned sequences
  • and so on (note: if gaps coincide with previously introduced gaps no need to change previously aligned sequences)

SC YFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGAL

SC YFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGAL S1 YFPHFDLSHG-AQVKG--KKVADALTNAVAHVDDMPNAL

SC YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVAHLDDLPGAL S1 YFPHF-DLS-----HG-AQVKG—GKKVA-----DALTNAVAHVDDMPNAL S2 FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS

SC YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVAHLDDLPGAL S1 YFPHF-DLS-----HG-AQVKG—GKKVA-----DALTNAVAHVDDMPNAL S2 FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS S3 LFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL

CLUSTALW

• Compute pairwise distances between strings

• Build phylogenetic tree

• Build iterative alignment by following tree edges

s

s

s s

s

s

Biological relevance of multiple alignments

Motif finding

Motif finding...example

From genetics.mgh.harvard.edu/sheenlab/

TTAGAGGTTGACTATTCAACTTTTGAGGAGGCCTAG TAGAGC

AGCCGACTTGCAACTTAGGCGTGGTCAGTGCCCTAA TAGAGC

GGCCTATTTGGGCCACTTAGACCTTCAACTTTTGCA TAGAGC

CCACAGTTAGATGTCCAAAAGACAAATATAGAGGGC TAGAGC

ACACGGACTGCGTTCAATGCTTACAGCAGATTGAGT TAGAGC

TTCAAAGACTTGACTATTGTTCAACTTTGAAGACTA TAGAGC

Promoter region Gene

Motif “sequence logo”

Finding motifs – Gibbs sampling

• Observations:

  • since no gaps – all motifs have equal length (assume known value - m)
  • exhaustive search of promoter region is impractical: all combinations of substrings of length m among k sequences of length L = (L – m + 1)k
  • Solution: random search
  1. Pick random substring of length m from each of the strings
  2. Construct multiple alignment (easy since no gaps) and compute

profile

  1. Pick random sequence s and remove from multiple alignment.

Recompute profile.

  1. Within removed sequence, search for best fit to profile and

insert into alignment

  1. Repeat until profile does not improve

Phylogenetic trees

Phylogenetic trees – how evolution works

• http://www.tolweb.org/tree/ - the tree of life

Phylogeny questions

  • Given several organisms & a set of features (usually sequence,

but also morphological: wing shape/color...)

  • A. Given a phylogenetic tree – figure out what the ancestors

looked like (what are the features of internal nodes)

  • B. Find the phylogenetic tree that best describes the common

evolutionary heritage of the organisms

wings, feathers, teeth claws, no wings, fur

A C

B A

B

B

A

C C