Chaining Algorithms and Multiple Alignment - Lecture Notes | CMSC 423, Study notes of Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-r9g
koofers-user-r9g 🇺🇸

5

(1)

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC423 Fall 2008 1
CMSC423: Bioinformatic Algorithms,
Databases and Tools
Lecture 12
chaining algorithms
multiple alignment
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Chaining Algorithms and Multiple Alignment - Lecture Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity!

CMSC423: Bioinformatic Algorithms,

Databases and Tools

Lecture 12

chaining algorithms

multiple alignment

Jobs

• Applied Predictive Technologies – looking for the best

students – focus on databases (forwarded by Daniel

Hackner) -not bioinformatics

Path “planning” and dynamic programming

• One intuitive way to think about dynamic programming

  • similar to finding shortest path between two points
  • at each “point” ask – what are all possible ways to get here?
  • pick the best (shortest, fastest, etc.) DCDC Frederick Baltimore Harrisburg Philly NYC

Chaining in 1D

  • Sort the endpoints (starts, ends) of the intervals
  • For every interval j, store V[j] – best score of a chain ending in j
  • MAX – store highest V[j] seen sofar
  • Process endpoints in increasing order of x coordinate
  • If we encounter left end (start) of interval j
    • V[j] = weight(j) + MAX
  • If we encounter right end (end) of interval j
    • MAX = max{V[j], MAX}
  • Running time?

Multiple sequence alignment

Multiple sequence alignment

• Simultaneously identify relationship between multiple

sequences

• Note: multiple alignment implies (not necessarily

optimal) pairwise alignment between the individual

sequences

HBB_HUMAN FFESFGDLSTPDAVMGNPKVKAHGKKVL-----GAFSDGLAHLDNLKGTF HBB_HORSE FFDSFGDLSNPGAVMGNPKVKAHGKKVL-----HSFGEGVHHLDNLKGTF HBA_HUMAN YFPHF-DLS-----HGSAQVKGHGKKVA-----DALTNAVAHVDDMPNAL HBA_HORSE YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVGHLDDLPGAL MYG_PHYCA KFDRFKHLKTEAEMKASEDLKKHGVTVL-----TALGAILKKKGHHEAEL GLB5_PETMA FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS LGB2_LUPLU LFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL

  • :.. .:: *. : :. : HBA_HUMAN YFPHF-DLS-----HGSAQVKGHGKKVA-----DALTNAVAHVDDMPNAL HBA_HORSE YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVGHLDDLPGAL

But....here's a solution

• Dynamic programming solution. e.g. 3 sequences

• Score(i, j, k) – optimal alignment between s1[1..i],

s2[1..j], s3[1..k] – do DP as usual

• s(i,j,k) = max {

s(i-1, j-1, k-1) +

match(s1[i], s2[j], s3[k]),

s 1 s 2 s 3

But... it's expensive

• 3 sequences – need to fill in the cube O(n

3

• k sequences – k-dimensional cube O(n

k

) time/space

• There are tricks that can help – similar to AI

techniques for reducing the search space

• Basic idea – if we can estimate optimal score, we can

prune the search space.

• Note – these are just heuristics – not guaranteed to

work faster

13

Iterative alignment

• Take sequences si in order:

  • align s1 with sc - results in gaps being inserted in both sequences
  • align s2 with sc - if gaps must be inserted – insert in previously aligned sequences
  • and so on (note: if gaps coincide with previously introduced gaps no need to change previously aligned sequences) SC YFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGAL SC YFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGAL S1 YFPHFDLSHG-AQVKG--KKVADALTNAVAHVDDMPNAL SC YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVAHLDDLPGAL S1 YFPHF-DLS-----HG-AQVKG—GKKVA-----DALTNAVAHVDDMPNAL S2 FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS SC YFPHF-DLS-----HGSAQVKAHGKKVG-----DALTLAVAHLDDLPGAL S1 YFPHF-DLS-----HG-AQVKG—GKKVA-----DALTNAVAHVDDMPNAL S2 FFPKFKGLTTADQLKKSADVRWHAERII-----NAVNDAVASMDDTEKMS S3 LFSFLKGTSEVP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL

Theorem proof

  • Theorem: star alignment is 2-optimal
  • Assumption: distances obey triangle inequality OPT = ∑ si,sj d*(s i ,s j

si,sj D(s i ,s j )≥ k ∑ si D(s i , sc) STAR = ∑ si,sj d(s i ,s j

si,sj (D(s i , sc) + D(s j , sc)) # triangle ineq. = ∑ sj,sj D(s j , sc) + ∑ sj,sj D(s i , sc) = 2k ∑ si D(s i , sc) => STAR/OPT ≤ 2 Q.E.D note: ∑ si D(s i , sc) – is score optimized by choice of sc d*(si,sj) – score of alignment btwn si, sj within optimal alignment d(si,sj) – score of alignment btwn si, sj within star alignment D(si,sj) – score of optimal alignment btwn si, sj sc s i s j