Spliced Alignment Problem: Solution and Secondary Structure Prediction in RNA - Prof. Saur, Study notes of Computer Science

The spliced alignment problem, which aims to find the chain of exons with the maximum sequence similarity to a target protein sequence. The document also covers the basics of rna secondary structure, including base pairing, stem-loop structures, and non-canonical base pairs. The document further explains two approaches for predicting rna secondary structures: nussinov's algorithm and zuker's algorithm.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-40p
koofers-user-40p 🇺🇸

10 documents

1 / 42

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dynamic Programming
(cont’d)
CS 466
Saurabh Sinha
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a

Partial preview of the text

Download Spliced Alignment Problem: Solution and Secondary Structure Prediction in RNA - Prof. Saur and more Study notes Computer Science in PDF only on Docsity!

Dynamic Programming

(cont’d)

CS 466

Saurabh Sinha

Spliced Alignment

  • Begins by selecting either all putative exons between potential acceptor and donor sites or by finding all substrings similar to the target protein (as in the Exon Chaining Problem).
  • This set is further filtered in a such a way that attempt to retain all true exons, with some false ones.
  • Then find the chain of exons such that the sequence similarity to the target protein sequence is maximized

The DAG

  • Vertices: One vertex for each block in B
  • Directed edge connecting non-overlapping blocks
  • Label of vertex = string of block it represents
  • A path through the DAG spells out the string obtained by concatenating that particular chain of blocks
  • Weight of a path is the score of the optimal alignment between the string it spells out and the target sequence

Dynamic programming

  • Genomic sequence G = g 1 g 2 …g n
  • Target sequence T = t 1 t 2 …t m
  • As usual, we want to find the optimal alignment score of the i-prefix of G and the j-prefix of T
  • Problem is, there are many i-prefixes possible (since multiple blocks may include position i)

If i is not the starting vertex of block B :

  • S(i, j, B) = max { S(i – 1, j, B) – indel penalty S(i, j – 1, B) – indel penalty S(i – 1, j – 1, B) + δ(g i , t j ) } If i is the starting vertex of block B :
  • S(i, j, B) = max { S(i, j – 1, B) – indel penalty max all blocks B’ preceding block B S(end(B’), j, B’) – indel penalty max all blocks B ’ preceding block B S(end(B’), j – 1, B’) + δ(g i , t j ) }

RNA secondary structure

prediction

RNA

  • There’s more to RNA than mRNA
  • RNA can adopt interesting non-linear structures, and catalyze reactions
  • tRNAs (transfer RNAs) are the “adapters” that implement translation

Secondary structure

  • Several interesting RNAs have a conserved secondary structure (resulting from base- pairing interactions)
  • Sometimes, the sequence itself may not be conserved for the function to be retained
  • It is important to tell what the secondary structure is going to be, for homology detection

Basics of secondary structure

  • G-C pairing: three bonds (strong)
  • A-U pairing: two bonds (weaker)
  • Base pairs are approximately coplanar

Basics of secondary structure

Secondary structure elements

Loop: single stranded subsequences bounded by base pairs loop at the end of a stem stem loop single stranded bases within a stem … only on one side of stem … on both sides of stem

Non-canonical base pairs

  • G-C and A-U are the canonical base pairs
  • G-U is also possible, almost as stable

Pseudoknot

2 11 9 18 (9, 18) (2, 11) NOT NESTED

Pseudoknot problems

  • Pseudoknots are not handled by the algorithms we shall see
  • Pseudoknots do occur in many important RNAs
  • But the total number of pseudoknotted base pairs is typically relatively small