Assignment: Optimal Overlap Alignment and K-Parse Alignment of DNA Sequences, Assignments of Computer Science

Instructions for an assignment that involves defining and computing optimal overlap alignment and k-parse alignment for given dna sequences. The assignment includes writing down the recurrences and initialization for dynamic programming algorithms, analyzing time complexity, and implementing the algorithm in a programming language. The input to the assignment are two fasta files, each containing one dna sequence.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-axy
koofers-user-axy 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assignment 2. (Due on O ctober 2.)
Instructions on submitting p rogramming solutions:
a. Your code should be easy to compile and run on csil-linux machines.
b. Email your code directory as a tarball (.tar.gz) to the instructor.
c. Give clear instructions in the email abo ut compiling (if necessary) and running the program.
1. (Problem 6.22 from text). Define an overlap alignment between two sequences v = v1…vn and w
= w1…wm to be an alignment between a suffix of v and a prefix of w. For example, if v = TATATA
and w = AAATTT, then a (not necessarily optimal) overlap alignm ent between v and w is
ATA
AAA
Optimal overlap alignment is an alignment that maximizes the global alignment score between
vi…vn and w1…wj, where the maximum is taken over all suffixes vivn of v and all prefixes w1…wj of
w. Given an algorithm which computes the optimal overlap alignm ent, and runs in time O(nm).
Assume that the score of an alignment is computed as usual, i.e., with a (5 x 5) scoring matrix δ.
(10 points)
2. Suppose u and v are two DNA sequences of length n each. Define a “k-parse” of a string s as a
sequence of substrings: π(s) = {s1, s2, … sr} where each si is a k-length substring of s, with the
condition that for any pair si = s[bi…ei] and sj = s[bj…ej], if i < j then ei < bj. That is π(s) is a
sequence of non-overlapping k-length substrings of s.
Given a k-parse π(u) of sequence u and a k-parse π(v) of sequence v, consider the task of finding
the optimal global alignment of π(u) and π(v) under the following scheme:
A k-mer x in π(u) may be align ed with a k- mer y in π(v)
A k-mer in either π(u) or π(v) may be aligned with a gap in the other k-parse.
The alignment preserves the order of k-m ers in each k-parse. That is, if x1 and y1 are
aligned and x2 and y2 are align ed, and if x1 is to the left of x2 in π(u), then y1 must be to the
left of y2 in π(v).
The score for aligning two k-mers x and y is equal to k – h(x,y)2 where h(x,y) is the
number of mismatches between x and y, or in other wo rds, the Hamming distance between
x and y.
The score for aligning a k-mer with a gap is zero.
(a) Write down the recurrences and initialization for a dynamic programming algorithm to find the
highest scoring global alignment of two g iven k-parses. (5 points)
(b) The optimal alignment of u and v is the highest scoring globa l alignment of a k-parse of u and
a k-parse of v, among all possible k-parses of the respective sequences. Write down the
recurrences and initialization for a dynamic programming algorithm to find the optimal alignment of
two given sequences, under this definition. (15 points)
(c) What is the time complexity of your alignment prog ram, in terms of n and k ? (5 points)
(d) Implement the algorithm in a programming language of your choice. The inputs to the
algorithm are two Fasta files, each with one DNA sequence. The command line should look
something like <yourprogramfilename> <fastafilename1> <fastafilename2>. The output should go
into standard output. (15 points)

Partial preview of the text

Download Assignment: Optimal Overlap Alignment and K-Parse Alignment of DNA Sequences and more Assignments Computer Science in PDF only on Docsity!

Assignment 2. (Due on October 2.) Instructions on submitting programming solutions: a. Your code should be easy to compile and run on csil-linux machines. b. Email your code directory as a tarball (.tar.gz) to the instructor. c. Give clear instructions in the email about compiling (if necessary) and running the program.

  1. (Problem 6.22 from text). Define an overlap alignment between two sequences v = v 1 …vn and w = w 1 …wm to be an alignment between a suffix of v and a prefix of w. For example, if v = TATATA and w = AAATTT, then a (not necessarily optimal) overlap alignment between v and w is ATA AAA Optimal overlap alignment is an alignment that maximizes the global alignment score between vi…vn and w 1 …wj, where the maximum is taken over all suffixes vi…vn of v and all prefixes w 1 …wj of w. Given an algorithm which computes the optimal overlap alignment, and runs in time O(nm). Assume that the score of an alignment is computed as usual, i.e., with a (5 x 5) scoring matrix δ. (10 points)
  2. Suppose u and v are two DNA sequences of length n each. Define a “k-parse” of a string s as a sequence of substrings: π(s) = {s 1 , s 2 , … sr} where each si is a k-length substring of s , with the condition that for any pair si = s [bi…ei] and sj = s [bj…ej], if i < j then ei < bj. That is π(s) is a sequence of non-overlapping k-length substrings of s. Given a k-parse π(u) of sequence u and a k-parse π(v) of sequence v , consider the task of finding the optimal global alignment of π(u) and π(v) under the following scheme:
    • A k-mer x in π(u) may be aligned with a k-mer y in π(v)
    • A k-mer in either π(u) or π(v) may be aligned with a gap in the other k-parse.
    • The alignment preserves the order of k-mers in each k-parse. That is, if x 1 and y 1 are aligned and x 2 and y 2 are aligned, and if x 1 is to the left of x 2 in π(u), then y 1 must be to the left of y 2 in π(v).
    • The score for aligning two k-mers x and y is equal to k – h(x,y)^2 where h(x,y) is the number of mismatches between x and y, or in other words, the Hamming distance between x and y.
    • The score for aligning a k-mer with a gap is zero. (a) Write down the recurrences and initialization for a dynamic programming algorithm to find the highest scoring global alignment of two given k-parses. (5 points) (b) The optimal alignment of u and v is the highest scoring global alignment of a k-parse of u and a k-parse of v , among all possible k-parses of the respective sequences. Write down the recurrences and initialization for a dynamic programming algorithm to find the optimal alignment of two given sequences, under this definition. (15 points) (c) What is the time complexity of your alignment program, in terms of n and k? (5 points) (d) Implement the algorithm in a programming language of your choice. The inputs to the algorithm are two Fasta files, each with one DNA sequence. The command line should look something like . The output should go into standard output. (15 points)