Bioinformatics Homework 2 - Edit Distance Problems - Prof. Yufeng Wu, Assignments of Biology

Information about the edit distance problems for undergraduate and graduate students in the bioinformatics course (bme/4800, cse 3800/5800) taught by yufeng wu in fall 2009. The homework includes four problems related to dynamic programming and edit distance, including showing the dynamic programming table for two strings, proving the expected score of random match is negative, finding a segment of a longer string with edit distance no more than a constant, and proving claims about the dynamic programming table for edit distance.

Typology: Assignments

Pre 2010

Uploaded on 02/25/2010

koofers-user-mu9
koofers-user-mu9 ๐Ÿ‡บ๐Ÿ‡ธ

9 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BME/4800, CSE 3800/5800 โ€” Bioinformatics โ€” Yufeng Wu โ€” Fall 2009
Homework 2 โ€”โ€”- Due: 10/6 in class.
Please be concise: each of the problems should not take more than 1 page. Under-
graduate students do problems 1, 2 and 3, and graduate students do problems 2, 3 and
4.
Problem 1
Recall that edit distance has cost 1 for each mismatch, insertion and deletion. Let S1= PARK and
S2= SPAKE. Show me a dynamic programming table for the edit distance (of two whole strings).
Show me how to find the optimal solution from the table.
Problem 2
Work out a step-by-step proof of the claim on p.24: expected score of random match is negative.
We went over this in class. Here I want you to write down for yourself the entire process. Concisely
show justification for each step.
Problem 3
Suppose we are given two strings S1and S2, where length of S1is n, length of S2is m and nis larger
than m. The objective is to find (if any) a segment of S1such that the edit distance between this
segment and S2is no more than K. Here, Kis a given constant. Recall in edit distance problem, it
has unit-cost substitution/insertion/deletion and matches score 0. The ob jective is to find the least
costly way of changing sequences.
Now show me a dynamic programming algorithm for this problem that should run in O(nm)
time. Note: you should clearly define the meaning of the any table you use, recursions, initialization,
how to find solutions and time analysis.
Problem 4
We are interested in the edit distance problem. Recall the simple dynamic programming formulation
we went over in class. If you forget it, try to write it down by yourself. In this formulation, we have
a table D, where D[i,j] is equal to the minimum cost of transforming S1[1..i] to S2[1..j].
(a) First prove the following claim: the values in the DP table D[i, j] along a line (horizontal, vertical
or diagonal in the increasing direction) can increase or decrease by at most one.
(b) Next prove something stronger: D[i, j ] can not decrease along diagonals. That is, D[i, j]โ‰ค
D[i+ 1, j + 1] for all i, j (assuming within range).
1

Partial preview of the text

Download Bioinformatics Homework 2 - Edit Distance Problems - Prof. Yufeng Wu and more Assignments Biology in PDF only on Docsity!

BME/4800, CSE 3800/5800 โ€” Bioinformatics โ€” Yufeng Wu โ€” Fall 2009

Homework 2 โ€”โ€”- Due: 10/6 in class.

Please be concise: each of the problems should not take more than 1 page. Under- graduate students do problems 1, 2 and 3, and graduate students do problems 2, 3 and

Problem 1

Recall that edit distance has cost 1 for each mismatch, insertion and deletion. Let S 1 = PARK and S 2 = SPAKE. Show me a dynamic programming table for the edit distance (of two whole strings). Show me how to find the optimal solution from the table.

Problem 2

Work out a step-by-step proof of the claim on p.24: expected score of random match is negative. We went over this in class. Here I want you to write down for yourself the entire process. Concisely show justification for each step.

Problem 3

Suppose we are given two strings S 1 and S 2 , where length of S 1 is n, length of S 2 is m and n is larger than m. The objective is to find (if any) a segment of S 1 such that the edit distance between this segment and S 2 is no more than K. Here, K is a given constant. Recall in edit distance problem, it has unit-cost substitution/insertion/deletion and matches score 0. The objective is to find the least costly way of changing sequences. Now show me a dynamic programming algorithm for this problem that should run in O(nm) time. Note: you should clearly define the meaning of the any table you use, recursions, initialization, how to find solutions and time analysis.

Problem 4

We are interested in the edit distance problem. Recall the simple dynamic programming formulation we went over in class. If you forget it, try to write it down by yourself. In this formulation, we have a table D, where D[i,j] is equal to the minimum cost of transforming S 1 [1..i] to S 2 [1..j]. (a) First prove the following claim: the values in the DP table D[i, j] along a line (horizontal, vertical or diagonal in the increasing direction) can increase or decrease by at most one. (b) Next prove something stronger: D[i, j] can not decrease along diagonals. That is, D[i, j] โ‰ค D[i + 1, j + 1] for all i, j (assuming within range).