
Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Information about the edit distance problems for undergraduate and graduate students in the bioinformatics course (bme/4800, cse 3800/5800) taught by yufeng wu in fall 2009. The homework includes four problems related to dynamic programming and edit distance, including showing the dynamic programming table for two strings, proving the expected score of random match is negative, finding a segment of a longer string with edit distance no more than a constant, and proving claims about the dynamic programming table for edit distance.
Typology: Assignments
1 / 1
This page cannot be seen from the preview
Don't miss anything!

BME/4800, CSE 3800/5800 โ Bioinformatics โ Yufeng Wu โ Fall 2009
Please be concise: each of the problems should not take more than 1 page. Under- graduate students do problems 1, 2 and 3, and graduate students do problems 2, 3 and
Recall that edit distance has cost 1 for each mismatch, insertion and deletion. Let S 1 = PARK and S 2 = SPAKE. Show me a dynamic programming table for the edit distance (of two whole strings). Show me how to find the optimal solution from the table.
Work out a step-by-step proof of the claim on p.24: expected score of random match is negative. We went over this in class. Here I want you to write down for yourself the entire process. Concisely show justification for each step.
Suppose we are given two strings S 1 and S 2 , where length of S 1 is n, length of S 2 is m and n is larger than m. The objective is to find (if any) a segment of S 1 such that the edit distance between this segment and S 2 is no more than K. Here, K is a given constant. Recall in edit distance problem, it has unit-cost substitution/insertion/deletion and matches score 0. The objective is to find the least costly way of changing sequences. Now show me a dynamic programming algorithm for this problem that should run in O(nm) time. Note: you should clearly define the meaning of the any table you use, recursions, initialization, how to find solutions and time analysis.
We are interested in the edit distance problem. Recall the simple dynamic programming formulation we went over in class. If you forget it, try to write it down by yourself. In this formulation, we have a table D, where D[i,j] is equal to the minimum cost of transforming S 1 [1..i] to S 2 [1..j]. (a) First prove the following claim: the values in the DP table D[i, j] along a line (horizontal, vertical or diagonal in the increasing direction) can increase or decrease by at most one. (b) Next prove something stronger: D[i, j] can not decrease along diagonals. That is, D[i, j] โค D[i + 1, j + 1] for all i, j (assuming within range).