Biology Assignment: Tree Inference Algorithms and Protein Sequence Analysis, Exercises of Computational Biology

A biology assignment that involves computing distance matrices for tree inference, analyzing different algorithms for tree reconstruction, and identifying open reading frames and translating them into proteins. Students are expected to submit their answers in hardcopy in class.

Typology: Exercises

2012/2013

Uploaded on 04/23/2013

ashwini
ashwini 🇮🇳

4.5

(18)

167 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
HW 3: due Thursday, November 10th in class
For this assignment, all answers are to be submitted in hardcopy in class.
This problem compares different algorithms for tree inference. Note that
even though in this case, you will know the correct tree, in the case of real
data you would have only the distance matrix and not the tree available to
you.
1. Compute the distance matrix for the taxa V,W, X, Y, and Z from the
tree below:
2. Is the distance matrix additive? Is it an ultrametric? How do you
know?
3. Suppose you do not know the underlying tree. Working only from the
computed distance matrix in part 1, which taxa would the UPGMA
algorithm first select as neighbors? Show the first subtree connecting
these two taxa, with branch lengths, according to the UPGMA algo-
rithm.
4. Suppose again that you do not know the underlying tree. Working only
from the computed distance matrix in part 1, show the corrected dis-
tance matrix computed for the first step of the NJ algorithm. Which
taxa would the neighbor joining (NJ) algorithm first select as neigh-
bors? Show the subtree connecting these two taxa, with the branch
lengths as determined by the NJ algorithm.
5. Which algorithm is a better one for reconstructing this tree, and why?
zvw y x
5
2
10
1
2
210
5
pf3

Partial preview of the text

Download Biology Assignment: Tree Inference Algorithms and Protein Sequence Analysis and more Exercises Computational Biology in PDF only on Docsity!

HW 3: due Thursday, November 10th in class

For this assignment, all answers are to be submitted in hardcopy in class.

This problem compares different algorithms for tree inference. Note that even though in this case, you will know the correct tree, in the case of real data you would have only the distance matrix and not the tree available to you.

  1. Compute the distance matrix for the taxa V,W, X, Y, and Z from the tree below:
  2. Is the distance matrix additive? Is it an ultrametric? How do you know?
  3. Suppose you do not know the underlying tree. Working only from the computed distance matrix in part 1, which taxa would the UPGMA algorithm first select as neighbors? Show the first subtree connecting these two taxa, with branch lengths, according to the UPGMA algo- rithm.
  4. Suppose again that you do not know the underlying tree. Working only from the computed distance matrix in part 1, show the corrected dis- tance matrix computed for the first step of the NJ algorithm. Which taxa would the neighbor joining (NJ) algorithm first select as neigh- bors? Show the subtree connecting these two taxa, with the branch lengths as determined by the NJ algorithm.
  5. Which algorithm is a better one for reconstructing this tree, and why?

w v z y x

This problem was inspired by a problem assigned in Shamir’s 2001 Algo- rithms for Molecular Biology class.

Background: You are a biologist studying a rare human disease called Home- work. After years of work, you understand that it is associated with specific malfunctioning cells. You harvest mRNA from such a cell, and get a cDNA, which you sequence. You reverse and complement the result to obtain the coding strand, and get:

GTGGCCGCTTCTGCCAGCGCGAGGTGAGCTTCCTCAATTGCTCGCTGGACA

ACGGCGGCTGCACGCATTACTGCCTAGAGGAGGTGGGCTGGCGGCGCTGTA

GCTGTGCGCCTGGCTACAAGCTGGGGGACGACCTCCTGCAGTGTCACCCCG

CAGTGAAGTTCCCTTGTGGGAGGCCCTGGAAGCGGATGGAGAAGAAGCGCA

GTCACCTGAAACGAGACACAGAAGACCAAGAAGACCAAGTAGATCCGCGCT

CATTGAT

  1. Search this sequence for open reading frames that make sense, and translate them into a protein. Are the following statements correct/incorrect/ possible for the sequence/a subsequence of it (explain your answers):

(a) It is a coding region (b) It is an exon (c) It is an intron (d) It is a 5’ untranslated region (e) It is a 3’ untranslated region

  1. Search the databases for your full length molecule. Which database should you use? Which of the following is correct/incorrect/possibly correct (justify)?

(a) You have found a new gene (b) You have found a mutated version of a known gene (what are the mutations?) (c) Your sequencing machine made sequencing errors? (what are they?) (d) The gene is not human, but rather contamination of a bacteria or fungus or virus in the testube?