

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Four potential projects for the cmsc423 class, including a gene finder, shotgun sequence overlapper, multiple aligner, and rna folding program. Students can work individually or in teams, with team projects requiring additional features. Project deliverables include source code, documentation, and a report. Details on project requirements, input and output formats, and suggested resources.
Typology: Study Guides, Projects, Research
1 / 2
This page cannot be seen from the preview
Don't miss anything!


CMSC423 Project 2 Handed out: 11/11/ Due: 12/9/ Project choice due by 11/18/ The second project in the class is your choice. For convenience I've listed four possible projects below, however you can come up with your own project. Furthermore, for this project you can work in a team of up to two people. You have until next Tuesday (11/18/2008) to email me both your project choice and the name of your partner. If you choose to work as a team, the project will have to be more complex than if you work alone - I have outlined how in the description of the projects below. Note that the description of the projects is intentionally vague - part of your assignment is to figure out what will make your project good. Project deliverables:
3. Multiple aligner Implement a simple multiple alignment program, extending the Smith-Waterman algorithm implemented in project 1 to allow implementing the progressive alignment approach. As input you must accept a set of protein sequences in FASTA format, then output their multiple alignment in ClustalW format (http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format). If working in a team your aligner must use a guide tree, and you need to evaluate the impact of the tree structure on the multiple alignment (e.g. implement UPGMA and neighbor-joining and compare the quality of the alignments). Test data and reference alignments can be obtained from BaliBase (http://bips.u- strasbg.fr/fr/Products/Databases/BAliBASE/). 4. RNA folding Write a program that computes the secondary structure of an RNA molecule. The input to your program will consist of an RNA sequence in FASTA format., and the output must be presented in parenthesized form:
Sample RNA AAAAAAAAAAAAAGGGGGGGUUUUUUUUUUUUUUCCCCCCCCCCCCCCCCC .............(((((((..............))))))).......... Note: RNA sequences can be obtained from the NCBI: http://www.ncbi.nlm.nih.gov If working in a team, your aligner must take into base-pairing energies as well as a stacking term (energy of stem depends on number of stacked bases in a way similar to affine gap penalties in Smith-Waterman). Energy values can be found at http://www.bioinfo.rpi.edu/zukerm/cgi-bin/efiles-3.0.cgi. Note: simplify the information on this website - just pick two sets of energies for each pair of bases - one energy if the bases are part of a stem, another if they are not.