Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Overlap Graph - Computational Biology - Assignment, Exercises of Computational Biology

Chitkara University Computational Biology

Main points of this assignment are: Overlap Graph, Set of Sequences, Corresponding Sequences, Hamiltonian Path, Maximum Overlap, Suffix Tree, Overlap Detection Program, Sequence File, Source Code, Successive ID Numbers, Overlapping Sequences

Typology: Exercises

2012/2013

Uploaded on 04/23/2013

ashwini 🇮🇳

4.5

(18)

167 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

HW 1: due Thursday, September 29th in class

Readings:

•Green, E.D. (2001). Strategies for the systematic sequencing of com-

plex genomes. Nature Reviews Genetics, 2:573-83.

•Pop, Mihai and Salzberg, Steven (2008). Bioinformatics challenges of

new sequencing technology. Trends in Genetics, 24(3): 142-149.

•(Optional) Miller, J.R., Koren, S. and Sutton, G. (2010) Assembly

algorithms for next-generation sequencing data. Genomics, 95(6): 315-

327.

Problems: Submit hard copy of answers to problems 1 and 2. For Problem

3, please use provide to submit your output file as well as any program files.

This works from any EECS machine by typing:

conbrio% provide comp167 hw1 myfilename1.here myfilename2.here ..

1. Recall that the overlap graph for a set of sequences is defined by in-

cluding a vertex for every sequence, and there is a directed edge from

vertex ato vertex bof weight t, if for the corresponding sequences, the

maximum overlap of a suffix of awith a prefix of bcontains tcharacters.

Draw the overlap graph for the following sequences. Find a maximum

weight Hamiltonian path in this graph, and show the assembly this

corresponds to.

(a) ACCA

(b) CAGGG

(c) CCAATA

(d) CGCC

Discover Exercises of Computational Biology Chitkara University

Partial preview of the text

Download Overlap Graph - Computational Biology - Assignment and more Exercises Computational Biology in PDF only on Docsity!

HW 1: due Thursday, September 29th in class

Readings:

Green, E.D. (2001). Strategies for the systematic sequencing of com- plex genomes. Nature Reviews Genetics, 2:573-83.
Pop, Mihai and Salzberg, Steven (2008). Bioinformatics challenges of new sequencing technology. Trends in Genetics, 24(3): 142-149.
(Optional) Miller, J.R., Koren, S. and Sutton, G. (2010) Assembly algorithms for next-generation sequencing data. Genomics, 95(6): 315-

Problems: Submit hard copy of answers to problems 1 and 2. For Problem 3, please use provide to submit your output file as well as any program files. This works from any EECS machine by typing:

conbrio% provide comp167 hw1 myfilename1.here myfilename2.here ..

Recall that the overlap graph for a set of sequences is defined by in- cluding a vertex for every sequence, and there is a directed edge from vertex a to vertex b of weight t, if for the corresponding sequences, the maximum overlap of a suffix of a with a prefix of b contains t characters. Draw the overlap graph for the following sequences. Find a maximum weight Hamiltonian path in this graph, and show the assembly this corresponds to.

(a) ACCA (b) CAGGG (c) CCAATA (d) CGCC

Draw the suffix tree associated with the sequence ATCCATTATG.
(Inspired by S. Salzberg) In this programming assignment, you are to write a overlap detection program that could be used as part of a sequence assembler. The program should assemble the sequences in the text file “hw1.reads” available from the course web site. To make this easier, the sequences are all from the same strand (you do not need to reverse-complement any of them) and there are no errors. Sequence A is considered to overlap sequence B if there is a suffix of A at least 40bp in length that exactly matches a prefix of B. Note that this relation is not necessarily symmetric; if A overlaps B, B may not overlap A! You should submit both your source code and an output file showing all the overlaps detected in the sequence file. Sort the file by ID number of each of the sequences, and for each sequence the file should contain EXACTLY one line. In other words, you will have one line in your file corresponding to each of the sequences in the data file, in sorted order. That line should contain the ID of the sequence followed by the IDs of all sequences that it overlaps. The list of overlapping sequences should also be sorted in order by ID number. For example: R1 R24 R R175 R2 R33 R109 R138 ... etc. We will be comparing your files to the correct answer using ’diff’ so the format should match exactly. Put exactly one space between successive ID numbers and no whitespace after the last ID number in each line. We will also check your program on another data set.

Overlap Graph - Computational Biology - Assignment, Exercises of Computational Biology

Related documents

Partial preview of the text

Download Overlap Graph - Computational Biology - Assignment and more Exercises Computational Biology in PDF only on Docsity!

HW 1: due Thursday, September 29th in class