Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Assignment 3 - Unlocking the Code | STOR 072, Assignments of Statistics

University of North Carolina (UNC) - Chapel Hill Statistics

Prof. J. S. Provan

Material Type: Assignment; Professor: Provan; Class: FYS UNLOCKING THE CODE; Subject: STATISTICS AND OPERATIONS RESEARCH; University: University of North Carolina - Chapel Hill; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-9su 🇺🇸

10 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

OR072: Assignment #3

1. Recall that a restriction enzyme cuts a DNA molecule at points in the base sequence exhibiting a

specific “word” of A/C/G/T bases. For example, a restriction enzyme might cut only after seeing the

sequence ATT.

a. How many possible sequences of length 3 are possible using the letters A,C,G, and T?

b. Assuming any one of these 3-letter “words” is equally likely to occur in a typical DNA

sequence, what are the chances of seeing the exact sequence ATT at a particular point in the

sequence?

c. Using the fact that the average number of steps you would take down a DNA strand before

hitting ATT is 1/(the chances of seeing ATT), on average how long would each fragment be

after cutting at all ATTs using this enzyme?

d. Try (a)-(c) for a four-base sequence, such as ATTG, to find out how big your fragments would

be. Continue this process to find out how big a “word” your restriction enzyme would need to

cut at, if you want to obtain DNA fragments of about size 1000. How about size 15,000?

2. Use the BLAST webpage to find the best match for the following DNA sequences. In each case, give

the first match that is a real organism (rather than a vector of some sort), give its E-value, and give the

common name for that organism.

mystery sequence #1

ATGTGCCATG TTAAGCGTAT TTAACACTGA TGATTACAGT CCAGCTGCGC AACAGAATAT

TCCTGCTCTC CGGAGAAGCT CTTCCTTCAT TTGCGCTGAA AGCTGTAGCT CTAAGTATCA

GTGTGAAGCA GGAGAAAACA GTAAAGGCAG CGTCCAGGAT AGAGTGAAGC GACCCATGAA

CGCATTCATT GTGTGGTCTC GGATCAGCAG GCGCAAGATG GCTCTAGAGA ATCCCAAAAT

GCGAAACTCA GAGATCAGCA AGCAGCTGGG ATACCAGTGG AAAATCCTTA CCGAAGCCGA

TAAATGGCCA TTCTTCCAGG AGGCACAGAA ACTACAGGCC ATGCATAGAG AGAAATACCC

GAATTATAAG TATCGACCTC GTCGGAAGGC GAAGATGCTG CAAAACAGTT GCAGTTTGCT

TCCGGCAGAT CCCTCTTCGG TCCCTGCCAG AGAAGTGTAC AACAACAGGT TGTACAGGGA

TGACTGTACC AAAGCCACGC ACTCAAGAAT GCAGCACCAG TTAGTCCACT TACCGCCCAT

CAACACAGCC AGCTCACCGC AGCAACGGGA CCGCTACAGC CACTCGATTC CAATCATATG

CCAAAGCTGT AG

Find the 6 errors in the best match, giving the location number of each error and whether it is a

mismatch, a gap in the databank sequence, or a gap in your sequence.

mystery sequence #2:

Discover Assignments of Statistics University of North Carolina (UNC) - Chapel Hill

Partial preview of the text

Download Assignment 3 - Unlocking the Code | STOR 072 and more Assignments Statistics in PDF only on Docsity!

OR072: Assignment

Recall that a restriction enzyme cuts a DNA molecule at points in the base sequence exhibiting a specific “word” of A/C/G/T bases. For example, a restriction enzyme might cut only after seeing the sequence ATT. a. How many possible sequences of length 3 are possible using the letters A,C,G, and T? b. Assuming any one of these 3-letter “words” is equally likely to occur in a typical DNA sequence, what are the chances of seeing the exact sequence ATT at a particular point in the sequence? c. Using the fact that the average number of steps you would take down a DNA strand before hitting ATT is 1/(the chances of seeing ATT), on average how long would each fragment be after cutting at all ATTs using this enzyme? d. Try (a)-(c) for a four-base sequence, such as ATTG, to find out how big your fragments would be. Continue this process to find out how big a “word” your restriction enzyme would need to cut at, if you want to obtain DNA fragments of about size 1000. How about size 15,000?
Use the BLAST webpage to find the best match for the following DNA sequences. In each case, give the first match that is a real organism (rather than a vector of some sort), give its E-value, and give the common name for that organism. mystery sequence # ATGTGCCATG TTAAGCGTAT TTAACACTGA TGATTACAGT CCAGCTGCGC AACAGAATAT TCCTGCTCTC CGGAGAAGCT CTTCCTTCAT TTGCGCTGAA AGCTGTAGCT CTAAGTATCA GTGTGAAGCA GGAGAAAACA GTAAAGGCAG CGTCCAGGAT AGAGTGAAGC GACCCATGAA CGCATTCATT GTGTGGTCTC GGATCAGCAG GCGCAAGATG GCTCTAGAGA ATCCCAAAAT GCGAAACTCA GAGATCAGCA AGCAGCTGGG ATACCAGTGG AAAATCCTTA CCGAAGCCGA TAAATGGCCA TTCTTCCAGG AGGCACAGAA ACTACAGGCC ATGCATAGAG AGAAATACCC GAATTATAAG TATCGACCTC GTCGGAAGGC GAAGATGCTG CAAAACAGTT GCAGTTTGCT TCCGGCAGAT CCCTCTTCGG TCCCTGCCAG AGAAGTGTAC AACAACAGGT TGTACAGGGA TGACTGTACC AAAGCCACGC ACTCAAGAAT GCAGCACCAG TTAGTCCACT TACCGCCCAT CAACACAGCC AGCTCACCGC AGCAACGGGA CCGCTACAGC CACTCGATTC CAATCATATG CCAAAGCTGT AG Find the 6 errors in the best match, giving the location number of each error and whether it is a mismatch, a gap in the databank sequence, or a gap in your sequence. mystery sequence #2:

In Michael Crichton's Jurassic Park (p. 103), a putative dinosaur DNA sequence is given. What is the nearest match in the database to this sequence? Is Crichton pulling one over on us? In the output screen, scroll down to the first diagram of the first match (the one with the letter pairings separated by | ). Do you see anything unusual about the pattern of mismatches? Extra credit for the correct interpretation of this odd match. ( Hint : The sequence is formatted exactly the way it appears in the book. Further, the mismatches have nothing to do with biology or BLAST .) GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT mystery sequence#3: this is a protein sequence: MAHETSFNDA LDYIYIANSM NDRAFLIAEP HPEQPNVDGQ DQDDAELEEL DDMAVTDDGQ LEDTNNNNNS KRYYSSGKRR ADFIGSLALK PPPTDVNTTT TTAGSPLATA ALAAAAASAS VAAAAARITA KAAHRALTTK QDATSSPASS PALQLIDMDN NYTNVAVGLG AMLLNDTLLL EGNDSSLFGE MLANRSGQLD LINGTGGLNV TTSKVAEDDF TQLLRMAVTS VLLGLMILVT IIGNVFVIAA IILERNLQNV ANYLVASLAV ADLFVACLVM PLGAVYEISQ GWILGPELCD IWTSCDVLCC TASILHLVAI AVDRYWAVTN IDYIHSRTSN RVFMMIFCVW TAAVIVSLAP QFGWKDPDYL QRIEQQKCMV SQDVSYQVFA TCCTFYVPLL VILALYWKIY QTARKRIHRR RPRPVDAAVN NNQPDGGAAT DTKLHRLRLR LGRFSTAKSK TGSAVGVSGP ASGGRALGLV DGNSTNTVNT VEDTEFSSSN VDSKSRAGVE APSTSGNQIA TVSHLVALAK QQGKSTAKSS AAVNGMAPSG RQEDDGQRPE HGEQEDREEL EDQDEQVGPQ PTTATSAMTA AGTNESEDQC KANGVEVLED PQLQQQLEQV QQLQKSVKSG GGGGASTSNA TTITSISALS PQTPTSQGVG IAAAAAGPMT AKTSTLTSCN QSHPLCGTAN ESPSTPEPRS RQPTTPQQQP HQQAHQQQQQ QQQLSSIANP MQKVNKRKET LEAKRERKAA KTLAIITGAF VVCWLPFFVM ALTMPLCAAC QISDSVASLF LWLGYFNSTL NPVIYTIFSP EFRQAFKRIL FGGHRPVHYR SGKL Also find the nearest match to sequence#3 among humans and among rats. In each case, give the E value of the match.

Assignment 3 - Unlocking the Code | STOR 072, Assignments of Statistics

Related documents

Partial preview of the text

Download Assignment 3 - Unlocking the Code | STOR 072 and more Assignments Statistics in PDF only on Docsity!

OR072: Assignment