Analyzing Sequences: Finding ORFs and Domains using NCBI Tools | Assignments Biology

October 6, 2004

INTRODUCTION TO ANALYZING SEQUENCES:

A) Finding sequences in the database:

To begin our sequence analysis, let’s first look at a sequence entry from Genbank.

First go to the following webpage: http://www.ncbi.nlm.nih.gov/. This is a website

funded by the human genome project that maintains databases of sequences and

provides support programs for accessing these sequences. One very useful tool here

is PUBMED, which allows you to search the literature for any area or author you are

interested in. To use PUBMED, click next to the left SEARCH button and choose

the PUBMED option from the pulldown menu. Type in a name or subject and test it

out.

Back to our task. Now at that same SEARCH button, click on nucleotide.

Enter CB092467, the entry number for the cycad clone we are working with. Take a

look at the entry-it tells you about the tissue the ESTs were made from, the way the

cDNA was isolated, and the cloning vector. Copy the DNA sequence and let’s try a

few tests of the sequence.

B) ORFS

Now let’s see of there are any ORFs (open reading frames) encoded within this DNA.

Go to the following website: http://www.ncbi.nlm.nih.gov/gorf/gorf.html and then

paste the sequence in the big box. Push the button that says ORFfind and wait a few

seconds and your results will arrive. The largest reading frames will be in green and

ranked on the right side. 1) What is the longest reading frame (which frame and how

many base pairs of DNA)? 2) How do you think this is defined? 3) How many

amino acids is that? Now push the button that says SIX FRAMES and you will see

the positions of the stop codons and start codons. Look at your figures again. 4) How

long is the longest open reading frame?

C) DOMAINS

Now let’s look at protein domains encoded by this short EST. Go to EXPASY: http://

us.expasy.org/ to first translate your sequence. Under Tools and software packages

on the right side go to DNA to PROTEIN-translate. Then click on translate at the top

of the page. Paste your sequence in and push the translate box. Wait a few seconds

for the results. Which is the best ORF? (5) If you click on the strand and reading

frame entry in blue, you will get a 1 letter code for that sequence. Copy that and go

back to the EXPASY page: http://us.expasy.org/ or EXPASY HOME. Now go to the

left hand column and click on PROSITE, the second column on the left. Paste your

amino acid sequence in the box under TOOLS FOR PROSITE. Click on quick scan.

This will take a few minutes. 6) What domain is identified by PROSITE and

Partial preview of the text

Download Analyzing Sequences: Finding ORFs and Domains using NCBI Tools and more Assignments Biology in PDF only on Docsity!

October 6, 2004 INTRODUCTION TO ANALYZING SEQUENCES: A) Finding sequences in the database: To begin our sequence analysis, let’s first look at a sequence entry from Genbank. First go to the following webpage: http://www.ncbi.nlm.nih.gov/. This is a website funded by the human genome project that maintains databases of sequences and provides support programs for accessing these sequences. One very useful tool here is PUBMED, which allows you to search the literature for any area or author you are interested in. To use PUBMED, click next to the left SEARCH button and choose the PUBMED option from the pulldown menu. Type in a name or subject and test it out. Back to our task. Now at that same SEARCH button, click on nucleotide. Enter CB092467, the entry number for the cycad clone we are working with. Take a look at the entry-it tells you about the tissue the ESTs were made from, the way the cDNA was isolated, and the cloning vector. Copy the DNA sequence and let’s try a few tests of the sequence. B) ORFS Now let’s see of there are any ORFs (open reading frames) encoded within this DNA. Go to the following website: http://www.ncbi.nlm.nih.gov/gorf/gorf.html and then paste the sequence in the big box. Push the button that says ORFfind and wait a few seconds and your results will arrive. The largest reading frames will be in green and ranked on the right side. 1) What is the longest reading frame (which frame and how many base pairs of DNA)? 2) How do you think this is defined? 3) How many amino acids is that? Now push the button that says SIX FRAMES and you will see the positions of the stop codons and start codons. Look at your figures again. 4) How long is the longest open reading frame? C) DOMAINS Now let’s look at protein domains encoded by this short EST. Go to EXPASY: http:// us.expasy.org/ to first translate your sequence. Under Tools and software packages on the right side go to DNA to PROTEIN-translate. Then click on translate at the top of the page. Paste your sequence in and push the translate box. Wait a few seconds for the results. Which is the best ORF? (5) If you click on the strand and reading frame entry in blue, you will get a 1 letter code for that sequence. Copy that and go back to the EXPASY page: http://us.expasy.org/ or EXPASY HOME. Now go to the left hand column and click on PROSITE, the second column on the left. Paste your amino acid sequence in the box under TOOLS FOR PROSITE. Click on quick scan. This will take a few minutes. 6) What domain is identified by PROSITE and

presented visually? Click on one or two of the Psxxxx hits. 7) What can you conclude about the domains in this short EST sequence? D) Basic BLAST searches: One way to learn about a particular sequence quickly is to compare it to all of the sequences in the database and test if it matches to anything that is well characterized. One program that lets you do this quickly is called BLAST (Basic local alignment search tool). Blast can be found on the NCBI website: http://www.ncbi.nlm.nih.gov/ There are basically five kinds of BLAST searches for DNA and protein sequences: Blastp Compares an amino acid query sequence against a protein sequence database Blastn Compares a nucleotide query sequence against a nucleotide sequence database Blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database Tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames Tblastx Compares a nucleotide query sequence translated in all reading frames against a nucleotide sequence database dynamically translated in all reading frames Today we will take the EST sequence and compare the results of blastn and blastx searches. On the NCBI website, click on the dark blue bar at the top on BLAST. There are three blastn programs that one can use; for simplicity’s sake let’s use the blastn (the third choice). Feel free to test the others and see of you find something different. Click on blastn and copy the cDNA sequence into the search box. We will search the nr (for nonredundant) database as this contains one entry for each known gene (at least this is the goal). Click search and the results will be back in a few minutes. On the page that opens up you will need to hit the “format” button to see your results. The matches are colorcoded; red and pink are “hot” matches. 8) What color is the best match? 9)What are the E values and what does this tell you? Scroll down to the best matches. 10) Which species are these from? 11) Is any gene mentioned? Scroll down to the alignments. 12) How long are the best matches? 13) How similar are these to your query?

Analyzing Sequences: Finding ORFs and Domains using NCBI Tools, Assignments of Biology

Related documents

Partial preview of the text

Download Analyzing Sequences: Finding ORFs and Domains using NCBI Tools and more Assignments Biology in PDF only on Docsity!