Analyzing Sequences: Finding ORFs and Domains using NCBI Tools, Assignments of Biology

A step-by-step guide for analyzing sequences using ncbi tools. It covers finding orfs using orffind, translating sequences to proteins using expasy, and performing basic blast searches to compare sequences with known genes. Students will learn how to identify the longest orf, determine protein domains, and understand the significance of e values and species matches.

Typology: Assignments

Pre 2010

Uploaded on 08/31/2009

koofers-user-3mw
koofers-user-3mw 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
October 6, 2004
INTRODUCTION TO ANALYZING SEQUENCES:
A) Finding sequences in the database:
To begin our sequence analysis, let’s first look at a sequence entry from Genbank.
First go to the following webpage: http://www.ncbi.nlm.nih.gov/. This is a website
funded by the human genome project that maintains databases of sequences and
provides support programs for accessing these sequences. One very useful tool here
is PUBMED, which allows you to search the literature for any area or author you are
interested in. To use PUBMED, click next to the left SEARCH button and choose
the PUBMED option from the pulldown menu. Type in a name or subject and test it
out.
Back to our task. Now at that same SEARCH button, click on nucleotide.
Enter CB092467, the entry number for the cycad clone we are working with. Take a
look at the entry-it tells you about the tissue the ESTs were made from, the way the
cDNA was isolated, and the cloning vector. Copy the DNA sequence and let’s try a
few tests of the sequence.
B) ORFS
Now let’s see of there are any ORFs (open reading frames) encoded within this DNA.
Go to the following website: http://www.ncbi.nlm.nih.gov/gorf/gorf.html and then
paste the sequence in the big box. Push the button that says ORFfind and wait a few
seconds and your results will arrive. The largest reading frames will be in green and
ranked on the right side. 1) What is the longest reading frame (which frame and how
many base pairs of DNA)? 2) How do you think this is defined? 3) How many
amino acids is that? Now push the button that says SIX FRAMES and you will see
the positions of the stop codons and start codons. Look at your figures again. 4) How
long is the longest open reading frame?
C) DOMAINS
Now let’s look at protein domains encoded by this short EST. Go to EXPASY: http://
us.expasy.org/ to first translate your sequence. Under Tools and software packages
on the right side go to DNA to PROTEIN-translate. Then click on translate at the top
of the page. Paste your sequence in and push the translate box. Wait a few seconds
for the results. Which is the best ORF? (5) If you click on the strand and reading
frame entry in blue, you will get a 1 letter code for that sequence. Copy that and go
back to the EXPASY page: http://us.expasy.org/ or EXPASY HOME. Now go to the
left hand column and click on PROSITE, the second column on the left. Paste your
amino acid sequence in the box under TOOLS FOR PROSITE. Click on quick scan.
This will take a few minutes. 6) What domain is identified by PROSITE and
pf3

Partial preview of the text

Download Analyzing Sequences: Finding ORFs and Domains using NCBI Tools and more Assignments Biology in PDF only on Docsity!

October 6, 2004 INTRODUCTION TO ANALYZING SEQUENCES: A) Finding sequences in the database: To begin our sequence analysis, let’s first look at a sequence entry from Genbank. First go to the following webpage: http://www.ncbi.nlm.nih.gov/. This is a website funded by the human genome project that maintains databases of sequences and provides support programs for accessing these sequences. One very useful tool here is PUBMED, which allows you to search the literature for any area or author you are interested in. To use PUBMED, click next to the left SEARCH button and choose the PUBMED option from the pulldown menu. Type in a name or subject and test it out. Back to our task. Now at that same SEARCH button, click on nucleotide. Enter CB092467, the entry number for the cycad clone we are working with. Take a look at the entry-it tells you about the tissue the ESTs were made from, the way the cDNA was isolated, and the cloning vector. Copy the DNA sequence and let’s try a few tests of the sequence. B) ORFS Now let’s see of there are any ORFs (open reading frames) encoded within this DNA. Go to the following website: http://www.ncbi.nlm.nih.gov/gorf/gorf.html and then paste the sequence in the big box. Push the button that says ORFfind and wait a few seconds and your results will arrive. The largest reading frames will be in green and ranked on the right side. 1) What is the longest reading frame (which frame and how many base pairs of DNA)? 2) How do you think this is defined? 3) How many amino acids is that? Now push the button that says SIX FRAMES and you will see the positions of the stop codons and start codons. Look at your figures again. 4) How long is the longest open reading frame? C) DOMAINS Now let’s look at protein domains encoded by this short EST. Go to EXPASY: http:// us.expasy.org/ to first translate your sequence. Under Tools and software packages on the right side go to DNA to PROTEIN-translate. Then click on translate at the top of the page. Paste your sequence in and push the translate box. Wait a few seconds for the results. Which is the best ORF? (5) If you click on the strand and reading frame entry in blue, you will get a 1 letter code for that sequence. Copy that and go back to the EXPASY page: http://us.expasy.org/ or EXPASY HOME. Now go to the left hand column and click on PROSITE, the second column on the left. Paste your amino acid sequence in the box under TOOLS FOR PROSITE. Click on quick scan. This will take a few minutes. 6) What domain is identified by PROSITE and

presented visually? Click on one or two of the Psxxxx hits. 7) What can you conclude about the domains in this short EST sequence? D) Basic BLAST searches: One way to learn about a particular sequence quickly is to compare it to all of the sequences in the database and test if it matches to anything that is well characterized. One program that lets you do this quickly is called BLAST (Basic local alignment search tool). Blast can be found on the NCBI website: http://www.ncbi.nlm.nih.gov/ There are basically five kinds of BLAST searches for DNA and protein sequences: Blastp Compares an amino acid query sequence against a protein sequence database Blastn Compares a nucleotide query sequence against a nucleotide sequence database Blastx Compares a nucleotide query sequence translated in all reading frames against a protein sequence database Tblastn Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames Tblastx Compares a nucleotide query sequence translated in all reading frames against a nucleotide sequence database dynamically translated in all reading frames Today we will take the EST sequence and compare the results of blastn and blastx searches. On the NCBI website, click on the dark blue bar at the top on BLAST. There are three blastn programs that one can use; for simplicity’s sake let’s use the blastn (the third choice). Feel free to test the others and see of you find something different. Click on blastn and copy the cDNA sequence into the search box. We will search the nr (for nonredundant) database as this contains one entry for each known gene (at least this is the goal). Click search and the results will be back in a few minutes. On the page that opens up you will need to hit the “format” button to see your results. The matches are colorcoded; red and pink are “hot” matches. 8) What color is the best match? 9)What are the E values and what does this tell you? Scroll down to the best matches. 10) Which species are these from? 11) Is any gene mentioned? Scroll down to the alignments. 12) How long are the best matches? 13) How similar are these to your query?