Analyzing Sequences Using NCBI BLAST – Solved Assignment | MCB 432, Quizzes of Biology

Material Type: Quiz; Professor: Olsen; Class: Computing in Molecular Biology; Subject: Molecular and Cell Biology; University: University of Illinois - Urbana-Champaign; Term: Spring 2010;

Typology: Quizzes

Pre 2010

Uploaded on 12/09/2010

vrmohan2
vrmohan2 🇺🇸

3 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Due: Mar. 2, 2010 MCB 432 Name __key________________
Assignment 4 20 points total
Analyzing Sequences Using NCBI BLAST
(http://blast.ncbi.nlm.nih.gov/Blast.cgi)
Data for the Assignment
This assignment explores some basic BLAST analyses using the NCBI WWW Server. A summary of
some elements can be found at: http://www.ncbi.nlm.nih.gov/blast/about/. Pay attention to the details.
In particular, read the entire BLAST Forms section of the page. How you use BLAST (or any other
tool) matters.
Data for the Assignment
The data for this assignment are available in electronic form from the class WWW server. The data are
in the Assignment 04 directory: http://www.life.uiuc.edu/mcb/432/Assignments/04/, or you can find
links to the file on the Schedule page: http://www.life.uiuc.edu/mcb/432/Schedule.html
The file pep1.txt is a plain text files with a peptide sequence in FASTA format.1 Your goal is to identify
the function of the protein, and what organism it is from.
Using BLAST to Find Sequences Similar to a Protein
Go to the NCBI BLAST Home page: http://blast.ncbi.nlm.nih.gov/Blast.cgi.
Our first step in our quest to identify pep1 and its source organism is to use the blastp program to search
for similar sequences in the nr (non-redundant) protein database.
Follow the link from the BLAST Home page to "protein blast".
1. Associated with the Query Sequence box is a question mark icon. Click on the icon to reveal its
text box, which ends with "more...". Click this link for a detailed description of the input
formats. Review the information and answer the following:
1a. What are the 3 different types (formats) of input acceptable to the BLAST programs?
2 pt FASTA
Bare Sequence
[Sequence] Identifier
1b. What does "K" stand for in a nucleotide sequence? 1 pt G/T [might include (keto)]
1c. What does "K" stand for in a protein sequence? 1 pt lysine
1d. What does "U" stand for in a nucleotide sequence? 1 pt uridine
1e. What does "U" stand for in a protein sequence? 1 pt selenocysteine
Use your computer web browserr's Copy and Paste functions to paste a copy of the pep1 definition and
sequence into the Enter Query Sequence box. Check that the following setting are set (most, but not all,
will be the default value): Database, Non-redundant protein sequences (nr); Algorithm, blastp; Max
target sequences2, 500; Short queries, Automatically adjust ...; Expect threshold, 10; Word size, 3;
Matrix, BLOSUM62; Gap Costs, Existence: 11 Extension 1; Compositional adjustments, Conditional
compositional score ...; Filter, unchecked; and Mask, both choices unchecked.
1 FASTA format if defined by having a line that starts with a greater than sign (>) and the description of
the sequence. All remaining lines to the end of the file, or the next line that starts with a greater than
sign (>), is interpreted as sequence. Most programs that accept a FASTA format file will allow blank
spaces in the lines with sequence data.
2 If you do not see this of the parameters that follow, click the arrow next to "Algorithm parameters to
reveal these settings. Remember this in future exercises.
pf2

Partial preview of the text

Download Analyzing Sequences Using NCBI BLAST – Solved Assignment | MCB 432 and more Quizzes Biology in PDF only on Docsity!

Due: Mar. 2, 2010 MCB 432 Name __key________________

Assignment 4 20 points total

Analyzing Sequences Using NCBI BLAST

(http://blast.ncbi.nlm.nih.gov/Blast.cgi)

Data for the Assignment This assignment explores some basic BLAST analyses using the NCBI WWW Server. A summary of some elements can be found at: http://www.ncbi.nlm.nih.gov/blast/about/. Pay attention to the details. In particular, read the entire BLAST Forms section of the page. How you use BLAST (or any other tool) matters. Data for the Assignment The data for this assignment are available in electronic form from the class WWW server. The data are in the Assignment 04 directory:http://www.life.uiuc.edu/mcb/432/Assignments/04/, or you can find links to the file on the Schedule page: http://www.life.uiuc.edu/mcb/432/Schedule.html The file pep1.txt is a plain text files with a peptide sequence in FASTA format.^1 Your goal is to identify the function of the protein, and what organism it is from. Using BLAST to Find Sequences Similar to a Protein Go to the NCBI BLAST Home page: http://blast.ncbi.nlm.nih.gov/Blast.cgi. Our first step in our quest to identify pep1 and its source organism is to use the blastp program to search for similar sequences in the nr (non-redundant) protein database. Follow the link from the BLAST Home page to "protein blast".

  1. Associated with the Query Sequence box is a question mark icon. Click on the icon to reveal its text box, which ends with "more...". Click this link for a detailed description of the input formats. Review the information and answer the following: 1a. What are the 3 different types (formats) of input acceptable to the BLAST programs? 2 pt FASTA Bare Sequence [Sequence] Identifier 1b. What does "K" stand for in a nucleotide sequence? 1 pt G/T [might include (keto)] 1c. What does "K" stand for in a protein sequence? 1 pt lysine 1d. What does "U" stand for in a nucleotide sequence? 1 pt uridine 1e. What does "U" stand for in a protein sequence? 1 pt selenocysteine Use your computer web browserr's Copy and Paste functions to paste a copy of the pep1 definition and sequence into the Enter Query Sequence box. Check that the following setting are set (most, but not all, will be the default value): Database, Non-redundant protein sequences (nr); Algorithm, blastp; Max target sequences^2 , 500; Short queries, Automatically adjust ...; Expect threshold, 10; Word size, 3; Matrix, BLOSUM62; Gap Costs, Existence: 11 Extension 1; Compositional adjustments, Conditional compositional score ...; Filter, unchecked; and Mask, both choices unchecked. (^1) FASTA format if defined by having a line that starts with a greater than sign (>) and the description of the sequence. All remaining lines to the end of the file, or the next line that starts with a greater than sign (>), is interpreted as sequence. Most programs that accept a FASTA format file will allow blank spaces in the lines with sequence data. (^2) If you do not see this of the parameters that follow, click the arrow next to "Algorithm parameters to reveal these settings. Remember this in future exercises.

Page 2 Assignment 4 Name __key________________ When this is all verified, click any of the BLAST buttons. Eventually you should get the "Formatting Results" page with the BLAST results. It is a good idea to take a look at the page to see that it makes sense for what you thought you were trying to do. 2a. What is the RID (Request ID) for your search?^3 The RID allows you to access old search results. 1 pt S4_________ – SX_________ 2b. What was the length of the query sequence? 1 pt 132 2c. What is the number of sequences in the database searched?^4 1 pt 10,464,191 – 10,530, 2d. What is the number of amino acids (letters) in the database searched? 1 pt 3,569,800,998 – 3,591,264, 2e. What was the average length of the sequences in the database searched? (Calculate from the above values.) 1 pt 341 2f. For the highest-scoring (best) database sequence match, what is the score (in bits)? 1 pt 262 2g. The significance of a BLAST match to a database entry is summarized by the E -value (Expect). In words, the E -value is the expected number of times that a random query sequence would align with database sequences with a score greater than for the given match. For the highest-scoring database sequence match, what is the E -value? 1 pt 8e-69 or 8 x 10–69^ [but not 8e–69^ or 8 x e–69] 2h. How many database sequences have similarity to the query sequence with an E -value less than or equal to 10–5? 1 pt 500 or >500 or 1440– 2i. Based on the descriptions of the 10–20 highest-scoring sequences, how would you describe the function of the query protein? (That is, what would you call it if you wanted someone to understand the cellular role of the protein?) 3 pt 1 pt for [optional mitochondrial] 50S or 54S or mitochondrial 60S or mitochondrial large subunit 1 pt for ribosomal protein 1 pt for L14 or L 2j. What is the percent sequence identity of the query sequence to the highest scoring sequence that has the description you used to answer 2i? 1 pt varies over 50–60% 2k. From what organism does the genome with the highest-scoring sequence come? 1 pt Ustilago maydis 521 2l. To which domain of life (NCBI calls it a superkingdom) does this organism belong? 1 pt Eukaryota or Eucarya [not with Fungi included, as in Eukaryota; Fungi] (^3) Near the top for the results page, to the right of "NCBI/ BLAST/ blastp suite/ Formatting Results -". (^4) Look at "Other reports: Search Summary", just above "Graphic Summary".