Macromolecular Structure Visualization - Lab 6 | BCB 444 | Lab Reports Bioinformatics

BCB 444/544 Fall 06 Oct. 4th Lab 6 p. 1

BCB 444/544

Lab 6 Name _____________________________

Macromolecular Structure Visualization

Objectives

1. Learn about the protein structure resources available at the PDB and NCBI

2. Understand the portions of a PDB formatted structure file relevant to structure visualization

3. Learn to use some useful features of the structural visualization programs, PyMol and Cn3D

Introduction

“The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional

structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life

that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as

diseased humans. Understanding the shape of a molecule helps to understand how it works.”

This introduction from the PDB provides the motivation for today’s exercises. In order to better understand a

particular protein, it is important to be able to retrieve structures of interest from the PDB, and to be able to

manipulate how these structures are displayed in order to highlight regions of interest.

Exercises

Required questions are in red.

Note: If you were not able to attend the regularly scheduled lab section, it may help to review the background

lecture slides, which can be downloaded from the course webpage. Please feel free to ask a TA if you have any

questions regarding these slides.

*Certain features of PDB require popups, so you may need to turn off popup blocking for the time being.

Please ask a TA if you need help with this.

1) Querying the PDB (Protein Data Bank) To familiarize yourself with the PDB website, start by

viewing the PDB tutorial at: http://www.pdb.org/pdbstatic/tutorials/Main_Tutorial.swf We will now

practice querying the PDB using the dystrophin protein discussed in our previous lab on NCBI

resources.

a. Open your web browser, and enter http://pdb.org in the address bar

b. We will start with a simple keyword query. In the search bar, type dystrophin, then hit enter.

How many structure hits were found by this query? To narrow down the list of structures, we

will utilize the advanced search function of the PDB.

c. In the left menu, click on the Search tab, then click Search Database, then click Advanced

Search. Click on the “Choose a Query Type:” dropdown menu to bring up the list of available

queries. We will not be using most of these, but it is good to be aware of what is available, so

take a second to scroll down the list. Go back near the top of the list and select “Molecule Name”

under the “Structure Summary--” subheading. Type dystrophin in the box, then click Evaluate

Subquery. This feature will tell us how many structures would be returned by this particular

query. How many structures were found by this query?

d. Click Evaluate Query to view these results. We now see that the first result is the N-terminal

domain of dystrophin, with id: 1DXX. We will return to this structure in a moment, using the

Queries tab in the left menu to return to this query. For now we will perform a new query that

provides a better example of how you might use advanced search to obtain a set of proteins with

a desired property, rather than searching for one specific protein. In this example, let’s say we

want to find all structures of nucleic acid-binding proteins in which the protein is not currently

bound to nucleic acid.

Partial preview of the text

Download Macromolecular Structure Visualization - Lab 6 | BCB 444 and more Lab Reports Bioinformatics in PDF only on Docsity!

BCB 444/

Lab 6 Name _____________________________ Macromolecular Structure Visualization Objectives

Learn about the protein structure resources available at the PDB and NCBI
Understand the portions of a PDB formatted structure file relevant to structure visualization
Learn to use some useful features of the structural visualization programs, PyMol and Cn3D Introduction “The Protein Data Bank (PDB) is the single worldwide depository of information about the three-dimensional structures of large biological molecules, including proteins and nucleic acids. These are the molecules of life that are found in all organisms including bacteria, yeast, plants, flies, and mice, and in healthy as well as diseased humans. Understanding the shape of a molecule helps to understand how it works.” This introduction from the PDB provides the motivation for today’s exercises. In order to better understand a particular protein, it is important to be able to retrieve structures of interest from the PDB, and to be able to manipulate how these structures are displayed in order to highlight regions of interest. Exercises Required questions are in red. Note: If you were not able to attend the regularly scheduled lab section, it may help to review the background lecture slides, which can be downloaded from the course webpage. Please feel free to ask a TA if you have any questions regarding these slides.

**Certain features of PDB require popups, so you may need to turn off popup blocking for the time being. Please ask a TA if you need help with this.

Querying the PDB (Protein Data Bank)** To familiarize yourself with the PDB website, start by viewing the PDB tutorial at: http://www.pdb.org/pdbstatic/tutorials/Main_Tutorial.swf We will now practice querying the PDB using the dystrophin protein discussed in our previous lab on NCBI resources. a. Open your web browser, and enter http://pdb.org in the address bar b. We will start with a simple keyword query. In the search bar, type dystrophin , then hit enter. How many structure hits were found by this query? To narrow down the list of structures, we will utilize the advanced search function of the PDB. c. In the left menu, click on the Search tab, then click Search Database, then click Advanced Search. Click on the “Choose a Query Type:” dropdown menu to bring up the list of available queries. We will not be using most of these, but it is good to be aware of what is available, so take a second to scroll down the list. Go back near the top of the list and select “Molecule Name” under the “Structure Summary--” subheading. Type dystrophin in the box, then click Evaluate Subquery. This feature will tell us how many structures would be returned by this particular query. How many structures were found by this query? d. Click Evaluate Query to view these results. We now see that the first result is the N-terminal domain of dystrophin, with id: 1DXX. We will return to this structure in a moment, using the Queries tab in the left menu to return to this query. For now we will perform a new query that provides a better example of how you might use advanced search to obtain a set of proteins with a desired property, rather than searching for one specific protein. In this example, let’s say we want to find all structures of nucleic acid-binding proteins in which the protein is not currently bound to nucleic acid.

e. Go to the advanced search page as before and click the Clear All box to reset the form. Select “Molecular Function” under the “Biology & Chemistry--” subheading near the bottom of the menu. In the window that pops up, click on the triangle next to the word “binding”. Once the types of binding have loaded, click on the words “nucleic acid binding”. How many structures are found by this query? f. Since we wish to narrow this query to only those structures of proteins not bound to any nucleic acid, we need to add a query to further limit our search by clicking on the plus box to the right of the query. Select “Molecule / Chain Type” under “Structure Summary--”. From the “Contains Protein” menu, select yes. Select no from the other two menus. How many structures would be returned using only this subquery? g. Now evaluate the entire query. Next to the word “Results” in the left menu you can see the total number of structures that were found. How many structures were found? h. For some types of analysis, it is important to only use a set of proteins that does not have high sequence similarity between any two members in the data set. From the left menu, click Narrow Query, then click Remove Similar Structures, then click 50% Sequence Identity. How many structures are left? 2) Working with PDB results. We will now return to our previous query and examine the 1DXX structure in more detail. a. We will now return to our previous query by clicking the Queries tab in the left menu. This brings up the Query History, where you can retrieve previously obtained search results. Click on the “View Results” button for the molecule name query for dystrophin we entered earlier. Now click on the text 1DXX, or the image under the text. This brings up the Structure Summary page, which contains some interesting information about the protein. For our purposes, the most interesting information is the derived data on the lower portion of the page. SCOP and CATH are two different methods of categorizing proteins by structure. From here, we can find other proteins with similar structure by clicking on any of the links in these two sections. We can also search for proteins with similar function by clicking on the links in the GO Terms section. How many other proteins are defined as having the molecular function , “actin binding”? b. Another interesting summary about a structure can be found by clicking on the Sequence Details tab at the top of the page. Here we see a nice graphical representation of the secondary structure of dystrophin. What secondary structure is most prevalent in dystrophin? c. From here we can also download the sequence of our protein chain (or multiple chains if we were working with a complex of proteins with different sequences). Click on FASTA Sequence from the left menu, under “Download Files” to download the sequence of dystrophin. Open the file, then copy and paste the FASTA sequence for chain B only, including the comment line into your lab exercise document. 3) Displaying structures in PyMol (the fun part) a. You don’t need to read it now, but for future reference, the user guide can be found here: http://pymol.sourceforge.net/newman/user/toc.html b. To download the PDB file for 1DXX, click Download Files in the left menu, then click PDB text. Save this file where you can find it. This file contains the raw information about the protein structure, including the 3-D coordinates of nearly all of the atoms in the protein. Double click on the file to display our molecule in PyMol (or right click and select Open With… MacPyMol if double clicking only opens a text file). c. Take a moment to familiarize yourself with the mouse controls. Holding down the left mouse button while moving the mouse rotates the molecule, while holding the right mouse button while moving the mouse up and down zooms in and out on the molecule.

case, we want to show cartoon for the selection 1DXX. We can similarly hide lines by typing hide lines. Selections can be made using Boolean operators as well. For example, you can select chains C & D by typing select , chain C OR chain D. This selects all atoms that are either in chain C OR in chain D. (Stop for a moment to think about why we need to use OR, rather than AND to select both chains. Ask now if you don’t understand this, or you will probably miss a later question). Now type remove to remove these atoms. i. In the article in which the structure for 1DXX was published, the authors mentioned that some of the residues have been experimentally determined to bind actin. A region denoted ABS comprises residues 17–26, ABS2 comprises residues 88–116, and ABS3 comprises residues 131–148. Residues can be selected by residue number by using the selection name resi as we used the command resn previously. Make three separate selections using the select command. The color command can be used to color selection using the syntax color , . Color each selection a different color. j. Selections can also be composed by combining previously defined selections. Make a new selection called ABS, by using Boolean operators and your previously defined selections. Use this selection to display all known actin-binding residues as spheres. Save a copy of your working session by selecting “Save Session” under the file menu, and include it with your assignment. This command is useful for returning to an existing state that may otherwise be difficult for you to reach again (e.g. adjusting a prepared image). k. While the selections we have defined so far have been specific to the dystrophin, you may at some point have a more general scheme for manipulating the display of any arbitrary molecule. To do this, we can create a PyMol script file containing a list of commands to be executed. Open the log you saved at the start of this portion of the exercise, and copy the commands entered into a new plain text document, with one command on each line. You should only copy commands you have typed, and only those that resulted in the desired action. Save this file as dystrophin.pml and submit it along with your assignment. To demonstrate how to execute an existing script file, please quit and once again relaunch PyMol with 1DXX.pdb. Once the molecule is loaded, load the script using the command @. For example, ff you saved the .pml file to the desktop, the command would be @~/Desktop/dystrophin.pml. This can also be accomplished by going under the “File” menu and selecting “Run…”, and then navigating to the .pml file using the graphical menu. 4) Cn3D a. Though you aren’t required to submit anything for this part of the lab, you are strongly encouraged to investigate NCBI’s structure visualization tool Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml), which can be particularly useful when viewing structures for which there are multiple sequence variants available and for viewing portions of the structure for which there are existing sequence annotations. Optional additional sites to check out (not related to structure visualization): One widely used package for generating and using Hidden Markov Models is HMMER, which you can download and investigate from here: http://hmmer.janelia.org/. You’ll probably want to check out user guide in the documentation section. Some of the individual tools from the HMMER package are also available online at: http://bioweb.pasteur.fr/seqanal/motif/hmmer-uk.html. In the past, we’ve found this server to be a little slow, so don’t be surprised if your results take a little while to return. You may also be interested in some of the other tools linked from Sean Eddy’s lab website: http://selab.janelia.org/

Macromolecular Structure Visualization - Lab 6 | BCB 444, Lab Reports of Bioinformatics

Related documents

Partial preview of the text

Download Macromolecular Structure Visualization - Lab 6 | BCB 444 and more Lab Reports Bioinformatics in PDF only on Docsity!

BCB 444/