Determining Protein Structure from Sequence: Computational Approaches, Study notes of Bioinformatics

The importance of protein structure determination and the challenges in measuring it. It introduces various methods for predicting protein structure from the amino acid sequence, including sequence similarity methods, secondary structure prediction algorithms, and tertiary structure prediction methods. The document also covers the limitations of these methods and introduces energy minimization, molecular dynamics, and stochastic searches as alternative approaches.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-r2v
koofers-user-r2v 🇺🇸

9 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Determining Protein Structure from
Sequence using Computational
Approaches
M. Saleet Jafri
Program in Bioinformatics and Computational Biology
George Mason University
and
Medical Biotechnology Center
University of Maryland Biotechnology Institute
Protein Structure Why do we care?
Structure Function Relation The shape of a protein
molecule directly determines its biological function.
Proteins with similar function often have similar shape or
similar regions or domains.
Hence, if we find a new protein and know it’s shape, we
can make a good guess about it’s biological function.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Determining Protein Structure from Sequence: Computational Approaches and more Study notes Bioinformatics in PDF only on Docsity!

Determining Protein Structure from

Sequence using Computational

Approaches

M. Saleet Jafri

Program in Bioinformatics and Computational Biology George Mason University and Medical Biotechnology Center University of Maryland Biotechnology Institute

Protein Structure – Why do we care?

  • Structure Function Relation – The shape of a protein molecule directly determines its biological function.
  • Proteins with similar function often have similar shape or similar regions or domains.
  • Hence, if we find a new protein and know it’s shape, we can make a good guess about it’s biological function.

Protein Databases

  • As of June 2000, 12,500 protein structures have been deposited into the Protein Data Bank (PDB) and 86, protein sequence entries were contained in SwissProt protein sequence database.
  • This is a 1:7 ratio – relatively few structures are known.
  • The number of sequence will increase much faster than the number of structures due to advances in sequencing.

Protein Basic Structure

  • A protein is made of a chain of amino acids.
  • There are 20 amino acids found in nature
  • Each amino acid is coded in the DNA by one or more codons, i.e. a three base sequence.

Measuring Protein Structure

  • Determining protein structure directly is difficult
  • X-ray diffraction studies – must first be able to crystallize the protein and then calculate its structure by the way it disperses X-rays.

From http://www.uni-wuerzburg.de/mineralogie/crystal/teaching/inv_a.html

Measuring Protein Structure

  • NMR – Use nuclear magnetic resonance to predict distances between different functional groups in a protein in solution. Calculate possible structures using these distances.

http://www.cis.rit.edu/htbooks/nmr/inside.htm

Why not stick to these methods?

  • X-ray Diffraction –
    • Only a small number of proteins can be made to form crystals.
    • A crystal is not the protein’s native environment.
    • Very time consuming.
  • NMR Distance Measurement –
    • Not all proteins are found in solution.
    • This method generally looks at isolated proteins rather than protein complexes.
    • Very time consuming.

Four Levels of Protein Structure

  • Primary Structure – Sequence of amino acids
  • Secondary Structure – Local Structure such as? -helices and? -sheets.
  • Tertiary Structure – Arrangement of the secondary structural elements to give 3-dimensional structure of a protein
  • Quaternary Structure – Arrangement of the subunits to give a protein complex its 3-dimensional structure.

Secondary Structure Prediction

Algorithms

  • These methods are 70-75% accurate at predicting secondary structure.
  • A few examples are
    • Chou Fasman Algorithm
    • Garnier-Osguthorpe-Robson (GOR) method
    • Neural network models
    • Nearest-neighbor method

Chou-Fasman Algorithm

  • Analyzed the frequency of the 20 amino acids in? helices, ? sheets and turns.
  • Ala (A), Glu (E), Leu (L), and Met (M) are strong predictors of? helices.
  • Pro (P) and Gly (G) break? helices.
  • When 4 of 5 amino acids have a high probability of being in an? helix, it predicts a? helix.
  • When 3 of 5 amino acids have a high probability of being in a? strand, it predicts a? strand.
  • 4 amino acids are used to predict turns.

Garnier-Osguthorpe-Robson Method

  • Chou-Fasman assumes that each individual amino acid influences secondary structure.
  • GOR assumes the the amino acids flanking the central amino acid also influence the secondary structure.
  • Hence, it uses a window of 17 amino acids (8 on each side of the central amino acid).
  • Each amino acid in the window acts independently on influencing structure (to save computational time).
  • Certain pair-wise combinations of amino acids in the window also contribute to influencing structure.

Hydrophobicity/Hydrophilicity Plots

  • Charge amino acids are hydrophilic, i.e. Asp (D), Glu (E), Lys (K), Arg (R).
  • Uncharged amino acids are hydrophobic, i.e. Ala (A), Leu (L) Ile (I), Val (V), Phe (F), Trp (W), Met (M), Pro (P).
  • In an? helix, hydrophobic amino acids might line up on one side, which suggests that that side is on the interior of a protein or protein complex. From Bioinformatics: Sequence and Genome Analysis by David Mount – Helicalwheel plot by GCG

Nearest Neighbor Method

  • Like neural networks, this is another machine learning approach to secondary structure prediction.
  • A very large list of short sequence fragments is made by sliding a window (n=16) along a set of 100-400 training sequences of know structure but with minimal similarity.
  • A same-size window is selected from the query sequence and the 50 best matching sequences are found.
  • The frequencies of the of the secondary structure of the middle amino acid in each of the matching fragments is used to predict the secondary structure of the middle amino acid in the query window.
  • Can be very accurate (up to 86%).

Energy Potential Functions

  • Contains terms for electrostatic interatction, van der Wals forces, hydrogen bonding, bond angle and bond length energies.
  • Common software packages have their own implementation: Charmm, ECEPP, Amber, Gromos, and CVF.
  • Structural predictions only as good as the assumptions upon which it is based (mainly the energy potential function).

Bonded Terms

Bond Length

Ebond-length =? (^) bonds kb(r – r 0 )^2

Bond Angle

Ebond-angle =? (^) angle k? (? –? 0 )^2 r

Bonded Terms

Dihedral Angle

Edihedral-angle =? (^) dihedrals K? (1 + cos [n? ?R)-??

Energy Minimization

  • Assumes that proteins are found at or near the lowest energy conformation.
  • Uses a empirical function that describes the interaction of different parts of the protein with each other (energy potential function).
  • Searches conformation space to find the global minimum using optimization techniques such as steepest descents and conjugate gradients.
  • To avoid the multiple- minima problem, approaches such as dynamic programming, or simulated annealing have been used.

Molecular Dynamics

Fi = miai force by Newton’s Second Law of Motion

ai = dvi/dt acceleration

vi = dri/dt velocity

-dE/dri = Fi Work = force x distance

-dE/dri = mi d^2 ri/dt^2 put it all together

Molecular Dynamics

  • Model System – Choose protein model, energy potential function, ensemble, and boundary conditions.
  • Initial Conditions – Need initial positions of the atoms, an initial distribution of the velocities (assume no momentum i.e.? (^) i mivi = 0), and the acceleration which is determined by the potential energy function.
  • Boundary Conditions – If water molecules are not being explicitly included in the potential function, the solvent boundary conditions must be imposed. The water molecules must not diffuse away from the protein. Also, usually a limited number of solvent molecules are included.

Molecular Dynamics

  • Integration Algorithm – Solve the equations of motion with an algorithm that conserves energy and momentum, is computationally efficient, and allows a large time step. Examples: - Verlet Algorithm - Leap-frog Algorithm - Velocity Verlet - Beeman’s algorithm
  • Constraints