










































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Tseng; Class: ADV TOPC PROG LANG; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;
Typology: Study notes
1 / 50
This page cannot be seen from the preview
Don't miss anything!











































CMSC 838T – Lecture 6
0 Predict protein 3D structure from (amino acid) sequence 0 One step closer to useful biological knowledge 0 Sequence → secondary structure → 3D structure → function
5' atgcccaagctgaat … 3'
atg ccc aag ctg aat …
0 Assume in most cases, 3D structure → biological function O Lock & key model of enzyme function (docking) 0 Folding problem: protein sequence ⇔ 3D structure 0 Structure prediction → protein design, drug design, etc … 0 The “holy grail” of bioinformatics
0 Sequence information uniquely determines 3D structure 0 Sequence similarity (>50%) tends to imply structural similarity
0 DNA sequence data » protein sequence data » structure data
CMSC 838T – Lecture 6
0 Interatomic forces 0 Secondary structure 0 Tertiary structure
0 Secondary structure 0 3D structure O Ab initio O Comparative modeling O Threading
0 3D structure alignment 0 Protein docking
0 Residues can share similar biochemical properties
CMSC 838T – Lecture 6
0 Hydrogen bonding w/ H 2 O in solution O Non-polar residues interfere (hydrophobic) O Polar residues participate (hydrophillic) 0 Main cause of globular 3D protein → protect hydrophobic core
0 Electrostatic attractive force
0 Nonspecific electrostatic attractive force 0 From transitive attractions between instantaneous dipoles
0 Repulsive force between atomic nuclei
Covalent Bond
van der Waal Interaction
Hydrogen Bond
Charge-charge Interaction
Negative Charge
Positive Charge
No Charge
Disulfide Bond / Bridge
CMSC 838T – Lecture 6
0 Meaningful in context of protein sequence region 0 Hydrophobicity plot → track local hydrophobic residues 0 Many local hydrophobic residues → hydrophobic region
Human tumor antigen p (window = 19 residues)
0 Van der Waal’s (attractive, far) 0 Steric interaction (repulsive, close)
0 Plot of pair potential energy vs. distance 0 Local minima (energy well) is stable distance for two atoms Distance → Potential energy
CMSC 838T – Lecture 6
0 Hydrogen bond between C=O (carbonyl) & NH (amine) groups within strand (4 positions apart) 0 3.6 residues / turn, 1.5 Å rise / residue 0 Typically right hand turn 0 Most abundant secondary structure 0 α-helix formers: A,C,L,M,E,Q,H,K
0 Hydrogen bond between groups across strands 0 Forms parallel and antiparallel pleated sheets 0 Amino acids less compact – 3.5 Å between adjacent residues 0 Residues alternate above and below β-sheet 0 β-sheet formers: V,I,P,T,W
CMSC 838T – Lecture 6
0 Generally speaking, anything besides α-helix, β-sheet, β-turn
0 Short turn (4 residues) 0 Hydrogen bond between C=O & NH groups within strand (3 positions apart) 0 Usually polar, found near surface 0 β-turn formers: S,D,N,P,R
0 Regions between α-helices and β-sheets 0 On the surface, vary in length and 3D configurations 0 Do not have regular periodic structures 0 Loop formers: small polar residues
0 Secondary structure is very context-dependent O Relies on substructures in nearby environment O β-sheet stabilized by hydrogen bonds w/ other β-sheet O α-helix more stable on its own 0 Prediction is difficult
0 Can help predict 3D protein structure O Intermediate step in prediction O Use topology of secondary structures to predict 3D 0 Can help model protein folding process O Sequences first form secondary structures as frame O 3D structure formed by attaching substructures to frame
CMSC 838T – Lecture 6
0 Protein Data Bank (PDB) version 1.61, September 2002 0 17406 entries, 44327 domains
0 Globular 0 Membrane
0 Hydrophobic 0 Closely packed 0 Limited substitutions
0 Loops 0 Hydrophillic (usually >70%) 0 Contact with other molecules 0 More flexibility in substitutions
Hydrophobic core
Hydrophillic surface
Globular protein
CMSC 838T – Lecture 6
0 Oxygen transport
0 Transmembrane transport
CMSC 838T – Lecture 6
0 Can use individual or multiple sequence alignment (MSA) O Predict structure based on profile / consensus alignment
0 Lim – rules based on physicochemical nature of residues 0 Chou-Fasman – residue frequency statistics (6 AA window) 0 GOR – pairwise residue frequency statistics (17 AA window)
0 PHDsec – combines results from different neural networks 0 PSIPRED – combines neural network with PSI-BLAST result
0 JPRED – combine multiple prediction tools with MSA
0 Examine single protein sequence 0 Base prediction on O Statistics – composition of amino acids O Neural networks – patterns of amino acids
0 First create MSA O Use sequences from PSI-BLAST, CLUSTALW, etc… O Align sequence with related proteins in family 0 Predict secondary structure based on consensus / profile 0 Generally improves prediction 8-9%
CMSC 838T – Lecture 6
0 Amino acid sequence is sufficient information 0 Residues determine secondary structure 0 Examining small windows of 13 – 17 residues is sufficient 0 Finds α-helices, β-sheet, β-turn; everything else → coil
0 Statistics of local residue interactions within sliding window O Considers nearby residues
0 Secondary structural state of the central residue O Consider current predicted structure
window size = 6
CMSC 838T – Lecture 6
P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75 P(turn) 114 143 152 114 66 74 59 60 95 143 114 156
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
0 Until average P(α-helix) <100 for 4 contiguous amino acids
P(H) 69 77 57 69 142 151 121 145 98 77 69 57
P(H) 69 77 57 69 142 151 121 145 98 77 69 57 P(E) 147 75 55 147 83 37 130 105 93 75 147 75
CMSC 838T – Lecture 6
0 Build on Chou-Fasman P values 0 Assumes residues outside central region affect structure 0 Evaluates interaction of residue with 16 adjacent residues 0 Use 17 × 20 scoring matrix (17 residues, 20 amino acids) to calculate probability of α-helix, β-sheet, β-turn, coil
0 Predicts α-helix, β-sheet, coil 0 Also uses 17 residue sliding window 0 Calculates probabilities for all N (N-1) / 2 paired positions O Considers correlation between residues 0 Post-processing for α-helix, β-sheet minimum lengths
0 Predictions based on combination of O Amino acid population statistics for α-helix, β-sheet O Long-range interactions (hydrogen bonding) X Within strand – α-helix (N, N+4) X Between strands – β-sheet (statistics) 0 Use statistics on residues for hydrogen bonds in β-sheets 0 For multiple sequences O Uses pairwise local alignment to query sequence (instead of multiple sequence alignment)
CMSC 838T – Lecture 6
0 Use multiple neural networks & combine results O Average output O Majority decision Input
Output
0 Finds consensus from PHD, PREDATOR, DSC, NNSSP, etc…
: 1---------11--------21--------31-: OrigSeq : ASYKVTLKTPDGDNVITVPDDEYILDVAEEEGL: cons : --EEEEEE-----EEEE----HHHHHHHHH---: dsc : ---EEEEE------EEE-----HHHHHHHH---: mul : --EEEEEE-------EE-----HHHHHHHH---: nnssp : --EEEEEE-----EEEE----HHHHHHHHH---: phd : --EEEEEEE----EEEE---HHHHHHHHHH---: pred : ---EEEE------EEEE-----HHHHHHHH---: zpred : HHHEEEEE-------EE---HHHHHHHHHHH--: dssp : --EEEEEEE--EEEEEEE-----HHHHHHHH--: define : EEEEEEEE---EEEEEE-----HHHHHHHHHHE: stride : -EEEEEEEE--EEEEEEEE----HHHHHHHH--:
CMSC 838T – Lecture 6
0 Random guess (for 30% α-helices, 20% β-sheet) = 40% 0 Sequence-based: Chou-Fasman 50%, GOR 53%, GOR IV 64% 0 Alignment-based: PHD 71%, PREDATOR 75%, PSIPRED 77% 0 More accurate for α-helices than β-sheet
0 Secondary prediction is very difficult 0 Accuracy seems to be reaching limits 0 Absolute accuracy may not be necessary 0 Focus on usage of secondary structure prediction instead O Using secondary structure motifs to predict 3D structure
0 Transmembrane helices O 20-30 residues with strong hydrophobicity 0 Coiled coils O 2-3 α-helices coiled around each other in supercoil 0 Leucine zippers O Antiparallel α-helices held together by L interactions O Leucine residues spaced every 7 amino acids
0 Strands, stems, hairpin / interior / bulge loops, knots, etc…
0 Useful since some RNA perform protein-like functions