Alignment File Format-Biogenetics and Computers-Lecture Slides, Slides of Biogenetics and Computers

Rajaram Purshotam Joshi delivered this lecture at All India Institute of Medical Sciences for Biogenetics and Computers course. It includes: Alignment, File, Format, Protein, Code, Line, Identifier, Specification, Identifier, Residue, Chain, Crystallographic

Typology: Slides

2011/2012

Uploaded on 07/11/2012

dhanesh
dhanesh 🇮🇳

4.4

(39)

159 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Alignment File Format
>P1;5fd1
structureX:5fd1:1 : :106 : :ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAI
FSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*
1) First line: Specifies the protein code after the >P1; line identifier.
2) Second line:
Field 1: A specification whether structure is available or not.
valid values: structureX, structureM, structure, sequence
Field 2: PDB Code
Field 3-6: identifiers for the first residue, chain id, last residue and its chain id.
Field 7: Protein Name
Field 8: Source of Protein
Field 9: Resolution of crystallographic analysis
Field 10: R-Factor
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Alignment File Format-Biogenetics and Computers-Lecture Slides and more Slides Biogenetics and Computers in PDF only on Docsity!

Alignment File Format

P1;5fd structureX:5fd1:1 : :106 : :ferredoxin:Azotobacter vinelandii: 1.90: 0. AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAI FSEDEVPEDMQEFIQLNAELA EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

  1. First line: Specifies the protein code after the >P1; line identifier.

  2. Second line:

Field 1: A specification whether structure is available or not.

valid values: structureX, structureM, structure, sequence

Field 2: PDB Code

Field 3-6: identifiers for the first residue, chain id, last residue and its chain id.

Field 7: Protein Name

Field 8: Source of Protein

Field 9: Resolution of crystallographic analysis

Field 10: R-Factor

Residue Numbers and Chain Ids

Residue Numbers:

Any real number or one of the following

:

FIRST: means first residue number LAST: means last residue number END: Last Residue in PDB file @ will match any residue number and chain id @:@ first residue of first chain

Chain Ids:

Actual Chain ids or @: meaning any chain

Alignment File Format

Template format of second line

the template structures must have at least the first two fields specified

Eg:

  • structure:pdb_file is equivalent to
  • structure:pdb_file: : : : : : : :

Target Format of Second Line

the target sequence must have the first field filled in.

  • sequence

Script File

• MODELLER is a

– command-line only tool

– has no graphical user interface

– You must provide it with a script file

containing MODELLER commands in Python

– You can use the examples provided in

modeller

– OR learn python

– http://www.python.org/doc/2.3.5/tut/

Environ class

• contains information about the MODELLER

environment,

• default values for many functions

• Usually it is the first class to be used in a

MODELLER script, as it provides methods to

create the other main classes.

• You can assign the new environ object to the

Python variable 'env' with the following:

env = environ()

Log object

• The log object allows you to control the amount

of information output to the MODELLER log file.

log.verbose()

This instructs MODELLER to display all log

output.

log.minimal()

This instructs MODELLER to only display

important outputs, and errors.

Automodel() class

• Prepare to build one or more comparative models

• Input parameters:

  • “env” : object of environ() class
  • alnfile : file containing template sequence

alignment in PIR format

  • knowns: template files in alignment
  • sequence: sequence code in alignment file of

the target

Automodel eg.

a = automodel(env,

alnfile = 'TvLDH-4mdhA.ali', # alignment filename

knowns = '4mdhA', # codes of the templates

sequence = 'TvLDH') # code of the target

automodel.make()

  • To build all models
  • You should call this command after

creating an automodel object and

setting any desired parameters, to then

go ahead and build all models.

  • Eg:

a.make()

SCRIPT FILES

• EASY WAY

UNDERSTAND AND MODIFY EXAMPLE

PROGRAMS WHERE NEEDED

Running Modeller

Run MODELLER itself by typing the following at

the command prompt:

> mod8v1 model-default.py

Model File:

A number of intermediary files are created as the

program proceeds

After about 30 seconds on a Pentium IV

workstation, the final 1fdx

model is written to file 1fdx.B99990001.pdb.

Understanding results

  • To find out how good is the model, you have to inspect

the log file.

  • Search for the following keywords for information
  • Check_a : to find out about the alignment
  • _W> : to check for warning messages
  • _E> : to check for the error messages
  • If everything is OK so far, the most important part of the

log file is the output of the model.energy() command for

each model. This is where the violations of restraints are

listed. When there are too many too violated restraints,

more optimization or a different alignment is needed.

LOG FILE – Template alignment

check_ali___> Checking pairwise structural superpositions.

Equivalent CA pairs with distance difference larger than 6. angstroms:

ALN_POS TMPL1 TMPL2 RID1 RID2 NAM1 NAM2 DIST


337 1 2 332 324 I A 11. 95 2 3 90 96 K G 6. 96 2 3 91 97 N P 7. 97 2 3 92 98 A G 7. 98 2 3 93 99 A M 7. 99 2 3 94 100 K E 6. 337 2 3 324 329 A G 9.

END OF TABLE

check_ali___> Checking the sequence-structure alignment.

Search for check_ali in log file

Log file - Target Template

Alignment

  • Check alignment

runcmd______> alignment.check() check_a_343_> >> BEGINNING OF COMMAND openf5__224_> Open 11 OLD SEQUENTIAL ./\4mdhA.pdb Dynamically allocated memory at amaxstructure [B,kB,MB]: 2645671 2583.663 2. openf5__224_> Open 11 OLD SEQUENTIAL ./\4mdhA.pdb check_ali___> Checking the sequence-structure alignment. Implied target CA(i)-CA(i+1) distances longer than 8.0 angstroms: ALN_POS TMPL RID1 RID2 NAM1 NAM2 DIST


END OF TABLE check_a_344_> << END OF COMMAND

Alignment position template residue id 1 residue id 2 name1 name2 dist