Basic Local Alignment Search Tool - Computational Biology Tools | BME 110, Exams of Chemistry

Material Type: Exam; Class: Computational Biology Tools; Subject: Biomolecular Engineering; University: University of California-Santa Cruz; Term: Unknown 1989;

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-rhx
koofers-user-rhx 🇺🇸

5

(1)

9 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BLAST
Basic Local Alignment Search Tool (1990)
Altschul, Gish, Miller, Myers, & Lipman
Uses short-cuts or “heuristics” to improve
search speed
Like speed-reading, does not examine every
nucleotide of database
However, many more choices (parameters) to
make to adjust search success (over 30!!)
Varieties of BLAST
Program Database Query Typical uses
BLASTN Nucleotide Nucleotide
Mapping oligonucleotides,
cDNAs, and PCR products
to a genome; screening
repetitive elements; cross-
species sequence
exploration; annotating
genomic DNA; clustering
sequencing reads; vector
clipping
BLASTP Protein Protein
Identifying common regions
between proteins;
collecting related proteins
for phylogenetic analyses
BLASTX Protein Nucleotide translated into protein
Finding protein-coding genes in
genomic DNA; determining
if a cDNA corresponds to a
known protein
TBLASTN Nucleotide translated into protein Protein
Identifying transcripts, potentially
from multiple organisms,
similar to a given protein;
mapping a protein to
genomic DNA
TBLASTX Nucleotide translated into protein Nucleotide translated into protein
Cross-species gene prediction at
the genome or transcript
level; searching for genes
missed by traditional
methods or not yet in
protein databases
From: BLAST by Joseph Bedell, Ian Korf, Mark Yandell; O’Rielly 2003
pf3
pf4
pf5
pf8

Partial preview of the text

Download Basic Local Alignment Search Tool - Computational Biology Tools | BME 110 and more Exams Chemistry in PDF only on Docsity!

BLAST

  • Basic Local Alignment Search Tool ( 1990 ) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or “heuristics” to improve search speed Like speed-reading, does not examine every nucleotide of database However, many more choices (parameters) to make to adjust search success (over 30!!)

Varieties of BLAST

Program Database Query Typical uses BLASTN Nucleotide Nucleotide Mapping oligonucleotides, cDNAs, and PCR products to a genome; screening repetitive elements; cross- species sequence exploration; annotating genomic DNA; clustering sequencing reads; vector clipping BLASTP Protein Protein Identifying common regions between proteins; collecting related proteins for phylogenetic analyses BLASTX Protein Nucleotide translated into protein Finding protein-coding genes in genomic DNA; determining if a cDNA corresponds to a known protein TBLASTN Nucleotide translated into protein Protein Identifying transcripts, potentially from multiple organisms, similar to a given protein; mapping a protein to genomic DNA TBLASTX Nucleotide translated into protein Nucleotide translated into protein Cross-species gene prediction at the genome or transcript level; searching for genes missed by traditional methods or not yet in protein databases From: BLAST by Joseph Bedell, Ian Korf, Mark Yandell; O’Rielly 2003

How Does it Work?

  • Searches for short exact (nucleotide) or near-exact matches aka “neighborhoods” (protein) of certain “word” lengths
  • Defaults:
    • Blastp: 3 amino acids
    • Blastn: 11 nucleotides
  • Without an initial word match, can MISS possibly important matches
  • With “seed hit”, tries to extend alignment in both directions
  • A fully extended hit is an HSP (high scoring pair)

Translated BLAST

  • DNA-> protein, 3 reading frames upper sequence + 3 reading frames lower sequence
  • Important when you don’t know or trust gene sequencing quality or annotation
  • Translating in all possible frames gives additional sensitivity, avoids reading frame errors due to incorrect gene prediction or sequencing errors

Re: The “Twilight Zone”

  • Less than 25% sequence identity for two

protein sequences

  • May still be homologous, but only similarity

of 3-D protein structures can verify similar

function (structural comparison tools to

detect these discussed later in quarter)

Quantitative Assessment of

Significance: E-values & P-values

Expressions of the same thing: E-value : number of expected random (not biological) matches in a given db search Examples: 0.001, 0.1, 1.0, 10, 100, 1000 P-value: probability that this hit is random (not biological) Examples: 0.1, 0.05, 0.0001, 1x10-3, 1x10-^60

E/P - values

Mathematical conversion between them: P-value = 1 – e-(E-value)

For values < 0.01, E-value and P-value are

nearly the same

Takes into account:

  • Length of sequence similarity
  • Conservation of aligned nucleotides/amino acids
  • Number and length of insertions and deletions
  • Sizes of query sequence and database you are searching

What is Reliable?

  • In biology P-value of 0.05 expect would be “good enough” (5 chances in 100 of not being correlated)
  • Due to BLAST’s estimation of significance, shouldn’t blindly trust P or E values > 1x10-
  • Note: Even with a “good” E-value, the match may be between paralogs with different function! Examine alignment for local areas of high similarity (are these known domains from CDD search?)
  • For good measure, I don’t have great confidence unless E value is less than 1x10-^8

Why would you ever use BLASTN

if BlastP is more Sensitive?

  • Non-translated sequences (RNA genes,

promotors, etc)

  • Closely related species, where you

expect sequence identity > 70%

A Related Note: Homology

  • Based on inference that two sequences

are ancestrally derived from same

molecule

  • If two sequences have high similarity , they

may be inferred to be homologous

  • It is WRONG to say two sequences or

genes are 80 % homologous (they either

are related, or they are not)

Homology: Same Function?

  • Even if two sequences are ancestrally

derived from same molecule, they may

or may not still have the same function

  • Orthologs: homologous genes created by speciation - Generally implies function remains the same
  • Paralogs: homologous genes created by a gene duplication event (in same species) - Implies function may have changed

Homology Diagram

Source: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html