Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110, Assignments of Chemistry

Material Type: Assignment; Class: Computational Biology Tools; Subject: Biomolecular Engineering; University: University of California-Santa Cruz; Term: Unknown 2008;

Typology: Assignments

Pre 2010

Uploaded on 08/19/2009

koofers-user-v75
koofers-user-v75 🇺🇸

8 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Two Sequence Alignment &
Scoring Matrices
BME 110: CompBio Tools
Todd Lowe
April 08, 2008
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110 and more Assignments Chemistry in PDF only on Docsity!

Two Sequence Alignment &^ Scoring Matrices

BME 110: CompBio Tools

Todd Lowe April 08, 2008

Admin

  • Reading:^ – Chapter 3 should be completed^ – Chapter 5 for tuesday • Homework #1 due tomorrow (Fri) 5pm • Homework #2 assigned Tuesday

Full GenomeDot-Plot

Multiple Genome Alignment

Dot Plots P.calidifontis

P.arsenaticum

P.islandicum P.aerophilum^

P.aerophilum^

P.aerophilum

P.calidifontis

P.islandicum

P.islandicum P.arsenaticum^

P.arsenaticum^

P.calidifontis

Pair-wise Sequence Comparison • Basis for relating biological information from awell-studied gene to a new sequence • Many programs exist for pairwise comparison • Some specialize in fast

database searching and get “good” alignments^ – One sequence v. many thousands:^ • BLAST or FASTA • Some are much slower, but guarantee the“optimal alignment”^ – Smith-Waterman is the de facto standard

Dot-plots: Dotlet

http://myhits.isb-sib.ch/cgi-bin/dotlet Example: In Archaeal Genome browser, bring up

Pyrobaculum aerophilum Select CRISPR2 region (chr:45,423-46,754) to compare to CRISPR6-7region (chr:1,898,656-1,899,678) Get DNA, paste into Dotlet one at a time, giving descriptive labels,Zoom 1:5, Are there direct or inverted repeats in each CRISPR (against itself?) Relative to each other, are these direct or inverted repeats?

Assessing Alignment Significance Most Basic Rules of thumb: Two nucleotide sequences – at least 70%identical, they are likely homologous Two protein sequences – at least 25% identicalover 100 amino acid alignment Does not take into account precise length ofalignment, or number of gaps! Not sufficient to quantitatively rank hits from adatabase search

The “Twilight Zone” • Less than^ 25% sequence identity for twoprotein sequences • May still be homologous, but only similarityof 3-D protein structures can verify similarfunction (structural comparison tools todetect these discussed later in quarter) • Must have a good / near optimal alignmentfor most distantly related proteins

  • Dynamic Programming • Fancy term for type of algorithm used to get the“optimal” or best possible alignment between twosequences • Needleman and Wunsch (1970) most basic method – Gives the “global” (end to end) best alignment • Smith-Waterman based closely on this algorithm, butallows for “local” alignments (best subsequencematch only) • See simple example of Global v. Local alignments inbook, Figure 3.1 p.

Basic Example

  • Find best global alignment of twosequences:^ G^ A^

T^ C G T G^ C

Which is better? Match +1, Mismatch –1, Gap -

G^ A^ T^ C^

|^ |^

OR^ (Score

=^ 0)

G^ T^ G^ C G^ A^ T^ -^

C^ +1-1+1-1+

|^ |^

|^

(Score^ =

G^ -^ T^ G^

C

Moral: Scoring Model Matters!! • For DNA, model can be very simple: • +1 match, -1 mismatch • However, not all mutations have equallikelihood: • Transition: A<–>G

or^ C <–> T

  • more likely • Transversion: A<–>C

or^ G <–> T

-^ less likely

Same Matrix (*10) A C^ G^

T
A^6
C^1
G^2
T^1

Actual values not important, only values relative toeach other

Protein Matrices, Same Idea • Original: Dayhoff matrix aka PAM • PAM = Percent accepted mutations • Based on small number of

correctly^ aligned proteins • Simply count how often each amino acid issubstituted for another • Frequency of substitutions based on propertiesof amino acids relative to each other