Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110, Assignments of Chemistry

University of California-Santa Cruz Chemistry

Material Type: Assignment; Class: Computational Biology Tools; Subject: Biomolecular Engineering; University: University of California-Santa Cruz; Term: Unknown 2008;

Typology: Assignments

Pre 2010

Uploaded on 08/19/2009

koofers-user-v75 🇺🇸

8 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

Two Sequence Alignment &

Scoring Matrices

BME 110: CompBio Tools

Todd Lowe

April 08, 2008

Discover Assignments of Chemistry University of California-Santa Cruz

Partial preview of the text

Download Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110 and more Assignments Chemistry in PDF only on Docsity!

Two Sequence Alignment &^ Scoring Matrices

BME 110: CompBio Tools

Todd Lowe April 08, 2008

Admin

Reading:^ – Chapter 3 should be completed^ – Chapter 5 for tuesday • Homework #1 due tomorrow (Fri) 5pm • Homework #2 assigned Tuesday

Full GenomeDot-Plot

Multiple Genome Alignment

Dot Plots P.calidifontis

P.arsenaticum

P.islandicum P.aerophilum^

P.aerophilum^

P.aerophilum

P.calidifontis

P.islandicum

P.islandicum P.arsenaticum^

P.arsenaticum^

P.calidifontis

Pair-wise Sequence Comparison • Basis for relating biological information from awell-studied gene to a new sequence • Many programs exist for pairwise comparison • Some specialize in fast

database searching and get “good” alignments^ – One sequence v. many thousands:^ • BLAST or FASTA • Some are much slower, but guarantee the“optimal alignment”^ – Smith-Waterman is the de facto standard

Dot-plots: Dotlet

http://myhits.isb-sib.ch/cgi-bin/dotlet Example: In Archaeal Genome browser, bring up

Pyrobaculum aerophilum Select CRISPR2 region (chr:45,423-46,754) to compare to CRISPR6-7region (chr:1,898,656-1,899,678) Get DNA, paste into Dotlet one at a time, giving descriptive labels,Zoom 1:5, Are there direct or inverted repeats in each CRISPR (against itself?) Relative to each other, are these direct or inverted repeats?

Assessing Alignment Significance Most Basic Rules of thumb: Two nucleotide sequences – at least 70%identical, they are likely homologous Two protein sequences – at least 25% identicalover 100 amino acid alignment Does not take into account precise length ofalignment, or number of gaps! Not sufficient to quantitatively rank hits from adatabase search

The “Twilight Zone” • Less than^ 25% sequence identity for twoprotein sequences • May still be homologous, but only similarityof 3-D protein structures can verify similarfunction (structural comparison tools todetect these discussed later in quarter) • Must have a good / near optimal alignmentfor most distantly related proteins

Dynamic Programming • Fancy term for type of algorithm used to get the“optimal” or best possible alignment between twosequences • Needleman and Wunsch (1970) most basic method – Gives the “global” (end to end) best alignment • Smith-Waterman based closely on this algorithm, butallows for “local” alignments (best subsequencematch only) • See simple example of Global v. Local alignments inbook, Figure 3.1 p.

Basic Example

Find best global alignment of twosequences:^ G^ A^

T^ C G T G^ C

Which is better? Match +1, Mismatch –1, Gap -

G^ A^ T^ C^

|^ |^

OR^ (Score

=^ 0)

G^ T^ G^ C G^ A^ T^ -^

C^ +1-1+1-1+

|^ |^

|^

(Score^ =

G^ -^ T^ G^

C

Moral: Scoring Model Matters!! • For DNA, model can be very simple: • +1 match, -1 mismatch • However, not all mutations have equallikelihood: • Transition: A<–>G

or^ C <–> T

more likely • Transversion: A<–>C

or^ G <–> T

-^ less likely

Same Matrix (*10) A C^ G^

T

A^6

C^1

G^2

T^1

Actual values not important, only values relative toeach other

Protein Matrices, Same Idea • Original: Dayhoff matrix aka PAM • PAM = Percent accepted mutations • Based on small number of

correctly^ aligned proteins • Simply count how often each amino acid issubstituted for another • Frequency of substitutions based on propertiesof amino acids relative to each other

Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110, Assignments of Chemistry

Related documents

Partial preview of the text

Download Lecture Slides on Two Sequence Alignment and Scoring Models | BME 110 and more Assignments Chemistry in PDF only on Docsity!

Two Sequence Alignment &^ Scoring Matrices

Admin

Full GenomeDot-Plot

Multiple Genome Alignment

Dot Plots P.calidifontis

Pair-wise Sequence Comparison • Basis for relating biological information from awell-studied gene to a new sequence • Many programs exist for pairwise comparison • Some specialize in fast

Dot-plots: Dotlet

Basic Example

T^ C G T G^ C

Which is better? Match +1, Mismatch –1, Gap -

G^ A^ T^ C^

|^ |^

OR^ (Score

=^ 0)

G^ T^ G^ C G^ A^ T^ -^

C^ +1-1+1-1+

|^ |^

|^

(Score^ =

G^ -^ T^ G^

C

Moral: Scoring Model Matters!! • For DNA, model can be very simple: • +1 match, -1 mismatch • However, not all mutations have equallikelihood: • Transition: A<–>G

Same Matrix (*10) A C^ G^

T

A^6

C^1

G^2

T^1

Protein Matrices, Same Idea • Original: Dayhoff matrix aka PAM • PAM = Percent accepted mutations • Based on small number of