Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Sequence Alignment: DNA and Protein, Scoring Functions and Homology, Study notes of Bioinformatics

George Mason University (GMU)Bioinformatics

The importance of sequence alignment in recognizing common sequences and homology between dna and proteins. It covers various types of mutations and their probabilities, alignment methods, and scoring strategies. The document also introduces the concept of point accepted mutation (pam) and amino acid pair probabilities, which are essential for estimating the probability of homology.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-aje 🇺🇸

9 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

BINF 730

Lecture 2

Sequence Alignment

DNA Sequence Alignment –

Why?

Recognition sites might be common –

restriction enzymes, start sequences, stop

sequences, other regulatory sequences

Homology – evolutionary common

progenitor

Discover Study notes of Bioinformatics George Mason University (GMU)

Partial preview of the text

Download Sequence Alignment: DNA and Protein, Scoring Functions and Homology and more Study notes Bioinformatics in PDF only on Docsity!

BINF 730

Lecture 2

Sequence Alignment

DNA Sequence Alignment –

Why?

Recognition sites might be common –

restriction enzymes, start sequences, stop

sequences, other regulatory sequences

Homology – evolutionary common

progenitor

Mutations

-Deletions

-Insertions

-Transitional Substitution (purine-purine

A-G, pyr-pyr T-C)

-Translational Substitution (pur-pyr,

pyr-pur)

Example

Start with ACGTACGT after 9540

generations with the following probabilities:

Deletion 0.

Insertion 0.

Transitional substitution 0.

Translational substitution 0.

Example

or using Gotoh’s algorithm with mismatch

penalty 3 and gap penalty function g(k) =

2+2k for length k gap

ACACG - - GTCCTAATAATGGCC

CAGGAAGATCT - - TAGTT - - C

The alignment depends on algorithm used!

Protein sequence alignment

A. Homologous proteins

i. Evolutionary common origin ii. Structural similarity iii. Functional similarity

B. Conserved regions

i. Functional domains ii. Evolutionary similarity iii. Structural motif

Example 3.

Choosing the best alignment

•Every alignment has a score

•Chose alignment with highest score

•Must choose appropriate scoring function

•Scoring function based on evolutionary

model with insertions, deletions, and

substitutions

•Use substitution score matrix – contains an

entry for every amino acid pair

Statistical approach

Let s and s’ be two amino acid sequences

of length n that we want to compute an

alignment score

Assume only substitutions occur (no

insertions or deletions)

Works for local alignment
Odds Ratio and Log Odds Ratio

Odds Ratio and Log Odds Ratio

The score for aligning s and s’ is based on the comparison of the hypothesis that the two sequences are generated randomly with the hypothesis that they come from a common ancestor. Assume q (^) A is the probability of producing amino acid A in model R (based on the relative frequency at which A is found in proteins). The probability for the null hypothesis (that s and s’ do not stem from a common ancestor) is

∏ ∏ ∏ ≤≤

′ ≤≤

i n

si si in

si in

si 1

, , 1

, 1

P(s,s |R) q, q q q

Odds Ratio and Log Odds Ratio

The second hypothesis (homologous hypothesis) that s and s’ arise from a common ancestor sequence r, of length n, is based on the evolutionary model (E). The probability that the amino acids A and B are aligned and hence have been derived from an ancestor amino acid C is given by pA,B is given by

≤≤

′ = ′ i n

sis i 1

P(s,s |E) p, ,

How this probability is determined will be explained later.

Odds Ratio and Log Odds Ratio

The odds ratio compares the homologous hypothesis with the null hypothesis

∏ ∏

∏

≤ ≤ ′

′

≤≤

′

≤≤

′ = = ′

′ i n si si

sisi

si si

sisi Pss R

Pss E (^1) , ,

, ,

, , q q

p q q

(, | )

To achieve a scoring function that is additive rather that multiplicative, the log odds ratio can be used

A B

AB q q

p s (^) A,B = log

PAM and Amino Acid Pair

Probabilities

We now have which is the relative frequency of a

pair (A,B) in the alignment of s and s’ where

n (^) AB(s,s’) is the number of times the amino acids A

and B are aligned in one column in the alignment

of s and s’ and n is the length of s and s’.

To find a value for n (^) AB, some homologous

sequences are needed. To do this Dayhoff and co-

workers used local sequence alignment.

PAM and Amino Acid Pair

Probabilities

Problem – They used sequence alignment to find a substitution matrix (substitution score matrix) for sequence alignment – which comes first, the chicken or the egg?

Answer – Use only very closely related sequence (sequences differ in at most 15% of the amino acid.

Caveat – The substitution matrix is only valid for closely related protein sequences

Sequence Alignment: DNA and Protein, Scoring Functions and Homology, Study notes of Bioinformatics

Related documents

Partial preview of the text

Download Sequence Alignment: DNA and Protein, Scoring Functions and Homology and more Study notes Bioinformatics in PDF only on Docsity!

BINF 730

Lecture 2

Sequence Alignment

DNA Sequence Alignment –

Why?

Recognition sites might be common –

restriction enzymes, start sequences, stop

sequences, other regulatory sequences

Homology – evolutionary common

progenitor

Mutations

-Deletions

-Insertions

-Transitional Substitution (purine-purine

A-G, pyr-pyr T-C)

-Translational Substitution (pur-pyr,

pyr-pur)

Example

Start with ACGTACGT after 9540

generations with the following probabilities:

Deletion 0.

Insertion 0.

Transitional substitution 0.

Translational substitution 0.

Example

or using Gotoh’s algorithm with mismatch

penalty 3 and gap penalty function g(k) =

2+2k for length k gap

ACACG - - GTCCTAATAATGGCC

The alignment depends on algorithm used!

Protein sequence alignment

Example 3.

Choosing the best alignment

•Every alignment has a score

•Chose alignment with highest score

•Must choose appropriate scoring function

•Scoring function based on evolutionary

model with insertions, deletions, and

substitutions

•Use substitution score matrix – contains an

entry for every amino acid pair

Statistical approach

of length n that we want to compute an

alignment score

insertions or deletions)

Odds Ratio and Log Odds Ratio

Odds Ratio and Log Odds Ratio

Odds Ratio and Log Odds Ratio