






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The importance of sequence alignment in recognizing common sequences and homology between dna and proteins. It covers various types of mutations and their probabilities, alignment methods, and scoring strategies. The document also introduces the concept of point accepted mutation (pam) and amino acid pair probabilities, which are essential for estimating the probability of homology.
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







A. Homologous proteins
i. Evolutionary common origin ii. Structural similarity iii. Functional similarity
B. Conserved regions
i. Functional domains ii. Evolutionary similarity iii. Structural motif
The score for aligning s and s’ is based on the comparison of the hypothesis that the two sequences are generated randomly with the hypothesis that they come from a common ancestor. Assume q (^) A is the probability of producing amino acid A in model R (based on the relative frequency at which A is found in proteins). The probability for the null hypothesis (that s and s’ do not stem from a common ancestor) is
∏ ∏ ∏ ≤≤
′ ≤≤
′ ≤≤
i n
si si in
si in
si 1
, , 1
, 1
P(s,s |R) q, q q q
The second hypothesis (homologous hypothesis) that s and s’ arise from a common ancestor sequence r, of length n, is based on the evolutionary model (E). The probability that the amino acids A and B are aligned and hence have been derived from an ancestor amino acid C is given by pA,B is given by
≤≤
′ = ′ i n
sis i 1
P(s,s |E) p, ,
How this probability is determined will be explained later.
The odds ratio compares the homologous hypothesis with the null hypothesis
∏ ∏
∏
≤ ≤ ′
′
≤≤
′
≤≤
′ = = ′
′ i n si si
sisi
in
si si
in
sisi Pss R
Pss E (^1) , ,
, ,
1
, ,
1
, , q q
p q q
p
(, | )
(, | )
To achieve a scoring function that is additive rather that multiplicative, the log odds ratio can be used
A B
AB q q
p s (^) A,B = log
PAM and Amino Acid Pair
Probabilities
We now have which is the relative frequency of a
pair (A,B) in the alignment of s and s’ where
n (^) AB(s,s’) is the number of times the amino acids A
and B are aligned in one column in the alignment
of s and s’ and n is the length of s and s’.
To find a value for n (^) AB, some homologous
sequences are needed. To do this Dayhoff and co-
workers used local sequence alignment.
PAM and Amino Acid Pair
Probabilities
Problem – They used sequence alignment to find a substitution matrix (substitution score matrix) for sequence alignment – which comes first, the chicken or the egg?
Answer – Use only very closely related sequence (sequences differ in at most 15% of the amino acid.
Caveat – The substitution matrix is only valid for closely related protein sequences