Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Maximum Likelihood II - Lecture Notes | BSC 5936, Study notes of Biology

Florida State University (FSU)Biology

Material Type: Notes; Class: ST:TEACH/LEARN SCIEN; Subject: BIOLOGICAL SCIENCES; University: Florida State University; Term: Fall 2005;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-rwa 🇺🇸

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Maximum likelihood II

Peter Beerli

October 3, 2005

1 Conditional likelihoods revisited

Some of this is already covered in the algorithm in the last chapter, but more elaboration on the

practical procedures and why are using these might give an even better understanding. This section

follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as

Prob(D(i)|T) = X

zX

yX

wX

x

Prob(A, A, C, G, G, w, y, x, z |T)|T) (1)

where T= (t1, t2, t3, t4, t5, t6, t7, t8). Each summation is over all 4 nucleotides. The above proba-

bility can be separated into

Prob(A, A, C, G, G, w, y, x, z |T)|T) =Prob(z)

×Prob(w|y, t3)Prob(A|w, t1)Prob(A|w, t2)

×Prob(y|t5, z)Prob(C|y, t4)

×Prob(x|z, t6)Prob(G|x, t7)Prob(G|x, t8)

Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are

easy to calculate and if we order the terms of the sum in formula 1 and move the summations as

far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,

A)),(G, G))

Prob(D(i)|T) =X

z

Prob(z) X

y

Prob(y|t5, z)Prob(C|y, t4)

×"X

w

Prob(w|y, t3)Prob(A|w, t1)Prob(A|w, t2)#!

× X

x

Prob(x|z, t6)Prob(G|x, t7)Prob(G|x, t8)!

1

Discover Study notes of Biology Florida State University (FSU)

Partial preview of the text

Download Maximum Likelihood II - Lecture Notes | BSC 5936 and more Study notes Biology in PDF only on Docsity!

Maximum likelihood II

Peter Beerli

October 3, 2005

1 Conditional likelihoods revisited

Some of this is already covered in the algorithm in the last chapter, but more elaboration on the

practical procedures and why are using these might give an even better understanding. This section

follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as

Prob(D(i)|T ) =

z

y

w

x

Prob(A, A, C, G, G, w, y, x, z|T )|T ) (1)

where T = (t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 ). Each summation is over all 4 nucleotides. The above proba-

bility can be separated into

Prob(A, A, C, G, G, w, y, x, z|T )|T ) =Prob(z)

× Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

× Prob(y|t 5 , z)Prob(C|y, t 4 )

× Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are

easy to calculate and if we order the terms of the sum in formula 1 and move the summations as

far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,

A)),(G, G))

Prob(D(i)|T ) =

z

Prob(z)

y

Prob(y|t 5 , z)Prob(C|y, t 4 )

×

[

w

Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

])

×

x

Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Figure 1: A tree with branch length and data

This method was introduced by Felsenstein in 1973 as pruning but was known before that in

pedigree analyses and even long before that as a summation reduction device called Horner’s rule.

Even though it is named after William George Horner, who described the algorithm in 1819, it

was already known to Isaac Newton in 1669 and even to the Chinese mathematician Ch’in Chiu-

Shao around 1200s (Horner’s rule Wikipedia.com 2005). Using the tree structure and calculating

quantities (conditional likelihoods) on nodes we arrive at the solution outlined in the algorithm in

the last lecture.

2 Scaling of likelihoods

With large trees the conditional likelihoods will be very small and on finite machines we need to

scale the conditional likelihoods so that they do not underflow. This would have dire consequences

need only little recalculation. The conditional likelihood of the subtrees towards the tips did not

change by this action.

For the optimization, it seems, often Newton-Raphson minimization is used. This needs analytical

derivates (first and second) and might be unstable for some mutation models. Several other mini-

mization methods could be also used to find the optimal branch length.

[Mathematica examples]

4 Optimizing mutation models

Optimizing parameters in the mutation models are rather cumbersome. Every change in the model

is changing the likelihood on the tree. Branch length need to be re-optimized. Often in programs

like PAUP* one is not co-optimizing the tree and the model at the same time. but running

preliminary runs on fixed trees to optimize the mutation model then optimizing the tree, and

perhaps use a couple of cycles to hopefully converge to the best mutation model and best tree.

It might be possible to use minimization methods that use derivatives, such as Newton-Raphson

minimization, but the calculation of the derivatives might be rather difficult for complex models

and only numerical approximations might be available.

5 Ancestral state reconstruction

We can use the same technique as for finding the best branch length to find the site that contributes

most to the likelihood. We could in turn set the root close to all the nodes of interest and recalculate

the likelihood, but we have seen already that this not necessary as most of the tree does not change

on such an operation and only the “views” to the branch that contains the root needs to change.

These saving are similar to the ones one can do in parsimony reconstruction.

Scaling on trees (an example using Jukes-Cantor and the tree

(((A,A),C),(G,G))

ü Transition probability matrix

pii pij @@ t_t_ DD :: == 31 ê ê (^4) 4 +- 11 ê ê (^4) 4 Exp Exp @@-- 44 t t ê ê (^3) D 3 D p @ t_ 8 pij D : @= t (^8) D (^8) ,pii pij @@ tt DD ,, pij pii @@ t D t, D , pij pij @@ tt DD< , pij, 8 pij @ t D<@ t, D (^8) ,pij pij @@ tt DD ,, pii pij @ t @ t DD ,, pij pii @@ tt D , D<< pij @ t D< ,

ü Conditional likelihoods

condL pi @= gi_ p @ ,ti gj_ D ;, ti_, tj_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj hi (^) hj = Plus üü H pj gj L ; D

ü Likelihood of a single site

In[437]:= treelike @ gi_, basefreq_ D : = Log @ Plus üü H gi basefreq LD

ü Examples

condL @ 8 1 , 0 , 0 , 0 < , 8 0 , 0 , 0, 1 < , 0.01, 0.01 D {0.00330025, 0.0000109641, 0.0000109641, 0.00330025} condL @ %, 8 0 , 0 , 0 , 1 < , 0.01, 0.01 D {0.000010928, 1.08673 10 -7 , 1.08673 10-7 , 0.00328939} condL @ %, 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D {0.0000217123, 3.65449 10^ -8 , 3.65449 10-8 , 0.0000108559} scaler.nb 1

ü Example tree (((A,A),C),(G,G))

w = condL @ 8 1, 0 , 0 , 0 < , 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D y = condL @ w, 8 0, 1, 0, 0 < , 0.05, 0.06 D x = condL @ 8 0, 0, 1, 0 < , 8 0, 0 , 1 , 0 < , 0.02, 0.02 D z = condL @ y, x, 0.04, 0.08 D treelike @ z, 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , 0 D {0.993389, 0.0000109641, 0.0000109641, 0.0000109641} {0.018786, 0.0157197, 0.000308069, 0.000308069} {0.0000432775, 0.0000432775, 0.986886, 0.0000432775} {0.000468974, 0.000394289, 0.000727305, 0.0000189071} -7.

ü Truncation problem

condL @ 8 .00001, 0, 0 , 0 < , 8 0, 0 , 0 , .00001 < , 0.1, 0.1 D {3.02328 10^ -12 , 9.73856 10 -14 , 9.73856 10-14 , 3.02328 10-12 } Chop @ condL @ 8 .00001, 0 , 0 , 0 < , 8 0 , 0, 0, .00001 < , 0.1, 0.1 DD {0, 0, 0, 0} condL @ %, 8 1 , 0 , 0 , 0 < , 0.1, 0.1 D {0, 0, 0, 0}

ü Using a scaling factor

condLs pi =@ gi_p @ ti, (^) D gj_;, ti_, tj_, si_, sj_, scale_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj gp == hiPlus hj (^) ; üü H pj gj L ; D^ If @ scale^ ã^^1 ,^^8 gp^ ê^ Max @ gp D ,^ Log @ Max @ gp DD^ +^ si^ +^ sj < ,^^8 gp,^ si^ +^ sj <D treelikes @ gi_, basefreq_, scaler_ D : = Log @H Plus üü H gi basefreq LLD + scaler condLs @8 .00001, 0 , 0 , 0 < , 8 0 , 0 , 0 , .00001 < , 0.1, 0.1, 0 , 0 , 1 D {{1., 0.0322119, 0.0322119, 1.}, -26.5247} scaler.nb 2

ü Example tree (((A,A),C),(G,G)) using scaler

w = condLs @ 8 1 , 0 , 0 , 0 < , 8 1 , 0 , 0, 0 < , 0.01, 0.01, 0 , 0, 1 D y = condLs @ w @@ 1 DD , 8 0 , 1, 0, 0 < , 0.05, 0.06, 0, w @@ 2 DD , 0 D x = condLs @ 8 0 , 0 , 1 , 0 < , 8 0 , 0 , 1 , 0 < , 0.02, 0.02, 0 , 0 , 0 D z = condLs @ y @@ 1 DD , x @@ 1 DD , 0.04, 0.08, y @@ 2 DD , x @@ 2 DD , 1 D treelikes @ z @@ 1 DD , 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , z @@ 2 DDD {{1., 0.0000110371, 0.0000110371, 0.0000110371}, -0.00663341} {{0.018911, 0.0158243, 0.000310119, 0.000310119}, -0.00663341} {{0.0000432775, 0.0000432775, 0.986886, 0.0000432775}, 0} {{0.64481, 0.542123, 1., 0.0259961}, -7.22616} -7. scaler.nb 3

Maximum Likelihood II - Lecture Notes | BSC 5936, Study notes of Biology

Related documents

Partial preview of the text

Download Maximum Likelihood II - Lecture Notes | BSC 5936 and more Study notes Biology in PDF only on Docsity!

Maximum likelihood II

Peter Beerli

October 3, 2005

1 Conditional likelihoods revisited

Some of this is already covered in the algorithm in the last chapter, but more elaboration on the

practical procedures and why are using these might give an even better understanding. This section

follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as

Prob(D(i)|T ) =

z

y

w

x

Prob(A, A, C, G, G, w, y, x, z|T )|T ) (1)

where T = (t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 ). Each summation is over all 4 nucleotides. The above proba-

bility can be separated into

Prob(A, A, C, G, G, w, y, x, z|T )|T ) =Prob(z)

× Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

× Prob(y|t 5 , z)Prob(C|y, t 4 )

× Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are

easy to calculate and if we order the terms of the sum in formula 1 and move the summations as

far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,

A)),(G, G))

Prob(D(i)|T ) =

z

Prob(z)

y

Prob(y|t 5 , z)Prob(C|y, t 4 )

×

[

w

Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

])

×

x

Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Figure 1: A tree with branch length and data

This method was introduced by Felsenstein in 1973 as pruning but was known before that in

pedigree analyses and even long before that as a summation reduction device called Horner’s rule.

Even though it is named after William George Horner, who described the algorithm in 1819, it

was already known to Isaac Newton in 1669 and even to the Chinese mathematician Ch’in Chiu-

Shao around 1200s (Horner’s rule Wikipedia.com 2005). Using the tree structure and calculating

quantities (conditional likelihoods) on nodes we arrive at the solution outlined in the algorithm in

the last lecture.

With large trees the conditional likelihoods will be very small and on finite machines we need to

scale the conditional likelihoods so that they do not underflow. This would have dire consequences

need only little recalculation. The conditional likelihood of the subtrees towards the tips did not

change by this action.

For the optimization, it seems, often Newton-Raphson minimization is used. This needs analytical

derivates (first and second) and might be unstable for some mutation models. Several other mini-

mization methods could be also used to find the optimal branch length.

[Mathematica examples]

Optimizing parameters in the mutation models are rather cumbersome. Every change in the model

is changing the likelihood on the tree. Branch length need to be re-optimized. Often in programs

like PAUP* one is not co-optimizing the tree and the model at the same time. but running

preliminary runs on fixed trees to optimize the mutation model then optimizing the tree, and

perhaps use a couple of cycles to hopefully converge to the best mutation model and best tree.

It might be possible to use minimization methods that use derivatives, such as Newton-Raphson

minimization, but the calculation of the derivatives might be rather difficult for complex models

and only numerical approximations might be available.

We can use the same technique as for finding the best branch length to find the site that contributes

most to the likelihood. We could in turn set the root close to all the nodes of interest and recalculate

the likelihood, but we have seen already that this not necessary as most of the tree does not change

on such an operation and only the “views” to the branch that contains the root needs to change.

These saving are similar to the ones one can do in parsimony reconstruction.

Scaling on trees (an example using Jukes-Cantor and the tree

(((A,A),C),(G,G))

ü Transition probability matrix

ü Conditional likelihoods

ü Likelihood of a single site

ü Examples

ü Example tree (((A,A),C),(G,G))

ü Truncation problem

ü Using a scaling factor

ü Example tree (((A,A),C),(G,G)) using scaler

Figure 2: Scaling of conditional likelihoods [Example with mathematica]