Maximum Likelihood II - Lecture Notes | BSC 5936, Study notes of Biology

Material Type: Notes; Class: ST:TEACH/LEARN SCIEN; Subject: BIOLOGICAL SCIENCES; University: Florida State University; Term: Fall 2005;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-rwa
koofers-user-rwa 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Maximum likelihood II
Peter Beerli
October 3, 2005
1 Conditional likelihoods revisited
Some of this is already covered in the algorithm in the last chapter, but more elaboration on the
practical procedures and why are using these might give an even better understanding. This section
follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as
Prob(D(i)|T) = X
zX
yX
wX
x
Prob(A, A, C, G, G, w, y, x, z |T)|T) (1)
where T= (t1, t2, t3, t4, t5, t6, t7, t8). Each summation is over all 4 nucleotides. The above proba-
bility can be separated into
Prob(A, A, C, G, G, w, y, x, z |T)|T) =Prob(z)
×Prob(w|y, t3)Prob(A|w, t1)Prob(A|w, t2)
×Prob(y|t5, z)Prob(C|y, t4)
×Prob(x|z, t6)Prob(G|x, t7)Prob(G|x, t8)
Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are
easy to calculate and if we order the terms of the sum in formula 1 and move the summations as
far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,
A)),(G, G))
Prob(D(i)|T) =X
z
Prob(z) X
y
Prob(y|t5, z)Prob(C|y, t4)
×"X
w
Prob(w|y, t3)Prob(A|w, t1)Prob(A|w, t2)#!
× X
x
Prob(x|z, t6)Prob(G|x, t7)Prob(G|x, t8)!
1
pf3
pf4
pf5

Partial preview of the text

Download Maximum Likelihood II - Lecture Notes | BSC 5936 and more Study notes Biology in PDF only on Docsity!

Maximum likelihood II

Peter Beerli

October 3, 2005

1 Conditional likelihoods revisited

Some of this is already covered in the algorithm in the last chapter, but more elaboration on the

practical procedures and why are using these might give an even better understanding. This section

follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as

Prob(D(i)|T ) =

z

y

w

x

Prob(A, A, C, G, G, w, y, x, z|T )|T ) (1)

where T = (t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 ). Each summation is over all 4 nucleotides. The above proba-

bility can be separated into

Prob(A, A, C, G, G, w, y, x, z|T )|T ) =Prob(z)

× Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

× Prob(y|t 5 , z)Prob(C|y, t 4 )

× Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are

easy to calculate and if we order the terms of the sum in formula 1 and move the summations as

far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,

A)),(G, G))

Prob(D(i)|T ) =

z

Prob(z)

y

Prob(y|t 5 , z)Prob(C|y, t 4 )

×

[

w

Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )

])

×

x

Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )

Figure 1: A tree with branch length and data

This method was introduced by Felsenstein in 1973 as pruning but was known before that in

pedigree analyses and even long before that as a summation reduction device called Horner’s rule.

Even though it is named after William George Horner, who described the algorithm in 1819, it

was already known to Isaac Newton in 1669 and even to the Chinese mathematician Ch’in Chiu-

Shao around 1200s (Horner’s rule Wikipedia.com 2005). Using the tree structure and calculating

quantities (conditional likelihoods) on nodes we arrive at the solution outlined in the algorithm in

the last lecture.

2 Scaling of likelihoods

With large trees the conditional likelihoods will be very small and on finite machines we need to

scale the conditional likelihoods so that they do not underflow. This would have dire consequences

need only little recalculation. The conditional likelihood of the subtrees towards the tips did not

change by this action.

For the optimization, it seems, often Newton-Raphson minimization is used. This needs analytical

derivates (first and second) and might be unstable for some mutation models. Several other mini-

mization methods could be also used to find the optimal branch length.

[Mathematica examples]

4 Optimizing mutation models

Optimizing parameters in the mutation models are rather cumbersome. Every change in the model

is changing the likelihood on the tree. Branch length need to be re-optimized. Often in programs

like PAUP* one is not co-optimizing the tree and the model at the same time. but running

preliminary runs on fixed trees to optimize the mutation model then optimizing the tree, and

perhaps use a couple of cycles to hopefully converge to the best mutation model and best tree.

It might be possible to use minimization methods that use derivatives, such as Newton-Raphson

minimization, but the calculation of the derivatives might be rather difficult for complex models

and only numerical approximations might be available.

5 Ancestral state reconstruction

We can use the same technique as for finding the best branch length to find the site that contributes

most to the likelihood. We could in turn set the root close to all the nodes of interest and recalculate

the likelihood, but we have seen already that this not necessary as most of the tree does not change

on such an operation and only the “views” to the branch that contains the root needs to change.

These saving are similar to the ones one can do in parsimony reconstruction.

Scaling on trees (an example using Jukes-Cantor and the tree

(((A,A),C),(G,G))

ü Transition probability matrix

pii pij @@ t_t_ DD :: == 31 ê ê (^4) 4 +- 11 ê ê (^4) 4 Exp Exp @@-- 44 t t ê ê (^3) D 3 D p @ t_ 8 pij D : @= t (^8) D (^8) ,pii pij @@ tt DD ,, pij pii @@ t D t, D , pij pij @@ tt DD< , pij, 8 pij @ t D<@ t, D (^8) ,pij pij @@ tt DD ,, pii pij @ t @ t DD ,, pij pii @@ tt D , D<< pij @ t D< ,

ü Conditional likelihoods

condL pi @= gi_ p @ ,ti gj_ D ;, ti_, tj_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj hi (^) hj = Plus üü H pj gj L ; D

ü Likelihood of a single site

In[437]:= treelike @ gi_, basefreq_ D : = Log @ Plus üü H gi basefreq LD

ü Examples

condL @ 8 1 , 0 , 0 , 0 < , 8 0 , 0 , 0, 1 < , 0.01, 0.01 D {0.00330025, 0.0000109641, 0.0000109641, 0.00330025} condL @ %, 8 0 , 0 , 0 , 1 < , 0.01, 0.01 D {0.000010928, 1.08673 10 -7 , 1.08673 10-7 , 0.00328939} condL @ %, 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D {0.0000217123, 3.65449 10^ -8 , 3.65449 10-8 , 0.0000108559} scaler.nb 1

ü Example tree (((A,A),C),(G,G))

w = condL @ 8 1, 0 , 0 , 0 < , 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D y = condL @ w, 8 0, 1, 0, 0 < , 0.05, 0.06 D x = condL @ 8 0, 0, 1, 0 < , 8 0, 0 , 1 , 0 < , 0.02, 0.02 D z = condL @ y, x, 0.04, 0.08 D treelike @ z, 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , 0 D {0.993389, 0.0000109641, 0.0000109641, 0.0000109641} {0.018786, 0.0157197, 0.000308069, 0.000308069} {0.0000432775, 0.0000432775, 0.986886, 0.0000432775} {0.000468974, 0.000394289, 0.000727305, 0.0000189071} -7.

ü Truncation problem

condL @ 8 .00001, 0, 0 , 0 < , 8 0, 0 , 0 , .00001 < , 0.1, 0.1 D {3.02328 10^ -12 , 9.73856 10 -14 , 9.73856 10-14 , 3.02328 10-12 } Chop @ condL @ 8 .00001, 0 , 0 , 0 < , 8 0 , 0, 0, .00001 < , 0.1, 0.1 DD {0, 0, 0, 0} condL @ %, 8 1 , 0 , 0 , 0 < , 0.1, 0.1 D {0, 0, 0, 0}

ü Using a scaling factor

condLs pi =@ gi_p @ ti, (^) D gj_;, ti_, tj_, si_, sj_, scale_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj gp == hiPlus hj (^) ; üü H pj gj L ; D^ If @ scale^ ã^^1 ,^^8 gp^ ê^ Max @ gp D ,^ Log @ Max @ gp DD^ +^ si^ +^ sj < ,^^8 gp,^ si^ +^ sj <D treelikes @ gi_, basefreq_, scaler_ D : = Log @H Plus üü H gi basefreq LLD + scaler condLs @8 .00001, 0 , 0 , 0 < , 8 0 , 0 , 0 , .00001 < , 0.1, 0.1, 0 , 0 , 1 D {{1., 0.0322119, 0.0322119, 1.}, -26.5247} scaler.nb 2

ü Example tree (((A,A),C),(G,G)) using scaler

w = condLs @ 8 1 , 0 , 0 , 0 < , 8 1 , 0 , 0, 0 < , 0.01, 0.01, 0 , 0, 1 D y = condLs @ w @@ 1 DD , 8 0 , 1, 0, 0 < , 0.05, 0.06, 0, w @@ 2 DD , 0 D x = condLs @ 8 0 , 0 , 1 , 0 < , 8 0 , 0 , 1 , 0 < , 0.02, 0.02, 0 , 0 , 0 D z = condLs @ y @@ 1 DD , x @@ 1 DD , 0.04, 0.08, y @@ 2 DD , x @@ 2 DD , 1 D treelikes @ z @@ 1 DD , 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , z @@ 2 DDD {{1., 0.0000110371, 0.0000110371, 0.0000110371}, -0.00663341} {{0.018911, 0.0158243, 0.000310119, 0.000310119}, -0.00663341} {{0.0000432775, 0.0000432775, 0.986886, 0.0000432775}, 0} {{0.64481, 0.542123, 1., 0.0259961}, -7.22616} -7. scaler.nb 3

Figure 2: Scaling of conditional likelihoods [Example with mathematica]