Download Maximum Likelihood II - Lecture Notes | BSC 5936 and more Study notes Biology in PDF only on Docsity!
Maximum likelihood II
Peter Beerli
October 3, 2005
1 Conditional likelihoods revisited
Some of this is already covered in the algorithm in the last chapter, but more elaboration on the
practical procedures and why are using these might give an even better understanding. This section
follows the Felsenstein book pages 251-255. We express the likelihood of the tree in Figure 1 as
Prob(D(i)|T ) =
z
y
w
x
Prob(A, A, C, G, G, w, y, x, z|T )|T ) (1)
where T = (t 1 , t 2 , t 3 , t 4 , t 5 , t 6 , t 7 , t 8 ). Each summation is over all 4 nucleotides. The above proba-
bility can be separated into
Prob(A, A, C, G, G, w, y, x, z|T )|T ) =Prob(z)
× Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )
× Prob(y|t 5 , z)Prob(C|y, t 4 )
× Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )
Prob(z) at the root is often assumed to depend on the stationary base frequencies. All parts are
easy to calculate and if we order the terms of the sum in formula 1 and move the summations as
far right as possible we get a summation pattern that uses the same structure as our tree ((C, (A,
A)),(G, G))
Prob(D(i)|T ) =
z
Prob(z)
y
Prob(y|t 5 , z)Prob(C|y, t 4 )
×
[
w
Prob(w|y, t 3 )Prob(A|w, t 1 )Prob(A|w, t 2 )
])
×
x
Prob(x|z, t 6 )Prob(G|x, t 7 )Prob(G|x, t 8 )
Figure 1: A tree with branch length and data
This method was introduced by Felsenstein in 1973 as pruning but was known before that in
pedigree analyses and even long before that as a summation reduction device called Horner’s rule.
Even though it is named after William George Horner, who described the algorithm in 1819, it
was already known to Isaac Newton in 1669 and even to the Chinese mathematician Ch’in Chiu-
Shao around 1200s (Horner’s rule Wikipedia.com 2005). Using the tree structure and calculating
quantities (conditional likelihoods) on nodes we arrive at the solution outlined in the algorithm in
the last lecture.
2 Scaling of likelihoods
With large trees the conditional likelihoods will be very small and on finite machines we need to
scale the conditional likelihoods so that they do not underflow. This would have dire consequences
need only little recalculation. The conditional likelihood of the subtrees towards the tips did not
change by this action.
For the optimization, it seems, often Newton-Raphson minimization is used. This needs analytical
derivates (first and second) and might be unstable for some mutation models. Several other mini-
mization methods could be also used to find the optimal branch length.
[Mathematica examples]
4 Optimizing mutation models
Optimizing parameters in the mutation models are rather cumbersome. Every change in the model
is changing the likelihood on the tree. Branch length need to be re-optimized. Often in programs
like PAUP* one is not co-optimizing the tree and the model at the same time. but running
preliminary runs on fixed trees to optimize the mutation model then optimizing the tree, and
perhaps use a couple of cycles to hopefully converge to the best mutation model and best tree.
It might be possible to use minimization methods that use derivatives, such as Newton-Raphson
minimization, but the calculation of the derivatives might be rather difficult for complex models
and only numerical approximations might be available.
5 Ancestral state reconstruction
We can use the same technique as for finding the best branch length to find the site that contributes
most to the likelihood. We could in turn set the root close to all the nodes of interest and recalculate
the likelihood, but we have seen already that this not necessary as most of the tree does not change
on such an operation and only the “views” to the branch that contains the root needs to change.
These saving are similar to the ones one can do in parsimony reconstruction.
Scaling on trees (an example using Jukes-Cantor and the tree
(((A,A),C),(G,G))
ü Transition probability matrix
pii pij @@ t_t_ DD :: == 31 ê ê (^4) 4 +- 11 ê ê (^4) 4 Exp Exp @@-- 44 t t ê ê (^3) D 3 D p @ t_ 8 pij D : @= t (^8) D (^8) ,pii pij @@ tt DD ,, pij pii @@ t D t, D , pij pij @@ tt DD< , pij, 8 pij @ t D<@ t, D (^8) ,pij pij @@ tt DD ,, pii pij @ t @ t DD ,, pij pii @@ tt D , D<< pij @ t D< ,
ü Conditional likelihoods
condL pi @= gi_ p @ ,ti gj_ D ;, ti_, tj_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj hi (^) hj = Plus üü H pj gj L ; D
ü Likelihood of a single site
In[437]:= treelike @ gi_, basefreq_ D : = Log @ Plus üü H gi basefreq LD
ü Examples
condL @ 8 1 , 0 , 0 , 0 < , 8 0 , 0 , 0, 1 < , 0.01, 0.01 D {0.00330025, 0.0000109641, 0.0000109641, 0.00330025} condL @ %, 8 0 , 0 , 0 , 1 < , 0.01, 0.01 D {0.000010928, 1.08673 10 -7 , 1.08673 10-7 , 0.00328939} condL @ %, 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D {0.0000217123, 3.65449 10^ -8 , 3.65449 10-8 , 0.0000108559} scaler.nb 1
ü Example tree (((A,A),C),(G,G))
w = condL @ 8 1, 0 , 0 , 0 < , 8 1 , 0 , 0 , 0 < , 0.01, 0.01 D y = condL @ w, 8 0, 1, 0, 0 < , 0.05, 0.06 D x = condL @ 8 0, 0, 1, 0 < , 8 0, 0 , 1 , 0 < , 0.02, 0.02 D z = condL @ y, x, 0.04, 0.08 D treelike @ z, 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , 0 D {0.993389, 0.0000109641, 0.0000109641, 0.0000109641} {0.018786, 0.0157197, 0.000308069, 0.000308069} {0.0000432775, 0.0000432775, 0.986886, 0.0000432775} {0.000468974, 0.000394289, 0.000727305, 0.0000189071} -7.
ü Truncation problem
condL @ 8 .00001, 0, 0 , 0 < , 8 0, 0 , 0 , .00001 < , 0.1, 0.1 D {3.02328 10^ -12 , 9.73856 10 -14 , 9.73856 10-14 , 3.02328 10-12 } Chop @ condL @ 8 .00001, 0 , 0 , 0 < , 8 0 , 0, 0, .00001 < , 0.1, 0.1 DD {0, 0, 0, 0} condL @ %, 8 1 , 0 , 0 , 0 < , 0.1, 0.1 D {0, 0, 0, 0}
ü Using a scaling factor
condLs pi =@ gi_p @ ti, (^) D gj_;, ti_, tj_, si_, sj_, scale_ D : = Module @8< , pj hi == p @ Plustj D ; üü H pi gi L ; hj gp == hiPlus hj (^) ; üü H pj gj L ; D^ If @ scale^ ã^^1 ,^^8 gp^ ê^ Max @ gp D ,^ Log @ Max @ gp DD^ +^ si^ +^ sj < ,^^8 gp,^ si^ +^ sj <D treelikes @ gi_, basefreq_, scaler_ D : = Log @H Plus üü H gi basefreq LLD + scaler condLs @8 .00001, 0 , 0 , 0 < , 8 0 , 0 , 0 , .00001 < , 0.1, 0.1, 0 , 0 , 1 D {{1., 0.0322119, 0.0322119, 1.}, -26.5247} scaler.nb 2
ü Example tree (((A,A),C),(G,G)) using scaler
w = condLs @ 8 1 , 0 , 0 , 0 < , 8 1 , 0 , 0, 0 < , 0.01, 0.01, 0 , 0, 1 D y = condLs @ w @@ 1 DD , 8 0 , 1, 0, 0 < , 0.05, 0.06, 0, w @@ 2 DD , 0 D x = condLs @ 8 0 , 0 , 1 , 0 < , 8 0 , 0 , 1 , 0 < , 0.02, 0.02, 0 , 0 , 0 D z = condLs @ y @@ 1 DD , x @@ 1 DD , 0.04, 0.08, y @@ 2 DD , x @@ 2 DD , 1 D treelikes @ z @@ 1 DD , 8 1 ê 4, 1 ê 4 , 1 ê 4 , 1 ê 4 < , z @@ 2 DDD {{1., 0.0000110371, 0.0000110371, 0.0000110371}, -0.00663341} {{0.018911, 0.0158243, 0.000310119, 0.000310119}, -0.00663341} {{0.0000432775, 0.0000432775, 0.986886, 0.0000432775}, 0} {{0.64481, 0.542123, 1., 0.0259961}, -7.22616} -7. scaler.nb 3
Figure 2: Scaling of conditional likelihoods [Example with mathematica]