









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Parsimony analysis in phylogeny.Maximum parsimonious tree is chosen in this method.EXERCISE PROBLEMS
Typology: Exercises
1 / 17
This page cannot be seen from the preview
Don't miss anything!










-! three general types -! distance : find tree that accounts for estimated evolutionary distances -! parsimony : find the tree that requires minimum number of changes to explain the data -! maximum likelihood : find the tree that maximizes the likelihood of the data
given : character-based data do : find tree that explains the data with a minimal number of changes
-! focus is on finding the right tree topology, not on estimating branch lengths
AAG AAA GGA AGA AAA AAA AGA AAG AGA AAA GGA AAA AAA AAA
-! there are various trees that could explain the phylogeny of the sequences AAG, AAA, GGA, AGA including these two: -! parsimony prefers the first tree because it requires fewer substitution events
-! brute force approach -! for each possible assignment of states to the internal nodes, calculate the number of changes -! report tne min number of changes found -! runtime is O ( NkN )! k = number of possible character states (4 for DNA) N = number of leaves
1.! traverse tree from leaves to root determining set of possible states (e.g. nucleotides) for each internal node 2.! traverse tree from root to leaves picking ancestral states for internal nodes
-! do a post-order (from leaves to root) traversal of tree -! determine possible states of internal node i with children j and k! i R -! this step calculates the number of changes required # of changes = # union operations , if , otherwise j k j k i j k R R R R R R R $!^ "^ =^ #% & & = (^) ' ( & " & ) *
-! [Sankoff & Cedergren, 1983] -! instead of assuming all state changes are equally likely, use different costs for different changes -! 1st step of algorithm is to propagate costs up through tree S ( a , b ) a! b
-! want to determine cost of assigning character to node i! -! for leaves: 0 , if is character at leaf ( ) , otherwise i a R a ! = (^) " $#
-! for an internal node i with children j and k : min ( ( ) ( , )) ( ) min ( ( ) ( , )) R b S a b R a R b S a b b k i b j
= + +
R 3 (^) [ A ] = !, R 3 (^) [ C ] = !, R 3 (^) [ G ] = 0 , R 3 [ T ]=! G T A 3 1 2 4 5 R 4 (^) [ A ] = !, R 4 (^) [ C ] = !, R 4 (^) [ G ] = !, R 4 [ T ] = 0 2 3 4 2 3 4 [ ] [ ] ( , ) [ ] ( , ) [ ] [ ] ( , ) [ ] ( , ) R A R G S A G R T S A T R T R G S T G R T S T T = + + + = + + + ! R 5 (^) [ A ] = 0 , R 5 (^) [ C ] = !, R 5 (^) [ G ] = !, R 5 [ T ]=! ( ) ( ) 1 2 2 5 1 2 2 5 [ ] min [ ] ( , ), , [ ] ( , ) [ ] ( , ) [ ] min [ ] ( , ), , [ ] ( , ) [ ] ( , ) R A R A S A A R T S A T R A S A A R T R A S T A R T S T T R A S T A = + + + + = + + + + … ! …
The minimal cost characters for node 1 are either g or t. The minimal cost character for node 3 is g. The maximum parsimony approach would prefer the other tree (exercise left to the reader).
3 3 3 3 ( ) 0 0. 8 0. 8 ( ) 0. 8 0 0. 8 ( ) 0. 2 0. 7 0. 9 ( ) 0. 9 0. 5 1. 4 R a R c R g R t = + = = + = = + = = + = a c g t a 0 0.8 0.2 0. c 0.8 0 0.7 0. g 0.2 0.7 0 0. t 0.9 0.5 0.1 0 t a c 3 6 4 5 1 1 1 1 1 ( ) 0. 9 min{ 0. 8 , 0. 8 0. 8 , 0. 3 0. 9 , 0. 9 1. 4 } 1. 7 ( ) 0. 5 min{ 0. 8 0. 8 , 0. 8 , 0. 7 0. 9 , 0. 5 1. 4 } 1. 3 ( ) 0. 1 min{ 0. 2 0. 8 , 0. 7 0. 8 , 0. 9 , 0. 1 1. 4 } 1. 0 ( ) 0 min{ 0. 9 0. 8 , 0. 5 0. 8 , 0. 1 0. 9 , 1. R a R c R g R t = + + + + = = + + + + = = + + + + = = + + + + 4 } = 1. 0 0.2!
-! we’ve considered how to find the minimum number of changes for a given tree topology -! need some search procedure for exploring the space of tree topologies
-! for any internal edge in a tree, there are 3 ways the four subtrees can be grouped -! nearest neighbor interchanges move from one grouping to another
given: set of leaves L! create an initial tree t incorporating all leaves in L! best-score = parsimony algorithm applied to t! repeat for each internal edge e in t! for each nearest neighbor interchange t ’! tree with interchange applied to edge e in t! score = parsimony algorithm applied to t ’! if score < best-score best-score = score best-tree = t ’! t = best-tree until stopping criteria met
given: set of leaves use heuristic method to grow initial tree ' initialize with a partial tree with 3 leaves from repeat tree in with lowest lower bound if has incorporated all l L t Q L t Q t ! eaves in return else create new trees by adding next leaf from to each branch of for each new tree if lower-bound( ) < score( ') L t L t n n t put n in Q sorted by lower bound
-! we described parsimony calculations in terms of rooted trees -! but we described the search procedures in terms of unrooted trees -! unweighted parsimony : minimum cost is independent of where root is located -! weighted parsimony : minimum cost is independent of root if substitution cost is a metric (refer back to definition of metric from distance-based methods)
-! it is a complete search method -! guaranteed to find optimal solution -! may be much more efficient than exhaustive search -! in the worst case, it is no better -! efficiency depends -! the tightness of the lower bound -! the quality of the initial tree
-! search space may be large, but -! can find the optimal tree efficiently in some cases -! heuristic methods can be applied -! difficult to evaluate inferred phylogenies: ground truth not usually known -! can look at agreement across different sources of evidence -! can look at repeatability across subsamples of the data -! can look at indirect predictions, e.g. conservation of sites in proteins -! some newer methods use data based on linear order of orthologous genes along chromosome -! phylogenies for bacteria, viruses not so straightforward because of lateral transfer of genetic material; “local” phylogenies might be more appropriate
D P T H t 5 t 4 t 3 t 2 t 1
-! actually we can do this without assuming particular amino-acid assignments at the internal nodes sum over all possibilities -! as before calculate the rate r that maximizes this expression
Rates estimated using 233 sequences Rates estimated using 34 sequences MP-ConSurf method Rate4Site method