Parsimony analysis phylogeny, Exercises of Biology

Parsimony analysis in phylogeny.Maximum parsimonious tree is chosen in this method.EXERCISE PROBLEMS

Typology: Exercises

2019/2020

Uploaded on 02/18/2020

akshita-baiju-gopal
akshita-baiju-gopal 🇮🇳

1 document

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Parsimony-Based Approaches to
Inferring Phylogenetic Trees
BMI/CS 576
www.biostat.wisc.edu/bmi576.html
Mark Craven
Fall 2011
Phylogenetic tree approaches
!three general types
!distance: find tree that accounts for estimated
evolutionary distances
!parsimony: find the tree that requires minimum
number of changes to explain the data
!maximum likelihood: find the tree that maximizes
the likelihood of the data
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Parsimony analysis phylogeny and more Exercises Biology in PDF only on Docsity!

Parsimony-Based Approaches to

Inferring Phylogenetic Trees

BMI/CS 576

www.biostat.wisc.edu/bmi576.html

Mark Craven

[email protected]

Fall 2011

Phylogenetic tree approaches

-! three general types -! distance : find tree that accounts for estimated evolutionary distances -! parsimony : find the tree that requires minimum number of changes to explain the data -! maximum likelihood : find the tree that maximizes the likelihood of the data

Parsimony based approaches

given : character-based data do : find tree that explains the data with a minimal number of changes

-! focus is on finding the right tree topology, not on estimating branch lengths

Parsimony example

AAG AAA GGA AGA AAA AAA AGA AAG AGA AAA GGA AAA AAA AAA

-! there are various trees that could explain the phylogeny of the sequences AAG, AAA, GGA, AGA including these two: -! parsimony prefers the first tree because it requires fewer substitution events

Finding minimum number of changes

for a given tree

-! brute force approach -! for each possible assignment of states to the internal nodes, calculate the number of changes -! report tne min number of changes found -! runtime is O ( NkN )! k = number of possible character states (4 for DNA) N = number of leaves

Fitch’s Algorithm [1971]

1.! traverse tree from leaves to root determining set of possible states (e.g. nucleotides) for each internal node 2.! traverse tree from root to leaves picking ancestral states for internal nodes

Fitch’s algorithm: Step 1

possible states for internal nodes

-! do a post-order (from leaves to root) traversal of tree -! determine possible states of internal node i with children j and k! i R -! this step calculates the number of changes required # of changes = # union operations , if , otherwise j k j k i j k R R R R R R R $!^ "^ =^ #% & & = (^) ' ( & " & ) *

Fitch’s algorithm: step 1 example

C T G T A T
{GT}
{AGT}
{T}
{C T} !" {A G T} = {T}
{C} #" {T} = {CT}

Weighted parsimony

-! [Sankoff & Cedergren, 1983] -! instead of assuming all state changes are equally likely, use different costs for different changes -! 1st step of algorithm is to propagate costs up through tree S ( a , b ) a! b

Weighted parsimony

-! want to determine cost of assigning character to node i! -! for leaves: 0 , if is character at leaf ( ) , otherwise i a R a ! = (^) " $#

Ri ( a ) a

Weighted parsimony

-! for an internal node i with children j and k : min ( ( ) ( , )) ( ) min ( ( ) ( , )) R b S a b R a R b S a b b k i b j

= + +

a! b

a

b

Example: weighted parsimony

R 3 (^) [ A ] = !, R 3 (^) [ C ] = !, R 3 (^) [ G ] = 0 , R 3 [ T ]=! G T A 3 1 2 4 5 R 4 (^) [ A ] = !, R 4 (^) [ C ] = !, R 4 (^) [ G ] = !, R 4 [ T ] = 0 2 3 4 2 3 4 [ ] [ ] ( , ) [ ] ( , ) [ ] [ ] ( , ) [ ] ( , ) R A R G S A G R T S A T R T R G S T G R T S T T = + + + = + + + ! R 5 (^) [ A ] = 0 , R 5 (^) [ C ] = !, R 5 (^) [ G ] = !, R 5 [ T ]=! ( ) ( ) 1 2 2 5 1 2 2 5 [ ] min [ ] ( , ), , [ ] ( , ) [ ] ( , ) [ ] min [ ] ( , ), , [ ] ( , ) [ ] ( , ) R A R A S A A R T S A T R A S A A R T R A S T A R T S T T R A S T A = + + + + = + + + + … ! …

The minimal cost characters for node 1 are either g or t. The minimal cost character for node 3 is g. The maximum parsimony approach would prefer the other tree (exercise left to the reader).

Weighted Parsimony Example

3 3 3 3 ( ) 0 0. 8 0. 8 ( ) 0. 8 0 0. 8 ( ) 0. 2 0. 7 0. 9 ( ) 0. 9 0. 5 1. 4 R a R c R g R t = + = = + = = + = = + = a c g t a 0 0.8 0.2 0. c 0.8 0 0.7 0. g 0.2 0.7 0 0. t 0.9 0.5 0.1 0 t a c 3 6 4 5 1 1 1 1 1 ( ) 0. 9 min{ 0. 8 , 0. 8 0. 8 , 0. 3 0. 9 , 0. 9 1. 4 } 1. 7 ( ) 0. 5 min{ 0. 8 0. 8 , 0. 8 , 0. 7 0. 9 , 0. 5 1. 4 } 1. 3 ( ) 0. 1 min{ 0. 2 0. 8 , 0. 7 0. 8 , 0. 9 , 0. 1 1. 4 } 1. 0 ( ) 0 min{ 0. 9 0. 8 , 0. 5 0. 8 , 0. 1 0. 9 , 1. R a R c R g R t = + + + + = = + + + + = = + + + + = = + + + + 4 } = 1. 0 0.2!

Exploring the space of trees

-! we’ve considered how to find the minimum number of changes for a given tree topology -! need some search procedure for exploring the space of tree topologies

Heuristic method:

nearest neighbor interchange

A C
B D
A B
C D
A B
D C

-! for any internal edge in a tree, there are 3 ways the four subtrees can be grouped -! nearest neighbor interchanges move from one grouping to another

Heuristic method: hill-climbing with

nearest neighbor interchange

given: set of leaves L! create an initial tree t incorporating all leaves in L! best-score = parsimony algorithm applied to t! repeat for each internal edge e in t! for each nearest neighbor interchange t ’! tree with interchange applied to edge e in t! score = parsimony algorithm applied to t ’! if score < best-score best-score = score best-tree = t ’! t = best-tree until stopping criteria met

Branch and bound (alternate version)

given: set of leaves use heuristic method to grow initial tree ' initialize with a partial tree with 3 leaves from repeat tree in with lowest lower bound if has incorporated all l L t Q L t Q t ! eaves in return else create new trees by adding next leaf from to each branch of for each new tree if lower-bound( ) < score( ') L t L t n n t put n in Q sorted by lower bound

Rooted or unrooted trees for parsimony?

-! we described parsimony calculations in terms of rooted trees -! but we described the search procedures in terms of unrooted trees -! unweighted parsimony : minimum cost is independent of where root is located -! weighted parsimony : minimum cost is independent of root if substitution cost is a metric (refer back to definition of metric from distance-based methods)

Comments on branch and bound

-! it is a complete search method -! guaranteed to find optimal solution -! may be much more efficient than exhaustive search -! in the worst case, it is no better -! efficiency depends -! the tightness of the lower bound -! the quality of the initial tree

Comments on tree inference

-! search space may be large, but -! can find the optimal tree efficiently in some cases -! heuristic methods can be applied -! difficult to evaluate inferred phylogenies: ground truth not usually known -! can look at agreement across different sources of evidence -! can look at repeatability across subsamples of the data -! can look at indirect predictions, e.g. conservation of sites in proteins -! some newer methods use data based on linear order of orthologous genes along chromosome -! phylogenies for bacteria, viruses not so straightforward because of lateral transfer of genetic material; “local” phylogenies might be more appropriate

D P T H t 5 t 4 t 3 t 2 t 1

-! actually we can do this without assuming particular amino-acid assignments at the internal nodes sum over all possibilities -! as before calculate the rate r that maximizes this expression

Identifying functional regions in proteins

P ( data | r ) =

" X # PX , D ( r $ t 1 )

PY , P ( r $ t 2 ) # PY , H ( r $ t 3 )

PX , T ( r $ t 4 ) # PX , Y ( r $ t 5 )

X , Y^ *

Identifying functional region in proteins

Rates estimated using 233 sequences Rates estimated using 34 sequences MP-ConSurf method Rate4Site method

Identifying functional region in proteins