







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Parsimony methods in evolutionary biology as cost minimization procedures, focusing on fitch parsimony. It discusses the fitch downpass and uppass algorithms for rooted binary trees, and mentions other variants like camin-sokal and dollo parsimony. The text also touches upon multiple ancestral state reconstructions and the use of the wagner parsimony model for quantitative characters.
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Reference: Felsenstein Chapters 1 and 2 (pp. 1-18), Chapter 6 (pp. 67-72), Chapter 7 (pp. 73-86)
Parsimony methods are perhaps the simplest methods used for inferring evolutionary trees, so we will start with them. Parsimony means ”extreme or excessive economy or frugality”. In the context of evolutionary inference, the idea is to minimize the amount of evolutionary change. Another way of understanding parsimony methods is that they are cost minimization procedures, where the cost is a measure of the amount of evolutionary change. In the typical case, we are dealing with discrete characters, and the natural measure of evolutionary change is simply the number of changes between states. However, there are many other possibilities as well, as we will discover.
Throughout this lecture, we will be assuming that the tree is fixed. We will then use parsimony methods to count the number of changes and to infer the ancestral states in the tree. In principle, the step is small from this to comparing and selecting among trees according to the amount of evolutionary change they imply. However, several practical issues arise when we start searching for trees so we will defer treatment of this topic to the next lecture.
To start with, let us assume that we are interested in the evolution of a DNA character, which has four discrete states: A, C, G, and T. In the simplest possible model, we assume that all changes between all states are allowed and count equally (Fig. 1). According to Felsenstein (2004), this model was initially proposed by Kluge and Farris (1969) and named Wagner parsimony by them
but it is more widely known as Fitch Parsimony because the first algorithms implementing the model were published by Walter Fitch (1970, 1971). The naming issue is utterly confusing because there is another parsimony model that we will examine shortly that is widely known as Wagner parsimony although Felsenstein calls it “parsimony on an ordinal scale”.
Figure 1: Fitch parsimony model for DNA characters. All state changes are possible and count equally.
In simple cases, when there are few state changes in a small tree, it is straightforward to find the number of required changes and the most parsimonious states of the interior nodes in the tree (and hence the position of the changes) (Fig. 2a). When there are more changes, it is quite common for there to be ambiguity both concerning the interior node states and the position of the changes, even though we can still find the minimum number of changes relatively easily in many cases (Fig. 2b).
For more complicated trees and larger problems, it is necessary to use an algorithm. The basic idea is to pass through the tree from the tips to the root (a downpass or postorder traversal). At each interior node, we count the minimum number of steps required by the subtree rooted at that node, and we record the ancestral states giving us that number of steps. When we have reached the root node, we know the minimum number of steps for the entire tree. Since it is a recursive algorithm, we only need to specify it for one node p and its two children q and r. Assume that Si is the set of observed or inferred minimal states at node i. For instance, if p is a terminal and we have observed an A for it, we have Sp = {A}. Similarly, if there were ambiguity about the states for a terminal or interior node p, such that the state could be either A or G, then we have Sp = {A, G}. Now we can calculate the length (or cost) l of a tree using the Fitch downpass algorithm by finding sections and unions of state sets (Algorithm 1).
Algorithm 1 Fitch downpass algorithm Sp ← Sq ∩ Sr if Sp = ∅ then Sp ← Sq ∪ Sr l ← l + 1 end if
Assume that we have the final state set Fa of node a, which is the immediate ancestor of node p. We also have the downpass sets of p and its two children q and r; they are labeled Sp, Sq and Sr, respectively, as in the description of the downpass algorithm. Now Fitch’s uppass (also known as the final pass) algorithm is based on combining information from these state sets (Algorithm 2). The algorithm is started by noting that, for the root, the final state set is the same as the downpass state set.
Algorithm 2 Fitch uppass algorithm Fp ← Sp ∩ Fa if Fp 6 = Fa then if Sq ∩ Sr 6 = ∅ then Fp ← ((Sq ∪ Sr) ∩ Fa) ∪ Sp else Fp ← Sp ∪ Fa end if end if
Fitch’s uppass algorithm is much less intuitive than the downpass algorithm but we will nevertheless make an attempt to explain why it works. In the first step, you test whether the downpass state set of p includes all of the states in the final set of a. If it does, then it means that each optimal assignment of final state to a can be combined with the same state at p to give zero changes on the branch between a and p and the minimal number of changes in the subtree rooted at p. Thus, the final set of p must be the same as the set at a because any other state would add steps to the tree length (including any states that might be present in the downpass set of p but not in the final set of a).
If the final set of a includes states that are not present in the downpass set of p, then there must be optimal solutions that have one state at a and a different state at p. This means that there is a potential change on the branch between a and p and we need to see if that change can be pushed onto one of the descendant branches of p without increasing the total tree length. If so, we may have to add states from the final set of a to the downpass set of p. The second part of the algorithm
Figure 3: An example of state sets calculated during a Fitch downpass.
(after the first if statement) takes care of this potential addition of states. If the downpass set of p was formed by taking an intersection, then we need to add the states that are in the final set of a and in the downpass set of either one of the two descendants of p because each of these additional state assignments to p will push a change on the branch between p and a to one of the descendant branches of p without increasing total tree length. If the downpass set of p was formed by a union, then we simply add the states in the final set of a to those in the downpass set of p.
Felsenstein describes an alternative uppass algorithm briefly in Chapter 6 (p. 68) but that algorithm is erroneous as far as we can tell (please let us know if you think we are wrong). Felsenstein’s algorithm will work on the example given in his Fig. 2.1 (Fig. 3) but it will fail on other trees. For instance, you can test it on the following simple tree (terminals labeled with the observed states): ((({A},{A}),{T}),{T}). The final interior state sets are obviously {T}, {T} and {A} but Felsenstein’s algorithm will give you {T}, {T} and {AT}, regardless of whether you understand L as the final state set or the downpass state set for the ancestral node. Admittedly, it is easy to underestimate the complexity of the Fitch uppass!
would simply group the measurements into n ordered groups that would comprise the states of the Wagner parsimony character. In the latter case, we divide the range between the minimum and maximum value into n equal-length intervals to give us a Wagner parsimony character with n states. In the extreme case, we simply use the raw measurements up to some fixed precision.
Felsenstein describes the Wagner parsimony algorithms using raw measurements, but let us use a slightly modified set description instead. Assume that the state set of a node p is a set of continuous elements S = {x, x + 1, x + 2, ..., y} where min(S) = x and max(S) = y (we can also call this set an interval). Now define the operation Si u Sj as producing the set of continuous elements from {max(min(Si), min(Sj ))} to {min(max(Si), max(Sj ))}. If Si and Sj overlap, then this operation simply produces their intersection, but if they do not overlap, the result is a minimum spanning interval connecting the two sets. For instance, { 2 , 3 , 4 } u { 6 , 7 , 8 } = { 4 , 5 , 6 }. As usual, the Wagner downpass algorithm finds the downpass interval for a node p from the downpass intervals of its two daughters q and r, and it adds to a global tree length variable l when we need to connect two non-overlapping state sets (Algorithm 3).
Algorithm 3 Wagner downpass algorithm Sp ← Sq ∩ Sr if Sp = ∅ then Sp ← Sq u Sr l ← l + (|Sp| − 1) end if
The similarities to the Fitch downpass algorithm are obvious. The uppass algorithm is also similar to its Fitch parsimony analog, but it is a little less complex because of the additive nature of the state transition costs. Before describing it, let us define the operation Si t Sj as producing the set of continuous elements from {min(min(Si), min(Sj ))} to {max(max(Si), max(Sj ))}. If the two intervals overlap, the result is simply their union, but if they are disjoint then the operation will produce an interval including all the values from the smallest to the largest. For example, { 3 , 4 } t { 6 , 7 } = { 3 , 4 , 5 , 6 , 7 }. With additional notation as above, the Wagner uppass algorithm is now easy to formulate (Algorithm 4). The algorithm is slightly simpler than the Fitch uppass but can be justified (proven to be correct) with argumentation similar to that used above for the latter.
Algorithm 4 Wagner uppass algorithm Fp ← Sp ∩ Fa if Fp 6 = Fa then Fp ← ((Sq t Sr) ∩ Fa) ∪ Sp end if
Many other variants of parsimony that count changes slightly differently have been proposed. Some examples include Camin-Sokal parsimony, where changes are assumed to be unidirectional, and Dollo parsimony, where changes in one direction are assumed to occur only once in a tree. We refer to Felsenstein (Chapter 7) for a detailed description of these methods and others. In practice, these alternative parsimony approaches are rarely used.
All parsimony methods are special cases of a technique called Sankoff optimization (sometimes referred to as generalized parsimony or step matrix optimization). Sankoff optimization is based on a cost matrix C = {cij }, the elements of which define the cost cij of moving from a state i to a state j along any branch in the tree. The cost matrix is used to find the minimum cost of a tree and the set of optimal states at the interior nodes of the tree. Felsenstein gives an example of a cost matrix and the Sankoff downpass algorithm in his Figure 2.2 and the accompanying discussion (pp. 13-16). The uppass algorithm is presented in Chapter 6 (pp. 68-69 and Figs. 6.1 - 6.2).
A slightly different presentation of the Sankoff algorithm is given here. Assume that we are interested in a character with k states. For the downpass, assign to each node p a set Gp = {g( 1 p ), g 2 (p ), ..., g k(p )} containing the (minimum) downpass cost g( i p)of assigning state i to node p. We will use another similar set Hp, which will give the (minimum) cost of assigning each state i to the ancestral end of the branch having p as its descendant. The elements of Hp will be derived from Gp and C, as you might have guessed. Before we start the algorithm, we go through all terminal nodes and set g i( p)= 0 if the state i has been observed at tip p and g( i p)= ∞ otherwise. As usual, the downpass algorithm is formulated for a node p and its two descendant nodes q and r (Algorithm ??). At the root node ρ, we easily obtain the tree length l as l = mini(g( i ρ)), the minimum cost for any state assignment. A worked example appears in Figure ??.
The uppass is relatively straightforward. First, we find the state or states of the root node ρ associated with the minimum cost in the Gρ set. Let Fρ be the set of indices of these states. Now we can find the final state set of a node p from its downpass cost set Gp, the cost matrix C and the final state set Fa of its ancestor a easily (Algorithm ??).
In some cases, we may be interested not only in the optimal states at each interior node but also the cost of the suboptimal state assignments. To determine these, we need a slightly more complicated
internal nodes to result in a large number of equally parsimonious reconstructions of ancestral states for the entire tree. Felsenstein describes several methods for systematically resolving this ambiguity. Perhaps the most popular among these are the accelerated transformation (ACCTRAN) and delayed transformation (DELTRAN) methods. Both methods will select one state for each interior node among the most parsimonious assignments, resulting in a unique reconstruction of ancestral states. The selection is done using rules that differ between ACCTRAN and DELTRAN; in the former we are trying to press state changes towards the root of the tree, in the other we press the state changes towards the tips. In many cases, it is more appropriate to enumerate all possibilities or to randomly select among them than to systematically select a particular type of reconstruction. Felsenstein (Chapter 6) briefly describes how this can be accomplished and we refer to this chapter and the references cited there for more details.
If we want to calculate the overall length (cost) of a tree with m taxa, n characters, and k states, it is relatively easy to see that the Fitch and Wagner algorithms are of complexity O(mnk) and the Sankoff algorithm is of complexity O(mnk^2 ). If there is a small number of states, we can save a considerable amount of time in the Fitch and Wagner algorithms by packing several characters into the smallest unit handled by the processor, typically a 32-bit or 64-bit unit, and use binary operators to handle several characters simultaneously. There is also a number of shortcuts that can help us to quickly calculate the length of a tree if we know the length and, in particular, the final state sets for some other tree which is similar to the tree we are interested in. Felsenstein describes several of these techniques briefly but we find that the discussion is difficult to follow and it appears to be partly incorrect, so we recommend that you go to the original literature if you are interested in them. We will briefly touch on some of these computational shortcuts when we describe searches for most parsimonious trees later during the course.
Figure 6: A worked example of the Sankoff uppass algorithm. At the root node, the downpass has given us the final cost of all states. We can now work our way up the tree, one branch at a time, by subtracting the downpass costs at the bottom end of the branch from the final costs of the ancestor (a). This gives us the cost of each state at the bottom end of the branch given the subtree below the branch (b). The downpass costs of the descendant node similarly gives us the cost of each state in the tree above the branch. We then take each of the states of the descendant node and find the minimum combination of transformation cost plus downpass cost at descendant node plus remainder cost at the ancestor node (c). This results in the final costs for the descendant node (d).