Understanding Parsimony Methods in Evolutionary Biology: Fitch Algorithms and Beyond, Study notes of Biology

Parsimony methods in evolutionary biology as cost minimization procedures, focusing on fitch parsimony. It discusses the fitch downpass and uppass algorithms for rooted binary trees, and mentions other variants like camin-sokal and dollo parsimony. The text also touches upon multiple ancestral state reconstructions and the use of the wagner parsimony model for quantitative characters.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-9nm
koofers-user-9nm 🇺🇸

5

(1)

10 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Parsimony: Counting Changes
Fredrik Ronquist
August 31, 2005
1 Counting Evolutionary Changes
Reference: Felsenstein Chapters 1 and 2 (pp. 1-18), Chapter 6 (pp. 67-72), Chapter 7 (pp. 73-86)
Parsimony methods are perhaps the simplest methods used for inferring evolutionary trees, so we
will start with them. Parsimony means ”extreme or excessive economy or frugality”. In the context
of evolutionary inference, the idea is to minimize the amount of evolutionary change. Another way
of understanding parsimony methods is that they are cost minimization procedures, where the
cost is a measure of the amount of evolutionary change. In the typical case, we are dealing with
discrete characters, and the natural measure of evolutionary change is simply the number of changes
between states. However, there are many other possibilities as well, as we will discover.
Throughout this lecture, we will be assuming that the tree is fixed. We will then use parsimony
methods to count the number of changes and to infer the ancestral states in the tree. In principle,
the step is small from this to comparing and selecting among trees according to the amount of
evolutionary change they imply. However, several practical issues arise when we start searching for
trees so we will defer treatment of this topic to the next lecture.
1.1 Fitch Parsimony
To start with, let us assume that we are interested in the evolution of a DNA character, which has
four discrete states: A, C, G, and T. In the simplest possible model, we assume that all changes
between all states are allowed and count equally (Fig. 1). According to Felsenstein (2004), this
model was initially proposed by Kluge and Farris (1969) and named Wagner parsimony by them
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Understanding Parsimony Methods in Evolutionary Biology: Fitch Algorithms and Beyond and more Study notes Biology in PDF only on Docsity!

Parsimony: Counting Changes

Fredrik Ronquist

August 31, 2005

1 Counting Evolutionary Changes

Reference: Felsenstein Chapters 1 and 2 (pp. 1-18), Chapter 6 (pp. 67-72), Chapter 7 (pp. 73-86)

Parsimony methods are perhaps the simplest methods used for inferring evolutionary trees, so we will start with them. Parsimony means ”extreme or excessive economy or frugality”. In the context of evolutionary inference, the idea is to minimize the amount of evolutionary change. Another way of understanding parsimony methods is that they are cost minimization procedures, where the cost is a measure of the amount of evolutionary change. In the typical case, we are dealing with discrete characters, and the natural measure of evolutionary change is simply the number of changes between states. However, there are many other possibilities as well, as we will discover.

Throughout this lecture, we will be assuming that the tree is fixed. We will then use parsimony methods to count the number of changes and to infer the ancestral states in the tree. In principle, the step is small from this to comparing and selecting among trees according to the amount of evolutionary change they imply. However, several practical issues arise when we start searching for trees so we will defer treatment of this topic to the next lecture.

1.1 Fitch Parsimony

To start with, let us assume that we are interested in the evolution of a DNA character, which has four discrete states: A, C, G, and T. In the simplest possible model, we assume that all changes between all states are allowed and count equally (Fig. 1). According to Felsenstein (2004), this model was initially proposed by Kluge and Farris (1969) and named Wagner parsimony by them

but it is more widely known as Fitch Parsimony because the first algorithms implementing the model were published by Walter Fitch (1970, 1971). The naming issue is utterly confusing because there is another parsimony model that we will examine shortly that is widely known as Wagner parsimony although Felsenstein calls it “parsimony on an ordinal scale”.

Figure 1: Fitch parsimony model for DNA characters. All state changes are possible and count equally.

In simple cases, when there are few state changes in a small tree, it is straightforward to find the number of required changes and the most parsimonious states of the interior nodes in the tree (and hence the position of the changes) (Fig. 2a). When there are more changes, it is quite common for there to be ambiguity both concerning the interior node states and the position of the changes, even though we can still find the minimum number of changes relatively easily in many cases (Fig. 2b).

For more complicated trees and larger problems, it is necessary to use an algorithm. The basic idea is to pass through the tree from the tips to the root (a downpass or postorder traversal). At each interior node, we count the minimum number of steps required by the subtree rooted at that node, and we record the ancestral states giving us that number of steps. When we have reached the root node, we know the minimum number of steps for the entire tree. Since it is a recursive algorithm, we only need to specify it for one node p and its two children q and r. Assume that Si is the set of observed or inferred minimal states at node i. For instance, if p is a terminal and we have observed an A for it, we have Sp = {A}. Similarly, if there were ambiguity about the states for a terminal or interior node p, such that the state could be either A or G, then we have Sp = {A, G}. Now we can calculate the length (or cost) l of a tree using the Fitch downpass algorithm by finding sections and unions of state sets (Algorithm 1).

Algorithm 1 Fitch downpass algorithm Sp ← Sq ∩ Sr if Sp = ∅ then Sp ← Sq ∪ Sr l ← l + 1 end if

Assume that we have the final state set Fa of node a, which is the immediate ancestor of node p. We also have the downpass sets of p and its two children q and r; they are labeled Sp, Sq and Sr, respectively, as in the description of the downpass algorithm. Now Fitch’s uppass (also known as the final pass) algorithm is based on combining information from these state sets (Algorithm 2). The algorithm is started by noting that, for the root, the final state set is the same as the downpass state set.

Algorithm 2 Fitch uppass algorithm Fp ← Sp ∩ Fa if Fp 6 = Fa then if Sq ∩ Sr 6 = ∅ then Fp ← ((Sq ∪ Sr) ∩ Fa) ∪ Sp else Fp ← Sp ∪ Fa end if end if

Fitch’s uppass algorithm is much less intuitive than the downpass algorithm but we will nevertheless make an attempt to explain why it works. In the first step, you test whether the downpass state set of p includes all of the states in the final set of a. If it does, then it means that each optimal assignment of final state to a can be combined with the same state at p to give zero changes on the branch between a and p and the minimal number of changes in the subtree rooted at p. Thus, the final set of p must be the same as the set at a because any other state would add steps to the tree length (including any states that might be present in the downpass set of p but not in the final set of a).

If the final set of a includes states that are not present in the downpass set of p, then there must be optimal solutions that have one state at a and a different state at p. This means that there is a potential change on the branch between a and p and we need to see if that change can be pushed onto one of the descendant branches of p without increasing the total tree length. If so, we may have to add states from the final set of a to the downpass set of p. The second part of the algorithm

Figure 3: An example of state sets calculated during a Fitch downpass.

(after the first if statement) takes care of this potential addition of states. If the downpass set of p was formed by taking an intersection, then we need to add the states that are in the final set of a and in the downpass set of either one of the two descendants of p because each of these additional state assignments to p will push a change on the branch between p and a to one of the descendant branches of p without increasing total tree length. If the downpass set of p was formed by a union, then we simply add the states in the final set of a to those in the downpass set of p.

Felsenstein describes an alternative uppass algorithm briefly in Chapter 6 (p. 68) but that algorithm is erroneous as far as we can tell (please let us know if you think we are wrong). Felsenstein’s algorithm will work on the example given in his Fig. 2.1 (Fig. 3) but it will fail on other trees. For instance, you can test it on the following simple tree (terminals labeled with the observed states): ((({A},{A}),{T}),{T}). The final interior state sets are obviously {T}, {T} and {A} but Felsenstein’s algorithm will give you {T}, {T} and {AT}, regardless of whether you understand L as the final state set or the downpass state set for the ancestral node. Admittedly, it is easy to underestimate the complexity of the Fitch uppass!

would simply group the measurements into n ordered groups that would comprise the states of the Wagner parsimony character. In the latter case, we divide the range between the minimum and maximum value into n equal-length intervals to give us a Wagner parsimony character with n states. In the extreme case, we simply use the raw measurements up to some fixed precision.

Felsenstein describes the Wagner parsimony algorithms using raw measurements, but let us use a slightly modified set description instead. Assume that the state set of a node p is a set of continuous elements S = {x, x + 1, x + 2, ..., y} where min(S) = x and max(S) = y (we can also call this set an interval). Now define the operation Si u Sj as producing the set of continuous elements from {max(min(Si), min(Sj ))} to {min(max(Si), max(Sj ))}. If Si and Sj overlap, then this operation simply produces their intersection, but if they do not overlap, the result is a minimum spanning interval connecting the two sets. For instance, { 2 , 3 , 4 } u { 6 , 7 , 8 } = { 4 , 5 , 6 }. As usual, the Wagner downpass algorithm finds the downpass interval for a node p from the downpass intervals of its two daughters q and r, and it adds to a global tree length variable l when we need to connect two non-overlapping state sets (Algorithm 3).

Algorithm 3 Wagner downpass algorithm Sp ← Sq ∩ Sr if Sp = ∅ then Sp ← Sq u Sr l ← l + (|Sp| − 1) end if

The similarities to the Fitch downpass algorithm are obvious. The uppass algorithm is also similar to its Fitch parsimony analog, but it is a little less complex because of the additive nature of the state transition costs. Before describing it, let us define the operation Si t Sj as producing the set of continuous elements from {min(min(Si), min(Sj ))} to {max(max(Si), max(Sj ))}. If the two intervals overlap, the result is simply their union, but if they are disjoint then the operation will produce an interval including all the values from the smallest to the largest. For example, { 3 , 4 } t { 6 , 7 } = { 3 , 4 , 5 , 6 , 7 }. With additional notation as above, the Wagner uppass algorithm is now easy to formulate (Algorithm 4). The algorithm is slightly simpler than the Fitch uppass but can be justified (proven to be correct) with argumentation similar to that used above for the latter.

Algorithm 4 Wagner uppass algorithm Fp ← Sp ∩ Fa if Fp 6 = Fa then Fp ← ((Sq t Sr) ∩ Fa) ∪ Sp end if

1.3 Other variants of parsimony

Many other variants of parsimony that count changes slightly differently have been proposed. Some examples include Camin-Sokal parsimony, where changes are assumed to be unidirectional, and Dollo parsimony, where changes in one direction are assumed to occur only once in a tree. We refer to Felsenstein (Chapter 7) for a detailed description of these methods and others. In practice, these alternative parsimony approaches are rarely used.

1.4 Sankoff Optimization

All parsimony methods are special cases of a technique called Sankoff optimization (sometimes referred to as generalized parsimony or step matrix optimization). Sankoff optimization is based on a cost matrix C = {cij }, the elements of which define the cost cij of moving from a state i to a state j along any branch in the tree. The cost matrix is used to find the minimum cost of a tree and the set of optimal states at the interior nodes of the tree. Felsenstein gives an example of a cost matrix and the Sankoff downpass algorithm in his Figure 2.2 and the accompanying discussion (pp. 13-16). The uppass algorithm is presented in Chapter 6 (pp. 68-69 and Figs. 6.1 - 6.2).

A slightly different presentation of the Sankoff algorithm is given here. Assume that we are interested in a character with k states. For the downpass, assign to each node p a set Gp = {g( 1 p ), g 2 (p ), ..., g k(p )} containing the (minimum) downpass cost g( i p)of assigning state i to node p. We will use another similar set Hp, which will give the (minimum) cost of assigning each state i to the ancestral end of the branch having p as its descendant. The elements of Hp will be derived from Gp and C, as you might have guessed. Before we start the algorithm, we go through all terminal nodes and set g i( p)= 0 if the state i has been observed at tip p and g( i p)= ∞ otherwise. As usual, the downpass algorithm is formulated for a node p and its two descendant nodes q and r (Algorithm ??). At the root node ρ, we easily obtain the tree length l as l = mini(g( i ρ)), the minimum cost for any state assignment. A worked example appears in Figure ??.

The uppass is relatively straightforward. First, we find the state or states of the root node ρ associated with the minimum cost in the Gρ set. Let Fρ be the set of indices of these states. Now we can find the final state set of a node p from its downpass cost set Gp, the cost matrix C and the final state set Fa of its ancestor a easily (Algorithm ??).

In some cases, we may be interested not only in the optimal states at each interior node but also the cost of the suboptimal state assignments. To determine these, we need a slightly more complicated

internal nodes to result in a large number of equally parsimonious reconstructions of ancestral states for the entire tree. Felsenstein describes several methods for systematically resolving this ambiguity. Perhaps the most popular among these are the accelerated transformation (ACCTRAN) and delayed transformation (DELTRAN) methods. Both methods will select one state for each interior node among the most parsimonious assignments, resulting in a unique reconstruction of ancestral states. The selection is done using rules that differ between ACCTRAN and DELTRAN; in the former we are trying to press state changes towards the root of the tree, in the other we press the state changes towards the tips. In many cases, it is more appropriate to enumerate all possibilities or to randomly select among them than to systematically select a particular type of reconstruction. Felsenstein (Chapter 6) briefly describes how this can be accomplished and we refer to this chapter and the references cited there for more details.

1.6 Time Complexity

If we want to calculate the overall length (cost) of a tree with m taxa, n characters, and k states, it is relatively easy to see that the Fitch and Wagner algorithms are of complexity O(mnk) and the Sankoff algorithm is of complexity O(mnk^2 ). If there is a small number of states, we can save a considerable amount of time in the Fitch and Wagner algorithms by packing several characters into the smallest unit handled by the processor, typically a 32-bit or 64-bit unit, and use binary operators to handle several characters simultaneously. There is also a number of shortcuts that can help us to quickly calculate the length of a tree if we know the length and, in particular, the final state sets for some other tree which is similar to the tree we are interested in. Felsenstein describes several of these techniques briefly but we find that the discussion is difficult to follow and it appears to be partly incorrect, so we recommend that you go to the original literature if you are interested in them. We will briefly touch on some of these computational shortcuts when we describe searches for most parsimonious trees later during the course.

1.7 Study Questions

  1. What is the difference between Fitch and Wagner parsimony?
  2. When Wagner parsimony is used for continuous characters on an interval scale (or raw mea- surements), what exactly is the amount of change that we are minimizing?
  3. Find the Fitch uppass and downpass algorithms for an unrooted binary tree. Tip: formulate a variant of the algorithm for dealing with a basal trichotomy and use it when deriving the states for an interior node selected to be the ‘root’ of the unrooted tree.
  1. Find the Fitch uppass and downpass algorithms for a polytomous tree. Tip: Use the same variants of the algorithms from the previous problem.
  2. Express Fitch parsimony for a four-state character using a Sankoff cost matrix
  3. Express Wagner parsimony for a four-state character using a Sankoff cost matrix
  4. Is there a natural measure of branch length under Fitch parsimony? How would you calculate that branch length? Would the sum of all the branch lengths be equal to the tree length?
  5. Can a Sankoff cost matrix have non-zero diagonal entries? Can it be non-metric (violate the triangle-inequality)?

Figure 6: A worked example of the Sankoff uppass algorithm. At the root node, the downpass has given us the final cost of all states. We can now work our way up the tree, one branch at a time, by subtracting the downpass costs at the bottom end of the branch from the final costs of the ancestor (a). This gives us the cost of each state at the bottom end of the branch given the subtree below the branch (b). The downpass costs of the descendant node similarly gives us the cost of each state in the tree above the branch. We then take each of the states of the descendant node and find the minimum combination of transformation cost plus downpass cost at descendant node plus remainder cost at the ancestor node (c). This results in the final costs for the descendant node (d).