Bayesian Character Evolution: Accounting for Uncertainty | Study notes Biology

Phylogenetics Series

Bayesian inference of character

evolution

Fredrik Ronquist

Computational Science and Information Technology, Florida State University, Tallahassee, FL 32306–4120, USA

Much recent progress in evolutionary biology is based

on the inference of ancestral states and past transform-

ations in important traits on phylogenetic trees. These

exercises often assume that the tree is known without

error and that ancestral states and character change

can be mapped onto it exactly. In reality, there is often

considerable uncertainty about both the tree and the

character mapping. Recently introduced Bayesian stat-

istical methods enable the study of character evolution

while simultaneously accounting for both phylogenetic

and mapping uncertainty, adding much needed credi-

bility to the reconstruction of evolutionary history.

Evolution is a difficult phenomenon to study. It is rarely

fast enough to be observed directly and only in exceptional

cases is it possible to find physical evidence, such as fossils

or ancient DNA, of past states and events. Fortunately,

evolution leaves its footprint in the distribution of traits

among living things. By studying this footprint, we can

infer how organisms originated through the successive

splitting of ancestral lineages, a process depicted in

phylogenetic trees. Given a phylogenetic tree, we can

also reconstruct the evolutionary history of individual

traits of interest.

The wide range of questions that can be addressed by

the INFERENCE (see Glossary) of ancestral states or paths

of change in key traits on phylogenetic trees is fascinating.

A few examples include the design of vaccines [1], the

reconstruction of ancestral hormone receptors [2] and

ancestral metabolic pathways [3], the inference of ancient

behaviours [4], the identification of past dispersal patterns

[5–7], the study of positive selection in proteins [8], the

discovery of viral infection pathways [9], and the recog-

nition of character correlation in coevolving lineages [10].

Many of these applications still rely on explicit or

implicit PARSIMONY mapping of characters onto a single

phylogenetic tree. The parsimony method finds the

reconstruction that implies the smallest number of

changes on the given tree; the solution is often intuitively

obvious (Figure IainBox 1). The inferred ancestral states

and character changes using parsimony typically reveal

the process of evolution in exhilarating detail.

It has long been recognized that this approach ignores

two important sources of error. First, the parsimony

principle singles out the solution(s) requiring the

minimum amount of change on the given tree, although

there is usually a range of alternative reconstructions on

the same tree that are almost as likely [11] (MAPPING

UNCERTAINTY;Box 1). Second, the tree is almost never

known without error [12] (PHYLOGENETIC UNCERTAINTY;

Box 2). If there is a range of plausible trees, it is possible

that the evolutionary history of a trait could differ

depending on the tree. Clearly, ignoring either of these

sources of error is potentially misleading.

Glossary

Bayesian inference: theory of statistical inference based on the idea of rational

accumulation of scientific knowledge. Statistical models and model parameters

are regarded as random variables, and statistical analysis uses data (obser-

vations) to update a prior probability distribution on these parameters to a

posterior probability distribution.

Bootstrapping (nonparametric): procedure for examining the uncertainty in a

statistical estimate by drawing new samples (pseudosamples) from the original

sample, and repeating the statistical procedure for each of these new samples.

There is also a parametric variant that generates new samples by using a

parametric model estimated from the original sample.

Conditional probability: the probability conditioned on (given) some infor-

mation; we can think of it as a relative probability. In Box 1, the conditional

(relative) probabi lity of ancestor Bbeing purple (state 0) is Pr( BZ0)Z

0.00024/0.00037Z0.65. Hence, the conditional probability it being green is

Pr(BZ1)Z1KPr(BZ0)Z1K0.65Z0.35. Box 1

Inference: to draw conclusions about a statistical model using empirical data.

Likelihood: probability that a particular model (with specific parameter values)

produced some observed data. For instance, the likelihood (probability) of the

data in Box 1 is LZ0.00037 given the binary Markov model with p

Z0.5 and

summing over ancestral states. If ancestor Bhas state 0 (purple), the likelihood

is L(BZ0)Z0.00024; if it has state 1 (green), the likelihood is L(BZ1)ZLK

L(BZ0)Z0.00037K0.00024Z0.00013. Box 1

Mapping uncertainty: the error associated with reconstructing the evolution of

a character on a given phylogenetic tree.

Markov chain Monte Carlo (MCMC): stochastic simulation technique for

generating a sample from a complex distribution that is known up to a

normalizing constant. It is widely used to sample Bayesian posterior

distributions, where it is based on specially designed Markov models (similar

but more complex than the ones used to model evolution; Box 3) and their

tendency to move towards a stationary condition. Box 3

Maximum likelihood (ML): widely used method of statistical inference that

finds the parameter values that maximize likelihood. For instance, when

Z0.5, the ML state of ancestor B(Figure Ib in Box 1) is 0 (purple) because

L(BZ0)OL(BZ1). More typically, ML is used to estimate the free parameters of

a probability model. For instance, if we vary p

, we discover that the likelihood

of the observed data is maximized when p

z0.20. This is the ML estimate of p

Figure I, Box 1

Parsimony: inference principle based on minimizing cost; in evolutionary

inference, usually the same as minimizing the number of character changes.

Phylogenetic uncertainty: the uncertainty in reconstructing character evolution

owing to error in the phylogenetic estimate.

Posterior (probability distribution): probability distribution describing the

knowledge about a model and its parameters after a Bayesian analysis. Can

be used as the prior in a subsequent Bayesian analysis.

Prior (probability distribution): probability distribution specifying the knowl-

edge about a model and its parameters before a Bayesian analysis.

Corresponding author: Fredrik Ronquist ([email protected]).

Available online 21 July 2004

Review TRENDS in Ecology and Evolution Vol.19 No.9 September 2004

Bayesian Character Evolution: Accounting for Uncertainty, Study notes of Biology

Related documents

Partial preview of the text

Download Bayesian Character Evolution: Accounting for Uncertainty and more Study notes Biology in PDF only on Docsity!

Phylogenetics Series

Bayesian inference of character

evolution

Fredrik Ronquist

Computational Science and Information Technology, Florida State University, Tallahassee, FL 32306–4120, USA

Box 1. Mapping uncertainty

Box 3. Markov models

Box 4. Generating a Bayesian sample of character change

histories

C

D

F E

B

A

G