Download Phylogenetics - Bioinformatics - Lecture Slides and more Slides Bioinformatics in PDF only on Docsity!
Phylogenetics - Distance-
Based Methods
Phylogenetics
- Attempts to infer the evolutionary history of a
group of organisms or sequences of nucleic acids
or proteins
- Phylogenetic methods can be used for the study of evolutionary relationships between species of organisms as well as genes
- Attempt to reconstruct evolutionary ancestors
- Estimate time of divergence from ancestor
Phylogenetic Trees
Phylogenetic Tree for Close Human Relatives
Humans
Orangutans Chimpanzees Gorillas
Common Ancestor of Gorillas Chimps
Comon Ancestor Gorillas, Chimps, Orangs
Common Ancestor of Humans and Apes
History
- Taxonomists used anatomy and physiology to
group and classify organisms
- Morphological features like presence of feathers or number of legs
- When protein sequencing, and later DNA
sequencing became common, amino acid and
DNA sequences became the common way to
contruct trees
The Big Picture
- Determine the species or genes to be studied
- Acquire homologous sequence data
- Use multiple sequence alignment software like ClustalW to align
- Clean up data by hand
- Use phylogenetic analysis software like Phylip based on techniques we will study
- Verify experimentally
Phylogenetics
- Can be used to solve a number of interesting
problems
- Forensics
- HIV virus mutates rapidly
- Predicting evolution of influenza viruses
- Predicting functions of uncharacterized genes - ortholog detection
- Drug discovery
- Vaccine development
- Target inferred common ancestor
Phylogenetic Trees
- Trees are composed of nodes and branches
- Terminal or leaf nodes correspond to a gene or organism for which data has been collected
- Internal nodes usually represent an inferred common ancestor that gave rise to two independent lineages sometime in the past
Rooted and Unrooted Trees
- Some trees make an inference about a
common ancestor and the direction of
evolution and some don’t
- First type is called a rooted tree and has a single node designated as root which is the common ancestor
- Second type is called an unrooted tree
- Specifies only relationship between nodes and says nothing about direction of evolution
Rooted and Unrooted Trees
- Roots can usually be assigned to unrooted
trees using an outgroup
- Species unambiguously separated the earliest from others being studied
- E.g. baboons in case of humans and gorillas
- For three species there are 3 possible rooted trees, but only one possible unrooted tree
Rooted and Unrooted Trees
- In fact the numbers of rooted ( NR ) and unrooted trees ( NU ) for n species is - NR = (2n - 3)!/2n-2(n - 2)! - N (^) U = (2n - 5)!/2n-3(n - 3)!
Data Sets Rooted Trees Unrooted Trees 2 1 1 3 3 1 4 15 3 5 105 15 10 34,459,425 2,027, 15 213,458,046,767,875 7,905,853,580, 20 8,200,794,532,637,891,559,375 221,643,095,476,699,771,
Rooting a Tree
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.
More Tree Terminology
- Structure of a phylogenetic tree can be represented in Newick format using nested parentheses - (((B, C), (D, E)), A)
- If we lack data to tell in which order two or more independent lineages occurred in the past, the tree may be multifurcating (more than two ancestors) otherwise, it is bifurcating (exactly two ancestors per interior node)
Character and Distance Data
- Distance-based methods must transform the
sequence data into a pairwise similarity matrix
for use during tree inference
Species A B C D B 2 - - - C 4 5 - - D 7 9 5 - E 3 5 7 8
Distance-Based Methods
- Given such an input matrix we want to find an
edge-weighted tree where the leafs of the
tree correspond to the species and the
distances measured between two leaves
corresponds to the corresponding matrix
value for the leaves