Phylogenetics: Distance-Based Methods for Tree Building in Molecular Biology - Prof. Drena, Lab Reports of Bioinformatics

An overview of distance-based methods for constructing phylogenetic trees in molecular biology. The basics of phylogenetics, the procedure for building a tree, the choice of molecular markers, multiple sequence alignment, and automatic editing of alignments. Distance matrices, upgma, and neighbor joining methods are discussed in detail.

Typology: Lab Reports

Pre 2010

Uploaded on 09/02/2009

koofers-user-n9r-1
koofers-user-n9r-1 🇺🇸

8 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
#30 - Phylogenetics Distance-Based
Methods 11/02/07
BCB 444/544 Fall 07 Dobbs 1
1BCB 444/544 F07 ISU Terribilini #30-Phylogenetics - Distance-Based Methods 11/02/07
BCB 444/544
Lecture 30
Phylogenetics – Distance-Based
Methods
#30_Nov02
2BCB 444/544 F07 ISU Terribilini #30- Phylogenetics -Distance-Based Methods 11/02/07
Wed Oct 30 -Lecture 29
Phylogenetics Basics
Chp 10 - pp 127 - 141
Thurs Oct 31 -Lab 9
Gene & Regulatory Element Prediction
Fri Oct 30 -Lecture 30
Phylogenetic – Distance-Based Methods
Chp 11 - pp 142 – 169
Mon Nov 5 -Lecture 31
Phylogenetics – Parsimony and ML
Chp 11 - pp 142 - 169
Required Reading
(
before
lecture)
3BCB 444/544 F07 ISU Terribilini #30- Phylogenetics -Distance-Based Methods 11/02/07
Assignments & Announcements
Mon Oct 29 -HW#5
HW#5 = Hands-on exercises with phylogenetics
and tree-building software
Due: Mon Nov 5 (
not
Fri Nov 1 as previously posted )
4BCB 444/544 F07 ISU Terribilini #30- Phylogenetics -Distance-Based Methods 11/02/07
BCB 544 "Team" Projects
Last week of classes will be devoted to Projects
Written reports due:
Mon Dec 3
(no class that day)
Oral presentations (20-30') will be:
Wed-Fri Dec 5,6,7
1 or 2 teams will present during each class period
¾See
Guidelines for Projects
posted online
5BCB 444/544 F07 ISU Terribilini #30- Phylogenetics -Distance-Based Methods 11/02/07
BCB 544 Only:
New Homework Assignment
544 Extra#2
Due: PART 1 - ASAP
PART 2 - meeting prior to 5 PM Fri Nov 2
Part 1 - Brief outline of Project, email to Drena & Michael
after response/approval, then:
Part 2 - More detailed outline of project
Read a few papers and summarize status of problem
Schedule meeting with Drena & Michael to discuss ideas
6BCB 444/544 F07 ISU Terribilini #30- Phylogenetics -Distance-Based Methods 11/02/07
Seminars this Week
BCB List of URLs for Seminars related to Bioinformatics:
http://www.bcb.iastate.edu/seminars/index.html
Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI
Bob Jernigan BBMB, ISU
Control of Protein Motions by Structure
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Phylogenetics: Distance-Based Methods for Tree Building in Molecular Biology - Prof. Drena and more Lab Reports Bioinformatics in PDF only on Docsity!

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 1

BCB 444/

Lecture 30

Phylogenetics – Distance-Based

Methods

#30_Nov

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 2

Wed Oct 30 - Lecture 29 Phylogenetics Basics

  • Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30 Phylogenetic – Distance-Based Methods
  • Chp 11 - pp 142 – 169 Mon Nov 5 - Lecture 31 Phylogenetics – Parsimony and ML
  • Chp 11 - pp 142 - 169

Required Reading (before lecture)

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 3

Assignments & Announcements

Mon Oct 29 - HW# HW#5 = Hands-on exercises with phylogenetics and tree-building software Due: Mon Nov 5 (not Fri Nov 1 as previously posted)

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 4

BCB 544 "Team" Projects

Last week of classes will be devoted to Projects

  • Written reports due:
    • Mon Dec 3 (no class that day)
  • Oral presentations (20-30') will be:
    • Wed-Fri Dec 5,6,
    • 1 or 2 teams will present during each class period

¾ SeeGuidelines for Projects posted online

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 5

BCB 544 Only: New Homework Assignment

544 Extra# Due: √PART 1 - ASAP PART 2 - meeting prior to 5 PM Fri Nov 2

Part 1 - Brief outline of Project, email to Drena & Michael after response/approval, then: Part 2 - More detailed outline of project Read a few papers and summarize status of problem Schedule meeting with Drena & Michael to discuss ideas

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 6

Seminars this Week

BCB List of URLs for Seminars related to Bioinformatics: http://www.bcb.iastate.edu/seminars/index.html

  • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI
    • Bob Jernigan BBMB, ISU
      • Control of Protein Motions by Structure

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 7

Chp 10 - Phylogenetics

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 10 Phylogenetics Basics

  • Evolution and Phylogenetics
  • Terminology
  • Gene Phylogeny vs. Species Phylogeny
  • Forms of Tree Representation
  • Why Finding a True Tree is Dificult
  • Procedure of Building a Phylogenetic Tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 8

Tree Building Procedure

  • Choose molecular markers
  • Perform MSA
  • Choose a model of evolution
  • Determine tree building method
  • Assess tree reliability

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 9

Choice of Molecular Markers

  • Very closely related organisms - nucleic acid sequence will show more differences
  • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA
  • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences
  • Very distantly related species - use highly conserved protein sequences

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 10

Multiple Sequence Alignment

  • Most critical step in tree building - cannot build correct tree without correct alignment
  • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one
  • Most alignments need manual editing
    • Make sure important functional residues align
    • Align secondary structure elements
    • Use full alignment or just parts

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 11

Automatic Editing of Alignments

  • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences
  • Gblocks – detect and eliminate poorly aligned positions and divergent regions

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 12

How do we measure divergence between sequences?

  • Simple measure – just count the

number of substitutions observed

between the sequences in the MSA

  • Problem – number of substitutions may

not represent the number of

evolutionary events that actually

occurred

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 19

What about differences in mutation rates between positions within a sequence?

  • One of our assumptions was that all positions in a sequence are evolving at the same rate
  • Bad assumption
    • Third position in a codon changes with higher frequency
    • In proteins, some amino acids can change and others cannot
  • This variation is called among-site rate heterogeneity
  • Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 20

Chp 11 – Phylogenetic Tree Construction Methods and Programs

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs

  • Distance-Based Methods
  • Character-Based Methods
  • Phylogenetic Tree Evaluation
  • Phylogenetic Programs

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 21

Tree Construction

  • Two main categories of tree building methods
  • Distance-based
    • Overall similarity between sequences
  • Character-based
    • Consider the entire MSA

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 22

Distance-Based Methods

  • Given a MSA and an evolutionary

model, calculate the distance between

all pairs of sequences

  • Construct distance matrix
  • Construct phylogenetic tree based on

the distance matrix

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 23

Distance Matrices

a 0 b 6 0 c 7 3 0 d 14 10 9 0 a b c d

a b

c

d

0 1 2 3 4 56 7 8

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 24

Distance-Based Methods

  • Two ways to construct a tree based on

a distance matrix

  • Clustering
  • Optimality

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 25

Clustering-Based Methods

  • E.g., UPGMA and Neighbor-Joining
  • A cluster is a set of taxa
  • Interspecies distances translate into intercluster distances
  • Clusters are repeatedly merged
  • “Closest” clusters merged first
  • Distances are recomputed after merging

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 26

UPGMA

  • UPGMA – Unweighted Pair Group Method Using Arithmetic Average
  • Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root ( ultrametric tree )
  • This assumption is usually wrong
  • So why use UPGMA?
    • Very fast

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 27

UPGMA Example

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 28

UPGMA Example

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 29

UPGMA Example

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 30

UPGMA Example

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 37

Neighbor Joining Example

D 0.60 0.70 0.

C 0.35 0.

B 0.

A B C

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 38

Neighbor Joining Example

  • Initialize tree into a star shape with

all taxa connected to the center

  • Step 1: Compute r-values and

transformed r-values for all taxa

  1. 675 2

  2. 35

4 2

'

  1. 4 0. 35 0. 6 1. 35

= = −

=

= + + = + + =

A A

A AB AC AD r r

r d d d

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 39

Neighbor Joining Example

  • Step 2: Compute converted distances
  1. 05

  2. 35 1. 55 2

1

  1. 4

2

1 '

= −

= − +

dAB = dABrA + rB

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 40

Neighbor Joining Example

D -1 -1 -1.

C -1 -

B -1.

A B C

  • Step 3: Fill out converted distance

matrix

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 41

Neighbor Joining Example

  • Step 4: Create a node by merging closest taxa
  • In this example, the distance between A and B is the same as the distance between C and D
  • We can pick either pair to start with
  • Let’s pick A and B and create a node called U

U

B

A

C D

A B

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 42

Neighbor Joining Example

  • Step 5: Compute branch lengths
  • Use the equation for computing the distance from a taxa to a node

[ ( )]

[ ( )]

  1. 15

2

  1. 4 0. 675 0. 775

2

' '

=

  • − =

  • AU =^ AB A B

d r r d

U

B

A

Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 43

Neighbor Joining Example

  • Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U
  • In UPGMA, we simply calculated the average

[ ( ) ( )]

[( )^ (^ )]

CU =^ AC UA BC UB

d d d d

d

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 44

Neighbor Joining Example

D 0.45 0.

C 0.

U C

Our reduced distance matrix:

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 45

Neighbor Joining Example

  • From here, we go back to step 1
  • Continue until all taxa have been decomposed from the star tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 46

Optimality-Based Methods

  • Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies
  • Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix
  • Two algorithms:
    • Fitch-Margoliash
    • Minimum evolution

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 47

Fitch-Margoliash

  • Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix
  • Basically, a least squares method
  • Dij = distance between i and j in matrix
  • dij = distance between i and j in tree
  • Objective: Find tree that minimizes

∑ ≤<≤

1 ij n

2

(Dij dij)

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods 11/02/07 48

Minimum Evolution

  • Similar to Fitch-Margoliash, but uses a different optimality criterion
  • Searches for a tree with the minimum total branch length
  • This is an indirect way of achieving the best fit of the branch lengths with the original data