Comparative Genomics - Lecture Slides | BIT 150, Study notes of Bioinformatics

Material Type: Notes; Class: Applied Bioinformatics; Subject: Biotechnology; University: University of California - Davis; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-yca-1
koofers-user-yca-1 🇺🇸

4.3

(3)

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Comparative genomics
Chapter 15
Features that are investigated include
1. Genome size variation
2. Base composition biases
3. Gene order
1. Colinearity
2. Identification of orthologs
3. Functional predictions
4. Whole genome alignments
1. Identify genes
2. Improve gene annotation
3. Identify regulatory regions
(phylogenetic footprinting)
5. Evolution of genomes
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Comparative Genomics - Lecture Slides | BIT 150 and more Study notes Bioinformatics in PDF only on Docsity!

Comparative genomics

Chapter 15

Features that are investigated include1. Genome size variation2. Base composition biases3. Gene order

1. Colinearity2. Identification of orthologs3. Functional predictions

4. Whole genome alignments

1. Identify genes2. Improve gene annotation3. Identify regulatory regions

(phylogenetic footprinting)

5.^

Evolution of genomes

Comparative genomics Genome Size

-^

There are huge differences in genome sizes among related organisms

-^

More complex organisms not necessarily have larger genomes.

-^

“C-value paradox”

-^

Repetitive elements responsible for large proportion of thedifferences

-^

Amount of DNA affect cell size, speed of cell cycle, and other traits

1

2

3

A. Rice^4

5

Inverted repeats

Symbols Non-LTR transposable elements^ MITEs

B. Barley

10 kb

LTR transposable elements

C. Wheat

TRIM

1

2

3

4a^

4b

1

2

3

4b

Genes

4a

DIFFERENCES IN SIZE ARE GENERATED MAINLY BYDIFFERENCES IN THE INTERGENIC REGIONS

Plant Physiology 2001. 125:1342-1353Functional & Integrative Genomics 2002. 2: 70-80Genetics 2002. 1389-

Comparisons of rice, barley, & wheat orthologous BAC sequences

AY013245 AY
AF

Comparative genomics Composition bias

Effect of repetitive elementsin GC content•Grass genes 51% GC,Arabidopsis genes 44% GC• The overall GC content inwheat 46% is 2% higher than Arabidopsis

or rice

•The GC content of barley andwheat is very similar to the GCcontent of the most abundantretroelements.

Homology search for the mouse genome•

Homology search of allgenes in the mousegenome

•^

27% in other metazoans

•^

29% in other eukaryotes

•^

6% in other chordates

•^

14% in other mammals

•^

Less than 1% rodentspecific

Eukaryote

Other

Rodentspecific

Chordate

Metazoan

Mammal

G 1
G 2
G 3

G 4b

G 4a

G 1 G 2 G 3 G 4 G 5

Ta^

Tb^

Tc^

Te^

Tf^

Te^

Tg

Th

Td

Ti

Dotplots to compare orthologous BACs: Barley

(horizontal)

vs.

rice

(vertical)

Barley BAC 635P2 (horizontal) vs. Rice BAC 36I05 (vertical)Dubcovsky et al. Plant Physiology. 125:1342-1353^ http://www.cgr.ki.se/cgr/groups/sonnhammer/Dotter.html

Comparative genomics Aligning Genomic sequences

What have we learned?

Decay of synteny in intergenic regions

Million years

1.0 0.8 0.6 0.4 0.2 0.

Wheat-barley

A-B-Dwheat

A-mA^ C ( t )

A B

C

D^

E

2 0 kb

m A

genome

T. monococcum

A genome durum

-^ Small groups of genes are dispersedwithin a sea of repetitive elements •^ Rapid divergence inintergenic regions: Comparison orthologousA and A

m^ VRN

regions

  • 1.1 MYA divergence• 31% similarity•

Fast decay of % similarity isalso affecting genes • Fast rate of gene deletionsand inactivation by retroinsertions. • Small phenotypic changes inpolyploid wheat. Valuablesource of diversity

Dubcovsky and Dvorak 2007,Science 2007 316: 1862-

HvsCh

HvsCh

Human vs. Chimp7 mya, 3% divergence

Dotplots to discover duplications in rice

  • A large genome duplication occurredapproximately 70 million years ago beforethe divergences of most grass subfamilies.•Diploid grasses are actually ancientpolyploids•Figures shows the arrangement ofduplicated protein-encoding genes in rice inthe order found in the current sequenceassembly. Both X and Y axes represent45,174 genes in their chromosomal order.• Colors indicate same (red) or opposite(green) transcriptional orientations.• Differential Gene Loss Contributes toapparent Incongruities in ComparativeGrass Genomics.•Paterson et al. 2004 PNAS• A similar result has been described forArabidopsis suggesting that this species isalso an ancient polyploid

Comparative genomics Aligning Genomic sequences

Comparative Genomics. Gene Annotation^ Barley Gene 1Rice Gene 1

Exon prediction:GeneMark:

Genescan:

BLASTN wheat-BLASTN wheat

-rice: 98%rice: 98% Barley Gene 1

R i c e G e n e 1

Pink

:^ Confirmed

predicted

gene

Underlined

:^ BLASTN

barley

vs

rice

genomic

DNA

TCCGAGCCAGAGGAGGCGAGGAACACATGCTCTCTTTCCTCCCCGCTGAAGAGTAGCTGCCAAGACCACCACGGAAGCGGAGGCGCACCACCCAGGCGGGCCATTCTCGCCGAGCCCCTCCCCCTCCCCCTCCCCCGGCCGCCGCGCCCTCGCCTCCGGGGATCGCGGGATTGGGCGGCC

ATGGACACCTCTACT

AGGGCGGCCAAGATCCCCTCCCTCCCCCAACAGACGGAGATCAACTGGGACAAGTAAGCGC

CCGC

CCGCCCGCCCGCCTCCTCTTTCCTCAGCCGTTCCCCGGCGATTCGCCCCGTCTCGCGCGCTCCGGGGTATTCCCCCCCCCCCCCCCCCCCCCGGCCGCAATTCGGAGCTGGCATTAGAAATTTCGGTTTAGGAACTTGCCGTGGCTGCTGTTGTCGCCGGATGGTGCGCGATTTAGGAAGCTGTGGTGAATTACGGCGTCTAGGTAGCTCTCTGGTTGGTATATTGTTCTGAAAATGCGGTGTG

TACATGTTGCAGCCTC

GACATGACCAAGCTCTATGTGGTTGGGGCCGGCATGTTCAGCTGCGTGACCGTGGCGCTCTACCCGGTCTCTGTGATCAAGACCCGGATGCAGGTTGCCTCGGGGGAGGCCATGAGGAGGAACGCCCTGGCCACCTTTAAGAACATCCTCAAGGTGGATGGGGTGCCTGGGCTCTACCGAGGGTTTGGGACCGTCATCACTGGGGCTATCCCGGCTAGGATTATCTTCCTCACGGCGCTCGAGAAGACAAAGGCGACTTCGCTCAAGCTTGTCGAGCCTCTTCAGCTGTCAGAGTCGATGGAGGCTGCCCTTGCTAATGGCCTTGGTGGTTTGACGGCGTCTCTGTGTTCGCAGGCCGTGTTTGTGCCCATTGATGTGGTATGC

TTATTA

TTGGTCATTTCCAGTTCCCAGTTTCTACGATGCTGCAAAGAGCTGTGCTGTTTTTACCAGATGAGGCGAATCCATTTATTATGTTTGTATTGGCCTTGT

ATCAGGTTAGTCAGAAATTGATGGTTCAAGG

ATATTCTGGTCATGTTAGATATAAGGGTGGAATCGATGTTGTTCAAAAGATTATGAAAGCTGATGGGCCACGGGGATTGTACAGAGGATTTGGCCTATCTGTTATGACGTATGCACCATCTAGCGCTGTGTGGTGGGCAAGCTATGGATTCAGCCAGCGTGTAATTTGGAGGTTAG

TAATGGTCTAATTAGCTTA

GCGCCATGTTTCTCTCTCTTAGTCTCTATGCTGTTTAAAGTCATGGTTTTCTTAAGTTGAACAAAGAAGAAAAGGCTAAGATGGTGACAGACTCTTTGAAGCA//AGGGCCAGTACATAAGTTAGTTTTGGGGAGTTCTTTCTTAATTCTTGATGATAGCATAAAAATGTTAGAAATTGAAATAAAGTTATTTATCTTGTATTCTTGTTAGCATAGCTACCTGAGAGTTCACTACTGTTGTTCTGCTTGTATAA

ACAGTG

CTCTTGGCCGTTTGGATGATAAAGAAGATACTCCTAGTCAGTTGAAAATAGTCGGTGTTCAAGCAACAGGGGGAATGGTTGCTGGGGCAGTGACCTCTTGTGTTTCAACTCCTCTAGATACAATCAAAACCAGGCTGCAGGTG

ACATTTATGTTTCTGATTATTGTTGCTTATTTCTGCCTAATTGGGTTTGAAC

AAAGTTCATTGTTCAAGACTGCCCTTGTAACTAGTGCTTATTAAGTTTCACACTGATGCCATTAAGATACCTATAACTTGACTCCATTATTAGAATATCATATACAATTATTGAGTTAGGTTTAGGAGCATGGATATTTTGTCTTGAAGTTCATTGATGTCCCTCT//TTTCTTTATGTCATCAAGAATTCGCATCTTTTCTATTTCTTAAATGTGACATGCATCTTACAGGTTGCGATGTGTGTATCAGTCACATTTATTATTTTTCTATTTTTTCCTC

TAGGTTAATATCAACAAGCCAAAAGCCAGTGAAGTAGTTAGAAGA

TTGATCGCTGAAGATGGATGGAAAGGTTTCTACAGAGGGTTAGGCCCAAGGTTTTTCAGCTCATCTGCTTGGGGTACATCAATGATCGTATGCTACGAATACCTAAGTATGCCCATC

CTTTCCGAGTCTT

ATCTGTATTTTTTTTTTTGTATATTCACCATATCTAATGCTATTTGCAAGCATACAAAATTGCATATAATAAATACACTGTTGCAATTAATTACATATT//CCTTTTTCCTAATGCTCTACTTATTTAGATGACACCAGGTTTTAAATTTCTCAATCTTGACCTGAAACTATTCTTTTTTCCCCTCTATTATGCTAACTGTTCCTTCCGCTTCTTC

CCAGAGCGTGTATGTGCTAAAGTTGAAGAGGCCTGAGTC

GCCC

BLASTN Rice vs Barley

Though most human and mouse orthologues have similar structure

some differences are sometimes observed:

•^

Comparison of the human and mouse spermidine synthasegenes revealed an additional intron in the human gene that is notfound in the mouse homologue

Human Mouse

5,500 bp

Comparative Genomics. Gene Annotation

Comparative genomics LAGAN

Limited Area Global Alignment of Nucleotides

LAGAN:

global

pair-wise and

multiple alignment

of finished

sequences.Detects closely and distantly relatedsequencesIf some of the sequences are in a draftformat your query will be redirected toAVIDMultiple alignment will be visualized byVISTAThis is the only alignment programavailable through the VISTA server thatproduces true multiple alignments.LAGAN performs better than BLASTZfor distantly related organisms

AVID

AVID:

global pair-wise alignment.

• Fast alignment of large seq.• Detects weak homologies• One of the sequences should befinished, but all others can beeither finished or in a

draft format

• For all finished sequences in theset, AVID generates all-against-allpair-wise alignments• Draft sequences are aligned tothe finished sequence.• View using VISTA

Clean Matches (

st)^

& Repeat matches^ Repeat for eachinter-anchorregion from theprevious step

Set of non-overlapping, non-crossing matches

Recursion

Red: non-overlapping, non-crossing matches

ZPicture

http://zpicture.dcode.org/

zPicture is a pair-wise alignment and visualizationtool that compares two sequences using the

local

alignment

program

BlastZ

Multiple sequences:

http://zpicture.dcode.org/multiz.php

Gene 1

4548-

Gene 2

8663-13534 R

Gene

14166-

Gene

24136-

Repeat

zPicture alignment Rice

(AY013245)

vs. Barley

(AY013246)

MULTI-LAGAN

http://lagan.stanford.edu/lagan_web/index.shtml

Rice Wheat Rice Wheat Rice Wheat

Rice Wheat

Barley view Gene 1

2563-63136 exons^ Gene 2

30279- 13 exons Missed

Reverse

Gene

64887- 18 exons

Gene4a

74857- 18 exons

Gene 1

4548-72876 exons

Gene 2

8663- 13 exons Missed

barley

Gene

14166- 18 exons

Gene

24136- 18 exons

Rice view Wheat

BarleyWheat

Barley

Gene4b

84895- 18 exons

MULTI-LAGAN: similar to ClustalW It produces dynamic alignments and also PDF ofthe alignments presented here.Any of the genomes can be used as referenceIn the alignment between rice-barley-wheat it failsto detect the

inverted Gene 2

in barley.

It produces an evolutionarytree for the submittedsequences