Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Bloc III, Apuntes de Biotecnología

Asignatura: bioinformatica, Profesor: , Carrera: Biotecnologia, Universidad: UB

Tipo: Apuntes

2012/2013

Subido el 10/06/2013

kirtash18
kirtash18 🇪🇸

4.2

(400)

30 documentos

1 / 19

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
BIOINFORMÀTICA
Bloque 3
1
!"#$%&'()'*+,-+.-/0'1&023-.+'4'&5#"%.-/0'6#"&.%"+,'
()7)'8#0.&93#:';<:-.#:'=&'&5#"%.-/0'
6#"&.%"+,)'>&"#?'6#"&.%"+,)'
!"#$ ,+@.&:$ %&'()*+,-$ .'(%*)("/$ /%#+0%$ %-$ ("$
1&023-.+' =&' 9#;"+.-#0&:' 1!"# $"%&# '&()*+"23$ !"$
4"/5%$ &,-.)*+"# ("$ 4/'4'/*+'-"$ ("$ 4"/5%$ 0%$
1&023-.+'6#"&.%"+,3$
6"7$0'#$8/%"#$0%$%#5)0+'$4/+-*+4"(%#9$
: 8#0#.&,' *,.'$ %&'()*+'-"-$ ("#$
."*/'.'(;*)("#$7$$
: <,.'$ %:+,' %#5"$ +-='/."*+,-$ *'.'$
>%//".+%-5"$1-")"#+/0/+&)#!/%#1&0&%#/)'(!/1/%2#&'+23$
!"#$ .%5'0'('?@"#$ )5+(+A"0"#$ #'-9$ "$ -+&%($ &A9&,-6&03+"B$ ("#$ C/6-.+:D$ 1&0# !"%# 34&# 0&+&%*'",/%# ,4+5.%*,"#
'&+0/!/1."23$ E$ -+&%($ +0+"@3-.#B$ ("$ &:3+=@:3-.+' 1&023-.+' 4' =&' 9#;"+.-#0&:3$ E$ -+&%($ .#69%3+.-#0+"B$ ("#$
5&))",*&0'"%#$*/*06/),7'*+"%3$
F'0%.'#$ )5+(+A"/('$ 5"-5'$ "$ -+&%($ -03&,&:9&.@B-.#' 1%"$&)# +(,/# &8/!4+*/0"0# &0')&# !"%# &%-&+*&%2$ '$ G+%-$
-03,+&:9&.@B-.#B$0%-5/'$0%$("$.+#. "$%#4%*+%H$
C-:3#,-+'=&'"+'-05&:3-1+.-/0'
I'0'$-"*%$%-$('#$"J'#$DEB$ *)"-0'$ &+%/'-$K)%$('#$ +",$*/%# 99#%&14."0# 40"#-"4'"# )&14!")$%-$%($ 5+%.4'3$L-$ %($
DFB$ #%$ %.4%A,$ "$ *'- #5/)+/$ ("$ 5*%'/)*"# &8/!4'*8" # :&# !"# ,/!;+4!"# :&# !"#$%&$'()*3$ E($ -%*%#+5" /$ >%//".+%-5"#$
+/,-4'"+*/0"!&%B$#%$0+#%J,$)-$4/'?/"."$0%$'/0%-"0'/$*'-$%($=+-$0%$('?/"/$*'-#5/)+/$%#"$>+#5'/+"$%&'()5+&"3$
+,-.*&#")-"/
!"$ ?%-;5+*"$ 7$ ("$ G+'+-='/.85+*"$%#58-$ %#5/%*>".%-5%$
/%("*+'-"0"#3$ </# %&# -4&:&# "8"0=")# &0# &!# +/0/+*,*&0'/# :&!#
1&0/,"#54,"0 /#%*0#!/%#"8"0+&%#$*/*06/),7'*+/%3$
!"#$ G&,,+6-&03+:';-#-0B#,6<3-.+:' "0%*)"0"#$ %M+#5%-B$7$%#'$ %#$
('$K)%$-'#$4%/.+5%$#%*)%-*+"/$5"-5'#$?%-'."#3$$
H5#"%.-/0'=&'"+:'6+.,#6#"2.%"+:'
01#$/2$3"#$4/53(*%)$4-(,*67/,*#'($48$$
!'$.8#$ =8*+($ %#$ *'?%/$ )-$ =/"?.%-5'$0%$NOE$*'.4"/5+0'$ %-5/%$ 0'#$ %#4%*+%#$ 7$+"-0&+,"#3$ L#5'$ 7"$ %#$ *'.4(%P'$
4'/$%($>%*>'$0%$('#$1+9:B$K)%$#%$4'-%-$%-5/%.%0+'$4"/"$&0.#"%60+,$('#$-)*(%,5+0'#$>'.,('?'#3$$
F%/'$ #,('$ &%.'#$ ("$ =-B&,&0.-+B$0/#&!#+",$*/# K)%$ >"$ #)*%0+0'3$ <'-$ ("$ %&'()*+,-$.'(%*)("/B$ ('$ K)%$ K)%/%. '#$
+-5%-5"/$+-=%/+/$#%/8-$('#$.+6;-#:)'>?@#40"#*0%&)+*(0#&0#9+&)#
-/:)."# %&)# 40"# :&!&+*(0# &0# A4&)+4%B# F"/"$ #"G%/$ ('$ K)%$
4"#,B$ 5%-%.'#$ K )%$ ,&9,#=%.-,' &"' 9,#.&:#' G-:3/,-.# $ 4"/"$
*'-'*%/$%($'/+?%-$0%$%#%$*".G+'3$$
I/"5"/%.'#$ 0%$ ,& .#0:3,%-,' &"' <,;#"' =&' &5#"%.-/ 0' 0%$ )-"$
#%*)%-*+"$ %-$ 4"/ 5+*)("/B$ "$ 4"/5+/$ 0%$ ("#$ 0+=%/ %-*+"#$ 0%$ ("#$
#%*)%-*+"#$"*5)"(%#3$$
BioinformàticaBioinformàticaBioinformàticaBioinformàtica
Bloc III.Bloc III. VariacióVariació GenGenèticaètica//GenòmicaGenòmicaBloc
III.
Bloc
III.
VariacióVariació GenGenèticaètica//GenòmicaGenòmica
i i EvolucióEvolució MolecularMolecular
Julio Julio RozasRozas
De
p
artamentDe
p
artament de de GenèticaGenètica
pp
UniversitatUniversitat de Barcelonade Barcelona
http://www.ub.es/molevol/julio
VariacióVariació GenèticaGenètica//GenòmicaGenòmica i i EvolucióEvolució MolecularMolecular
BLOC III
Variació genètica/genòmica i Evolució molecular
BLOC III
.-
Variació genètica/genòmica i Evolució molecular
Conceptes Bàsics d'Evolució Molecular. El rellotge molecular
Distancies Genètiques i Models d’Evolució del DNA.
Mesures de Similaritat: Matrius de Substitució (PAM/BLOSUM).
Conceptes Bàsics en Filogènia: Branca Topologia
Outgroups
etc
Conceptes Bàsics en Filogènia: Branca
,
Topologia
,
Outgroups
,
etc
.
Arbre de Gens i Arbre d’Espècies.
Mètodes de Reconstrucció Filogenètica (UPGMA, Neighbour-Joining, Màxima
Versemblança).
Determinació de la Fiabilitat de l'Arbre Filogenètic: Bootstrapping.
Duplicació Gènica: Identificació d’Ortòlegs i Paràlegs.
Conceptes Bàsics de Genòmica Comparada.
Variabilitat Intraespecífica: Polimorfismes de DNA i d’Haplotips.
Bases de Dades i Portals Genòmics de Variació Genètica (dbSNP, HapMap; PopSet).
Bioinformàtica J. Rozas
UNIVERSITAT DE BARCELONA
U
B
Bioinformàtica J. Rozas
UNIVERSITAT DE BARCELONA
U
B
11
Bioinformàtica J. Rozas
UNIVERSITAT DE BARCELONA
U
B
12
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Vista previa parcial del texto

¡Descarga Bloc III y más Apuntes en PDF de Biotecnología solo en Docsity!

BIOINFORMÀTICA

Bloque 3

!"#$ .%5'0'('?@"#$ )5+(+A"0"#$ #'-9$ "$ -+&%($ &A9&,-6&03+" B$ ("#$ C /6-.+: D$ 1 &0# !"%# 34&# 0&+&%'",/%# ,4+5.%,"#

'&+0/!/1." 23 $ E$ -+&%($ +0+"@3-.# B$ ("$ &:3+=@:3-.+' 1&023-.+' 4' =&' 9#;"+.-#0&: 3$ E$ -+&%($ .#69%3+.-#0+" B$ ("#$

F'0%.'#$ )5+(+A"/('$ 5"-5'$ "$ -+&%($ -03&,&:9&.@B-.#' 1 %"$&)# +(,/# &8/!4+/0"0# &0')&# !"%# &%-&+&% 2$ '$ G+%-$

-03,+&:9&.@B-.# B$0%-5/'$0%$("$.+#."$%#4%*+%H$

C-:3#,-+'=&'"+'-05&:3-1+.-/0'

I'0'$-"%$%-$('#$"J'#$ DE B$)"-0'$&+%/'-$K)%$('#$ +",$*/%#99#%&14."0#40"#-"4'"#)&14!") $%-$%($5+%.4'3$L-$%($

DF B$ #%$ %.4%A,$ "$ '-#5/)+/$ ("$ 5%'/)"# &8/!4'8"# :&# !"# ,/!;+4!"# :&# !"#$%&$'()* 3$ E($ -%*%#+5"/$ >%//".+%-5"#$

+/,-4'"+/0"!&% B$#%$0+#%J,$)-$4/'?/"."$0%$'/0%-"0'/$'-$%($=+-$0%$('?/"/$*'-#5/)+/$%#"$>+#5'/+"$%&'()5+&"3$

!"$ ?%-;5+"$ 7$ ("$ G+'+-='/.85+"$ %#58-$ %#5/%*>".%-5%$

!"#$ G&,,+6-&03+:' ;-#-0B#,6<3-.+:' "0%*)"0"#$ %M+#5%-B$ 7$ %#'$ %#$

('$K)%$-'#$4%/.+5%$#%)%-+"/$5"-5'#$?%-'."#3$$

H5#"%.-/0'=&'"+:'6+.,#6#"2.%"+:'

!'$.8#$=8+($%#$'?%/$)-$=/"?.%-5'$0%$NOE$'.4"/5+0'$%-5/%$0'#$%#4%+%#$7$ +"-0&+,"# 3$L#5'$7"$%#$*'.4(%P'$

4'/$%($>%>'$0%$('#$ 1+9: B$K)%$#%$4'-%-$%-5/%.%0+'$4"/"$ &0.#"%60+, $('#$-)(%,5+0'#$>'.,('?'#3$$

F%/'$#,('$&%.'#$("$ =-B&,&0.-+ B$ 0/#&!#+",$/# K)%$>"$#)%0+0'3$<'-$("$%&'()+,-$.'(%)("/B$('$K)%$K)%/%.'#$

-/:)."# %&)# 40"# :&!&+*(0# &0# A4&)+4%B# F"/"$ #"G%/$ ('$ K)%$

4"#,B$ 5%-%.'#$ K)%$ ,&9,#=%.-,' &"' 9,#.&:#' G-:3/,-.# $ 4"/"$

'-'%/$%($'/+?%-$0%$%#%$*".G+'3$$

I/"5"/%.'#$ 0%$ ,&.#0:3,%-,' &"' <,;#"' =&' &5#"%.-/0' 0%$ )-"$

#%)%-+"$ %-$ 4"/5+)("/B$ "$ 4"/5+/$ 0%$ ("#$ 0+=%/%-+"#$ 0%$ ("#$

DeDeppppartamentartament dede GenèticaGenètica UniversitatUniversitat de Barcelonade Barcelona http://www.ub.es/molevol/julio

VariacióVariació GenèticaGenètica//GenòmicaGenòmica ii EvolucióEvolució MolecularMolecular

BLOC IIIBLOC III .- Variació genètica/genòmica i Evolució molecularVariació genètica/genòmica i Evolució molecular Conceptes Bàsics d'Evolució Molecular. El rellotge molecular Distancies Genètiques i Models d’Evolució del DNA. Mesures de Similaritat: Matrius de Substitució (PAM/BLOSUM). Conceptes Bàsics en Filogènia: BrancaConceptes Bàsics en Filogènia: Branca , TopologiaTopologia , OutgroupsOutgroups , etcetc. Arbre de Gens i Arbre d’Espècies. Mètodes de Reconstrucció Filogenètica (UPGMA, Neighbour-Joining , Màxima Versemblança). Determinació de la Fiabilitat de l'Arbre Filogenètic: Bootstrapping. Duplicació Gènica: Identificació d’Ortòlegs i Paràlegs. Conceptes Bàsics de Genòmica Comparada. Variabilitat Intraespecífica: Polimorfismes de DNA i d’Haplotips. Bases de Dades i Portals Genòmics de Variació Genètica (dbSNP, HapMap; PopSet). Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B

Mutations & Multiple Substitutions

Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 11

BIOINFORMÀTICA

Bloque 3

I+4'#$0%$".G+'#$K)%$%-'-5/".'#$

Q'-$('#$"#'#$'.4(+"0'#B$7"$K)%$4)%0%-$#)%0%/$0%$&"/+"#$."-%/"#B$7$%-$("$'G#%/&"+,-$0%$("#$0+=%/%-+"#$

"5)"(%#$-'$#+%.4/%$#%$4)%0%$'-'*%/$K);$4"#,$%&'()5+&".%-5%3$

L#$ .)7$ +.4'/5"-5%$ #"G%/$ #+$ %/"$ )-"$ 6%3+.-/0' '$ )-"$ :%;:3-3%.-/0 3$ I%-%.'#$ K)%$ '-'%/$ *)"-5"#$

#)G#5+5)+'-%#$>"-$#)%0+0'$%-$)-"$#%)%-+"3$

<)"-0'$ '.4"/".'#$ 0'#$ #%)%-+"#B$ 5%-%.'#$ 4'/$ %P%.4('$ ("$ 0%($ C>'.G/%D$ 7$ ("$ 0%($ C>+.4"-*;DB$ 7$ %-5/%$

%#5"#$0'#B$&".#$"$+-=%/+/$("$0%$)-$"-%#5/'$'.R-3$S)>"#$0%$("#$>%//".+%-5"#$G+'+-='/.85+"#$%#58-$0+/+?+0"#$

"$'-'%/$%#5"#$#)G#5+5)*+'-%#3$

:.;/<."6=4/">$&.-(>4/4.'?,")//&4/#$&;,.&4/3"/@A+8//

E # -+&%($ +-5/"%#4%@=+'B$ 5%-0/%.'#$ 0+=%/%-5%#$ ".G+'#$ 0%$ ("#$ .'(;)("#$ 0%$

NOE$K)%$?%-%/"/8-$4'(+.'/=+#.'#3$

: !"$ :&"&..-/0'0+3%,+"I' %#$("$R-+"$K)%$%#58$'/+%-5"0"$"$("$?%-%/"+,-$0%$)-$ .+6;-#'+=+93+3-5# 3$ '

: !"$ =&,-5+' 1&023-.+I' 4'/$ %P%.4('B$ 5%-%.'#$ &%+-5%$ +-0+&+0)'#$ %-$ )-"$ +#("3$ Q+$ )-'$ 0%$ %(('#$ 5+%-%$ )-"$

&"/+"-5%$4"/5+)("/$7$0%#"4"/%%B$4)%#$%#"$&"/+"-5%$7"$-'$#%$*'-#%/&"/83 '

!"#$ 6%3+.-#0&:J' 4)%#B$4'0/8-$5%-%/$0+#5+-5"#$'-#%)%-+"#$%-$("$%&'()+,-$0%$("$%#4%*+%B$0%4%-0+%-0'$0%$("#$

."-+=%#5"+'-%#$=%-'5@4+"#$K)%$4/%#%-5%-3$E#@B$4)%0%-$#%/9$

: KHLM>NOI' -'$ ")#"-$ 4%/P)++'$ -+$ G%-%=++'$ "($ +-0+&+0)'$ K)%$ ("$ 5+%-%$ 1 '",$;0# %&# :&0/,*0"#

4"&",-(>#")-"/"B.(>&")-" 2 #C#%4#&8/!4+*(0#:&-&0:&)7#:&!#"=") 23 $Q%/8-$("#$K)%$.8#$-'#$+-5%/%#"/8-3$

: !HKHPQ8QRONI $%#5"$0+-8.+"$7"$0%4%-0%/8$0%$("$#%(%*+,-$-"5)/"(B$K)%$("#$=+P"/8$/84+0".%-5%3$

: SLMN8QTK' UHVHMH>HNI $ %#5"$ .)5"+,-$ (%$ 4/'&'"$ 4%/P)+*+'#$ "($

+-0+&+0)'3$Q%/8$%(+.+-"0"3$

!"$ O&"&..-/0'0+3%,+" $"5)"/8$'.'$=+(5/'$0%$('#$*".G+'#$0%($NOE3$$

L-$("$#%(%+,-$ 0&1+3-5+' '$ 9%,-B-.+=#,+ B$"K)%(('#$K)%$="&'/%A"-$"$("$%#4%+%B$

#%/8-$=+P"0'#B$7$"K)%(('#$K)%$-'$#%"-$G%-%=+*+'#'#B$#%/8-$%(+.+-"0'#3$

L-$("$#%(%+,-$ 9#:-3-5+ $'$ +=+93+3-5+ B$5%-%.'#$)-$0+/%+'-".+%-5'$0%$('#$".G+'#$0%($NOE$>"+"$("#$='/."#$

K)%$ .8#$ G%-%=+*+%-$ "($ +-0+&+0)'3$ <'-$ ('#$

4'(+.'/=+#.'#B$ 4'/$ %P%.4('B$ 4'0%.'#$

%-'-5/"/$ )-"$ #%(%*+,-$ =-,&..-#0+" B$ %-$ ("$

*)"($ #,('$ ."-5%-0/%.'#$ %($ .%P'/$ "(%('B$ '$

G+%-$ ;+"+0.&+=+ B$ %-$ ("$ *)"($ ."-5%-0/%.'#$

.R(5+4(%#$"(%('#$'-$"/"5%/@#5+"#$0+#5+-5"#$'$

0&%3,+"&:' #%$4)%0%$'-'%/B$7$"#@$'G5%-%.'#$

("$ 4/'G"G+(+0"0$ 0%$ K)%$ +%/5"$ .)5"+,-$

-%)5/"($K)%0%$=+P"$%-$%($?%-'."3$

Mutation NN eutt rall mut titati ons (( sell ectiti vell y equii vall entt chh anges)) Beneficial/Advantageous mutation Deleterious mutation Natural Selection Negative/Purifying selection. Remove Deleterious mutations. Maintain existing adaptations Positive/Adaptive selection. DriveDrive advantageous mutationsadvantageous mutations. Origin of novelties and adaptationsOrigin of novelties and adaptations Directional Selection (the allele frequency shifts in one direction) -> Fixation (loss of polymorphim) Balancing Selection (two –or more- alleles are actively maintained) -> Maintain polymorphims Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B Neutral Mutations P lPol ymorphihi sm -> GG enetiti c D iftDrift Divergence -> Genetic Drift Polymorphism is a transient phase towards fixation (mutation-drift equilibrium) Probability of Fixation, P = 1/2 N Fixation time, t^ - = 4 N Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B Hablaremos de polimorfismo, ya que a nivel de una población en concreto tenemos distintas variantes.

DIVERGENCIA: cuando

tenemos diferentes variantes de un gen o secuencia entre dos especies evolutivamente relacionadas.

BIOINFORMÀTICA

Bloque 3

()Y)'S#=&"#:'=&'&5#"%.-/0'=&"'UKN'

F"/"$#"G%/$,.'$#%$'/+?+-"-$('#$?%-%#B$-%%#+5".'#$?%-%/"/$)-$4/'*%#'$0%$ B-"#1&0-+'

1. R;3&0&,'"+'6%&:3,+ 3$L($4/'%#'$0%$'G5%-+,-$0%$("$.)%#5/"$%/"B$>"#5"$>"*%$)-'#$

"J'#B$)-$4/'%#'$#+.4(%3$F%/'$">'/"$%#$.8#$'.4(+"0'B$('#$G+>'#$-'/."(%#$7"$

#%$'-'%-B$7$('#$/"/'#$#'-$0+=@+(%#$0%$'-#%?)+/

Y) HA3,+..-/0'=&"'UKN 3$6"7$.8K)+-"#$K)%$('$>"%-$")5'.85+'3$

() O&.%&0.-+,'&"'UKN 3$6'7$%-$0+"$5".G+;-$>"7$/'G'5#$K)%$('$>"*%-3$$

Z) H0:+6;"+,' &"' UKN 3$ Q+$ %#$ )-$ ?%-'."B$ #%/8$ .8#$ 0+=@*+($ K)%$ )-$ #'('$ ?%-3$

O%%#+5".'#$)-+/$('#$?%-%#$0%$"0"$#%)%-+"+,-$%-$)-"$#'("$#%)%-+"3$ >0%",$!"?"+&#)&6&)&0+"#"#

I'0'$%#5"/8$G"#"0'$%-$.;5'0'#$%#5"0@#5+'#B$'-$('#$K)%$5%-+%-0'$)-'#$.' 0 %('#$4/'G"G+(@#5+*'#B$#"G/%.'#$("$

4/'G"G+(+0"0$0%$"0"$".G+'$0%$-)(%,5+0'B$7$"#@$4'0/%.'#$%#5"G(%%/$("$=+('?%-+"3$E$-+&%($0%$NOE$&%/%.'#$

5/%#$.'0%('#B$4%/'$>"7$.)*>'#B$7$5%-%.'#$K)%$#"G%/$ .%<"'%:+, 3$Q%?R-$("$4/%?)-5"B$)#"/%.'#$)-'#$)$'5/'#3$

N%#4);#$ 0%$ %#'?%/$ )-$ 6#=&"# B$ 5%-%.'#$ K)%$ %#'?%/$ ("$ 6&3#=#"#1@+ $ 4"/"$ #'()*+'-"/$ -)%#5/'$ 4/'G(%."B$ 7$

5".G+;-$%#'?%/$)-"$."-%/"$0%$ 5+"-=+, $-)%#5/'$/%#)(5"0'B$'-'%/$('#$4)-5'#$=)%/5%#$7$('#$0;G+(%#$0%$"0"$

O%;:3-3%.-#0&:'6["3-9"&:'

Q'-$ ("#$ K)%$ 4)%0%-$ "=%*5"/$ "($ .+#.'$

#+5+'B$ '$ G+%-$ #%/$ ".G+'#$ 0%$ /%5/'%#'B$

K)%$ "=%*5"/8-$ "$ ('$ K)%$ &%".'#$ %-$ ("$

E($4/+-+4+'$5%-0/%.'#$(+-%"/+0"0$'-$("#$

0+=%/%-*+"#B$ 4%/'$ "$ .%0+0"$ K)%$ 4"#%$ %($

5+%.4'B$("$&".'#$"$4%/0%/3$

!'$ K)%$ K)%/%.'#$ =&3&,6-0+,' &:' &"' 0[6&,#' =&' :%;:3-3%.-#0&: B$ .%0+"-5%$ ("$ 'G#%/&"*+,-$ 0%$ -)%#5/"#$

#%)%-+"#$ "5)"(%#B$ 7$ 0%$ ("#$ 0+=%/%-+"#$ 'G#%/&"0"#3$ E($ 4/+-+4+'B$ ("#$ .)5"+'-%#$ K)%$ 'G#%/&%.'#$ #%/8-$

"-0&+"&: B$7$"$4"/5+/$0%$)-$.'.%-5'$0%5%/.+-"0'B$('$K)%$'G#%/&".'#$#%$#"5)/"$7$7"$-'$&%.'#$)-$-R.%/'$0%$

0+=%/%-+"#$K)%$-'#$+-0+K)%$('#$".G+'#$/%"(%#3$

F%/'$ "$ -'#'5/'#$ -'#$ +-5%/%#"$ '-'%/$ ('#$ .+6;-#: B$ 7$ 4'/$ ('$ 5"-5'$ -%*%#+5".'#$ .#,,&1-, $ ('$ K)%$ >"7$ %-5/%$ ('$

'G#%/&"0'$7$("$0+=%/%-*+"3$$

L($4/'G(%."$%#$K)%$("$0+-8.+"$0%$("$C'G#%/&"0"D$-'$#%/8$("$.+#."$%-$"0"$#%)%-+"B$-%%#+5".'#$'-'*%/$

./6#' .+6;-+0' ('#$ -)(%,5+0'#$ 7$ "$ $%2' 5&"#.-=+=' %-$ "0"$ #%)%-+"3$ I%-%.'#$ ("$ /'P"B$ K)%$ %#$ ("$ C5%,/+*"$

>+4'5;5+*"DB$+-=%/+0"$"$4"/5+/$0%$("$"A)(B$K)%$%#$("$C'G#%/&"0"D3$

>HOLSHK'UHV'UH!NMH'N8MLNV'

C"$6E/3"/&/">$&.,(1)/#$&",.&6 9$!'#$ :&"&..-#0-:3+:' 0%@"-$K)%$('#$".G+'#$K)%$')//%-$#'-$G)%-'#$'$."('#$

1 0&4')/%#0/#5"C 23$!"$."7'/@"$0%$*".G+'#$0%($?%-'."$#'-$ =&"&3&,-#:' 1 &%/#!/#8&0#'/:/%2#&%#8&):": 23 $

!'#$ 0&%3,+"-:3+: $0%@"-$K)%$>"7$ ,4+5/%#:&!&'&)/%2#"!140/%#0&4')/%#C#,4C#-/+/%#$&0&6+/%/% 3$Q'-$('#$K)%$.8#$

/"A,-$ 5+%-%-B$ 7"$ K)%$ 0+%-$ K)%$ %M+#5%$ ("$ =-0<6-.+' =&' 6%3+.-#0&: $ 0%G+0'$ "($ ?/"-$ -R.%/'$ 0%$ .)5"+'-%#$

-%)5/"#3$!"#$ :&!&'&)"%#$"?")."0#!"#'"%"#:&#,4'"+(0 B$7"$K)%$#%$%(+.+-"-3$

!'$K)%$#@$K)%$&%-$".G'#$%#$K)%$('#$?%-%#$'-$.8#$ 8&!/+:":#:&#,4'"+(0#%/0#!/%#,&0/%#640+/0"!&% 3 $

ED)EZ)YE

Multiple Substitutions Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B ! Multiple Substitutions Number of Changes Observed Differences Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B "

BIOINFORMÀTICA

Bloque 3

S#=&"#'=&'X%&amp;:'8+03#,']X8D^_'

I".G+;-$ #%$ (("."$ %($ .'0%('$ 0%$ %0' 9+,<6&3,# B$ 7"$ K)%$ #,('$ 5%-0/8$ %-$ *)%-5"$ ("$ 3+:+' =&' 6%3+.-/0 3$ <"0"$

-)(%,5+0'$5+%-%$("$.+#."$4/'G"G+(+0"0$0%$#%/$#)G#5+5)+0'$4'/$'5/'3$F"/"$.)>'#$*"#'#$#+/&%B$")-K)%$-'$#%"$

N%+.'#$K)%$("$5"#"$0%$.)5"+,-$1 ` 2$%#$#+%.4/%$("$.+#."3$!"$4/'G"G+(+0"0$0%$K)%$5%-?".'#$E$%-$%($5T$#+-$K)%$

.)5%$#%/8$+?)"($"$ 7 a(** 3$L-$%($5+%.4'$ Y$#%/@"$1 **7 a( 2 Y 3 $ _01#$/ ()-6$3.,(#$4/ "4-$4/ )G#"6$4/ ")/.)/#-6(=/3"/4.'4-(-.,(1)8_** $$ F'-0/%.'#$ )-$ )"0/'B$ '-$ ('#$ )"5/'$ -R(%'5+0'#$ %-$ "0"$ ("0'B$ 7$ ("#$4/'G"G+(+0"0%#$0%G"P'3$$ N%4%-0+%-0'$ 0%$ -)%#5/"$ "(="B$ ('#$ _0D,&)/%#:&#!"#,"')=#%&)70#:%'0'/%_ 3$$ <)"-0'$5%-?".'#$("#$ 9,#;+;-"-=+=&:'=&'.+6;-# B$4'0/%.'#$="G/+"/$-)%#5/"#$."5/+%#$0%$#)G#5+5)*+,-3$

S#=&"#'=&'b-6%,+']bYc_'

Q%$0%-'.+-"$%($4"/8.%5/'$0%$ =#:'6#=&"#: B$7"$K)%$.'0%(".'#$0+=%/%-5%$("#$ 3,+0:-.-#0&: $0%$("#$

3,+0:5&,:-#0&: 3$!"#$4/+.%/"#$#%/8-$ `' 7$("#$#%?)-0"#$ d 3$!"#$4/+.%/"#$#'-$.)>'$.8#$=/%)%-5%#B$7$ ("#$#%?)-0"#$.%-'#3$$ I".G+;-$ ="G/+"/%.'#$ 0%&:3,+' 6+3,-e' =&' :%;:3-3%.-/0 3$ SNM $ #%/8$ ("$ 4/'G"G+(+0"0$ 0%$ K)%$ 1 NafM 2$ K)%$'//%#4'-0"$%-$("$."5/+AB$#%/@"$)-"$ d 3 $

S#=&"#'1&0&,+"'=&'3-&69#',&5&,:-;"&']gM>_'

Z-"$4/'4+%0"0$+.4'/5"-5%$K)%$-'$5+%-%-$%-$*)%-5"$-+-?)-'$0%$('#$

0'#$.'0%('#$"-5%/+'/%#$%#$("$0+#5+-5"$ B,&.%&0.-+'=&'"+:';+:&:'&0'&"'

UKN 3$ E$ &%%#$ 5+%-%$ #%-5+0'$ 4'-%/$ ("#$ =/%)%-*+"#$ 0%$ %K)+(+G/+'$ 0 %$

("#$ 0+#5+-5"#$ G"#%#B$ "#@$ '.'$ ("#$ 5"#"#$ 0%$ .)5"+,-B$ K)%$ #%/8-$

L($.'0%('$ gM>' %#$%($.'0%('$ 3-6&a,&5&,:-;"& $.8#$?%-%/"(B$5+%-%$%-$ )%-5"$5'0"#$("#$4'#+G+(+0"0%#3$ I%-0/8$ 0+%A$ 4"/8.%5/'#B$ #%+#$ 4"/"$ ('#$ 4"/%#$ 0%$ G"#%#$ K)%$ 4)%0%-$ ".G+"/B$ .8#$ ('#$ )"5/'$ 4"/8.%5/'#$ 0%($ %K)+(+G/+'$0%$"0"$G"#%3$ _E":"#6!"#%4,"#40/_ 3 $ !'#$.'0%('#$ -,,&5&,:-;"&:' =)-+'-"-$4''B$4'/K)%$%-$)-$8/G'($=+('?%-;5+'$-'$#"G%.'#$#+$&".'#$"0%("-5%$'$ "5/8#3$F%/'$#+$=)%/"$+//%&%/#+G(%B$%($[I$5%-0/@"$ :&-:'9+,<6&3,#:'6<: B$7"$K)%$%($".G+'$0%$)-"$G"#%$"$'5/"$7$ ()%?'$%($.+#.'$".G+'$#%/@"$0+#5+-5'3$Q+$5%-%.'#$4'*'#$0"5'#B$%#5%$-'$('$)5+(+A"/%.'#3$

Jukes and Cantor model M ij probability to change nucleotide i (row) to j (column), per unit of time (the sum over in each row should be 1) M GC probability of substitution of G (row) to C (column) (G!C): M GC =! ite is in state ite in theite in the state i at g times. UNIVERSITAT DE BARCELONA^ U J. Rozas B !

e of a substitution matrix

A G C T

A

0. 002 0. 983 0. 005 0. 010 G

G

C

$( 0.002 0.013 0.005 0.979$) T

bility that a site that started with A had nucleotide C, time interval). bility that a C remains unchanged), per unit of time. has been no (apparent) change at this site. atrix sum to 1 (all possible states are represented, one osen) UNIVERSITAT DE BARCELONA J. Rozas U B TheThe JukesJukes- Cantor (Cantor (JC69)JC69) one parameter model (1969) assumes that eachone parameter model (1969) assumes that each nucleotide has equal probability to be substituted by any of the other three in a fixed period of time. Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B ! Jukes and Cantor model P A0 = 1 P A0 = 1 t (^0) A 1 3 A 1 - 3! 33! t 1 P A1 = 1-3! A Not A P A1 = 3! 1-3!! t 2 P A2 = (1-3!) P A1 = (1-3!) (^2) A A P A2 =! P A1 = 3!" P A2 = (1-3!)^2 + 3!^2 Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B "

Kimura (1980)

The Jukes-Cantor model assu to be substituted by any of thto be substituted by any of th However, is often observed (r frequently than transversionsfrequently than transversions. two parameters! (for transiti Note that JC69 is simple a parNote that JC69 is simple a par Bioinformàtica UNIV

General Time-R

An important property of real models)models) is that the frequencieis that the frequencie the substitution rates might v TheThe GTRGTR is the most general tis the most general t instantaneous rates of substit a = A !C, b = A!G, c = A! The Q substitution matrix rep base i to base j. The equilibriu

G and # T. The diagonal eleme

Bioinformàtica UNIV

Kimura (1980). 2-parameter mod

The Jukes-Cantor model assumes that each nucleotide has eq to be substituted by any of the other three in a fixed period ofto be substituted by any of the other three in a fixed period of However, is often observed (real data) that transitions occur m frequently than transversionsfrequently than transversions. TheThe Kimura twoKimura two--parameter (K2parameter (K two parameters! (for transitions) and " (transversions). Usua Note that JC69 is simple a particular case of K2P in whichNote that JC69 is simple a particular case of K2P in which !! = M ij probability of substitution of nucleotid toto jj (column) per unit time(column) per unit time. M AT probability of substitution of A!T = Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

General Time-Reversible model (G

An important property of real sequences (not accounted for JC models)models) is that the frequencies of the four bases are often diffis that the frequencies of the four bases are often diff the substitution rates might vary across different nucleotide p TheThe GTRGTR is the most general timeis the most general time--reversible modelreversible model. GTR allowGTR allow instantaneous rates of substitution between each of the six nu a = A !C, b = A!G, c = A!T, d = C!G, e = C!T, and f = The Q substitution matrix represents the instantaneous rate o base i to base j. The equilibrium frequency of the bases are de

# G and # T. The diagonal elements are omitted for clarity.

Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

BIOINFORMÀTICA

Bloque 3

E$ 6<:'+" 3 #'&"':.#,& B$ 6<:'.&,3&e+'=&'$%&'&:#'&:'G#6/"#1# $1 ,7%#)/$4%'&=#:&%')"#&%',"+(0 23$$

IJI@"/31)3"/4*&")/&$4/).#"6(-$48K8/

I%-%.'#$K)%$4%-#"/$K)%$-'$5'0'#$('#$EE$#'-$+?)"(%#B$4'/$('$5"-5'$('#$/%%.4("A".+%-5'#$0%$EE$-'$#%/8-$"$("$

.+#."$&%('+0"03$Z-$".G+'$0%$".+-'8+0'$0%$4'#+5+&'$"$4'#+5+&'$4)%0%$#%/$G"#5"-5%$=/%)%-5%B$4%/'$)-'$0%$

4'#+5+&'$"$-%?"5+&'$#%/8$.8#$/"/'3$%%.4("A".+%-5'#$ .#0:&,5+3-5#:' #%/8-$.8#$=/%)%-5%#B$ $%-e<: B$K)%$%-5/%$ ?/)4'#3$L#5%$ :LMNO9 $%#$%($K)%$K)%/%.'#$)"-5+=+"/3$ E0%.8#B$ %-$ %($ ,0+?'$ ?%-;5+'B$ ('#$ .#=#0&:' K)%$ '0+=+"-$ 4"/"$ )-$ ".+-'8+0'$ 0%$ )-$ 5+4'$ '-/%5'$ #)%(%-$ %#5"/$ ()#5%/+A"0'#B$ 7$ _+/,-")'&0# ,7%# 04+!&(':/%_ 3$ F'/$ ('$ 5"-5'B$ #+$ #,('$ >"7$ )-$ ".G+'$ 0%$ -)(%,5+0'B$ -)/$"$!&,&0'&#&!#99#34&#:&)8&#%&)7#-")&+:/ 3 $ b)%/%.'#$ )"-5+=+"/('$ 4"/"$ 5%-%/$ )-"$ G"#%$ /%"(+#5"$ #'G/%$ ("$ K)%$ '-#5/)+/$ ("$ 6+3,-e' =&' 9,#;+;-"-=+=&: 3$ b)%/%.'#$ +0%-5+=+"/$ -)(%,5+0'#$ 7$ ".+-'8+0'#$ K)%$ #%"-$ G#6/"#1#:' 7$ K)%$ 0%/+&%-$ 0%$ )-$ "-5%4"#"0'$ *'.R-3$

cNS'

F"/"$ '-#5/)+/$ %#5"$ ."5/+A$ 0%$ #)G#5+5)+,-B$ "?/)4"/'-$ ("#$ 0+=%/%-5%#$ 4/'5%@-"#$ %-$ :%9&,B+6-"-+: B$ K)%$

'.4"/5@"-$=)-+'-%#$#+.+("/%#B$>"*+%-0'$ <,;#"&:'B-"#1&023-.#: 3$L-$5'5"($>"G@"$ (Z':%9&,B+6@"-+: 3 $

<'-$%#5"#$=".+(+"#$?%-%/"G"$ +"-0&+6-&03#:'6["3-9"&: 3$L($-R.%/'$0%$ cNS7 B$#%$/%=+%/%$"$K)%$#,('$5+%-%-$)-$ 7o' 0%$ =-5&,1&0.-+' 1 40#+",$*/#:&#+":"#IJJ#99 23$F)%0%$#%/$4'/K)%$#%"-$#%)%-+"#$0%$0'#$%#4%+%#$.)7$ .&,.+0+:' 1 _5/,$)&P+5,-"0+;_ 2$'$G+%-$.)7$ .#0:&,5+=+:' %-5/%$%#4%+%#$0+#5+-5"#3$ 6"%$ )-$ "-8(+#+#$ '-$ #%)%-+"#$ .)7$ '-#%/&"0"#B$ ('#$ ".G+'#$ K)%$ #%$ 0%-$ %-5/%$ %(("#$ #%/8-$ 4'/K)%$ ("$ :&"&..-/0' 0+3%,+"' ("#$ >"$ "%45"0'3$ Q+$ "(+-%".'#$ #%)%-+"#$ %-$ ("#$ K)%$ #,('$ TcT]]$ EE$ %#$ 0+=%/%-5%B$ %#$ .)7$ 4/'G"G(%$K)%$("#$"(+-%%.'#$G+%-3$<'-$("$FEST$#%$G)#"G"$%#'B$ _34&#&!#"!0&",&0'/#,D!'-!@&)"#+&)'/_ 3 $ N%#4);#$0%$"-"(+A"/$TWdY$"#'#$0%$/%%.4("A".+%-5'$%-$)-$'-5%M5'$=+('?%-;5+'B$&+%/'-$('#$/%#)(5"0'#3$ F'/$%P%.4('B$5%-%.'#$ .%+3,#':&.%&0.-+:'+"-0&+=+: 3$Q+$%-'-5/".'#B$%-$("$.+#."$4'#++,-B$0'#$EE$+?)"(%#$7$ 5/%#$ 0+#5+-5'#B$ K)%/%.'#$ #"G%/$ )8($ >"$ ".G+"0'B$ 7$ K);$ %#$ .8#$ =/%)%-5%$ K)%$ 4"#%3$ L-$ %#5%$ "#'B$ >"@"$ )-$ 8/G'($ =+('?%-;5+'$ K)%$ -'#$ +-0+"G"$ ("$ =-,&..-/0' 0%$ ('#$ ".G+'#3$ _Q# 0/# %"$&,/%# +47!#&%#,7%#"0'14/2#+/0%:&)").",/%#!/%# +",$/%#)&8&)%$!&%_ 3$$ O'$#'(".%-5%$.+/"G"$K);$".G+'#$>"G@"$ "((@B$ #+-'$ ('$ >"@"$ %-$ )-$ '-5%M5'$ =+('?%-;5+'3$ Q+$.+/".'#$("$#%.+:."5/+A$0%$ cNS7 $1 _0/%# :7#14"!#34&#%&"#:&#R!C#"#9!"#34&#: !"#"# R!C_ 2B$ 5%-%.'#$ .)>"#$ 4)-5)"+'-%#$ 0+#5+-5"#3$ E($ ("0'$ 5%-%.'#$ ("$ K)%$ 'G5)&'$ N"7>'==B$ K)%$ '-5,B$ 4'/$ %P%.4('$ eVaa$ ".G+'#$0%$ N"+ 3 $ F'/$ %P%.4('B$ 5%-%.'#$ E' .+:#:' %-5/%$ ('#$ K)%$ ".G+"G"$ 1 _&0# !/%# ISTL# +"%/%# :&# )&&,-!"=",&0'/#34&#'&0."_ 2$)-"$ 3,�-0+' 1 M 2 ' 4'/$)-$ 3,-93/B+0#' 1 m 23 $ L-$%($%P%.4('$K)%$&%.'#$"($("0'B$("$ :%6+'=&'3#=#:'"#:'.+6;-#: $%#58$.)(5+4(+"0"$4'/$T]B$7$5+%-%$K)%$#)."/$ 7iFY' ".G+'#3$ _Q#0/%#%"!(#UJ2#&%#34&#%/0#U#+",$/%#G,/'8/%#&%'":.%'+/%K_ 3 $

X <-> Ala = 3644 changes Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B 9 !"#$%&'()'++%,-%.'/(01-'2"-3-0(14'56789':/ ;<(4%<='>%<3-%.'/&(-%014;<(4%<='>%<3-%.'/&(-%** Proteins that have undergone 1% change (PAM1 matrix) (1 substitution per 100 residues)

BIOINFORMÀTICA

Bloque 3

L-5'-%#$4)0'$'G5%-%/$("$ 6%3+;-"-=+=',&"+3-5+ $0%$"0"$".+-'8+0'B$>"+%-0'$ 0p':%;:3-3%.-#0&:'l'B,&.%&0.-+ $ ,&"+3-5+'0#,6+"-e+=+ 3$L#$+.4'/5"-5%$0+&+0+/('$4'/$("$=/%)%-+"$/%("5+&"$-'/."(+A"0"B$7"$K)%$('#$""$-'$%#58-$%-$ ("$.+#."$=/%)%-+"$%-$5'0"#$("#$#%)%-+"#f$L-$%($)"0/'$0%$"($("0'B$("$'().-"$/'P"$%#$("$'G#%/&"0"B$("$(+("$%#$ ("$-'/."(+A"0"$7$("$&%/0%$%#$("$ 6%3+;-"-=+=',&"+3-5+ 3 $ !"$ 6%3+;-"-=+=',&"+3-5+' 0 %$("$ +"+0-0+':&,<'7EE 3$L($K)%$.8#$.)5"G+(+0"0$5+%-%$%#$%($0%$("$ N:9+,+1-0+ $ 1 K 2B$7"$K)%$5%-%.'#$.)>"#$#)G#5+5)+'-%#$/%#4%5'$"($-R.%/'$0%$O$K)%$ 5%-%.'#$%-$("$#%)%-+"3$ H0'"+'6+3,-e'5&6#:'"#'$%&':%9#0@+6#: 9$ _+",$/%#:c#34&#%4-/0&0#40#+",$/#:&#'-/# V34.,+/W#%/0#,&0/%#6)&+4&0'&%#34&#"34&!!/%#34&#+",$"0#"#*14"!_ 3 $

!"#$ %(0+(("#$ -'#$ +-0+"-$ ("$ 4/'G"G+(+0"0$ 0%$ K)%$ )-$ EE$ *".G+%$ "$ )-'$ )$ '5/'3$

LM4(+"/%.'#$ ./6#':&'.+".%"+'"+'cNS B$7$5".G+;-$&%/%.'#$ !VROLS B$K)%$%#$.%P'/3$ S+/".'#$ )-"$ ."5/+A$ FESTB$ K)%$ 7"$ -'$ #'-$ ".G+'#B$ #+-'$ 9,#;+;-"-=+=' =&' 6%3+.-/0 3$ I/"G"P".'#$ '-$ #%)%-+"#$ '-$ #,('$)-$ 7o' 0%$".G+'#3$ !'$K)%$%#4%/".'#$ %-'-5/"/$ 0%$ .%0+"$ %-$ ("$ =-+1#0+"' &:' %0' ^^o' =&' -=&03-=+= $ 1 ,7%# /# ,&0/%# &%# !/# 34&# 8&,/%2# "4034&# &%'7# ,4!'-!+":/# -/)# IJJJJ 2 3$$ Q%$.+/"$"($/%&;#$K)%$%-$ 0[."-=#: 3$$ L($".G+'$ SNH' #+?-+=+"$%($".G+'$0%$ H! N 3$!"$4/'G"G+(+0"0$ 0%$K)%$.)5%$0%$L$"$E$4)%0%$#%/$ =-B&,&03&'$%&'"+'-05&,:+ $ 1 &0#&%'&#+"%/2#9PX>#&%#IJ#C#:&#>PX9#&%#IT 23 $ !"$."5/+A$ cNS7' %#$("$R-+"$K)%$5+%-%$0"5'#$%.4@/+'#B$("#$ '5/"#$%#58-$ _,&,.&34/")/6"<"6"),(_* 3 $ F'/$ %P%.4('B$ ("$ _P+QR/ "4/ ,$)/ 4",."),(4/ (%.&"4_** B$ ("$ 0+"?'-"($#%/8$ 7EE' 7$("#$'5/"#$"#+(("#$#%/8-$]3$$ E$4"/5+/$0%$('#$&"('/%#$0%$ cNS7' 7$.)(5+4(+"-0'$%#'#$&"('/%#$4'/$#+$.+#.'$4'0%.'#$".G+"/("$4'/$)-+0"0%#$ 0%$5+%.4'3$F'/$%P%.4('B$5%-%.'#$("$ cNSYiE B$K)%$-'#$+-0+"/8$ -)/'&.0"%#34&#+/,-")'&0#/#'&0&0#40#LJN#:&# :&0':":#GYJN#:8&)1&0'&%K 3$$ Q+$ >"%.'#$ -0B-0-3# B$ '.'$ )-"$ cNSYEEE B$ ')//+/8$ K)%$ _+4"!34&)# 99# 5 "+"# "!"00"# %&)7# Y2TN2_ $ K)%$ %#$ #)$ B,&.%&0.-+' &:9&,+=+' 1 !"# 34&# %&,-)&# &0+/0')",/%# :&# "!"00"#&0#!"%#%&+4&0+*"% 23$$ Q%/@"$ %($ &"('/$ 0%$ &$%-"-;,-#' B-0+" 3$$ !'$ K)%$ 0%+.'#$ %#$ K)%$ #+$ %#"$ 3+:+'=&'6%3+.-/0'#.%,,&'=%,+03&'-0B-0-3#J $#+%.4/%$&'(&%/%.'#$ "$('#$&"('/%#$0%$%K)+(+G/+'$0%$('#$".+-'8+0'#3$ !"$ ."5/+A$ 0%$ 4/'G"G+(+ 0 "0%#$ ('?:'00#$ #'/+-?$ ."5/+M$ 0%$ cNSYiE' #%$ G"#"/8$ %-$ ("$ 6+3,-e' =&' :%;:3-3%.-/0' =&' cNSYiE 3$ E($ =+-"($ 5%-0/%.'#$ )-$ :.#,& B$ "$ 4"/5+/$ 0%$ ("#$ #)G#5+5)+'-%#$ 7$ ("#$ =/%)%-+"#$ /%("5+&"#3$ <'.'$ ('$ >"%.'#$'-$ "#1+,-36#: B$('#$4'0/%.'#$#)."/B$7"$K)%$"($%#5"/$>"G("-0'$0%$4/'G"G+(+0"0%#B$4"/"$'G5%-%/$("$ -)/$"$!:":#'/'"!#'&0:).",/%#34&#,4!'-!+") H$ !"#$%&#'()(+,&!-+.)./( 00 &#)(-1& 22 $3 443331 55 ##** (^) "" 6&3 789:6&3%789: #;" 2"&<=&;5&6&3%33$3 #"; 2;&<=&"5&6&3%33$: Bioinformàtica UNIVERSITAT DE BARCELONA^ U B J. Rozas^15 !"#3!"# PAM0 A Ala R Arg N Asn D Asp C Cys Q Gln E Glu G Gly A 100% 0% 0% 0% 0% 0% 0% 0% R 0% 100% 0% 0% 0% 0% 0% 0% N 0%^ 0%^ 100%^ 0%^ 0%^ 0%^ 0%^ 0% D 0% 0% 0% 100% 0% 0% 0% 0% C 0% 0% 0% 0% 100% 0% 0% 0% Q 0%^ 0%^ 0%^ 0%^ 0%^ 100%^ 0%^ 0% E 0% 0% 0% 0% 0% 0% 100% 0% G 0% 0 % 0% 0 % 0 % 0% 0 % 100% Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 16 Bioinformàtica UNIVERSITAT DE BARCELONA U B !"#$%&'()'++%,-%.'/( ;<(4%<='>%<3;<(4%<='>%<* Proteins that (1 substituti >%<3-0?%'2"- @"$4-0-"-0( Alanine: 3644 (The Alanine D7EF8 7G Bioinformàtica UNIVERSITAT DE BARCELONA U B !"#$#%&#'%"(%#'')'%)(&+$#','% -"#%.)&%,/0+)&%-"#%1#/)&%#(% ,-"#..,%"(+$,$%$#%2+#/3)%$#.% !"#$% &#'4(%+5",.%&+#/3'#6%7% -"#%/".2+3.+,($)%#&)&%1,.)'#&% 2#($'#/)&%#.%#-"+1,.#(2#%,%.,% #1)."+8(%$#%.,%&#"#(+,9% !"#$%&'()+,#)-.)/0')1+",+%$'/"+%0'1!"#$%&'()+,#)-.)/0')1+",+%$'/"+%0'** Gly 8.9% Arg 4.1% Ala 8.7% Asn 4.0% LeuLeu 8.5%8.5% PhePhe 4.0%4.0% Lys 8.1% Gln 3.8% Ser 7.0% Ile 3.7% ValVal 6 5% 6 .5% HisHis 3 4% 3 .4% Thr 5.8% Cys 3.3% Pro 5.1% Tyr 3.0% GlGl u 5 0% 5 .0% M tMet 1 5% 1 .5% Asp 4.7% Trp 1.0% Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 19

A 2 R -2 6 N 0 0 2 D 0 -1 2 4 C -2 -4 -4 -5 12

BIOINFORMÀTICA

Bloque 3

"'5'# *".,%"'# /'"# ,(# @A!# 0'"# (20# ,+6+;'

3&,6-0+"&; 6# A28&*-"# .,",8'0# (20# ,+6+;'

.*,83'<# ,(# "'5'# 8$0# *".,%"'# 0,%$# (2# ,+@A <# ,(#

BCDE #9 #,!(1/2/'!(2##'("'2/,!1 :6##

B"#+,",%2(<#('0#$%&'(,0#.,"5%$"# ?-.#3#6-+; <#5'"5,#1"2#%282#0,#5C5,#,"#5'06#A28&*-"#D2># F#"-3#6@+; <#5'"5,#

1"2#%282#0,#5C5,#,"#8$0#5,#5'06#;1,5,#0,%#1"#3%'&(,82#5,#")'%82/4"<#'#&,"#%,2(#9 3'(412/,*('"!31"5 :6#

E'0#31".'0#%,(2/*'"25'0#,".%,#,(('0<#/'"#2"/,0.%'#/'8F"<#0'"# 1,%F#;'G&,6+0#; 6

B#0#H-"&3-;6#'-'I+,+H-"&3-;6#'

7'0# +%13'0# 6#0#H-"23-.#;' 0'"# +%13' 0 # D,%82"' 0 <#

.*,","#1"#2"/,0.%'#/'8F"6##

GC,0#/'"#%,3.(,0#,0#1"#+%13'# 6#0#H-"23-.#' 9 !.*,(

F+,+H-"23-.# <# "'# )'%82"# 1"# ."+?& # 2# 32%.*%# 5,(#

8*08'#2"/,0.%'#/'8F"6##

7 20# 2C,0# "'# ,0.$"# /'"05,%2520# %,3.(,0<# 21"?1,#

.,"+2"# 1"# 2"/,0.%'# /'8F"# /'"# ('0# %,3.(,0# 9 5,(

G# C,/,0# 1.(H2%,8'0# ,03,/*,0# ,I.,%"20<# ?1,#

02&,8'0#?1,#D2"#5C,%+5'#2".,%*'%8,".,<#/'8'#

#%31,#%F 6##

J'0# 0*%C,"# 32%2# F#"+,-A+, <# ># 0,32%2%# 8,K'%# (20#

(2# H"&.G+'&0'&"'3-&6F#' 0*#.,",8'0#2#('0#828=),%'06#

Statistical Inference (relies on probabilities) Phylogenetic Tree Inference Statistical Inference (relies on probabilities) Models of DNA/Protein evolution -> How to choose the model? Phylogenetic algorithm/method (NJ, ML, Bayesian, etc) Robustness of the phylogenetic tree (bootstrap, etc) Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 3 Basic Phylogenetic Concepts Leaf OTU external node OTU, Operational Taxonomic Unit terminal branch Leaf, OTU, external node A B C D E F G H I J Internal node Internal branch Root Internal branch MRCA , Most Recent^ Root Common Ancestor of A to J Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 4 Sister taxa/group : e.g., D and^ The clade of species^ F+G^ is E (are more closely related to each other than either is to a third The clade of species F+G is the sister group to the clade of species H+I+J A B C D E F G H I J taxon A B C D E F G H I J P l t Dicotomy Polytomy Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 5 Monophyletic and Paraphyletic groups Birds and crocodiles are sister taxa (^) Reptils & Birds are monophyleticReptils are paraphyletic Paraphyletic. The group contains some, but not all, of the descendants from a common ancestor common; the members do not form a natural clade Monophyletic. All taxa within the group derive from a single common ancestor Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 6 Sister taxa/group : e.g., D and^ The clade of species^ F+G^ is E (are more closely related to each other than either is to a third The clade of species F+G is the sister group to the clade of species H+I+J A B C D E F G H I J taxon A B C D E F G H I J P l t Dicotomy Polytomy Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 5 Monophyletic and Paraphyletic groups Birds and crocodiles are sister taxa (^) Reptils & Birds are monophyleticReptils are paraphyletic Paraphyletic. The group contains some, but not all, of the descendants from a common ancestor common; the members do not form a natural clade Monophyletic. All taxa within the group derive from a single common ancestor Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 6

BIOINFORMÀTICA

Bloque 3

<(% 2(% A&B6% '-.-/"0 % $% $=$% .$% "3% C% )/+)#"%

<(% 2(% A&B6% 3$.+',4.+-)" 7% $% $=$% .$% *"3% C%

D&B6$3%#6(%&"8E%6%3)(%&"8E0%F"%&"8E%$3%$% 5"*"(

$3,"/63%,&"B"="(.60%%

<% A&B6% 6-5( +'&7( (6% ,)$($% '$#G"% .$% ,)$/+67% 36*"/$(,$% (63% ."% *"% .)3,"(#)"% $(,&$% 3$#2$(#)"30% F63% /$=6&$3%

/H,6.63%36(%63% 35+"".2 0 %

I)$/+&$% G"B&A% ,86( #"6-1-$-'26( *2( 9')2+(

8+1"$26()"5(+'&7(:32(6-5(+'&7 7%,$(.&$/63%/A3%

#6/B)("#)6($3%+63)B*$30%%

J(%A&B6% 6-5(+'&7(2(;("".2( ?% =5+"".2( ,$(.&A(%

.)'$&$(,$% /"($&"% .$% $3#&)B)&3$0%

999K7L;7M;79N7<;;%3$&8"% +"".2* 0 %

I)% G"#$/63% 2(% +"".2* 7% ,"/B)H(%

Types of Trees

Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

Rooted and Unrooted trees

A

B

C

D

E

B

C

B E

D

E

F AA

F

Rooted Unrooted Root Rooted trees: Has a root that denotes common ancestry Unrooted trees: Only specifies the relationship among taxa, without reference to the direction of the evolutionary time (the common ances unsppecified )) Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

Types of Trees Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 9 Rooted and Unrooted trees A B C D B^ E C B E D E F AA F Rooted Unrooted Root Rooted trees: Has a root that denotes common ancestry Unrooted trees: Only specifies the relationship among taxa, without reference to the direction of the evolutionary time (the common ancestor is unsppecified )) Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 10 4 OTUs: 3 unrooted and 15 rooted trees RootedRooted Number of OTUs Number of unrooted trees Number of rooted trees 2 1 1 3 1 3 4 33 155 5 15 105 6 105 945 7 954 10, 8 10,395 135, 9 1 35, 135 34 , 4 59, 425 Unrooted 9 35, 35 3 , 59, 5 10 34,459,425 2.13E Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B 13

BIOINFORMÀTICA

Bloque 3

;<=>?>'@AB;C'

@AB;C )$4)5")301..)$)&(+&5+(%)+()'41("&'()/$"01'&(:)<)"'=$+)$+)>?<2)".4)'"'&() !"#$%#&'(#)#%'+,$%)-%

'!"'$!.# :) D )$4)$+)"@3$%.)*$) 9%E93-3%.-#0&9 )7.%)4'1'.)$"1%$)+()4$&5$"&'() -' 9)+()4$&5$"&'() F :)

A.%) $B$37+.2) 4') +() 4$&5$"&'() - ) 9) +() F ) 1'$"$") GHH' 0%."&/3-I#9 2) 4'") /(742) $+) ;JC ) C /"&+.0&'% *'1"'$!'% #&.$2/'$+ D)

1'$"$) EFF) 7.4'&'."$4:) G4(3.4) #H%35+(4) $) +.4) 3.$+.4) *$) $=.+5&'H") C 34567% 89:; D) 9) &(+&5+(3.4) +() *'41("&'()

I') 6(9) JF) '#$%$"&'(42) 1$"%$3.4) 7KF2J:)

L5("*.) &.%%$/'3.4) +() '41("&'() 3$'("1$) +()

#H%35+() *$) M5N$4OL("1.%2) 7."$3.4) 5"() I ) .)

5"() K 2)9)$")$41$)&(4.)4$%P()*$) HLM(( :)

Q() '*$() *$+) @AB;C' $4) +() $) '%) (/%57(".) +(4) &.4(4) $") <"$!.=$% )'% *"% *./.&.+")% -% "% !'>!#$?# :) R41$) 301..)

/$"$%()4'$37%$) N,E#"&9'.#0',+OP :))

R4)$+)@"'&.)301.*.)85$)+.)6(&$2),%-.+$4) #,-&03+I#9'&0'&"'

3-&6Q# 2)85$)1$"%,")5"()%(3'1()'"'&("*.)+()S%(PTU:)

<85P)$4)S%$=$%4'-+$U2)".)7.+(%'T(3.4)9)*$&'3.4)85$)5"()

4$&5$"&'() $41(-() ("1$4) 85$) +() .1%(:) A.%) +.) 1("1.2) 4H+.)

Q.&(+'T(%$3.4) +() '41("&'() 3,4) 7$85$V() C &#% '!"'$!.#%

/,% !'>!#$#% 2'$@+.!#/'$+' D:) R+) 301.*.) (453$) 3+9+'

Phylogenetic Reconstruction Steps

MSA (Multiple Sequence Alignment) Phylogenetic Tree Inference MSA (Multiple Sequence Alignment) Statistical Inference (relies on probabilities) Models of DNA/Protein evolution (JC69, K80, BLOSUM, etc) -> How to choose the model? Phylogenetic algorithm/method -> Distance Matrix methods (UPGMA, NJ) -> Maximum Parsimony methods (MP)

  • > Maximum> Maximum LikelihoodLikelihood methods (ML)methods (ML) Robustness of the phylogenetic tree (bootstrap, etc) Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 3

UPGMA

UPGMA, U nweighted p air- g roup m ethod with an a rithmetic mean UPGMAUPGMA : MM aii n StSteps MSA (Multiple Sequence Alignment) Estimate Genetic Distances between pairs of sequences (applying: JC69, K2P, GTR, etc) Assumes that the substitution (evolution) rate is constant across lineages p q ( pp y g , , , ) Construct Phylogenetic Tree (linear relationship between genetic distance and divergence time) Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 4

UPGMA

d ij, genetic distance between sequence i and j

d ij could be K ij (the number of substitutions per site between sequenc

corrected by Jukes and Cantor.

For example, if sequences i and j have 100 nucleotides, and there is n

has 100 positions (sites or columns).

Let me suppose that there are 20 differences (between i and j )

p = 20/100 = 0.

d ij ( = K ij) = 0.

Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

UPGMA

Root

1 2

4 5

U

UPGMA d ij, genetic distance between sequence i and j d corrected by Jukes and Cantor.ij could be K ij (the number of substitutions per site between sequence i and j ) For example, i has 100 positions (sites or columns).f sequences i and j have 100 nucleotides, and there is no gaps; the MSA Let me suppose that there are 20 differences (between i and j ) p d = 20/100 = 0. ij ( = K ij) = 0. Bioinformàtica UNIVERSITAT DE BARCELONA^ U B J. Rozas 5 UPGMA 9 Root (^1 2) 8 9 (^3) 7 8 3 4 5 6 7 1 2 4 5 3 Bioinformàtica UNIVERSITAT DE BARCELONA^ U B J. Rozas^6

BIOINFORMÀTICA

Bloque 3

&/+./1+,$&=( >$&6$( @/1/0$0( /#( :0A"#(

9$&( 5,&6$1+,$&( 5/( .1( :0A"#( BCDE)(

F&( "60"( $#@"0,6'"=( ;./( +#.&6/0,G$( #$&(

5/( 0/#$H$0( /#( >/+>"( 5/( ;./( #$( 6$&$(

/I"#.6,I$(&/$(+"1&6$16/3((

J"( @/1/0$0:( :0A"#/&( +"1( 0$?G=( %"5/'"&( 6/1/0( 0$'$&( ':&( +"06$&( "( ':&( #$0@$&=( "( A,/1( 6/1/0( +$'A,"&(

/&6"+:&6,+"&3(J"($+$A$(6"5"(/1(/#(',&'"(&,6,"3(

F&6$&( '/6"5"#"@?$&( A$&$5$&( /1( 5,&6$1+,$&( 1"( &"1( '.7( -,$A#/&=( %/0"( &"1( 0:%,5$&( 7( &,( 6/1/'"&( '.+>$&(

&/+./1+,$&(,0:1(A,/13(

0 12343'46' 0 >?70@'A@;B70357@'

J"(&"1('.7(26,#/&(/1('"5/#"&(5/(&/+./1+,$&(5/(8J)("(5/(%0"6/?1$&=(7$(;./(1"(&/(A$&$1(/1(1,1@21('"5/#"(

/I"#.6,I"3(F'%/G$'"&(,@.$#=(+"1(.1( +"/#$#/!&'$C+!"%+# =(%/0"(#./@"($%#,+$'"&("60"&($#@"0,6'"&3(

8/&%.K&(,5/16,-,+$0/'"&(#"&( D"!"&D'"/E&-$!"F&D'%&-'%-D"$&/"* =(;./(&/0:1(#"&(21,+"&(;./(.6,#,G$0/'"&(%$0$(

0/+"1&60.,0( /#( :0A"#( L !"#$% &!"#'&(% )*+,'%

-./$&)'+-0. M3(

F1( /#( $#,1/$',/16"( 5/( .1$&( +.$16$&(

&/+./1+,$&(6/150/'"&(5,&6,16"&(&,6,"&N((

O 7 /F-"/!#D (L .$%*#-1-2'&!)$" M=((

O G-",+#D' /&' "/E&-$*!"F& ( L .$%

*#-1-2'&!)$" M(7((

O B"!"&D' "/E&-$!"F&D' %&-' %-D"$&/"* (

L "-#-$"%3$.3!%1'%4'&-'.#!%)!.$"%/&!+*!.#!%

4!+!" M3(

!.$15"(>$+/'"&(.1$( ("D!/)"'H#/I!")* =(

.&:A$'"&(#"&(&,6,"&(,1I$0,$16/&(7(6$'A,K1(

#"&( 1"( ,1-"0'$6,I"&3( );.?( 1"=( &P#"(

.6,#,G$'"&(#"&( "/E&-$*!"F&D 3()&?=(6/1/'"&(

'.+>$('/1"&(,1-"0'$+,P13((

C"0(/H/'%#"=(+"@/'"&(/#(&,6,"(Q(5/(#$(,'$@/13(C"5/'"&(>$+/0( J'K-,&+#D'%&D",+#D'D"/'-LM')N/(&'O*P'Q'32RD 3(

E,0$0/'"&(+.$16"&(+$'A,"&(>$+/1(-$#6$(%$0$(H.&6,-,+$0(1./&60$(5,&60,A.+,P13((

F1( /&6/( /H/'%#"=( /#( K-,&+' S ( &/0?$( /#( '/H"0=( 7$( ;./( "01$% .!+!"-#'&7')$"% .'% )#'+-0. ( %$0$( @/1/0$0( /&6$&(

&/+./1+,$&3(F#(12'/0"('?1,'"(5/(+$'A,"&(%$0$(H.&6,-,+$0(/&$(5,&60,A.+,P1(&/0?$(R3( 8-%"01$%.$"%/-9(&')$"%!.%!1%

MP, maximum parsimony

MP: Main Steps MSA (Multiple Sequence Alignment) Identify Parsimony Informative Sites Evaluate all trees: The tree that requires the lowest number of changes is chosen *I U U U I I I, Invariant I I I U, Uninformative site , Parsimony Informative Site U U U PIS, Parsimony Informative Site. Variable sites with at least two different variants that occur at least two times (sites that favor a subset of trees over other possible trees). Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 13

MP. Site

Tree#1 Tree#2 Tree# *, Parsimony Informative Site

MP, maximum parsimony

MP: Main Steps MSA (Multiple Sequence Alignment) Identify Parsimony Informative Sites Evaluate all trees: The tree that requires the lowest number of changes is chosen *I U U U I I I, Invariant I I I U, Uninformative site , Parsimony Informative Site U U U PIS, Parsimony Informative Site. Variable sites with at least two different variants th occur at least two times (sites that favor a subset of trees over other possible trees). Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B 1

MP. Site

Tree#1 Tree#2 Tree# *, Parsimony Informative Site 1 step 2 steps 2 steps Parsimony Informative Site: favors tree#1 over the other 2 trees. 1 step 2 steps 2 steps Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B 1 NJ, finding shortest (minimum evolution) tree by finding neighbors that minimize the total length of the tree. Shortest pairs are chosen to be neighbors and then joined in distance matrix as one OTU. Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 11 Neighbor-joining tree U t d t Neighbor-joining tree Unrooted tree^ Rooting on midpoint D. (^) m im nigra vittata Rooting on midpoint eura D ffi idi j t D. adias tola mic a^ D.^ n S.^ al bovit D. crassifemur D. affinidisjuncta D. heteroneura D. adiastola D. mimica aff D nigra inid isju nct D.^ heterone a ur D D. nigra S. albovittata D. crassifemur D. mulleri S. lebanonensis D.^ a^ D.^ mulleri S. lebanonensis D. melanogaster D. pseudoobscura

S. (^) l eba n D. m p.^ D ues ood n one nsis me lan oga ste r bo^ o ucs ar 0 0.05. 05 Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas U B 12

BIOINFORMÀTICA

Bloque 3

A&+2$."#0&(7*$0(."'1+&(2$+$%&'(;3$('3)&+$#B 84 (

+%+.+!&%+%/%)!$% 4(C+"(D$E(2$+$%&'(0&'("2&'(

F1+"0$',(<"#$%&'(3+(=#G&0(%"#."+*&($0()&#.$+2"6$(

>+( $'2$( ."'&,( HI( $'2"#@"+( 63+2&'( $+( 3+( :7;)

7 7%-#%)#!* $.7,-#. 8( %1$+2#"'( ;3$( +&( '"G#@"%&'(

<..1(1!'#?) /'(3+()#&./'(J"D"03".1K($(0"('1?+1F1.".1K($'2"@'21."($0(+&'2#$("#G#$(F10&?$+L21.4(>+'(."0.30"#M(0$'(

)#&G"G1012"2'($(;3$(.$#2("#G#$(F&'(.&+'2#3N2("()"#21#($(0$'(+&'2#$'("$'4(

1000 Pseudoreplicates True Data Set: Tree (^) 3 topologies obtained from the 1000 replicates AA BB CC D A B C D A B C D A B C D Outgroup Out !"" group^ Outgroup Outgroup $"" #"" A B C D AB^ 90 % ABC 60 % Consensus Bootstrap 90 60 ABC 60 % CD 40 % Outgroup BCD 10 % Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 23 Eventually the two runs should reach convergence. An MCMC analysis is performed in three steps: first, a Markov chain is started with a tree that may be randomly chosen. Second, a new tree is proposed. Third, the new tree is accepted with some probability. Typically tens to hundreds of thousands of MCMC iterations are performed. The proportion of time that the Markov chain visits a particular tree is an approximation of the posterior probability of that tree. Some authors have cau- tioned that MCMC algorithms can give misleading results, especially when data have conflicting phylogenetic signals (Mossel and Vigoda, 2005). Fourth, summarize the samples. MrBayes provides a variety of summary statistics including a phylogram, branch lengths (in units of the number of expected substitutions per site), and clade credibility values. An example for 13 globin proteins is shown in Fig. 7.31. The summary statistics for a Bayesian analysis are provided. They include a list of all trees sorted by their probabilities; this is used to create a “credible” list of trees (Fig. 7.31a). A consensus tree showing branch lengths and support values for interior nodes is generated (Fig. 7.31b). Bayesian inference of phylogeny resembles maximum likelihood because each method seeks to identify a quantity called the likelihood which is proportional to observ- ing the data conditional on a tree. The methods differ in that Bayesian inference includes the specification of prior information and uses MCMC to estimate the posterior probability distribution. Although they were introduced relatively recently, Bayesian approaches to phylogeny are becoming increasingly commonplace.

Stage 5: Evaluating Trees

After you have constructed a phylogenetic tree, how can you assess its accuracy? The main criteria by which accuracy may be assessed are consistency, efficiency, and robustness (Hillis, 1995; Hillis and Huelsenbeck, 1992). One may study the accuracy of a tree-building approach or the accuracy of a particular tree. The most common approach is bootstrap analysis (Felsenstein, 1985; Hillis and Bull, 1993). Bootstrapping is not a technique to assess the accuracy of a tree. Instead, it describes the robustness of the tree topology. Given a particular branching order, how consistently does a tree-building algorithm find that branching order using a randomly permuted version of the original data set? Bootstrapping allows an inference of the variability in an unknown distribution from which the data were drawn (Felsenstein, 1985). Nonparametric bootstrapping is performed as follows. A multiple sequence alignment is used as the input data to generate a tree using some tree-building method. The program then makes an artificial data set of the same size as the original data set by randomly picking columns from the multiple sequence alignment. This is usually performed with replacement, meaning that any individual column may appear multiple times (or not at all). A tree is generated from the randomized data set. A large number of bootstrap replicates are then generated; typically, between 100 and 1000 new trees are made by this process. The bootstrap trees are compared to the original, inferred tree(s). The information you get from bootstrapping is the frequency with which each clade in the original tree is observed. An example of the bootstrap procedure using MEGA4 is shown in Fig. 7.20. The percentage of times that a given clade is supported in the original tree is provided based on how often the bootstraps supported the original tree topology. Bootstrap values above 70% are sometimes considered to provide support for the clade desig- nations. Hillis and Bull (1993) have estimated that such values provide statistical significance at the p , 0.05 level. This approach measures the effect of random acy refers to the degree to a tree approximates the true e will define and discuss ion and accuracy in Chapter context of microarray data is. etric bootstrapping refers to ed random sampling with- lacement from the original e. It is not used as often as rametric bootstrapping. MOLECULAR PHYLOGENY AND EVOLUTION

BIOINFORMÀTICA

Bloque 3

1 !"#$%&'()'+,-+.-/0'1&023-.+'4'&5#"%.-/0'6#"&.%"+,' ()7)'+,-+.-/0'1&0/6-.+'4'1&0/6-.+'.#68+,+9+'

!"# $%# &'($)+,"-# .)&/&"# *%01+(.# &"# &$# 2&"(0%# 3)&# 4%56"# 3)&# .&# 5&&"# ")&'%.# &.7&+&.# 8# 25)7(.#

9+$(2&":;+(.<# =&"&0(.# /+9&5&";&.# /&"(0+"%+("&.# 8# $%.+9+%+("&.#/&#$(.#75(&.(.#/&#&.7&+%+,"<# !"# &.;&# &>&07$(# '&0(.# (0(# &"# &$# %"&.;5(# (0?"# .&# 75(/)&# )"%# 9%8"-.+.-/0'120-.+ -#/&#$%#)%$#"%&56"#$%.# !"#$%"&'"()"'*'+$,"' #$' ("' -(.+/%" <# @# 7%5;+5# /&# %$$A-# *%/%# )"%# /&# $%.# *(7+%.# +56# *%01+%"/(-#2&"&5%"/(#/+.;+";(.#2&"&.<#

:;<;=' >?@ABC:C= B# $(.# /(.# 2&"&.# /+.;+";(.# 3)&# /&5+'%"# 9&' %0'

6-D6#'1&0E'&0'"+'6-D6+'&D8&.-& <# 0(.+/%"'"()"'$%'1",2%'*'-(.+/%"'

+$,"'$%'1",2%3'

:;<;=' C@FGBC:C= B# &$# 6-D6#' 1&0' &0' 9-D3-03+D' &D8&.-&D <# 0(.+/%"' "()"'

$%'1",2%'*'-(.+/%"'"()"'$%'4.((.3'

=(/(.# $(.# 0+&015(.# /&# $%# 9%0+$+%# /&# $%.# 2$(1+"%.# 75('+&"&"# &$# 0+.0(# 2&"# %"&.;5%$-# 3)&# ;&"A%# $%# 9)"+,"# /&# ;5%".7(5;&# 8# %7;%+,"# /&# (CA2&"(<# !.;&# 2&"# .&# /)7$+,-# 8# /+(# $)2%5# %# $%.# 6-#1"#H-0+D # 8# %# $%.# I&6#1"#H-0+D <# D&.7):.# /&$# 75(&.(# /&# /)7$+%+,"-# 4%8# )"%# /+'&5.+9+%+,"#9)"+("%$<# E+#4%&0(.#)"%#5&(".;5)+,"#.,$(#("# #,3/"#1#D -#$%#4+.;(5+%#&'($);+'%# /&# $%.# &.7&+&.# "(.# .%$/56# 1+&"<# !$# 75(1$&0%# &.# 6&J."+, -# 8# (2&5# 8+,K"#1#D' 8# #,3/"#1#D -# &"# ;%$# %.(# %.' 4.#1$5.&' 1$!.%&,16/1' ("' )/(.-$%7,/!" <# E+# ;5%1%>%0(.# ("# 2&"&.# 3)&# ;+&"&"# 6%.I+D' .#8-+D' &0' &"' 1&0#6+ -# $%# 75(1%1+$+/%/# /&# &"(";5%5.&# ("# (5;,$(2(.#8#7%56$(2(.#&.#0)8#&$&'%/%-#%.A#3)&#;&"&0(.#3)&#+5#("#)+/%/(<# F"#2&"#%"&.;5%$#.&#/)7$+%-#8#$(.#/(.#2&"&.#.&56"#+2)%$&.#&"#.&)&"+%#8#&"#9)"+,"<#G)&/&#.&5#3)&#%$2)"(# 0%";&"2%# $%# 9)"+,"# 8# &$# (;5(# '%8%# %01+%"/(-# (# 3)&# .&# 3)&/&"# +2)%$-# &;H# I%/%# )"(# ;&"/56# &5#"%.-/0' -09&8&09-&03& <#

()L)'*+,-+.-/0'-03,+&D8&.MN-.+'

=<>D #

J&"&5%$0&";&#"(#("(&0(.#$%# H+D&'+0.&D3,+"' /&#)"#2&"(0%-#"(50%$0&";&#$(#3)&#.%1&0(.#&.#3)%8#)"%#

+"3&,0+3-5+ - #8#3)&5&0(.#(07%5%5#/(.#.&)&"+%.#("#/&;&50+"%/(#25%/(#/&#/+9&5&"+%<# D&";5(# /&# )"%# 0+.0%# &.7&+&-# 4%8# 0)4%# 5+,-+.-/0 <# K(.# ELG.# .&# 75&.&";%56"# /&# )"%# 9(50%# )# (;5%# &"# $(.# +"/+'+/)(.# /&# )"%# 7(1$%+,"# M 4"1"' $(' 89:;' <"+1=' "(-6/$%'>6$',$%-"'6%"'?''.,1.&'6%"'@ N<#K%#0%8(5A%#/&# 7($+0(59+.0(.#"(#;+&"&"#"+"2?"#&9&;(-#.("# 0&%3,#D <# G)&/&#.&5#3)&#;&"2%0(.#+"9(50%+,"#.(15&#&$#%"&.;5(-# )%"/(# ;&"&0(.# )"# #%31,#%8 <# I(07%5%"/(# 4)0%"(.-# .+# 0+5%0(.# &$# =<>' 3)&# ;&"2%# )"# .I-68+0.2 -# &.(# "(.# 7(/5A%#.&5'+5#(0(#();25()7<# G&5(# .+# .,$(# ;&"&0(.# +"9(50%+,"# .(15&# $%# &.7&+&#

Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

Bioinformàtica UNIVERSITAT DE BARCELONAJ. Rozas

U

B

RSITAT DE BARCELONA J. Rozas

U

B

-Paralogous & Trees

e species: A, B and C 3; alfa orthologous group 6; beta orthologous group RSITAT DE BARCELONA J. Rozas

U

B

Si son monomórficas, todos tendrán la misma variable. Leu Thr Arg SNP 1 SNP 2 g SNP 1 SNP 2 Ind_1 C T C A C T A G A Ind_2 C T G A C T A A A Leu Lys sSNP (sinónimo) nsSNP (nosinónimo) Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B

BIOINFORMÀTICA

Bloque 3

@ !"#$%&'()'+,)'(&-(.#(/'+'-+01*.#23/'(,.4 A6##

5%'0-%&# >0)# 1(-9+:$# 0"# './.) *!0) 1!" <# B#

'0$1)%#'0#0&10#"%&#'+,0)0$10ř&#=20#C(B(<#

- .!(/+!'!()'()+0%!(

?0$:1+.(&<#'0$1)%#'0#"(#"+9)0) 8 (#'0"#4DEF<#=20#

5(/6!# # 0&# 2$(# 9(&0# '0# '(1%&# 0&;0.+("+G('(#

0$# .0-"!.'-!"#()'&0#-/0! <#;()(#"(# )(5#23'6+36('(&,7'3"(&,)+'89: 6#5%'0-%�$.%$1)()#0"#)0;%&+1%)+%#=20#C(#

0$.%$1)('%# 0"# +$>0&1+?('%)# ;()(# C(.0)# 2$# 1)(9(H%# 0$# ;()1+.2"()6# !$# 5%;&01# 0$.%$1)()0-%&# 2$(# .%"0..+/$# '0#

&0.20$.+(&#&+-+"()0&#=20#C($#&+'%#($("+G('(&#;()(#'010)-+$()#"(#)0"(.+/$#0>%"21+>(#'0#2$(#;%9"(.+/$ 6

8./3./ #0&#"(#9(&0#'0#'(1%&#-&#+-;%)1($10#@ (&'"3'-)+;(.,+ A#'0# './.9)!):./0(#-/(9 #@ .+3<"3,+&'-),#."1)(&'

6('89:& A#'0#"(#0&;0.+0#C2-($(6#D2($'%#&0#+$+.+/#0"#;)%B0.1%#&0#92&.(9(#)0.%;+"()#+$,%)-(.+/$#;()(#'010.1()#

345&#=20#10$?($#=20#>0)#.%$#;(1%"%?8(&<#'+&0I%#'0#,*)-(.%&<#01.J#

'+&1+$1(&# 01$+(&<# B# "%# =20# &0#

.%$# KLE# '0# '+&1($.+(# -0'+(#

!&1%#(;"+.(#(# :='."(9) B#(#%1)(&#

M(# -(B%)8(# '0# ;%&+.+%$0&# '0"#

N47#'0#"(#0&;0.+0#C2-($(<#&%$#

&%$# '("('>$+-,.9 <# C(B# ;%.(&#

=20#&0($#-2B#>()+(9"0&6#30#1)(1/#'0#0$.%$1)()#(=20""(&#=20#0)($#;%"+-/),+.(&<#;()(#'0,+$+)#(#"%඀#7#;()1+)#

'0#"%ř&#%910$+'%&<#.(".2"(-%&#"%&#C(;"%1+;%&<#B#0$1%$.0&amp;.(-%&#"%&# #.1)6<59 6

!&%&# #.16<59) &%$# 0"# 8%&9"&0%( #:&$#%( )'( 456!( ;"'( 8+<+80'<$=+&( +( ,%!( >+?,%0$?%! 6# !"# C(;"%1+;%# O# 0&1()*#

."(&+,+.('%#;%)#0&1%&#PQ#345&<#B#&+#.%$#R#345&#&%-%&#.(;(.0&#'0#.()(.10)+G()#("#C(;"%1+;%<#$%#$0.0&+1()0-%&#

C(.0)#0"#?0$%1+;%#'0#1%'%&#;()(#&(90)#&2#C(;"%1+;%6##

30# 1)(1(# '0# ;%'0)# ."(&+,+.()# (# "%&# '+&1+$%&# C(;"%1+;%&# .%$# 0"# - 8 $+-%# $S-0)%# '0# 345&<# ;()(# %;1+-+G()# "%&#

)0.2)&%&#'0#;)0'+..+/$#B#'010..+/$ 6

HapMap

HapMapHapMap. InternationalInternational HaplotypeHaplotype Map ProjectMap Project Haplotype, a specific set of alleles (e.g. SNP variants) on a chromosome. HapMap, dd ata on thh e common patterns of hf h uman DNA sequence variation ((SNPs, haplotypes, etc). Interesting to finding genes/variants affecting health, disease, the response to drugs or environmental factors. DNA samples from 270 people: Yoruba (Nigeria), Japan, China and USA (European ancestry). D tDat a: SNPSNP andd genottype andd hh apll ott ype ff requency. MM easures of SNPf SNP associi atiti on (LD(LD, linkage disequilibrium) among other. Phase I ((2002-2005 )). >1 million of SNPs (average 1 SNP every 5 kb) 300,000 tagSNPs Phase IIPhase II – III (2006III (2006 - )). More individualsMore individuals , populationspopulations , SNPsSNPs , haplotypeshaplotypes , etcetc Bioinformàtica UNIVERSITAT DE BARCELONA^ U J. Rozas B 25 SNPsSNPs,, HaplotypesHaplotypes && tagSNPstagSNPs U