SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
	
  
h#p://www.supersmart-­‐project.org	
  
Self-­‐Updating	
  Platform	
  for	
  the	
  Estimation	
  of	
  Rates	
  
of	
  Speciation,	
  Migration,	
  And	
  Relationships	
  of	
  Taxa	
  
Rutger	
  Vos,	
  Naturalis	
  Biodiversity	
  Center,	
  Leiden,	
  the	
  Netherlands	
  
@rvosa	
  
Methods	
  to	
  construct	
  large	
  species	
  trees	
  
!"#$%&'()*+),-
.+,$/0$1(+)+2$,3++-
!"#$%&%$$
4%&'53%,+
Tree inference
!"#$%&'()*+),-
Tree inference
(same study)
Supertree
methods
(e.g. MRP)
.67+3*%,3'8
!"#$%'(&%)*
4%&'53%,+
Concatenation
Tree
inference
(e.g. ML, Bayesian)
9'-,$/0$-7+4'+-$
1/3$3//,$,%8%2
4%&'53%,+
:%4;5/)+$,3++
1(+)6-$&+<+&2
=6&&$%&'()*+),
1-7+4'+-$/3$5+&/>2
!+,-.!/0.1
3+?4%&'53%,+
Identify & align
orthologous sequences;
Tree inference
Decompose
& add
markers
Infer trees
& place in
backbone
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
&& & &
&&&&
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
!
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!!
"
& & &
&&&&&&&&
!
$
&&& & &&
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
& & & &
&&&&
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
!
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!!
"
& & &
&&&&&&&&
!
$
&& & & && !!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!!
"
!
!
!
$
! ! !!
"
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
"
"
"
"
"
!!
"
!
!
!
$
! ! !!
"
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
! !
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!! !
!
#"
#
#
#
#
$
$
$
$
%
%
%
%
"
"
"
"
!!
"
#
#
#
#
#
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!
! !
!
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!!
"
#
#
#
#
#
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
%
%
%
%
%
$
$
$
$
$
"
"
"
"
"
!! !
!
#"
#
#
#
#
$
$
$
$
%
%
%
%
"
"
"
"
!!
"
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
The	
  SUPERSMART	
  algorithm	
  
!"
#"
$"
#%
&%
&"
!%
$%
&%
&'
&"
#%
#"
#'
!'
!"
!%
$%
$'
$"
!'
$"
&'
$'
!"
#%
#"
!%
&%
&"
#'
$%
!" #" $"
80 0
Time (Ma)
80 0
Time (Ma)Relative time
1 0
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
Data	
  mining	
  
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
taxon coverage
data
available
A1 A4 A7 A2
!"## $# #%$#! !" #" "" "$! $%%%
S1
S2
S6
S5
S3
S7
!"# %$% "$!" & & & & & & & & & & & & &
"!"# %% $! "#" $& & & & & & & & & & &
!" #" #" " & & & & & && & & & & & & & & &
%!" % #" " ! #" $$& & & & & & & & & &
!"## $# "# $! !"""& & & & & & & & &
A3
S1
S2
S6
S5
S3
S7
S4
A1 A 3A 2 A 4 A 5A6A 7
!"##%$
!"# %$%
"!"# %%
!"##%$#
!"##%$#
!" #"
!" #"
!"""
%!" %
$! "#" $
$! #" $$
#" "
$! $%%%
#" "
"" " #%$#!
"# $!
"$!"
%$#!" $
%$!" $
$#"
$#"
Alignments
Exemplarspecies
Minimally sparse supermatrix
(minimum of two markers per exemplar species)
$
Effect	
  of	
  data	
  mining	
  parameterization	
  
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
0 5 10 15 20
0.050.100.150.20
Averaged posterior probabilities on nodes
Minimum marker coverage per taxon
Maximumaverageduncorrectedpairwisedistance
p	
  <	
  0.8	
  
0.8	
  <	
  p	
  <	
  0.95	
  
p	
  >	
  0.95	
  
Recovering	
  a	
  simulated	
  tree	
  
Simulated
Tree
Re-estimated
Tree
data sim
0.000.10
data sim
60708090
% of invariant sites
per alignment
%
data sim
0102030
Number of indels
per alignment
count
data sim
04080120
data sim
0200400
data sim
02040
average distance
within alignment
relativeeditdistance
Average size of indels
per alignment
size(nucleotides)
Number of gaps
per sequence
gapcount
% gaps per sequence
%
!" #"
time
(myr)
80 0 0 80
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
Install	
  a	
  vagrant	
  box,	
  then:	
  $	
  smrt [COMMAND] [OPTIONS]	
  
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
Acknowledgements	
  
h#p://www.supersmart-­‐project.org	
  -­‐	
  @rvosa	
  
The	
  smar,es:	
  
• Alexandre	
  Antonelli	
  
• Rutger	
  Vos	
  
• Hannes	
  He#ling	
  
• Mike	
  Sanderson	
  
• Bengt	
  Oxelman	
  
• Karin	
  Nilsson	
  
• Mats	
  Töpel	
  
• Hervé	
  Sauquet	
  
• Henrik	
  Nilsson	
  
• Daniele	
  Silvestro	
  
• Fabien	
  Condamine	
  
• Ruud	
  Scharn	
  
Thank	
  you	
  to:	
  
• Erick	
  Matsen	
  
• Meg	
  Pirrung	
  
• You,	
  the	
  audience!	
  

Weitere ähnliche Inhalte

Mehr von Rutger Vos

Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for PhyloinformaticsRutger Vos
 
How to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeHow to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeRutger Vos
 
Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebRutger Vos
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlRutger Vos
 
Biohackathon2010 About Me
Biohackathon2010 About MeBiohackathon2010 About Me
Biohackathon2010 About MeRutger Vos
 

Mehr von Rutger Vos (20)

Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 
Tree of Life
Tree of LifeTree of Life
Tree of Life
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 
How to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genomeHow to make a monkey: functional adaptation in the primate genome
How to make a monkey: functional adaptation in the primate genome
 
Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic Web
 
Bio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perlBio::Phylo - phyloinformatic analysis using perl
Bio::Phylo - phyloinformatic analysis using perl
 
Biohackathon2010 About Me
Biohackathon2010 About MeBiohackathon2010 About Me
Biohackathon2010 About Me
 

Self-Updating Platform for the Estimation of Rates of Speciation, Migration And Relationships of Taxa: SUPERSMART

  • 1.    h#p://www.supersmart-­‐project.org   Self-­‐Updating  Platform  for  the  Estimation  of  Rates   of  Speciation,  Migration,  And  Relationships  of  Taxa   Rutger  Vos,  Naturalis  Biodiversity  Center,  Leiden,  the  Netherlands   @rvosa  
  • 2. Methods  to  construct  large  species  trees   !"#$%&'()*+),- .+,$/0$1(+)+2$,3++- !"#$%&%$$ 4%&'53%,+ Tree inference !"#$%&'()*+),- Tree inference (same study) Supertree methods (e.g. MRP) .67+3*%,3'8 !"#$%'(&%)* 4%&'53%,+ Concatenation Tree inference (e.g. ML, Bayesian) 9'-,$/0$-7+4'+-$ 1/3$3//,$,%8%2 4%&'53%,+ :%4;5/)+$,3++ 1(+)6-$&+<+&2 =6&&$%&'()*+), 1-7+4'+-$/3$5+&/>2 !+,-.!/0.1 3+?4%&'53%,+ Identify & align orthologous sequences; Tree inference Decompose & add markers Infer trees & place in backbone !! " # # # # # $ $ $ $ $ % % % % % " " " " " && & & &&&& # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! ! % % % % % $ $ $ $ $ " " " " " !! " # # # # # $ $ $ $ $ % % % % % " " " " " !! " & & & &&&&&&&& ! $ &&& & && !! " # # # # # $ $ $ $ $ % % % % % " " " " " & & & & &&&& # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! ! % % % % % $ $ $ $ $ " " " " " !! " # # # # # $ $ $ $ $ % % % % % " " " " " !! " & & & &&&&&&&& ! $ && & & && !! " # # # # # $ $ $ $ $ % % % % % " " " " " ! # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! % % % % % $ $ $ $ $ " " " " " ! " # # # # # $ $ $ $ $ % % % % % " " " " " !! " ! ! ! $ ! ! !! " !! " # # # # # $ $ $ $ $ % % % % % " " " " " ! # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! % % % % % $ $ $ $ $ " " " " " ! " # # # # # $ $ $ $ $ % % % % % " " " " " !! " ! ! ! $ ! ! !! " # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! ! ! % % % % % $ $ $ $ $ " " " " " !! " # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " !! ! ! #" # # # # $ $ $ $ % % % % " " " " !! " # # # # # # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " ! ! ! ! % % % % % $ $ $ $ $ " " " " " !! " # # # # # $ $ $ $ $ % % % % % $ $ $ $ $ % % % % % $ $ $ $ $ " " " " " !! ! ! #" # # # # $ $ $ $ % % % % " " " " !! " h#p://www.supersmart-­‐project.org  -­‐  @rvosa  
  • 3. The  SUPERSMART  algorithm   !" #" $" #% &% &" !% $% &% &' &" #% #" #' !' !" !% $% $' $" !' $" &' $' !" #% #" !% &% &" #' $% !" #" $" 80 0 Time (Ma) 80 0 Time (Ma)Relative time 1 0 h#p://www.supersmart-­‐project.org  -­‐  @rvosa  
  • 4. Data  mining   h#p://www.supersmart-­‐project.org  -­‐  @rvosa   taxon coverage data available A1 A4 A7 A2 !"## $# #%$#! !" #" "" "$! $%%% S1 S2 S6 S5 S3 S7 !"# %$% "$!" & & & & & & & & & & & & & "!"# %% $! "#" $& & & & & & & & & & & !" #" #" " & & & & & && & & & & & & & & & %!" % #" " ! #" $$& & & & & & & & & & !"## $# "# $! !"""& & & & & & & & & A3 S1 S2 S6 S5 S3 S7 S4 A1 A 3A 2 A 4 A 5A6A 7 !"##%$ !"# %$% "!"# %% !"##%$# !"##%$# !" #" !" #" !""" %!" % $! "#" $ $! #" $$ #" " $! $%%% #" " "" " #%$#! "# $! "$!" %$#!" $ %$!" $ $#" $#" Alignments Exemplarspecies Minimally sparse supermatrix (minimum of two markers per exemplar species) $
  • 5. Effect  of  data  mining  parameterization   h#p://www.supersmart-­‐project.org  -­‐  @rvosa   0 5 10 15 20 0.050.100.150.20 Averaged posterior probabilities on nodes Minimum marker coverage per taxon Maximumaverageduncorrectedpairwisedistance p  <  0.8   0.8  <  p  <  0.95   p  >  0.95  
  • 6. Recovering  a  simulated  tree   Simulated Tree Re-estimated Tree data sim 0.000.10 data sim 60708090 % of invariant sites per alignment % data sim 0102030 Number of indels per alignment count data sim 04080120 data sim 0200400 data sim 02040 average distance within alignment relativeeditdistance Average size of indels per alignment size(nucleotides) Number of gaps per sequence gapcount % gaps per sequence % !" #" time (myr) 80 0 0 80 h#p://www.supersmart-­‐project.org  -­‐  @rvosa  
  • 7. Install  a  vagrant  box,  then:  $  smrt [COMMAND] [OPTIONS]   h#p://www.supersmart-­‐project.org  -­‐  @rvosa  
  • 8. Acknowledgements   h#p://www.supersmart-­‐project.org  -­‐  @rvosa   The  smar,es:   • Alexandre  Antonelli   • Rutger  Vos   • Hannes  He#ling   • Mike  Sanderson   • Bengt  Oxelman   • Karin  Nilsson   • Mats  Töpel   • Hervé  Sauquet   • Henrik  Nilsson   • Daniele  Silvestro   • Fabien  Condamine   • Ruud  Scharn   Thank  you  to:   • Erick  Matsen   • Meg  Pirrung   • You,  the  audience!