SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
Natural history research as a
replicable data science
Rutger Vos
Natural history museums
and collections
• Main goal is not to
exhibit but to collect and
curate specimens
• Usually multiple
specimens per species,
sometimes many more
• Specimens are research
and reference materials
Natural history research
To understand the patterns and processes of biodiversity
Biodiversity is expressed and studied in multiple ways:
- Species diversity, e.g.
counts of species, maybe
taking abundances into
account
- Phylogenetic diversity,
i.e. the evolutionary
distances between
species
- Functional diversity, i.e.
the ecological roles
species play, and the
characteristics associated
with that role
Natural history research
To understand the patterns and processes of biodiversity
The patterns and processes of biodiversity are systematized
as taking place:
- Within a given system (⍺
diversity), e.g. a biome
- Across systems (β
diversity, turnover)
- Among systems (ɣ
diversity, totality)
Natural history data
High dimensionality:
- Sequential
- Geospatial
- Morphological
DNA barcoding
• Some genes are variable so that a
few hundred letters suffice to identify
species
• In addition, barcodes are useful for
studying evolution and phylogeny
• Taking the barcode of a specimen (by
Sanger seq) is part of the workflow of
indexing collection specimens
Barcoding example: species boundaries in beetles
Pentinsaari, Vos & Mutanen. 2016. Algorithmic single-locus species
delimitation: effects of sampling effort, variation and nonmonophyly in four
methods and 1870 species of beetles. Molecular Ecology Resources 17(3):
393-404
Metabarcoding
• The species contents of organic mixtures can also be identified using
identifiable marker genes
• This is typically done using multiplexed, high-throughput (“next
generation”) sequencing
• Consequently, data storage and processing requirements are higher
Metabarcoding
examples: gut contents
of Ice Age grazers
Species distribution modelling
• Collection specimens are (ideally) stored with their collection locality
recorded as lat/lon coordinates
• Based on the localities where specimens were found, and geospatial
data layers (climate, land use, soil, etc.) a correlative model of the
species affinities can be constructed
• With such a model, habitat suitability and predictive scenarios (e.g.
climate change) can be projected
Biogeographic example: vulnerability of
European butterflies
Shapes, traits, and phenotypes
Natural history data
• Highest data volumes are HTS,
3D scanning, images
• High dimensionality at multiple
scales
• Many biases in species/locality
sampling
• Many axes are messy:
- Species names have been
changing for centuries
- Likewise place names
- Trait descriptions are often
ambiguous 3d.naturalis.nl
The Reproducibility Crisis
• More than 70%
of researchers
(n=1576) have
tried and failed to
reproduce
another
scientist's
experiments
• More than half
have failed to
reproduce their
own experiments
Reproducible data science
and cultural change
1. “Data available from the author upon request”
No: data are open, as FAIR as possible
2. “Data were processed with custom scripts”
No: scripts/workflows are open source
3. “Data were analyzed on a Pentium III 450 MHz…”
No: the environment can be cloned as VM
1. FAIR data management
Findable: increasing attention to
metadata, and discoverability
and indexing of data
Accessible: implementation of
resolvable identifiers, e.g.
PURLs and DOIs
Interoperable: increasing
attention for open community
standards (syntax) and
semantics
Re-usable: increasing attention
for data ownership and
licensing
2. Open source
Analytical code is no longer a folder on a postdoc’s laptop, it’s a code
repository with specific versions, documentation, tests, and a license
3. Virtualization
- Analyses are not run on dedicated hardware, e.g.
workstations, clusters, but in the (private) cloud
- Complex workflows are distributed as virtual machines,
docker containers, or deployed with devops tools
In closing
Thank you for
your attention

Weitere ähnliche Inhalte

Was ist angesagt?

Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Species concept
Species conceptSpecies concept
Species conceptBio-Geek
 
species Concept
species Conceptspecies Concept
species Conceptkaakaawaah
 
Open Access Bhl Ia
Open Access Bhl IaOpen Access Bhl Ia
Open Access Bhl Iatgarnett
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Animal Systematics Lecture 3
Animal Systematics Lecture 3Animal Systematics Lecture 3
Animal Systematics Lecture 3Hamid Ur-Rahman
 
Species concept
Species conceptSpecies concept
Species conceptNoor Zada
 
Species concept
Species conceptSpecies concept
Species conceptNoor Zada
 
Higher taxa and higher category
Higher taxa and higher categoryHigher taxa and higher category
Higher taxa and higher categoryNoor Zada
 
Evolução e Raças Humanas
Evolução e Raças HumanasEvolução e Raças Humanas
Evolução e Raças HumanasFlávia Smarti
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And SpeciationMark McGinley
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...ICZN
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceICZN
 

Was ist angesagt? (20)

New Systematics
New SystematicsNew Systematics
New Systematics
 
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekingeBioinformatics t6-phylogenetics v2013-wim_vancriekinge
Bioinformatics t6-phylogenetics v2013-wim_vancriekinge
 
Species concept
Species conceptSpecies concept
Species concept
 
species Concept
species Conceptspecies Concept
species Concept
 
Open Access Bhl Ia
Open Access Bhl IaOpen Access Bhl Ia
Open Access Bhl Ia
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Animal Systematics Lecture 3
Animal Systematics Lecture 3Animal Systematics Lecture 3
Animal Systematics Lecture 3
 
Species concept
Species conceptSpecies concept
Species concept
 
The Good Species
The Good SpeciesThe Good Species
The Good Species
 
Evolution
EvolutionEvolution
Evolution
 
Species concept
Species conceptSpecies concept
Species concept
 
Higher taxa and higher category
Higher taxa and higher categoryHigher taxa and higher category
Higher taxa and higher category
 
Evolução e Raças Humanas
Evolução e Raças HumanasEvolução e Raças Humanas
Evolução e Raças Humanas
 
Species Concepts And Speciation
Species Concepts And SpeciationSpecies Concepts And Speciation
Species Concepts And Speciation
 
Species problem
Species problemSpecies problem
Species problem
 
Bi 2005 20
Bi 2005 20Bi 2005 20
Bi 2005 20
 
Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...Nomenclature for the Future: The power and challenges for stable and sensible...
Nomenclature for the Future: The power and challenges for stable and sensible...
 
Species
SpeciesSpecies
Species
 
Phylogenetics
PhylogeneticsPhylogenetics
Phylogenetics
 
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic RenaissanceQuentin D. Wheeler - ZooBank and the Taxonomic Renaissance
Quentin D. Wheeler - ZooBank and the Taxonomic Renaissance
 

Ähnlich wie Natural history research as a replicable data science

How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...redrinkwater
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Sarah Anna Stewart
 
Lab report instrukcije
Lab report instrukcijeLab report instrukcije
Lab report instrukcijeEna Lalic
 
Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Graeme Lloyd
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryWilliam Ulate
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsJoe Parker
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life Cyndy Parr
 
Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...Carina van Rooyen
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...Gemma Derrick
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introductionVince Smith
 
Thesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptxThesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptxMaribeth Manuel
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyAnne Thessen
 
Research Skills for Egyptology
Research Skills for EgyptologyResearch Skills for Egyptology
Research Skills for EgyptologyMelanie Pitkin
 
Research skills for Egyptology
Research skills for Egyptology Research skills for Egyptology
Research skills for Egyptology Melanie Pitkin
 
Ethics, Ethnography, Archeology
Ethics, Ethnography, ArcheologyEthics, Ethnography, Archeology
Ethics, Ethnography, Archeologyanimation0118
 
Genomics and proteomics ppt
Genomics and proteomics pptGenomics and proteomics ppt
Genomics and proteomics pptPatelSupriya
 

Ähnlich wie Natural history research as a replicable data science (20)

How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...How can we release biodiversity data from herbarium specimens for climate cha...
How can we release biodiversity data from herbarium specimens for climate cha...
 
Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...Webs of Life and Data: Impacts of open and networked data on scientific pract...
Webs of Life and Data: Impacts of open and networked data on scientific pract...
 
Lab report instrukcije
Lab report instrukcijeLab report instrukcije
Lab report instrukcije
 
Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?Is there a bias in deep sea diversity patterns?
Is there a bias in deep sea diversity patterns?
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
Interpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasetsInterpreting ‘tree space’ in the context of very large empirical datasets
Interpreting ‘tree space’ in the context of very large empirical datasets
 
importance of biology and biological observations.pptx
importance of biology and biological observations.pptximportance of biology and biological observations.pptx
importance of biology and biological observations.pptx
 
Research Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data ServicesResearch Data Lifecycle: Role of Data Services
Research Data Lifecycle: Role of Data Services
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life
 
Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...Fifty shades of evidence: A transdisciplinary research project on changing cl...
Fifty shades of evidence: A transdisciplinary research project on changing cl...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...Multiplying method: Ethnography and the reconceptualization of evaluation stu...
Multiplying method: Ethnography and the reconceptualization of evaluation stu...
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Thesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptxThesis Proposal Presentations Sample.pdf.pptx
Thesis Proposal Presentations Sample.pdf.pptx
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
 
Research Skills for Egyptology
Research Skills for EgyptologyResearch Skills for Egyptology
Research Skills for Egyptology
 
Research skills for Egyptology
Research skills for Egyptology Research skills for Egyptology
Research skills for Egyptology
 
Ethics, Ethnography, Archeology
Ethics, Ethnography, ArcheologyEthics, Ethnography, Archeology
Ethics, Ethnography, Archeology
 
Genomics and proteomics ppt
Genomics and proteomics pptGenomics and proteomics ppt
Genomics and proteomics ppt
 

Mehr von Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for PhyloinformaticsRutger Vos
 

Mehr von Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 
Tree of Life
Tree of LifeTree of Life
Tree of Life
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 

Kürzlich hochgeladen

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDivyaK787011
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书zdzoqco
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsCharlene Llagas
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 

Kürzlich hochgeladen (20)

GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdfDECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
DECOMPOSITION PATHWAYS of TM-alkyl complexes.pdf
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
办理麦克马斯特大学毕业证成绩单|购买加拿大文凭证书
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
Quarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and FunctionsQuarter 4_Grade 8_Digestive System Structure and Functions
Quarter 4_Grade 8_Digestive System Structure and Functions
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 

Natural history research as a replicable data science

  • 1. Natural history research as a replicable data science Rutger Vos
  • 2. Natural history museums and collections • Main goal is not to exhibit but to collect and curate specimens • Usually multiple specimens per species, sometimes many more • Specimens are research and reference materials
  • 3. Natural history research To understand the patterns and processes of biodiversity Biodiversity is expressed and studied in multiple ways: - Species diversity, e.g. counts of species, maybe taking abundances into account - Phylogenetic diversity, i.e. the evolutionary distances between species - Functional diversity, i.e. the ecological roles species play, and the characteristics associated with that role
  • 4. Natural history research To understand the patterns and processes of biodiversity The patterns and processes of biodiversity are systematized as taking place: - Within a given system (⍺ diversity), e.g. a biome - Across systems (β diversity, turnover) - Among systems (ɣ diversity, totality)
  • 5. Natural history data High dimensionality: - Sequential - Geospatial - Morphological
  • 6. DNA barcoding • Some genes are variable so that a few hundred letters suffice to identify species • In addition, barcodes are useful for studying evolution and phylogeny • Taking the barcode of a specimen (by Sanger seq) is part of the workflow of indexing collection specimens
  • 7. Barcoding example: species boundaries in beetles Pentinsaari, Vos & Mutanen. 2016. Algorithmic single-locus species delimitation: effects of sampling effort, variation and nonmonophyly in four methods and 1870 species of beetles. Molecular Ecology Resources 17(3): 393-404
  • 8. Metabarcoding • The species contents of organic mixtures can also be identified using identifiable marker genes • This is typically done using multiplexed, high-throughput (“next generation”) sequencing • Consequently, data storage and processing requirements are higher
  • 10. Species distribution modelling • Collection specimens are (ideally) stored with their collection locality recorded as lat/lon coordinates • Based on the localities where specimens were found, and geospatial data layers (climate, land use, soil, etc.) a correlative model of the species affinities can be constructed • With such a model, habitat suitability and predictive scenarios (e.g. climate change) can be projected
  • 11. Biogeographic example: vulnerability of European butterflies
  • 12. Shapes, traits, and phenotypes
  • 13. Natural history data • Highest data volumes are HTS, 3D scanning, images • High dimensionality at multiple scales • Many biases in species/locality sampling • Many axes are messy: - Species names have been changing for centuries - Likewise place names - Trait descriptions are often ambiguous 3d.naturalis.nl
  • 14. The Reproducibility Crisis • More than 70% of researchers (n=1576) have tried and failed to reproduce another scientist's experiments • More than half have failed to reproduce their own experiments
  • 15. Reproducible data science and cultural change 1. “Data available from the author upon request” No: data are open, as FAIR as possible 2. “Data were processed with custom scripts” No: scripts/workflows are open source 3. “Data were analyzed on a Pentium III 450 MHz…” No: the environment can be cloned as VM
  • 16. 1. FAIR data management Findable: increasing attention to metadata, and discoverability and indexing of data Accessible: implementation of resolvable identifiers, e.g. PURLs and DOIs Interoperable: increasing attention for open community standards (syntax) and semantics Re-usable: increasing attention for data ownership and licensing
  • 17. 2. Open source Analytical code is no longer a folder on a postdoc’s laptop, it’s a code repository with specific versions, documentation, tests, and a license
  • 18. 3. Virtualization - Analyses are not run on dedicated hardware, e.g. workstations, clusters, but in the (private) cloud - Complex workflows are distributed as virtual machines, docker containers, or deployed with devops tools
  • 20. Thank you for your attention