SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Museum Impact
Linking-up specimens with
research published on them
Ross Mounce
@rmounce
formerly at
About Me
Currently a Postdoc at
a Fellow of the
('Class of 2016')
a researcher with
plantsci.cam.ac.uk
software.ac.uk/fellows
contentmine.org
About This Talk (A little warning!)
● Don't expect to see much biology in this talk
● I'm going to talk about informatics
● I will focus more on context, background and methods,
more than 'results' per se
● There will be more questions than answers :)
Source: http://www.nhm.ac.uk/our-science/collections.html © The Trustees of the Natural History Museum, London
ïŹ New
ïŹ Open Data
ïŹ Easy-to-use
ïŹ Quick
ïŹ Images
ïŹ Audio
ïŹ Interactive Maps
ïŹ Citable
ïŹ API access
ïŹ Open Source
Infrastructure
It’s not KE Emu :)
What I want to do:
link specimen records to their mentions in the literature
“Micro-computed tomography scan slice through four bat skulls, displaying the relative position of
the three semicircular canals within the skull. Scans are from the following species: (A) Pteropus
rodricensis (BMNH.76.3.15.14); 
”
NHM Data Portal Link (Stable, Unique Identifier)
http://data.nhm.ac.uk/specimen/69e97f52-0275-4a82-9fa6-cf1c3949f408
Article DOI (Stable, Unique Identifier)
http://dx.doi.org/10.1371/journal.pone.0061998
114,000,000
scholarly papers available online
36,000,000 of which are
‘Biology’ / ‘Environmental Studies’ / ‘Geosciences’ / ‘Multidisciplinary’
Khabsa, M. and Giles, C. L. 2014. The number of scholarly documents on the public web. PLoS ONE
Sadly, the vast majority of papers are only ‘available’ online to paying subscribers and no
institution in the world has access to everything. Not even close to everything!
In 2016, libraries pay subscriptions, or individuals per article fees
to access even out of copyright works
??
http://outofcopyright.eu/rights-after-digitisation/
Some academic societies recognise the value of releasing
out-of-copyright content
This is what a PDF looks like
PDF is NOT a
good method
of exchanging
information
HTML is better, but lacks
standardisation
+ italics & bold preserved, semantic links to figures & tables - lacks standardisation
The industry standard format for
scholarly articles is JATS XML
● Journal Article Tags Archiving Suite
is an application of NISO Z39.96-2015, which defines a set of XML elements and
attributes for tagging journal articles
● Standardising the format of digital scholarly publications is HIGHLY desirable
e.g. for this project, knowing if the string 'NHM' occurrs in the Materials section, rather
than the Acknowledgements section is hugely helpful.
Much harder to do with PDF/HTML.
Section-based search already implemented in EuropePMC!
→ Section level search functionality in Europe PMC. Kafkas et al (2015) J Biomed Semantics
A plea for full text XML
A minority of journals do not provide full text XML
✓PLOS, eLife, PeerJ, Pensoft, Wiley, Elsevier, Springer,
NPG, Ubiquity Press, Copernicus, Hindawi, MPDI
✘ Geological Society of London Publications,
Magnolia Press, a long tail of smaller publishers
Making fuller use of our expensively provisioned access
Image credit: Ubiquity Press
http://ubiquitypress.tumblr.com/post/96012592921/the-right-to-read-is-the-right-to-mine
UK Copyright Law has
changed recently,
giving a specific
copyright exemption
for non-commercial
text and data mining
work
A complicated, fragmented landscape of relevant journals
Nature + Science + PNAS + Phytotaxa + Zootaxa
BioOne Journals (131)
Springer Journals (32)
Wiley Journals (22)
Taylor & Francis Journals (14)
Elsevier Journals (12)
Oxford University Press Journals (8)
SciELO Journals (7) [Open Access but not in PMC]
Ecological Society of America Journals (6)
Geological Society Journals (4)
CSIRO Journals (4)
Cambridge University Press Journals (3)
Royal Society Journals (2)
Journal-omics!
I discover 'new' journals every week
e.g. last week I 'found' Oryctos (published between 1998-2010), still behind a
paywall. Does anyone have access to this journal? Please let me know
http://www.dinosauria.org/oryctos.php
How are we meant to achieve a comprehensive
aggregation of research literature (to do rigorous science,
inclusive of all the evidence) when it is so unhelpfully
scattered and we don't even know where it all is?
https://github.com/rossmounce
/NHM-specimens
I don’t just find in-text mentions.
I’m trying to match them up to our
NHM Data Portal records too!
Specimens in RED do not appear
to be on the Data Portal ...yet
Blue globe represents a PLOS ONE paper
Searching ALL full texts is
not enough!!!
A significant number of specimens are
probably ‘hiding-out’ in supplementary
data files of all sorts of formats.
Google Scholar does not index SI
Web of Science doesn’t either
Nor does Scopus
At scale, journal-held supplementary
data files are the ‘darkest corners’ of
science
“Specimens were deposited in the collections of the California Academy of Sciences' Department of
Herpetology (CAS), the British Museum of Natural History (BMNH) and of author GJM (Table S1)”
10.1371/journal.pone.0104628 http://rossmounce.co.uk/2015/06/20/deep-indexing-supplementary-data-files/
Why write such descriptive papers in natural
language? Keep data as data!
The above was published in 2013(!)
Almost nothing in Nature & Science ‘full (short) text’
Context: 15 years worth of full text research in Nature & Science examined
Science: only 11 NHM specimens found in 39,600 full texts.
Nature: similar story. <30 specimens in 14,132 full texts.
Clearly there are more,
but it’s all buried in supplementary materials :(
Blue globe represents a PLOS ONE paper
Very few specimens occur in more than one paper
Can you guess what BMNH 37001 is?
Hint: it’s a very famous specimen! Grey represents an NHMUK specimen
Huge variation in how specimens are cited (not helpful!)
PI AZ 8459 TEXSpruce6067
BM000922891 NYRaz054
BMNH(E)609062 MSB00509
Belize_CW_All_1071 F1629082
BM-BRIT-EURO 3948 OR.5379
“BMNH” is not necessarily British Museum of Natural History (UK).
Can also be Beijing Museum of Natural History (CN) or Bell Museum of Natural History (US)
Where possible use standard/permanent identifiers
Want to discuss a particular collection? Use the official GrSciColl identifier
The Global Registry of Scientific Collections (GRSciColl)
http://grscicoll.org/
Which for the Natural History Museum, London (UK) is: NHMUK
http://biocol.org/urn:lsid:biocol.org:col:34665
Want to cite the BM Archaeopteryx specimen?
NHMUK PV OR 37001
http://data.nhm.ac.uk/object/57ee3bf1-0a74-4ae4-a588-ba9ea8dc5265
Credit: Davies KTJ, Bates PJJ, Maryanto I, Cotton JA, Rossiter SJ (2013) The
Evolution of Bat Vestibular Systems in the Face of Potential Antagonistic Selection
Pressures for Flight and Echolocation. PLoS ONE 8(4): e61998.
doi:10.1371/journal.pone.0061998
Openly-licensed data on specimens, published elsewhere, could
be re-incorporated back into the online museum catalogue. A
one-stop shop for information.
Beyond-linking:
repatriation of knowledge
This is a CT-scan of “BMNH 76.3.15.14”.
Without mining, I wouldn’t know this data exists.
Perhaps it could also be made available on the portal?
http://data.nhm.ac.uk/specimen/69e97f52-0275-
4a82-9fa6-cf1c3949f408
Does published info make it back ‘home’ to the collections?
BMNH 2013.2.13.3 on the portal as “Petrochromis nov.sp. Takahashi”
I found it (by text mining) here: http://dx.doi.org/10.1007/s10228-014-0396-9
It’s now called: Petrochromis horiin. sp. , according to the paper.
What mechanisms are there to update newer information back into the collection?
Content mining could definitely help keep collections data up-to-date!
Can we create a (better) digital NHM metadata catalogue
entirely from the literature, hundreds of years before the NHM
themselves complete their own digitisation programme?
Given funding and time, perhaps

Acknowledgements
Sincere thanks to:
Aime Rankin for help with the project
The NHM Library staff, particularly Sarah Vincent for actively supporting my content mining
Nancy Chillingsworth (IPR, NHM London)
Mark Wilkinson (Life Sciences, NHM London)
Peter Murray-Rust & the ContentMine team
Vince Smith (Life Sciences, NHM London)
Ben Scott (NHM Data Portal Lead Architect)
Rod Page (University of Glasgow)
All of the Biodiversity Informatics team
http://contentmine.org/
For a more detailed version of this talk on
YouTube see: bit.ly/nhmlink

Weitere Àhnliche Inhalte

Was ist angesagt?

The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career ResearchersRoss Mounce
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 TheContentMine
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trustpetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016petermurrayrust
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016TheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS TheContentMine
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature TheContentMine
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!petermurrayrust
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifestpetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 

Was ist angesagt? (20)

The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Open Access for Early Career Researchers
Open Access for Early Career ResearchersOpen Access for Early Career Researchers
Open Access for Early Career Researchers
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016 Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
Cochrane workshop 2016
Cochrane workshop 2016Cochrane workshop 2016
Cochrane workshop 2016
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!
 
ContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC DigifestContentMine (TDM) at JISC Digifest
ContentMine (TDM) at JISC Digifest
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 

Andere mochten auch

Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Kaitlin Thaney
 
Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Lancaster University Library
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? Nancy Pontika
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014Ross Mounce
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesAlex Holcombe
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingKent Anderson
 
History of international code of botanical nomenclature 1
History of international  code of botanical nomenclature 1History of international  code of botanical nomenclature 1
History of international code of botanical nomenclature 1nasira jaffry
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeRon Martinez
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You OnJill Cirasella
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationhierohiero
 
Engler & prantl system of classification
Engler & prantl system of classificationEngler & prantl system of classification
Engler & prantl system of classificationJayakara Bhandary
 
Botanical nomenclature
Botanical nomenclatureBotanical nomenclature
Botanical nomenclaturebisharifa
 
Classification of life taxonomy
Classification of life taxonomyClassification of life taxonomy
Classification of life taxonomytas11244
 
Taxonomy ppt
Taxonomy pptTaxonomy ppt
Taxonomy pptKarl Pointer
 

Andere mochten auch (17)

Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
Leveraging the power of the web - Rocky Mountain Advanced Computing Conference
 
Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?Open Access Publishing, Threat or Opportunity?
Open Access Publishing, Threat or Opportunity?
 
How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why? How can repositories support the text-mining of their content and why?
How can repositories support the text-mining of their content and why?
 
The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014The PLUTo project @iEvoBio 2014
The PLUTo project @iEvoBio 2014
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Subscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundariesSubscription costs versus open access costs, & Dissolving journals' boundaries
Subscription costs versus open access costs, & Dissolving journals' boundaries
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
History of international code of botanical nomenclature 1
History of international  code of botanical nomenclature 1History of international  code of botanical nomenclature 1
History of international code of botanical nomenclature 1
 
Research publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challengeResearch publication support for scholars in Brazil: Rising to the challenge
Research publication support for scholars in Brazil: Rising to the challenge
 
Open Access: Which Side Are You On
Open Access: Which Side Are You OnOpen Access: Which Side Are You On
Open Access: Which Side Are You On
 
Icbn
IcbnIcbn
Icbn
 
Fifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly informationFifty shades of green and gold: open access to scholarly information
Fifty shades of green and gold: open access to scholarly information
 
Engler & prantl system of classification
Engler & prantl system of classificationEngler & prantl system of classification
Engler & prantl system of classification
 
Botanical nomenclature
Botanical nomenclatureBotanical nomenclature
Botanical nomenclature
 
Classification of life taxonomy
Classification of life taxonomyClassification of life taxonomy
Classification of life taxonomy
 
Plant taxonomy
Plant taxonomyPlant taxonomy
Plant taxonomy
 
Taxonomy ppt
Taxonomy pptTaxonomy ppt
Taxonomy ppt
 

Ähnlich wie Specimen-level mining: bringing knowledge back 'home' to the Natural History Museum, London

247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)Stuart Chalk
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0Jean-Claude Bradley
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage LibraryMartin Kalfatovic
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Materials informatics
Materials informaticsMaterials informatics
Materials informaticsSergey Sozykin
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Martin Kalfatovic
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshotsdatacite
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...NASIG
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkJean-Claude Bradley
 
NITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkNITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkJean-Claude Bradley
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012KUPKB_Team
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Proyecto CeVALE2
 

Ähnlich wie Specimen-level mining: bringing knowledge back 'home' to the Natural History Museum, London (20)

247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)247th ACS Meeting: Experiment Markup Language (ExptML)
247th ACS Meeting: Experiment Markup Language (ExptML)
 
Peer Review and Science2.0
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Materials informatics
Materials informaticsMaterials informatics
Materials informatics
 
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
 
What is DataCite-screenshots
What is DataCite-screenshotsWhat is DataCite-screenshots
What is DataCite-screenshots
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...
From Record-Bound to Boundless: FRBR, Linked Data and New Possibilities for S...
 
BHL Tech Report
BHL Tech ReportBHL Tech Report
BHL Tech Report
 
Science Commons Open Notebook Science Talk
Science Commons Open Notebook Science TalkScience Commons Open Notebook Science Talk
Science Commons Open Notebook Science Talk
 
NITLE Open Notebook Science Talk
NITLE Open Notebook Science TalkNITLE Open Notebook Science Talk
NITLE Open Notebook Science Talk
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.Citing and reading behaviours in high energy physics.
Citing and reading behaviours in high energy physics.
 

Mehr von Ross Mounce

Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Ross Mounce
 
Social Media For Researchers
Social Media For ResearchersSocial Media For Researchers
Social Media For ResearchersRoss Mounce
 
Social Media for Science
Social Media for ScienceSocial Media for Science
Social Media for ScienceRoss Mounce
 
Herding Cats
Herding CatsHerding Cats
Herding CatsRoss Mounce
 
Content Mining
Content MiningContent Mining
Content MiningRoss Mounce
 
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Ross Mounce
 
ProgPal2011
ProgPal2011ProgPal2011
ProgPal2011Ross Mounce
 

Mehr von Ross Mounce (7)

Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
Liberating OA figures from PDF to Flickr (A Pro-iBiosphere talk)
 
Social Media For Researchers
Social Media For ResearchersSocial Media For Researchers
Social Media For Researchers
 
Social Media for Science
Social Media for ScienceSocial Media for Science
Social Media for Science
 
Herding Cats
Herding CatsHerding Cats
Herding Cats
 
Content Mining
Content MiningContent Mining
Content Mining
 
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
Phylogenetic Congruence between Cranial and Postcranial Characters in Archosa...
 
ProgPal2011
ProgPal2011ProgPal2011
ProgPal2011
 

KĂŒrzlich hochgeladen

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 

KĂŒrzlich hochgeladen (20)

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPSÂź Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 

Specimen-level mining: bringing knowledge back 'home' to the Natural History Museum, London

  • 1. Museum Impact Linking-up specimens with research published on them Ross Mounce @rmounce formerly at
  • 2. About Me Currently a Postdoc at a Fellow of the ('Class of 2016') a researcher with plantsci.cam.ac.uk software.ac.uk/fellows contentmine.org
  • 3. About This Talk (A little warning!) ● Don't expect to see much biology in this talk ● I'm going to talk about informatics ● I will focus more on context, background and methods, more than 'results' per se ● There will be more questions than answers :)
  • 4. Source: http://www.nhm.ac.uk/our-science/collections.html © The Trustees of the Natural History Museum, London
  • 5. ïŹ New ïŹ Open Data ïŹ Easy-to-use ïŹ Quick ïŹ Images ïŹ Audio ïŹ Interactive Maps ïŹ Citable ïŹ API access ïŹ Open Source Infrastructure It’s not KE Emu :)
  • 6. What I want to do: link specimen records to their mentions in the literature “Micro-computed tomography scan slice through four bat skulls, displaying the relative position of the three semicircular canals within the skull. Scans are from the following species: (A) Pteropus rodricensis (BMNH.76.3.15.14); 
” NHM Data Portal Link (Stable, Unique Identifier) http://data.nhm.ac.uk/specimen/69e97f52-0275-4a82-9fa6-cf1c3949f408 Article DOI (Stable, Unique Identifier) http://dx.doi.org/10.1371/journal.pone.0061998
  • 7. 114,000,000 scholarly papers available online 36,000,000 of which are ‘Biology’ / ‘Environmental Studies’ / ‘Geosciences’ / ‘Multidisciplinary’ Khabsa, M. and Giles, C. L. 2014. The number of scholarly documents on the public web. PLoS ONE
  • 8. Sadly, the vast majority of papers are only ‘available’ online to paying subscribers and no institution in the world has access to everything. Not even close to everything!
  • 9. In 2016, libraries pay subscriptions, or individuals per article fees to access even out of copyright works ?? http://outofcopyright.eu/rights-after-digitisation/
  • 10. Some academic societies recognise the value of releasing out-of-copyright content
  • 11. This is what a PDF looks like PDF is NOT a good method of exchanging information
  • 12. HTML is better, but lacks standardisation + italics & bold preserved, semantic links to figures & tables - lacks standardisation
  • 13. The industry standard format for scholarly articles is JATS XML ● Journal Article Tags Archiving Suite is an application of NISO Z39.96-2015, which defines a set of XML elements and attributes for tagging journal articles ● Standardising the format of digital scholarly publications is HIGHLY desirable e.g. for this project, knowing if the string 'NHM' occurrs in the Materials section, rather than the Acknowledgements section is hugely helpful. Much harder to do with PDF/HTML. Section-based search already implemented in EuropePMC! → Section level search functionality in Europe PMC. Kafkas et al (2015) J Biomed Semantics
  • 14. A plea for full text XML A minority of journals do not provide full text XML ✓PLOS, eLife, PeerJ, Pensoft, Wiley, Elsevier, Springer, NPG, Ubiquity Press, Copernicus, Hindawi, MPDI ✘ Geological Society of London Publications, Magnolia Press, a long tail of smaller publishers
  • 15. Making fuller use of our expensively provisioned access
  • 16. Image credit: Ubiquity Press http://ubiquitypress.tumblr.com/post/96012592921/the-right-to-read-is-the-right-to-mine UK Copyright Law has changed recently, giving a specific copyright exemption for non-commercial text and data mining work
  • 17. A complicated, fragmented landscape of relevant journals Nature + Science + PNAS + Phytotaxa + Zootaxa BioOne Journals (131) Springer Journals (32) Wiley Journals (22) Taylor & Francis Journals (14) Elsevier Journals (12) Oxford University Press Journals (8) SciELO Journals (7) [Open Access but not in PMC] Ecological Society of America Journals (6) Geological Society Journals (4) CSIRO Journals (4) Cambridge University Press Journals (3) Royal Society Journals (2) Journal-omics!
  • 18. I discover 'new' journals every week e.g. last week I 'found' Oryctos (published between 1998-2010), still behind a paywall. Does anyone have access to this journal? Please let me know http://www.dinosauria.org/oryctos.php How are we meant to achieve a comprehensive aggregation of research literature (to do rigorous science, inclusive of all the evidence) when it is so unhelpfully scattered and we don't even know where it all is?
  • 20. I don’t just find in-text mentions. I’m trying to match them up to our NHM Data Portal records too! Specimens in RED do not appear to be on the Data Portal ...yet Blue globe represents a PLOS ONE paper
  • 21. Searching ALL full texts is not enough!!! A significant number of specimens are probably ‘hiding-out’ in supplementary data files of all sorts of formats. Google Scholar does not index SI Web of Science doesn’t either Nor does Scopus At scale, journal-held supplementary data files are the ‘darkest corners’ of science “Specimens were deposited in the collections of the California Academy of Sciences' Department of Herpetology (CAS), the British Museum of Natural History (BMNH) and of author GJM (Table S1)” 10.1371/journal.pone.0104628 http://rossmounce.co.uk/2015/06/20/deep-indexing-supplementary-data-files/
  • 22. Why write such descriptive papers in natural language? Keep data as data! The above was published in 2013(!)
  • 23. Almost nothing in Nature & Science ‘full (short) text’ Context: 15 years worth of full text research in Nature & Science examined Science: only 11 NHM specimens found in 39,600 full texts. Nature: similar story. <30 specimens in 14,132 full texts. Clearly there are more, but it’s all buried in supplementary materials :(
  • 24. Blue globe represents a PLOS ONE paper Very few specimens occur in more than one paper Can you guess what BMNH 37001 is? Hint: it’s a very famous specimen! Grey represents an NHMUK specimen
  • 25. Huge variation in how specimens are cited (not helpful!) PI AZ 8459 TEXSpruce6067 BM000922891 NYRaz054 BMNH(E)609062 MSB00509 Belize_CW_All_1071 F1629082 BM-BRIT-EURO 3948 OR.5379 “BMNH” is not necessarily British Museum of Natural History (UK). Can also be Beijing Museum of Natural History (CN) or Bell Museum of Natural History (US)
  • 26. Where possible use standard/permanent identifiers Want to discuss a particular collection? Use the official GrSciColl identifier The Global Registry of Scientific Collections (GRSciColl) http://grscicoll.org/ Which for the Natural History Museum, London (UK) is: NHMUK http://biocol.org/urn:lsid:biocol.org:col:34665 Want to cite the BM Archaeopteryx specimen? NHMUK PV OR 37001 http://data.nhm.ac.uk/object/57ee3bf1-0a74-4ae4-a588-ba9ea8dc5265
  • 27. Credit: Davies KTJ, Bates PJJ, Maryanto I, Cotton JA, Rossiter SJ (2013) The Evolution of Bat Vestibular Systems in the Face of Potential Antagonistic Selection Pressures for Flight and Echolocation. PLoS ONE 8(4): e61998. doi:10.1371/journal.pone.0061998 Openly-licensed data on specimens, published elsewhere, could be re-incorporated back into the online museum catalogue. A one-stop shop for information. Beyond-linking: repatriation of knowledge This is a CT-scan of “BMNH 76.3.15.14”. Without mining, I wouldn’t know this data exists. Perhaps it could also be made available on the portal? http://data.nhm.ac.uk/specimen/69e97f52-0275- 4a82-9fa6-cf1c3949f408
  • 28. Does published info make it back ‘home’ to the collections? BMNH 2013.2.13.3 on the portal as “Petrochromis nov.sp. Takahashi” I found it (by text mining) here: http://dx.doi.org/10.1007/s10228-014-0396-9 It’s now called: Petrochromis horiin. sp. , according to the paper. What mechanisms are there to update newer information back into the collection? Content mining could definitely help keep collections data up-to-date!
  • 29. Can we create a (better) digital NHM metadata catalogue entirely from the literature, hundreds of years before the NHM themselves complete their own digitisation programme? Given funding and time, perhaps

  • 30. Acknowledgements Sincere thanks to: Aime Rankin for help with the project The NHM Library staff, particularly Sarah Vincent for actively supporting my content mining Nancy Chillingsworth (IPR, NHM London) Mark Wilkinson (Life Sciences, NHM London) Peter Murray-Rust & the ContentMine team Vince Smith (Life Sciences, NHM London) Ben Scott (NHM Data Portal Lead Architect) Rod Page (University of Glasgow) All of the Biodiversity Informatics team http://contentmine.org/ For a more detailed version of this talk on YouTube see: bit.ly/nhmlink