SlideShare ist ein Scribd-Unternehmen logo
1 von 21
CONTENT-MINING IN SCIENCE
TheContentMine
Progress since “Hargreaves” legislation
Opportunities for UK, and Europe
Peter Murray-Rust, 2015-04-14
Workshop sponsored by Wellcome Trust
OUR TEAM
@jenny_molloy
Ross Mounce
@rmounce
Richard Smith-Unna
@blahah404
Stephanie Smith-Unna
@treblesteph
Jenny Molloy
Mark MacGillivray
@cottagelabs
Peter Murray-Rust
@petermurrayrust
Charles Oppenheim
@CharlesOppenh
Graham Steel
@McDawg
OUR MISSION
“make 100,000,000 facts from the STEM
literature open, accessible and reusable”
WHY?
http://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about-
ebola.html
We were stunned recently when we stumbled across an article by European
researchers in Annals of Virology [1982]: “The results seem to indicate that
Liberia has to be included in the Ebola virus endemic zone.” In the future,
the authors asserted, “medical personnel in Liberian health centers should be
aware of the possibility that they may come across active cases and thus be
prepared to avoid nosocomial epidemics,” referring to hospital-acquired
infection.
Adage in public health: “The road to inaction is paved with research papers.”
Bernice Dahn is the chief medical officer of Liberia’s Ministry of Health,
where Vera Mussah is the director of county health services. Cameron Nutt
is the Ebola response adviser to Partners in Health.
THE RIGHT TO READ IS
THE RIGHT TO MINE
The Hargreaves report (UK) ,
legalised 2014, allowing
limitations and exceptions for
non-commercial content mining
for research.
The Hague decal
THE SCALE OF THE TASK
• ~ 27,000 peer reviewed journals*
• > 5,000 publishers
• ~ 3,000 new papers per day
• “costing” 15 Billion USD to publish
• Representing 500 Billion USD of research
*Ulrich’s database:
http://ulrichsweb.serialssolutions.com/login
OUR WORKSHOPS
• Shuttleworth Foundation
• Leicester Univ
• Electronic Theses and Dissertations
• Austrian Science Fund AT
• OKFest DE
• Eur. Bioinformatics Institute (x2)
• Open Science Rio de Janeiro BR
• Sci DataCon , Delhi IN
• Univ of Chicago US
• OpenCon 2014, Wash DC. US
• JISC , London
• LIBER
• Cochrane UK
• British Library
• Wellcome Trust
• WHO
OUR COLLABORATORS
• Shuttleworth Foundation
• Wikimedia/Wikidata
• Mozilla
• Open Knowledge
• LIBER
• British Library
• Wellcome Trust
• EBI (Eur. Bioinf. Inst.)
• JISC
• BBSRC
• Cochrane UK
• Open Access Button
• SPARC
• Creative Commons
• CORE
• EuropePubmedCentral
• Cambridge University Library
STRUCTURED INFORMATION
• chemical names and structures
• species
• metabolism
• phylogenetic trees
• …
INTERACTIVE DEMO
of content mining
http://chemicaltagger.ch.cam.ac.uk/
ContentMine at Cochrane UK, 2015-03-16
CLINICAL TRIALS
How to we find (mentions of) clinical trials?
Is a document a (clinical) trial?
What is the subject of the trial?
What is the methodology used? How many/long?
Does the design and practice conform to CONSORT?
What are the outcomes?
Can we extract specific re-usable information?
Who are involved? (researchers, sponsors, patients?)
Has a proposed trial been completed and reported?
COMMUNITY PROJECTS
• Clinical Trials (with Cochrane UK)
• Phyloinformatic Literature Unlocking Tools (PLUTo/BBSRC)
• EBI – MetaboLights
• Plant Sciences and farming (Cambridge, TGAC, OpenFarm)
• Crystallography Open Database (COD)
• OpenOil / OpenCorporates
METABOLIGHTS
• European Bioinformatics Institute
• database for metabolomics experiments and
derived information
• cross-species, cross-technique, structures,
biological roles, locations, concentrations
• http://www.ebi.ac.uk/metabolights/
CONTENTMINE WORKSHOPS AND
HACKDAYS
Open Science Brazil, 2014-08
Easily distributed software
Get started in 30 mins
Build application
in a day
Start simple: bagOfWords, Stemming, Regex, templates
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.01113
03&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRY
TEXT
MATH
contentmine.org tackles these
What is “Content”?
Emily Sena (neuroscience.ed.ac.uk) spends
half a day digitising a diagram like this
ContentMine will soon be able to do it in 1 second
Note Jaggy and
broken pixels
NEW Bacteria must have a phylogenetic tree
Length
_________Weight
Binomial Name Culture/Strain GENBANK ID
Evolution
Rate
• CRAWL the web for scientific documents
(articles, grey literature, repositories)
• quickSCRAPE pages (text, graphics, images, data)
• NORMA-lize page to semantic form
…Open semantic science …
• MINE pages with your methods and tools (AMI)
• CAT-alogue results in searchable index
• Automate daily process (CANARY)
contentmine.org Infrastructure
quickscrape
Crawl
Feed
Norma
Index &
Transform
PDF
XML
URL
DOI
Scientific
literature
Repositories DOC
CSV
sHTML
Plugins
Regex
SequencesSpecies
Bespoke
Scrapers
XPathPer-Journal
Taggers
Per- Journal
MetadataChemistry
Phylogenetics Farming
AMI
BadHT
ML
OCR
Diagrams
Open NORMA-lized Scientific
Literature + Facts
CANARY pipeline
CAT-alogue index
POSSIBLE USES
• Indexing/searching the literature; G***** for science
• Current awareness; alerts and practices
• Extraction and re-use of facts; re-computation
• Multidisciplinary integration; co-occurrence
• Compliance with funder/institution policies
• Managing your Research Data!
• Finding similar and complementary colleagues
• Reproducibility, checking data and avoiding fraud
How to leverage Content
Mining for benefit of UK/EU
• Create UK showcase of successes in mining
• Graduate training by 3rd year UK graduate students.
• Develop EuropePMC as world resource for bio-mining
• Training/support for UK/EU libraries about Hargreaves.
• Central collection of born-digital UK theses
• Collect pre-copyright author manuscripts
• Integrate CM into Research Data Management tools
• Promote mining in all aspects of healthcare information
• Open collection of extracted scientific facts for the world

Weitere ähnliche Inhalte

Was ist angesagt?

Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literaturepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literaturepetermurrayrust
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Sciencepetermurrayrust
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureTheContentMine
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDatapetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literaturepetermurrayrust
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? TheContentMine
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humanspetermurrayrust
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature TheContentMine
 
OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!petermurrayrust
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Datapetermurrayrust
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome TrustTheContentMine
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSSpetermurrayrust
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKpetermurrayrust
 

Was ist angesagt? (20)

Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Open data and Open Science
Open data and Open ScienceOpen data and Open Science
Open data and Open Science
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Ebi
EbiEbi
Ebi
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
ContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UKContentMining for France and Europe; Lessons from 2 years in UK
ContentMining for France and Europe; Lessons from 2 years in UK
 

Andere mochten auch

Legal aspects of content mining
Legal aspects of content miningLegal aspects of content mining
Legal aspects of content miningGraham Steel
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature TheContentMine
 
E D U C AÇÃ O A D I S TÂ N C I A Caren Vinhas
E D U C AÇÃ O  A  D I S TÂ N C I A    Caren  VinhasE D U C AÇÃ O  A  D I S TÂ N C I A    Caren  Vinhas
E D U C AÇÃ O A D I S TÂ N C I A Caren Vinhascarenvinhas
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europepetermurrayrust
 
Overview of Practical Content Mining
Overview of Practical Content Mining Overview of Practical Content Mining
Overview of Practical Content Mining TheContentMine
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistrypetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgpetermurrayrust
 
ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)Jenny Molloy
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literaturepetermurrayrust
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literaturepetermurrayrust
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHpetermurrayrust
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Imagespetermurrayrust
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismpetermurrayrust
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)TheContentMine
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in CambridgeTheContentMine
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for factspetermurrayrust
 

Andere mochten auch (20)

Legal aspects of content mining
Legal aspects of content miningLegal aspects of content mining
Legal aspects of content mining
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
E D U C AÇÃ O A D I S TÂ N C I A Caren Vinhas
E D U C AÇÃ O  A  D I S TÂ N C I A    Caren  VinhasE D U C AÇÃ O  A  D I S TÂ N C I A    Caren  Vinhas
E D U C AÇÃ O A D I S TÂ N C I A Caren Vinhas
 
Content Mining of Science in Europe
Content Mining of Science in EuropeContent Mining of Science in Europe
Content Mining of Science in Europe
 
Overview of Practical Content Mining
Overview of Practical Content Mining Overview of Practical Content Mining
Overview of Practical Content Mining
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
 
Architecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.orgArchitecture of ContentMine Components contentmine.org
Architecture of ContentMine Components contentmine.org
 
ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)ContentMine (EMBL-EBI Industry Programme)
ContentMine (EMBL-EBI Industry Programme)
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
High throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIHHigh throughput mining of the scholarly literature; talk at NIH
High throughput mining of the scholarly literature; talk at NIH
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
Asking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolismAsking the scientific literature to tell us about metabolism
Asking the scientific literature to tell us about metabolism
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Content Mining of Science in Cambridge
Content Mining of Science in CambridgeContent Mining of Science in Cambridge
Content Mining of Science in Cambridge
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for factsMining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 

Ähnlich wie Content Mining at Wellcome Trust

Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulTheContentMine
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulpetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in NeuroscienceTheContentMine
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyonepetermurrayrust
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureTheContentMine
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literaturepetermurrayrust
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Biodiversity Heritage Library
Biodiversity Heritage LibraryBiodiversity Heritage Library
Biodiversity Heritage LibraryConnie Rinaldo
 
Open Research and Archaeology
Open Research and ArchaeologyOpen Research and Archaeology
Open Research and ArchaeologyCrossref
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search petermurrayrust
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgGigaScience, BGI Hong Kong
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcarepetermurrayrust
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migrationpetermurrayrust
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesTheContentMine
 

Ähnlich wie Content Mining at Wellcome Trust (20)

Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 
Scientific search for everyone
Scientific search for everyoneScientific search for everyone
Scientific search for everyone
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literatureAutomatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
ContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific LiteratureContentMine: Mining the Scientific Literature
ContentMine: Mining the Scientific Literature
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Biodiversity Heritage Library
Biodiversity Heritage LibraryBiodiversity Heritage Library
Biodiversity Heritage Library
 
Open Research and Archaeology
Open Research and ArchaeologyOpen Research and Archaeology
Open Research and Archaeology
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sgScott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
Scott Edmunds: Publishing in the Open Data Era, talk at Hackerspace.sg
 
Early Career Reseachers and Open Healthcare
Early Career Reseachers and Open HealthcareEarly Career Reseachers and Open Healthcare
Early Career Reseachers and Open Healthcare
 
Plosslides
PlosslidesPlosslides
Plosslides
 
PLOS slides
PLOS slidesPLOS slides
PLOS slides
 
Climate Change and Human Migration
Climate Change and Human MigrationClimate Change and Human Migration
Climate Change and Human Migration
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 

Mehr von petermurrayrust

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Agepetermurrayrust
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practicepetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestpetermurrayrust
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentationpetermurrayrust
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literaturepetermurrayrust
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusespetermurrayrust
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?petermurrayrust
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Bravepetermurrayrust
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingpetermurrayrust
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archivepetermurrayrust
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everythingpetermurrayrust
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complexpetermurrayrust
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialismpetermurrayrust
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyonepetermurrayrust
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017petermurrayrust
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?petermurrayrust
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistrypetermurrayrust
 

Mehr von petermurrayrust (20)

Omdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital AgeOmdi2021 Ontologies for (Materials) Science in the Digital Age
Omdi2021 Ontologies for (Materials) Science in the Digital Age
 
Open Science Principles and Practice
Open Science Principles and PracticeOpen Science Principles and Practice
Open Science Principles and Practice
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 
OpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFestOpenVirus at OpenPublishingFest
OpenVirus at OpenPublishingFest
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
Automatic mining of data from materials science literature
Automatic mining of data from materials science literatureAutomatic mining of data from materials science literature
Automatic mining of data from materials science literature
 
openVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on virusesopenVirus - tools for discovering literature on viruses
openVirus - tools for discovering literature on viruses
 
XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?XML for science; its huge potential; but are pubiishers preventing it?
XML for science; its huge potential; but are pubiishers preventing it?
 
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be BraveEarly Career Reseachers in Science. Start Early, Be Open , Be Brave
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
 
Openplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searchingOpenplant2018 Poster; Semantic searching
Openplant2018 Poster; Semantic searching
 
Extracting science from the archive
Extracting science from the archiveExtracting science from the archive
Extracting science from the archive
 
WikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and EverythingWikiFactMine: Ontology for Everybody and Everything
WikiFactMine: Ontology for Everybody and Everything
 
Disrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic ComplexDisrupting the Publisher-Academic Complex
Disrupting the Publisher-Academic Complex
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
Young people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge NeocolonialismYoung people in an Age of Knowledge Neocolonialism
Young people in an Age of Knowledge Neocolonialism
 
WikiFactMine: Science for Everyone
WikiFactMine: Science for EveryoneWikiFactMine: Science for Everyone
WikiFactMine: Science for Everyone
 
ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017ContentMining and Copyright at CopyCamp2017
ContentMining and Copyright at CopyCamp2017
 
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
 
WikiFactMine for Plant Chemistry
WikiFactMine for Plant ChemistryWikiFactMine for Plant Chemistry
WikiFactMine for Plant Chemistry
 

Kürzlich hochgeladen

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialMarkus Roggen
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docxkarenmillo
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaDr.Mahmoud Abbas
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 

Kürzlich hochgeladen (20)

Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s Potential
 
dll general biology week 1 - Copy.docx
dll general biology   week 1 - Copy.docxdll general biology   week 1 - Copy.docx
dll general biology week 1 - Copy.docx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer ZahanaEGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
EGYPTIAN IMPRINT IN SPAIN Lecture by Dr Abeer Zahana
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 
Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 

Content Mining at Wellcome Trust

  • 1. CONTENT-MINING IN SCIENCE TheContentMine Progress since “Hargreaves” legislation Opportunities for UK, and Europe Peter Murray-Rust, 2015-04-14 Workshop sponsored by Wellcome Trust
  • 2. OUR TEAM @jenny_molloy Ross Mounce @rmounce Richard Smith-Unna @blahah404 Stephanie Smith-Unna @treblesteph Jenny Molloy Mark MacGillivray @cottagelabs Peter Murray-Rust @petermurrayrust Charles Oppenheim @CharlesOppenh Graham Steel @McDawg
  • 3. OUR MISSION “make 100,000,000 facts from the STEM literature open, accessible and reusable”
  • 4. WHY? http://www.nytimes.com/2015/04/08/opinion/yes-we-were-warned-about- ebola.html We were stunned recently when we stumbled across an article by European researchers in Annals of Virology [1982]: “The results seem to indicate that Liberia has to be included in the Ebola virus endemic zone.” In the future, the authors asserted, “medical personnel in Liberian health centers should be aware of the possibility that they may come across active cases and thus be prepared to avoid nosocomial epidemics,” referring to hospital-acquired infection. Adage in public health: “The road to inaction is paved with research papers.” Bernice Dahn is the chief medical officer of Liberia’s Ministry of Health, where Vera Mussah is the director of county health services. Cameron Nutt is the Ebola response adviser to Partners in Health.
  • 5. THE RIGHT TO READ IS THE RIGHT TO MINE The Hargreaves report (UK) , legalised 2014, allowing limitations and exceptions for non-commercial content mining for research. The Hague decal
  • 6. THE SCALE OF THE TASK • ~ 27,000 peer reviewed journals* • > 5,000 publishers • ~ 3,000 new papers per day • “costing” 15 Billion USD to publish • Representing 500 Billion USD of research *Ulrich’s database: http://ulrichsweb.serialssolutions.com/login
  • 7. OUR WORKSHOPS • Shuttleworth Foundation • Leicester Univ • Electronic Theses and Dissertations • Austrian Science Fund AT • OKFest DE • Eur. Bioinformatics Institute (x2) • Open Science Rio de Janeiro BR • Sci DataCon , Delhi IN • Univ of Chicago US • OpenCon 2014, Wash DC. US • JISC , London • LIBER • Cochrane UK • British Library • Wellcome Trust • WHO OUR COLLABORATORS • Shuttleworth Foundation • Wikimedia/Wikidata • Mozilla • Open Knowledge • LIBER • British Library • Wellcome Trust • EBI (Eur. Bioinf. Inst.) • JISC • BBSRC • Cochrane UK • Open Access Button • SPARC • Creative Commons • CORE • EuropePubmedCentral • Cambridge University Library
  • 8. STRUCTURED INFORMATION • chemical names and structures • species • metabolism • phylogenetic trees • …
  • 9. INTERACTIVE DEMO of content mining http://chemicaltagger.ch.cam.ac.uk/
  • 10. ContentMine at Cochrane UK, 2015-03-16
  • 11. CLINICAL TRIALS How to we find (mentions of) clinical trials? Is a document a (clinical) trial? What is the subject of the trial? What is the methodology used? How many/long? Does the design and practice conform to CONSORT? What are the outcomes? Can we extract specific re-usable information? Who are involved? (researchers, sponsors, patients?) Has a proposed trial been completed and reported?
  • 12. COMMUNITY PROJECTS • Clinical Trials (with Cochrane UK) • Phyloinformatic Literature Unlocking Tools (PLUTo/BBSRC) • EBI – MetaboLights • Plant Sciences and farming (Cambridge, TGAC, OpenFarm) • Crystallography Open Database (COD) • OpenOil / OpenCorporates
  • 13. METABOLIGHTS • European Bioinformatics Institute • database for metabolomics experiments and derived information • cross-species, cross-technique, structures, biological roles, locations, concentrations • http://www.ebi.ac.uk/metabolights/
  • 14. CONTENTMINE WORKSHOPS AND HACKDAYS Open Science Brazil, 2014-08 Easily distributed software Get started in 30 mins Build application in a day Start simple: bagOfWords, Stemming, Regex, templates
  • 16. What is “Content”? Emily Sena (neuroscience.ed.ac.uk) spends half a day digitising a diagram like this ContentMine will soon be able to do it in 1 second
  • 17. Note Jaggy and broken pixels NEW Bacteria must have a phylogenetic tree Length _________Weight Binomial Name Culture/Strain GENBANK ID Evolution Rate
  • 18. • CRAWL the web for scientific documents (articles, grey literature, repositories) • quickSCRAPE pages (text, graphics, images, data) • NORMA-lize page to semantic form …Open semantic science … • MINE pages with your methods and tools (AMI) • CAT-alogue results in searchable index • Automate daily process (CANARY) contentmine.org Infrastructure
  • 19. quickscrape Crawl Feed Norma Index & Transform PDF XML URL DOI Scientific literature Repositories DOC CSV sHTML Plugins Regex SequencesSpecies Bespoke Scrapers XPathPer-Journal Taggers Per- Journal MetadataChemistry Phylogenetics Farming AMI BadHT ML OCR Diagrams Open NORMA-lized Scientific Literature + Facts CANARY pipeline CAT-alogue index
  • 20. POSSIBLE USES • Indexing/searching the literature; G***** for science • Current awareness; alerts and practices • Extraction and re-use of facts; re-computation • Multidisciplinary integration; co-occurrence • Compliance with funder/institution policies • Managing your Research Data! • Finding similar and complementary colleagues • Reproducibility, checking data and avoiding fraud
  • 21. How to leverage Content Mining for benefit of UK/EU • Create UK showcase of successes in mining • Graduate training by 3rd year UK graduate students. • Develop EuropePMC as world resource for bio-mining • Training/support for UK/EU libraries about Hargreaves. • Central collection of born-digital UK theses • Collect pre-copyright author manuscripts • Integrate CM into Research Data Management tools • Promote mining in all aspects of healthcare information • Open collection of extracted scientific facts for the world

Hinweis der Redaktion

  1. This presentation will be a quick introduction to the ContentMine software for literature scraping, normalising, and fact extraction.
  2. Because information is structured (some examples listed), we can aggregate similar objects and mine using a modular systematic approach.
  3. Because information is structured (some examples listed), we can aggregate similar objects and mine using a modular systematic approach.
  4. Because information is structured (some examples listed), we can aggregate similar objects and mine using a modular systematic approach.
  5. Can describe each collaboration, but keep this slide brief if the presentation is short.
  6. Can describe each collaboration, but keep this slide brief if the presentation is short.
  7. Can describe each collaboration, but keep this slide brief if the presentation is short.
  8. Can describe each collaboration, but keep this slide brief if the presentation is short.
  9. Can describe each collaboration, but keep this slide brief if the presentation is short.