SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Downloaden Sie, um offline zu lesen
Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc.  Computer Science (2002-2007) MSc, PhD.  Chemistry (2008-date) Postdoc
It’s all Casey’s fault! Dr. Casey Bergman, Lecturer  Faculty of Life Sciences I  s  Citeulike.org! http://ukpmc.ac.uk/
[object Object]
Defrosting the Digital Library (in one slide) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Metawhat? getMetadata getData ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Journal: PLoS Computational Biology Tell me more? What is it about? Where did it  come from?
Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of  Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
R epresenting  E vidence  F or  I nteracting  N etwork  E lements www.sbml.org  from  www.biomodels.net  database at the  EBI.ac.uk
Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
Synonyms from Pedro Mendes  B-Net Database http://www.comp-sys-bio.org/yeastnet/   Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate;  H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose;  grape sugar; Traubenzucker D-Glucose Synonyms Name
Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
For more info. www.nactem.ac.uk/refine   One of the biggest challenges is getting hold of accurate metadata from libraries and databases
But first… ,[object Object],[object Object],[object Object],[object Object]
[object Object],getMetadata getData 6 million+ “units” sold worldwide to date: america, europe, middle east, africa, australasia Lots of data, metadata and money! Owner’s handbook Tell me more? What is it about?
Final solution: Web XSLT Print
Summary: Lessons from Ford ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DATA METADATA
 
BBC Spooks? ,[object Object],[object Object],Keeping an eye on people around the world since 1939  Winston Churchill “ B ig  B ritish  C astle” (BBC)
I  hate powerpoint Radio MS Word TV
How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
Word:  Not  the best way to manage data and metadata
Getting Rid of Word database XML schema Web &  Intranet Printed documents XSLT
A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about  Thabo Mbeki Thabo Mbeki
Summary: Lessons from the BBC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How have libraries managed metadata? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Image via  http://en.wikipedia.org/wiki/Library_of_Alexandria
From  ~1824  until ~1989 Photos via dpicker  http://www.flickr.com/photos/dpicker/3107856991/  and pit yacker  http://www.flickr.com/photos/78825653@N00/131611136   JRULM (Main Library) Joule  Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
[object Object],Data Tightly bound (literally) Rarely separated First published 1687, over 300 years old
Data and metadata was like this for centuries! ,[object Object]
+ Tim Berners-Lee 1989
Timeline: Unchanged for centuries but… 20 years  ÷   2309 years  = <1%
Everything’s Gone Digital!  www.scopus.com www.pubmed.gov http://ukpmc.ac.uk   www. isiknowledge .com scholar.google.com
Digital Utopia? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Alexander Griekspoor www.mekentosj.com
Welcome to Digital Dystopia ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
Identity Crisis part 1: Which publication? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Identity crisis part 2: Who are you?  Who, who … who, who? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Neil Smalheiser and Vetle Torvik Typo Attribution would seem to be a simple process and yet it represents a  major, unsolved problem   for information science. http://tinyurl.com/authorid
Identity crisis part 3: Mistaken Identity ,[object Object],Dr. Duncan Hull Humble Postdoc Article about Authored-by Authored-by Wrong! “ DNA mania” title http://tinyurl.com/mistakenid
Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know,  Title might be  “ defrosting…” Where did this  come from?
Can’t get metadata (decoupled from data): PDF ,[object Object],Why can't I manage  academic papers like MP3s? http: //tinyurl .com/mp3vpdf   James Howison, Carnegie Mellon University Data is tightly coupled to its metadata getMetadata getData Artist: The Who Title: Who Are You? Recorded: 1978 Album: Who Are You
Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger,  and we're trying to turn it  back into a cow.   http://tinyurl.com/pdfhamburger   Cow (structured data) publishing text-mining
Can’t get metadata (decoupled from data): HTTP ,[object Object]
Can’t get metadata (decoupled from data): HTTP ,[object Object],Tim Bray, Sun Microsystems One of the Web's distinguishing features  is that there's a big gaping hole  where the metadata ought to be. http://tinyurl.com/nometadata
I’ll stop moaning now ,[object Object],[object Object],[object Object],[object Object],[object Object]
www.citeulike.org   Richard Cameron Kevin Emamy Picture from  http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy  and  http://www.citeulike.org/faq/faq.adp   The reason I wrote the site [citeulike.org] was, after recently coming back to academia,  I was slightly shocked by the quality of some of the tools available to help academics  do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
Why should you care about citeulike? ,[object Object],[object Object]
All references in one place
Click Post to Citeulike
Tag it (optional)
Citeulike: Recoupling data and metadata ,[object Object]
Citegeist = Citeulike + Zeitgeist
allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
Where will citeulike break? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why should you bother with citeulike? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Casey Bergman story I was importing papers on solexa and 454  genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689   which was a real find in terms of convincing me  that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil  http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
Many  different  solutions e.g.  Papyro:  Steve  Pettifer http://utopia.cs.manchester.ac.uk/
And the rest… www.mendeley.com   www.zotero.org   www.connotea.org   www.mekentosj.com   www.hubmed.org   Re-couple metadata that has be de-coupled from data www.2collab.com   www.refworks.com   “ iTunes for PDF files”
There is still lots  more metadata How many times  has  http://pubmed.gov/19060304  been cited? Who has cited  http://pubmed.gov/19060304   ?  Give me all the references that cite this one Give me all the references cited by  http://pubmed.gov/19060304   Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304   Impact factor?
Digital Identity would solve  some  of these problems Give yourself a URI,  you deserve it! Tim Berners-Lee  http://www.w3.org/People/Berners-Lee/card#i see  http://dig.csail.mit.edu/breadcrumbs/node/71
URI’s for Douglas Kell ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.myopenid.com   www.openid.net   (Also Note researcher-id from thomson)
[object Object],Phil Bourne
[object Object],Science is  public  knowledge http://tinyurl.com/publicknowledge
Conclusions: What hasn’t changed ,[object Object],[object Object],[object Object],[object Object]
Conclusions: Publication metadata matters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions: Scientists are too blasé about metadata! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],metadata
Conclusions: Do us a favour!
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData TheContentMine
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data, data, data
Data, data, dataData, data, data
Data, data, dataandrewxhill
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataJose Emilio Labra Gayo
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframeKai Li
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesespetermurrayrust
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)net2-project
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebMartin Kalfatovic
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Paul Bradshaw
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science dataARDC
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internetdrgath
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neurosciencepetermurrayrust
 

Was ist angesagt? (20)

Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data, data, data
Data, data, dataData, data, data
Data, data, data
 
Best Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open DataBest Practices for Multilingual Linked Open Data
Best Practices for Multilingual Linked Open Data
 
Introduction to bibframe
Introduction to bibframeIntroduction to bibframe
Introduction to bibframe
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 
Unknown Unknowns
Unknown UnknownsUnknown Unknowns
Unknown Unknowns
 
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
The Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic WebThe Digital Library from Information Superhighway to the Semiotic Web
The Digital Library from Information Superhighway to the Semiotic Web
 
Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)Data Journalism (City Online Journalism wk8)
Data Journalism (City Online Journalism wk8)
 
Transcript - Provenance and Social Science data
Transcript  - Provenance and Social Science dataTranscript  - Provenance and Social Science data
Transcript - Provenance and Social Science data
 
YQL:: Select * from Internet
YQL:: Select * from InternetYQL:: Select * from Internet
YQL:: Select * from Internet
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
ContentMining in Neuroscience
ContentMining in NeuroscienceContentMining in Neuroscience
ContentMining in Neuroscience
 

Ähnlich wie Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Dorothea Salo
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDan Brickley
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...Boris Adryan
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? Dr. Haxel Consult
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machinespetermurrayrust
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxesta2310819
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time MachineGiovanni Colavizza
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsJeremy Frey
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7Scott Edmunds
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration LectureSUNY Oneonta
 

Ähnlich wie Defrosting the Digital Library: A survey of bibliographic tools for the next generation web (20)

Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?Is this BIG DATA which I see before me?
Is this BIG DATA which I see before me?
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Describing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classificationDescribing Everything - Open Web standards and classification
Describing Everything - Open Web standards and classification
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
O'Reilly Webcast: Organizing the Internet of Things - Actionable Insight Thro...
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future? ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
ICIC 2014 What Can We Learn from Our Past, that Equips Us for the Future?
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Module 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptxModule 1 - Data Around Us .pptx
Module 1 - Data Around Us .pptx
 
Introduction to the Venice Time Machine
Introduction to the Venice Time MachineIntroduction to the Venice Time Machine
Introduction to the Venice Time Machine
 
Blogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart LabsBlogs Logs Pods: Smart Labs
Blogs Logs Pods: Smart Labs
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7HKU Data Curation MLIM7350 Class 7
HKU Data Curation MLIM7350 Class 7
 
Web3uploaded
Web3uploadedWeb3uploaded
Web3uploaded
 
Data Integration Lecture
Data Integration LectureData Integration Lecture
Data Integration Lecture
 

Mehr von Duncan Hull

Why study plants?
Why study plants?Why study plants?
Why study plants?Duncan Hull
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumDuncan Hull
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyDuncan Hull
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusDuncan Hull
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09Duncan Hull
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenIDDuncan Hull
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible ScientistDuncan Hull
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging DangerouslyDuncan Hull
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upDuncan Hull
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information managementDuncan Hull
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureDuncan Hull
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and Duncan Hull
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Duncan Hull
 

Mehr von Duncan Hull (20)

Why study plants?
Why study plants?Why study plants?
Why study plants?
 
Embedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculumEmbedding employability in the Computer Science curriculum
Embedding employability in the Computer Science curriculum
 
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the UglyWikipedia at the Royal Society: The Good, the Bad and the Ugly
Wikipedia at the Royal Society: The Good, the Bad and the Ugly
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome CampusBibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
Bibliography 2.0: A citeulike case study from the Wellcome Trust Genome Campus
 
OWL and OBO
OWL and OBOOWL and OBO
OWL and OBO
 
Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
How to Blog
How to BlogHow to Blog
How to Blog
 
OWL-XML-Summer-School-09
OWL-XML-Summer-School-09OWL-XML-Summer-School-09
OWL-XML-Summer-School-09
 
Authenticating Scientists with OpenID
Authenticating Scientists with OpenIDAuthenticating Scientists with OpenID
Authenticating Scientists with OpenID
 
The Invisible Scientist
The Invisible ScientistThe Invisible Scientist
The Invisible Scientist
 
The Year of Blogging Dangerously
The Year of Blogging DangerouslyThe Year of Blogging Dangerously
The Year of Blogging Dangerously
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Chemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-upChemical named entity recognition and literature mark-up
Chemical named entity recognition and literature mark-up
 
Chemoinformatics and information management
Chemoinformatics and information managementChemoinformatics and information management
Chemoinformatics and information management
 
Text mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literatureText mining tools for semantically enriching scientific literature
Text mining tools for semantically enriching scientific literature
 
Issues for metabolomics and
Issues for metabolomics and Issues for metabolomics and
Issues for metabolomics and
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 
Web of Science: REST or SOAP?
Web of Science: REST or SOAP?Web of Science: REST or SOAP?
Web of Science: REST or SOAP?
 

Kürzlich hochgeladen

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 

Kürzlich hochgeladen (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 

Defrosting the Digital Library: A survey of bibliographic tools for the next generation web

  • 1. Defrosting the Digital Library A survey of bibliographic tools for the next generation Web Duncan Hull Faculty of Life Sciences (1992-6) BSc. Computer Science (2002-2007) MSc, PhD. Chemistry (2008-date) Postdoc
  • 2. It’s all Casey’s fault! Dr. Casey Bergman, Lecturer Faculty of Life Sciences I s Citeulike.org! http://ukpmc.ac.uk/
  • 3.
  • 4.
  • 5.
  • 6. Metadata in: Chemistry (Science of Matter) Biology (Science of Life) Informatics (Science of Information) Cheminformatics Biochemistry Bioinformatics Science! www.mib.ac.uk nactem.ac.uk/refine www.citeulike.org
  • 7. R epresenting E vidence F or I nteracting N etwork E lements www.sbml.org from www.biomodels.net database at the EBI.ac.uk
  • 8. Example from Glycolysis in Yeast reactant reactant product product modifier This is just one reaction, there are at least another 1700+ in Yeast
  • 9. Synonyms from Pedro Mendes B-Net Database http://www.comp-sys-bio.org/yeastnet/ Robison ester, D-Glucose 6-phosphate Glucose-6-phosphate 5'-adenylphosphoric acid; Adenosine 5'-diphosphate; H3adp ADP Hexokinase-1; Hexokinase-A; Hexokinase PI; YFR053C Hexokinase Adenosine 5'-triphosphate; Adenosine triphosphate; H4atp ATP dextrose; D-Glucose; D-(+)-glucose; D(+)-glucose; grape sugar; Traubenzucker D-Glucose Synonyms Name
  • 10. Chemistry Biology Informatics Cheminformatics Biochemistry Bioinformatics
  • 11. For more info. www.nactem.ac.uk/refine One of the biggest challenges is getting hold of accurate metadata from libraries and databases
  • 12.
  • 13.
  • 14. Final solution: Web XSLT Print
  • 15.
  • 16.  
  • 17.
  • 18. I hate powerpoint Radio MS Word TV
  • 19. How do they stay in business? Broadcasting House, London Foreign governments, e.g. U.S.A. etc
  • 20. Word: Not the best way to manage data and metadata
  • 21. Getting Rid of Word database XML schema Web & Intranet Printed documents XSLT
  • 22. A solution that worked! getMetadata getData Who is Thabo Mbeki? These documents are all about Thabo Mbeki Thabo Mbeki
  • 23.
  • 24.
  • 25. From ~1824 until ~1989 Photos via dpicker http://www.flickr.com/photos/dpicker/3107856991/ and pit yacker http://www.flickr.com/photos/78825653@N00/131611136 JRULM (Main Library) Joule Library Mostly “private” only available to an elite (e.g. University of Manchester Students and Staff)
  • 26.
  • 27.
  • 29. Timeline: Unchanged for centuries but… 20 years ÷ 2309 years = <1%
  • 30. Everything’s Gone Digital! www.scopus.com www.pubmed.gov http://ukpmc.ac.uk www. isiknowledge .com scholar.google.com
  • 31.
  • 32.
  • 33. Isolated publication silos Chemistry Informatics Biology impersonal, isolated, unsociable, Generally rubbish
  • 34.
  • 35.
  • 36.
  • 37. Can’t get metadata (decoupled from data): PDF getMetadata getData Title: defrosting the digital library Authors: Duncan Hull, Steve Pettifer and Douglas Kell Published: 2008 Tell me more Don’t know, Try google Don’t know, Title might be “ defrosting…” Where did this come from?
  • 38.
  • 39. Can’t get metadata (decoupled from data): PDF Peter Murray-Rust Hamburger (unstructured data) PDF is a hamburger, and we're trying to turn it back into a cow. http://tinyurl.com/pdfhamburger Cow (structured data) publishing text-mining
  • 40.
  • 41.
  • 42.
  • 43. www.citeulike.org Richard Cameron Kevin Emamy Picture from http://network.nature.com/people/mfenner/blog/2009/01/30/interview-with-kevin-emamy and http://www.citeulike.org/faq/faq.adp The reason I wrote the site [citeulike.org] was, after recently coming back to academia, I was slightly shocked by the quality of some of the tools available to help academics do their job. I found it preferable to start writing proper tools for my own use than to use existing software.
  • 44.
  • 45. All references in one place
  • 46. Click Post to Citeulike
  • 48.
  • 49. Citegeist = Citeulike + Zeitgeist
  • 50. allegedly 2,243,177 ~2,000 /day variable 674,076 2,880 /day 2 papers / min Linear growth ~500,000
  • 51.
  • 52.
  • 53. Casey Bergman story I was importing papers on solexa and 454 genome assembly and came across the following paper: http://www. citeulike .org/user/cisevol/article/1465689 which was a real find in terms of convincing me that light shotgun sequence data is worth analysing. I nicked this from a phd student's library in Brazil http://www. citeulike . org/profile/GustavoLacerda Wouldn’t have found this any other way e.g (keyword searching or following citation trails)
  • 54. Many different solutions e.g. Papyro: Steve Pettifer http://utopia.cs.manchester.ac.uk/
  • 55. And the rest… www.mendeley.com www.zotero.org www.connotea.org www.mekentosj.com www.hubmed.org Re-couple metadata that has be de-coupled from data www.2collab.com www.refworks.com “ iTunes for PDF files”
  • 56. There is still lots more metadata How many times has http://pubmed.gov/19060304 been cited? Who has cited http://pubmed.gov/19060304 ? Give me all the references that cite this one Give me all the references cited by http://pubmed.gov/19060304 Who the hell is Doug Kell? Steve Pettifer? Duncan Hull? What is Doug Kell’s h-index? Remember: Machines ask these questions, not just humans Notify me whenever Steve Pettifer publishes a paper Notify me whenever someone cites http://pubmed.gov/19060304 Impact factor?
  • 57. Digital Identity would solve some of these problems Give yourself a URI, you deserve it! Tim Berners-Lee http://www.w3.org/People/Berners-Lee/card#i see http://dig.csail.mit.edu/breadcrumbs/node/71
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64. Conclusions: Do us a favour!
  • 65.