Data for Science: How Elsevier is using data science to empower researchers

•Als PPTX, PDF herunterladen•

2 gefällt mir•1,095 views

Each month 12 million people use Elsevier’s ScienceDirect platform. The Mendeley social network has 4.6 million registered users. 3500 institutions make use of ClinicalKey to bring the latest in medical research to doctors and nurses. How can we help these users be more effective? In this talk, I give an overview of how Elsevier is employing data science to improve its services from recommendation systems, to natural language processing and analytics. While data science is changing how Elsevier serves researchers, it’s also changing research practice itself. In that context, I discuss the impact that large amounts of open research data are having and the challenges researchers face in making use of it, in particular, in terms of data integration and reuse. We are at just beginning to see of how technology and data is changing science correspondingly this impacts how best to empower those who practice it.

Technologie

DATA FOR SCIENCE
HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS
Paul Groth | @pgroth | pgroth.com
Disruptive Technology Director
Elsevier Labs | @elsevierlabs
European Data Forum 2016

40 million reactions
75 million compounds
500 million facts

3 EXAMPLES
• Personalized: what should I read?
• Actionable: who should I collaborate with?
• Consumable: how do I make my data available?

RECOMMENDATIONS AT MENDELEY
• Maya Hristakeva
• Data Scientist at Mendeley
• @mayahhf
• Spark Summit 2015
• http://www.slideshare.net/SparkSummit/sparkin
g-science-up-with-research-recommendations-
by-maya-hristakeva

Read
&
Organize
Search
&
Discover
Collaborate
&
Network
Experiment
&
Synthesize
MENDELEY BUILDS TOOLS TO HELP
RESEARCHERS …

BEING THE BEST RESEARCHER YOU CAN BE!
• Good researchers are on top of their game
• Large amount of research produced
• Takes time to get what you need
• Help researchers by recommending relevant research

PERSONALIZED ARTICLE RECOMMENDATION
Input:
User libraries
Output:
Suggested
articles to read
Algorithms:
• Collaborative Filtering
– Item-based
– User-Based
– Matrix Factorization
• Content-based

Costly & GoodCostly & Bad
Cheap & GoodCheap & Bad
Tuned IB Mahout
Tuned UB Mahout
Tuned UB Spark
Tuned IB Spark
UB DimSum
Spark MLlib
ALS Matrix Fact.
Spark MLlib
Performance
+100%
+150%
~$50

CALCULATING 75 TRILLION METRICS
• Benchmark 4600 institutions & 220 countries updated weekly
• 40 terabytes of data
• HPCC massively parallel compute system – 40 node system

60 % OF TIME IS SPENT ON DATA
PREPARATION

10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA
https://www.elsevier.com/con
nect/10-aspects-of-highly-
effective-research-data

http://data.mendeley.com/
Each dataset receives a versioned DOI,
so it can be cited
The citation for the
associated article is
displayed

CONCLUSION
• Researchers are faced with an ever growing amount of data and content
• Data Science is key to making systems that help them
• I’ve shown three Elsevier examples. Many more!
• Antonio Gulli’s codingplayground.blogspot.nl
• labs.elsevier.com
• Of course, we’re hiring 
Contact: Paul Groth @pgroth

Empfohlen

Structured Data & the Future of Educational MaterialPaul Groth

Information architecture at ElsevierPaul Groth

Recommender systems and information extraction for researchersMarco Rossetti

Knowledge Graph Construction and the Role of DBPediaPaul Groth

Knowledge graph construction for research & medicinePaul Groth

Sources of Change in Modern Knowledge Organization SystemsPaul Groth

Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler

Knowledge Graph Semantics/InteroperabilityJames Hendler

Empfohlen

Structured Data & the Future of Educational MaterialPaul Groth

Information architecture at ElsevierPaul Groth

Recommender systems and information extraction for researchersMarco Rossetti

Knowledge Graph Construction and the Role of DBPediaPaul Groth

Knowledge graph construction for research & medicinePaul Groth

Sources of Change in Modern Knowledge Organization SystemsPaul Groth

Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler

Knowledge Graph Semantics/InteroperabilityJames Hendler

Research Data Sharing: A Basic FrameworkPaul Groth

NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone

THOR Workshop - Services PANGAEAMaaike Duine

No more waiting! Tools that work Today to reveal dataset useHeather Piwowar

Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma

The Data Management EcosystemJohn Kunze

RDAP13 Elizabeth Moss: The impact of data reuseASIS&T

Re tooling for data management-supportSherry Lake

Machines are people tooPaul Groth

BEng Product Design 1st years session 1 Oct 2021EISLibrarian

Open Science: Research Data ManagementLibrary_Connect

Research methodologyCutLiaisons

THOR Workshop - Data Publishing ElsevierMaaike Duine

PDE2440 Nov 2019EISLibrarian

Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone

Coping with Data for WHOI JP StudentsCarly Strasser

ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser

The Dataverse CommonsMerce Crosas

Research data management workshop april12 2016 Rebecca Raworth, MLIS

Open Science for sustainability and inclusiveness: the SKA role modelLourdes Verdes-Montenegro

Open Access and Research Communication: The Perspective of Force11Maryann Martone

Weitere ähnliche Inhalte

Was ist angesagt?

Research Data Sharing: A Basic FrameworkPaul Groth

NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...Susanna-Assunta Sansone

THOR Workshop - Services PANGAEAMaaike Duine

No more waiting! Tools that work Today to reveal dataset useHeather Piwowar

Why Data Science Matters - 2014 WDS Data Stewardship Award LectureXiaogang (Marshall) Ma

The Data Management EcosystemJohn Kunze

RDAP13 Elizabeth Moss: The impact of data reuseASIS&T

Re tooling for data management-supportSherry Lake

Machines are people tooPaul Groth

BEng Product Design 1st years session 1 Oct 2021EISLibrarian

Open Science: Research Data ManagementLibrary_Connect

Research methodologyCutLiaisons

THOR Workshop - Data Publishing ElsevierMaaike Duine

PDE2440 Nov 2019EISLibrarian

Sharing Sensitive Data With Confidence: The DataTags systemMichael Bar-Sinai

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone

Coping with Data for WHOI JP StudentsCarly Strasser

ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser

The Dataverse CommonsMerce Crosas

Research data management workshop april12 2016 Rebecca Raworth, MLIS

Was ist angesagt? (20)

Research Data Sharing: A Basic Framework

NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...

THOR Workshop - Services PANGAEA

No more waiting! Tools that work Today to reveal dataset use

Why Data Science Matters - 2014 WDS Data Stewardship Award Lecture

The Data Management Ecosystem

RDAP13 Elizabeth Moss: The impact of data reuse

Re tooling for data management-support

Machines are people too

BEng Product Design 1st years session 1 Oct 2021

Open Science: Research Data Management

Research methodology

THOR Workshop - Data Publishing Elsevier

PDE2440 Nov 2019

Sharing Sensitive Data With Confidence: The DataTags system

Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014

Coping with Data for WHOI JP Students

ESA Ignite talk on UC3 Dash platform for data sharing

The Dataverse Commons

Research data management workshop april12 2016

Ähnlich wie Data for Science: How Elsevier is using data science to empower researchers

Open Science for sustainability and inclusiveness: the SKA role modelLourdes Verdes-Montenegro

Open Access and Research Communication: The Perspective of Force11Maryann Martone

Teaching Data Science to Undergraduate StudentsNicole Vasilevsky

Data Science and Urban Science @ UWUniversity of Washington

Sci Tech Forum LA 2013: New Directions in Scholarly CommunicationWilliam Gunn

Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015Alex Humphreys

Five Ways to Use Social Media to Raise Awareness for Your Paper or ResearchSean Ekins

Upgrading the Scholarly InfrastructureBjörn Brembs

Lern, june 2016, digital media slidesYork University - Osgoode Hall Law School

AAAS 2014: How the Web Changes CollaborationWilliam Gunn

NgspTim Clark

Is democracy the right system? Building an engaged RDM community - Marta Tepe...Mari Tinnemans

Melissa Terras' Report on the #UKMHLiveLabUniversity of Edinburgh

Destroying the silo: how breaking down barriers can lead to proactive and coo...UKSG: connecting the knowledge community

Dataverse in the Universe of Data by Christine L. Borgmandatascienceiqss

Advancing access to information - togetherIna Smith

When Search becomes Research and Research becomes SearchJaap Kamps

Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran

Responsive and Responsible Use of Digital Resources for Research Shang Genon-Sieras

Plum analytics: Altmetrics in Practice - ALM workshop -- San Francisco - 201...plumanalytics

Ähnlich wie Data for Science: How Elsevier is using data science to empower researchers (20)

Open Science for sustainability and inclusiveness: the SKA role model

Open Access and Research Communication: The Perspective of Force11

Teaching Data Science to Undergraduate Students

Data Science and Urban Science @ UW

Sci Tech Forum LA 2013: New Directions in Scholarly Communication

Of Libraries and Labs: Effecting User-Driven Innovation - RLUK Members Mtg 2015

Five Ways to Use Social Media to Raise Awareness for Your Paper or Research

Upgrading the Scholarly Infrastructure

Lern, june 2016, digital media slides

AAAS 2014: How the Web Changes Collaboration

Ngsp

Is democracy the right system? Building an engaged RDM community - Marta Tepe...

Melissa Terras' Report on the #UKMHLiveLab

Destroying the silo: how breaking down barriers can lead to proactive and coo...

Dataverse in the Universe of Data by Christine L. Borgman

Advancing access to information - together

When Search becomes Research and Research becomes Search

Data publication: Discover, Explore, Visualise

Responsive and Responsible Use of Digital Resources for Research

Plum analytics: Altmetrics in Practice - ALM workshop -- San Francisco - 201...

Mehr von Paul Groth

Data Curation and Debugging for Data Centric AIPaul Groth

Content + Signals: The value of the entire data estate for machine learningPaul Groth

Data Communities - reusable data in and outside your organization.Paul Groth

Minimal viable-datareuse-cziPaul Groth

Knowledge Graph MaintenancePaul Groth

Knowledge Graph FuturesPaul Groth

Knowledge Graph MaintenancePaul Groth

Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth

Thinking About the Making of DataPaul Groth

End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth

From Data Search to Data ShowcasingPaul Groth

Elsevier’s Healthcare Knowledge GraphPaul Groth

The Challenge of Deeper Knowledge Graphs for SciencePaul Groth

More ways of symbol grounding for knowledge graphs?Paul Groth

Diversity and Depth: Implementing AI across many long tail domainsPaul Groth

Progressive Provenance Capture Through Re-computationPaul Groth

From Text to Data to the World: The Future of Knowledge GraphsPaul Groth

Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth

The need for a transparent data supply chainPaul Groth

The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth

Mehr von Paul Groth (20)

Data Curation and Debugging for Data Centric AI

Content + Signals: The value of the entire data estate for machine learning

Data Communities - reusable data in and outside your organization.

Minimal viable-datareuse-czi

Knowledge Graph Maintenance

Knowledge Graph Futures

Knowledge Graph Maintenance

Thoughts on Knowledge Graphs & Deeper Provenance

Thinking About the Making of Data

End-to-End Learning for Answering Structured Queries Directly over Text

From Data Search to Data Showcasing

Elsevier’s Healthcare Knowledge Graph

The Challenge of Deeper Knowledge Graphs for Science

More ways of symbol grounding for knowledge graphs?

Diversity and Depth: Implementing AI across many long tail domains

Progressive Provenance Capture Through Re-computation

From Text to Data to the World: The Future of Knowledge Graphs

Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs

The need for a transparent data supply chain

The Roots: Linked data and the foundations of successful Agriculture Data

Kürzlich hochgeladen

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Advanced Computer Architecture – An IntroductionDilum Bandara

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

unit 4 immunoblotting technique complete.pptxBkGupta21

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Kürzlich hochgeladen (20)

Commit 2024 - Secret Management made easy

Advanced Computer Architecture – An Introduction

TeamStation AI System Report LATAM IT Salaries 2024

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

"Debugging python applications inside k8s environment", Andrii Soldatenko

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

Scanning the Internet for External Cloud Exposures via SSL Certs

Dev Dives: Streamline document processing with UiPath Studio Web

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

unit 4 immunoblotting technique complete.pptx

Take control of your SAP testing with UiPath Test Suite

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Streamlining Python Development: A Guide to a Modern Project Setup

The State of Passkeys with FIDO Alliance.pptx

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Ensuring Technical Readiness For Copilot in Microsoft 365

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

SAP Build Work Zone - Overview L2-L3.pptx

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Data for Science: How Elsevier is using data science to empower researchers

1. DATA FOR SCIENCE HOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS Paul Groth | @pgroth | pgroth.com Disruptive Technology Director Elsevier Labs | @elsevierlabs European Data Forum 2016

3. 12 million people per month

5. 40 million reactions 75 million compounds 500 million facts

6. 3 EXAMPLES • Personalized: what should I read? • Actionable: who should I collaborate with? • Consumable: how do I make my data available?

7. RECOMMENDATIONS AT MENDELEY • Maya Hristakeva • Data Scientist at Mendeley • @mayahhf • Spark Summit 2015 • http://www.slideshare.net/SparkSummit/sparkin g-science-up-with-research-recommendations- by-maya-hristakeva

8. Read & Organize Search & Discover Collaborate & Network Experiment & Synthesize MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …

9. BEING THE BEST RESEARCHER YOU CAN BE! • Good researchers are on top of their game • Large amount of research produced • Takes time to get what you need • Help researchers by recommending relevant research

10.

11. PERSONALIZED ARTICLE RECOMMENDATION Input: User libraries Output: Suggested articles to read Algorithms: • Collaborative Filtering – Item-based – User-Based – Matrix Factorization • Content-based

12. Costly & GoodCostly & Bad Cheap & GoodCheap & Bad Tuned IB Mahout Tuned UB Mahout Tuned UB Spark Tuned IB Spark UB DimSum Spark MLlib ALS Matrix Fact. Spark MLlib Performance +100% +150% ~$50

13.

14. CALCULATING 75 TRILLION METRICS • Benchmark 4600 institutions & 220 countries updated weekly • 40 terabytes of data • HPCC massively parallel compute system – 40 node system

15.

16. ALL DATA ISN’T CURATED

17. 60 % OF TIME IS SPENT ON DATA PREPARATION

18. 10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA https://www.elsevier.com/con nect/10-aspects-of-highly- effective-research-data

19. http://data.mendeley.com/ Each dataset receives a versioned DOI, so it can be cited The citation for the associated article is displayed

20.

21. ACADEMIC COLLABORATIONS

22. CONCLUSION • Researchers are faced with an ever growing amount of data and content • Data Science is key to making systems that help them • I’ve shown three Elsevier examples. Many more! • Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com • Of course, we’re hiring  Contact: Paul Groth @pgroth

Hinweis der Redaktion

1.8 million unique authors worldwide submitted 1.3 million manuscripts to Elsevier journals
40 million reactions 75 million compounds 500 million experimental facts ,
40 million reactions 75 million compounds 500 million experimental facts ,
At Mendeley we build tools to help researchers organise and read research articles, collaborate and connect with other researchers, search and discover new research articles, etc.
815 million articles
“Mendeley Suggest” is our personalised article recommender. It is based on what users have in their libraries, and recommends other related articles.
Calculate for over 4 million users We are building a personalised article recommender based on what users read. Input is the users’ libraries and the output is a list of articles they may want to add to their library and read. There are a number of different algorithms we can use to generate the recommendations (content-based, collaborative filtering), and this talk we’ll focus on three types of collaborative filtering algorithms (user and item-based as well as matrix factorisation).
To sum, we now have a Spark implementation of our production UB CF algorithm which performs well, and is a lot simpler to maintain and extend. There are still a few areas where we can tune and optimise further, so that could only make it faster and get bigger gains of using Spark. Depending on your data different algorithms might work better, so do experiment.
40 million reactions 75 million compounds 500 million experimental facts ,
http://www.tamr.com/piketty-revisited-improving-economics-data-science/
NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
Data enginnering pipleines