SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Paul Groth
Elsevier Labs
@pgroth | pgroth.com
Data Integration & Transparency
Tackling the tension
University of Fribourg Informatics Colloquium June 15, 2015
Outline
• Data integration for analysis
– i.e. remixing data
• The need for transparency
• Provenance as a solution
• The downloads folder problem
60 % of time is spent on data
preparation
http://openphacts.org
pmu@openphacts.org
@Open_PHACTS
Why?
Public Domain Drug Discovery Data:
Pharma are accessing, processing, storing & re-processing
Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
Repeat @
each
company
x
Prioritised Research Questions
Number sum Nr of 1 Question
15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse
18 14 8
Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety
concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor,
KOL) for findings associated with a compound?
24 13 8
Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine
ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8
The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine
protease assays for molecules that contain substructure X.
38 13 8
Retrieve all experimental and clinical data for a given list of compounds defined by their chemical
structure (with options to match stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to
modulate the target directly? What are the compounds that may modulate the target directly? i.e. return
all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both
from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
www.openphacts.org
Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse
From Mabel Loza - USC team
From Mabel Loza - USC team
From Mabel Loza - USC team
From Mabel Loza - USC team
Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse
ChEMBL:
Search target Oxidoreductase: 481 targets from different species
Selection of all the oxidoreductases and filtering bioactivities with the
criteria IC50 < 100 (no units could be selected): 11497 data obtained
Table exported to a excel spreadsheet and manually filtered
From Mabel Loza - USC team
5 people
Working 6 hours
Problem: Data Integration
Data
Source
Data
Source
Data Warehouse
Queries
Extract
Transform
Load
Data
Source
Data
Source
Mediator
Queries
Query
Reformulation
Using the Power of Open PHACTS, London, 22-23 April 2013
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
index
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public Ontologies
User
Annotations
Applications
16
Open PHACTS Explorer
17
Open PHACTS Explorer
?
Credits: Curt Tilmes, Peter Fox
Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.;
Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate
Assessment in the Global Change Information System," Geoscience and Remote
Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
Problem: I don’t trust your assessment what is
it based on?
Tension:
Integrated &
Summarized
Data
Transparenc
y& Trust
Solution
Integrating and exposing provenance
provided by multiple sources
provbook.org
National Climate Change Assessment Provenance
Tooling
http://asdf.readthedocs.org/en/latest/provenance.html
http://www.slideshare.net/soilandreyes/20130529-taverna-provenance
Towards Workflow Ecosystems Through Semantic and Standard Representations Garijo, D.; Gil, Y.; and Corcho, O.
In Proceedings of the Ninth Workshop on Workflows in Support of Large-Scale Science (WORKS), held in
conjunction with the IEEE ACM International Conference on High-Performance Computing (SC), New Orleans, LA,
https://github.com/pgroth/PROVTutorial
Great…..
but
Data integration is manual
Look to OS techniques
1. Taint Tracking
2. Record and Replay
Work with Manolis Stamatogiannakis & Herbert Bos VU University Amsterdam – Security
Group
https://www.youtube.com/watch?v=BD0h6M5m
Voo
http://www.androidreran.com
Use R&R for provenance
1. Execution Capture
2. Application of instrumentation
3. Provenance analysis
4. Selection and iteration
Implemented using plugins for Platform for
Architecture-Neutral Dynamic Analysis
(PANDA)
An Example (1)
<exe://pam-foreground-~3451> prov:endedAtTime 199090196 .
<exe://getent~3451> a prov:Activity .
<exe://getent~3451> rdf:type dt:getent .
<exe://cut~3452> a prov:Activity .
<exe://cut~3452> rdf:type dt:cut .
<file:/etc/nsswitch.conf> a prov:Entity .
<file:/etc/nsswitch.conf> rdfs:label "/etc/nsswitch.conf" .
<file:/etc/nsswitch.conf> rdf:type dt:Unknown .
<exe://getent~3451> prov:used <file:/etc/nsswitch.conf> .
# unused file:3477815296:getent~3451:/etc/passwd:r0:w0:f524288
<exe://getent~3451> prov:startedAtTime 199090196 .
<exe://getent~3451> prov:endedAtTime 200392668 .
<file:FD0_3452> a prov:Entity .
<file:FD0_3452> rdfs:label "FD0_3452"
An Example (2
Example (3)
Example (4)
Conclusion
• Tension between:
– putting stuff together; and
– documenting what’s been done
• Provenance helps
• Issues in collection
• Standards + Stealth1
1 hat tip Carole Goble
Questions?
• More info:
– openphacts.org
– data2semantics.org
– provbook.org
– https://github.com/m000/dtracker
– Manolis Stamatogiannakis, Paul Groth and Herbert Bos. Decoupling
Provenance Capture and Analysis from Execution. Theory and Practice
of Provenance 2015
– Luc Moreau, Paul Groth, James Cheney, Timothy Lebo, Simon Miles,
The rationale of PROV, Web Semantics: Science, Services and Agents
on the World Wide Web, Available online 20 April 2015
http://dx.doi.org/10.1016/j.websem.2015.04.001.
– Paul Groth, "Transparency and Reliability in the Data Supply Chain,"
IEEE Internet Computing, vol. 17, no. 2, pp. 69-71, March-April, 2013
– Paul Groth, "The Knowledge-Remixing Bottleneck," Intelligent Systems,
IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013

Weitere ähnliche Inhalte

Was ist angesagt?

Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceNextBio
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachKrzysztof Gorgolewski
 
NIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkNIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkJean-Claude Bradley
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...open_phacts
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationStuart Chalk
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Philippe Rocca-Serra
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Open Notebook Science in Drug Discovery
Open Notebook Science in Drug DiscoveryOpen Notebook Science in Drug Discovery
Open Notebook Science in Drug DiscoveryJean-Claude Bradley
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AIDatabricks
 

Was ist angesagt? (20)

Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
 
Experimenta
ExperimentaExperimenta
Experimenta
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Reproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approachReproducibility and replicability: a practical approach
Reproducibility and replicability: a practical approach
 
NIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science TalkNIST Dec08 Open Notebook Science Talk
NIST Dec08 Open Notebook Science Talk
 
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
chattopadhyay resume
chattopadhyay resumechattopadhyay resume
chattopadhyay resume
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
ACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka CollaborationACS 248th Paper 67 Eureka Collaboration
ACS 248th Paper 67 Eureka Collaboration
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
IGERT Open Notebook Science
IGERT Open Notebook ScienceIGERT Open Notebook Science
IGERT Open Notebook Science
 
Open Notebook Science in Drug Discovery
Open Notebook Science in Drug DiscoveryOpen Notebook Science in Drug Discovery
Open Notebook Science in Drug Discovery
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 

Andere mochten auch

"Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited "Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited Paul Groth
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactPaul Groth
 
Telling your research story with (alt)metrics
Telling your research story with (alt)metricsTelling your research story with (alt)metrics
Telling your research story with (alt)metricsPaul Groth
 
Decoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionDecoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionPaul Groth
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersPaul Groth
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at ElsevierPaul Groth
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialPaul Groth
 
Knowledge Graphs at Elsevier
Knowledge Graphs at ElsevierKnowledge Graphs at Elsevier
Knowledge Graphs at ElsevierPaul Groth
 
Open PHACTS API Walkthrough
Open PHACTS API WalkthroughOpen PHACTS API Walkthrough
Open PHACTS API WalkthroughPaul Groth
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CapturePaul Groth
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaPaul Groth
 
Φύλλο Αναφοράς Ομάδας
Φύλλο Αναφοράς ΟμάδαςΦύλλο Αναφοράς Ομάδας
Φύλλο Αναφοράς ΟμάδαςPetros Michailidis
 
Star Chart Presentation
Star Chart PresentationStar Chart Presentation
Star Chart Presentationtbilleaud
 
Google History
Google HistoryGoogle History
Google HistoryRoger
 

Andere mochten auch (20)

"Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited "Don't Publish, Release" - Revisited
"Don't Publish, Release" - Revisited
 
Altmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impactAltmetrics: painting a broader picture of impact
Altmetrics: painting a broader picture of impact
 
Telling your research story with (alt)metrics
Telling your research story with (alt)metricsTelling your research story with (alt)metrics
Telling your research story with (alt)metrics
 
Decoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from ExecutionDecoupling Provenance Capture and Analysis from Execution
Decoupling Provenance Capture and Analysis from Execution
 
Data for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchersData for Science: How Elsevier is using data science to empower researchers
Data for Science: How Elsevier is using data science to empower researchers
 
Information architecture at Elsevier
Information architecture at ElsevierInformation architecture at Elsevier
Information architecture at Elsevier
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
Structured Data & the Future of Educational Material
Structured Data & the Future of Educational MaterialStructured Data & the Future of Educational Material
Structured Data & the Future of Educational Material
 
Knowledge Graphs at Elsevier
Knowledge Graphs at ElsevierKnowledge Graphs at Elsevier
Knowledge Graphs at Elsevier
 
Open PHACTS API Walkthrough
Open PHACTS API WalkthroughOpen PHACTS API Walkthrough
Open PHACTS API Walkthrough
 
Tradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance CaptureTradeoffs in Automatic Provenance Capture
Tradeoffs in Automatic Provenance Capture
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Knowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPediaKnowledge Graph Construction and the Role of DBPedia
Knowledge Graph Construction and the Role of DBPedia
 
Φύλλο Αναφοράς Ομάδας
Φύλλο Αναφοράς ΟμάδαςΦύλλο Αναφοράς Ομάδας
Φύλλο Αναφοράς Ομάδας
 
D I A P 1
D I A P 1D I A P 1
D I A P 1
 
Star Chart Presentation
Star Chart PresentationStar Chart Presentation
Star Chart Presentation
 
Adicción al sexo
Adicción al sexoAdicción al sexo
Adicción al sexo
 
Google History
Google HistoryGoogle History
Google History
 
04. Αggregeted results
04. Αggregeted results04. Αggregeted results
04. Αggregeted results
 

Ähnlich wie Data Integration vs Transparency: Tackling the tension

EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...European Data Forum
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiChris Evelo
 
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discoveryopen_phacts
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug DiscoverySciBite Limited
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeSciBite Limited
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 
Integrated Analysis of Toxicology Data supported by ToxBank
Integrated Analysis of Toxicology Data supported by ToxBankIntegrated Analysis of Toxicology Data supported by ToxBank
Integrated Analysis of Toxicology Data supported by ToxBankBarry Hardy
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...Barry Hardy
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Kerstin Forsberg
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyFAIRDOM
 

Ähnlich wie Data Integration vs Transparency: Tackling the tension (20)

EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
EDF2014: Paul Groth, Department of Computer Science & The Network Institute, ...
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
2014-03-20 Open PHACTS - A Data Platform for Drug Discovery
 
Mashing Up Drug Discovery
Mashing Up Drug DiscoveryMashing Up Drug Discovery
Mashing Up Drug Discovery
 
Open PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry ProgrammeOpen PHACTS (Sept 2013) EBI Industry Programme
Open PHACTS (Sept 2013) EBI Industry Programme
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 
Integrated Analysis of Toxicology Data supported by ToxBank
Integrated Analysis of Toxicology Data supported by ToxBankIntegrated Analysis of Toxicology Data supported by ToxBank
Integrated Analysis of Toxicology Data supported by ToxBank
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium
 
The FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems BiologyThe FAIRDOM Commons for Systems Biology
The FAIRDOM Commons for Systems Biology
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 

Mehr von Paul Groth

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIPaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-cziPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphPaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationPaul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 

Mehr von Paul Groth (20)

Data Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AIData Curation and Debugging for Data Centric AI
Data Curation and Debugging for Data Centric AI
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Elsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge GraphElsevier’s Healthcare Knowledge Graph
Elsevier’s Healthcare Knowledge Graph
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Progressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computationProgressive Provenance Capture Through Re-computation
Progressive Provenance Capture Through Re-computation
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 

Kürzlich hochgeladen

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Kürzlich hochgeladen (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Data Integration vs Transparency: Tackling the tension

  • 1. Paul Groth Elsevier Labs @pgroth | pgroth.com Data Integration & Transparency Tackling the tension University of Fribourg Informatics Colloquium June 15, 2015
  • 2.
  • 3. Outline • Data integration for analysis – i.e. remixing data • The need for transparency • Provenance as a solution • The downloads folder problem
  • 4. 60 % of time is spent on data preparation
  • 6. Why? Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing Literature PubChem Genbank Patents Databases Downloads Data Integration Data Analysis Firewalled Databases Repeat @ each company x
  • 7. Prioritised Research Questions Number sum Nr of 1 Question 15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 13 8 For a given interaction profile, give me compounds similar to it. 37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. 38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). 41 13 8 A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature. 44 13 8 Give me all active compounds on a given target with the relevant assay data 46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) 59 14 8 Identify all known protein-protein interaction inhibitors www.openphacts.org
  • 8. Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse From Mabel Loza - USC team
  • 9. From Mabel Loza - USC team
  • 10. From Mabel Loza - USC team
  • 11. From Mabel Loza - USC team
  • 12. Research question 15: All oxido reductase inhibitors active < 100nM in both human and mouse ChEMBL: Search target Oxidoreductase: 481 targets from different species Selection of all the oxidoreductases and filtering bioactivities with the criteria IC50 < 100 (no units could be selected): 11497 data obtained Table exported to a excel spreadsheet and manually filtered From Mabel Loza - USC team
  • 14. Problem: Data Integration Data Source Data Source Data Warehouse Queries Extract Transform Load Data Source Data Source Mediator Queries Query Reformulation
  • 15. Using the Power of Open PHACTS, London, 22-23 April 2013 Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service index CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Applications
  • 18. Credits: Curt Tilmes, Peter Fox Tilmes, C.; Fox, P.; Ma, X.; McGuinness, D.L.; Privette, A.P.; Smith, A.; Waple, A.; Zednik, S.; Zheng, J.G., "Provenance Representation for the National Climate Assessment in the Global Change Information System," Geoscience and Remote Sensing, IEEE Transactions on , vol.51, no.11, pp.5160,5168, Nov. 2013
  • 19.
  • 20. Problem: I don’t trust your assessment what is it based on?
  • 22. Solution Integrating and exposing provenance provided by multiple sources
  • 23.
  • 25.
  • 26. National Climate Change Assessment Provenance
  • 27.
  • 29.
  • 30.
  • 31.
  • 33.
  • 35. Towards Workflow Ecosystems Through Semantic and Standard Representations Garijo, D.; Gil, Y.; and Corcho, O. In Proceedings of the Ninth Workshop on Workflows in Support of Large-Scale Science (WORKS), held in conjunction with the IEEE ACM International Conference on High-Performance Computing (SC), New Orleans, LA,
  • 37.
  • 38.
  • 39.
  • 42.
  • 43.
  • 44. Look to OS techniques 1. Taint Tracking 2. Record and Replay Work with Manolis Stamatogiannakis & Herbert Bos VU University Amsterdam – Security Group
  • 45.
  • 46.
  • 47.
  • 50. Use R&R for provenance 1. Execution Capture 2. Application of instrumentation 3. Provenance analysis 4. Selection and iteration Implemented using plugins for Platform for Architecture-Neutral Dynamic Analysis (PANDA)
  • 51. An Example (1) <exe://pam-foreground-~3451> prov:endedAtTime 199090196 . <exe://getent~3451> a prov:Activity . <exe://getent~3451> rdf:type dt:getent . <exe://cut~3452> a prov:Activity . <exe://cut~3452> rdf:type dt:cut . <file:/etc/nsswitch.conf> a prov:Entity . <file:/etc/nsswitch.conf> rdfs:label "/etc/nsswitch.conf" . <file:/etc/nsswitch.conf> rdf:type dt:Unknown . <exe://getent~3451> prov:used <file:/etc/nsswitch.conf> . # unused file:3477815296:getent~3451:/etc/passwd:r0:w0:f524288 <exe://getent~3451> prov:startedAtTime 199090196 . <exe://getent~3451> prov:endedAtTime 200392668 . <file:FD0_3452> a prov:Entity . <file:FD0_3452> rdfs:label "FD0_3452"
  • 55. Conclusion • Tension between: – putting stuff together; and – documenting what’s been done • Provenance helps • Issues in collection • Standards + Stealth1 1 hat tip Carole Goble
  • 56. Questions? • More info: – openphacts.org – data2semantics.org – provbook.org – https://github.com/m000/dtracker – Manolis Stamatogiannakis, Paul Groth and Herbert Bos. Decoupling Provenance Capture and Analysis from Execution. Theory and Practice of Provenance 2015 – Luc Moreau, Paul Groth, James Cheney, Timothy Lebo, Simon Miles, The rationale of PROV, Web Semantics: Science, Services and Agents on the World Wide Web, Available online 20 April 2015 http://dx.doi.org/10.1016/j.websem.2015.04.001. – Paul Groth, "Transparency and Reliability in the Data Supply Chain," IEEE Internet Computing, vol. 17, no. 2, pp. 69-71, March-April, 2013 – Paul Groth, "The Knowledge-Remixing Bottleneck," Intelligent Systems, IEEE , vol.28, no.5, pp.44,48, Sept.-Oct. 2013

Hinweis der Redaktion

  1. A lot of my work is based on the assumption that great stuff is a combo of old stuff and that the web accelerates all aspects of this process. How do deal (technically & socially) with the implications of in terms of usability, credit, responsibility, and transparency?
  2. NASA, A.40 Computational Modeling Algorithms and Cyberinfrastructure, tech. report, NASA, 19 Dec. 2011
  3. pubmed
  4. scifinder
  5. brenda
  6. chembl
  7. Extract, transform, load Or source wrapping
  8. libraries
  9. da